[Bug #11308] tbench regression on each kernel release from 2.6.22 -> 2.6.28

Kernel-testers Development Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-08-16 19:00 2.6.27-rc3-git3: Reported regressions from 2.6.26 Rafael J. Wysocki
@ 2008-08-16 19:02 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-16 19:02 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (6 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* 2.6.27-rc4-git1: Reported regressions from 2.6.26
@ 2008-08-23 18:07 Rafael J. Wysocki
  2008-08-23 18:07 ` [Bug #11141] no battery or DC status - Dell i1501 Rafael J. Wysocki
                   ` (54 more replies)
  0 siblings, 55 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
	Kernel Testers List

This message contains a list of some regressions from 2.6.26, for which there
are no fixes in the mainline I know of.  If any of them have been fixed already,
please let me know.

If you know of any other unresolved regressions from 2.6.26, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.

Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2008-08-23      122       48          40
  2008-08-16      103       47          37
  2008-08-10       80       52          31
  2008-08-02       47       31          20


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11414
Subject		: Random crashes with 2.6.27-rc3 on PPC
Submitter	: Michael Buesch <mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
Date		: 2008-08-23 14:10 (1 days old)
References	: http://marc.info/?l=linux-kernel&m=121950076812616&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11410
Subject		: SLUB list_lock vs obj_hash.lock...
Submitter	: Daniel J Blueman <daniel.blueman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-22 21:48 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121944176609042&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11407
Subject		: suspend: unable to handle kernel paging request
Submitter	: Vegard Nossum <vegard.nossum-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-21 17:28 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121933974928881&w=4
Handled-By	: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
		  Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
		  Pavel Machek <pavel-AlSwsSmVLrQ@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11406
Subject		: patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug
Submitter	: Jan Beulich <jbeulich-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-21 12:59 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121932366326572&w=4
Handled-By	: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
		  Robert Richter <robert.richter-5C7GfCeVMHo@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11405
Subject		: 2.6.27-rc3 segfault on cold boot; not on warm boot.
Submitter	: David Greaves <david-FQ/kcb21CSxWk0Htik3J/w@public.gmane.org>
Date		: 2008-08-21 9:45 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121931198904777&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11404
Subject		: BUG: in 2.6.23-rc3-git7 in do_cciss_intr
Submitter	: rdunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-21 5:52 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121929819616273&w=4
		  http://marc.info/?l=linux-kernel&m=121932889105368&w=4
Handled-By	: Miller, Mike (OS Dev) <Mike.Miller-VXdhtT5mjnY@public.gmane.org>
		  James Bottomley <James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11403
Subject		: 2.6.27-rc2 USB suspend regression
Submitter	: Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org>
Date		: 2008-08-20 20:48 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=121926536103630&w=4
Handled-By	: Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11402
Subject		: skbuff bug?
Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-21 3:56 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121929102707658&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11401
Subject		: pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
Submitter	: Laurent Riffard <laurent.riffard-GANU6spQydw@public.gmane.org>
Date		: 2008-08-22 08:16 (2 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11398
Subject		: hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
Date		: 2008-08-21 17:17 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11388
Subject		: 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable
Submitter	: Joshua Hoblitt <j_kernel-amK9oZtvyLhBDgjK7y7TUQ@public.gmane.org>
Date		: 2008-08-20 17:38 (4 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11382
Subject		: e1000e: 2.6.27-rc1 corrupts EEPROM/NVM
Submitter	: David Vrabel <david.vrabel-kQvG35nSl+M@public.gmane.org>
Date		: 2008-08-08 10:47 (16 days old)
References	: http://marc.info/?l=linux-kernel&m=121819267211679&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11380
Subject		: lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16
Submitter	: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date		: 2008-08-20 6:44 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=121921480931970&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11379
Subject		: char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
Date		: 2008-08-18 13:40 (6 days old)
References	: http://marc.info/?l=linux-kernel&m=121906698213329&w=4
Handled-By	: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11357
Subject		: Can not boot up with zd1211rw USB-Wlan Stick
Submitter	: uwe <kender-KuiJ5kEpwI6ELgA04lAiVw@public.gmane.org>
Date		: 2008-08-16 14:17 (8 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11356
Subject		: Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
Date		: 2008-08-16 19:11 (8 days old)
References	: http://marc.info/?l=linux-kernel&m=121891396320127&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11355
Subject		: Regression in 2.6.27-rc2 when cross-building the kernel
Submitter	: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
Date		: 2008-08-16 2:38 (8 days old)
References	: http://marc.info/?l=linux-kernel&m=121885432118368&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11354
Subject		: AMD Elan regression with 2.6.27-rc3
Submitter	: Sean Young <sean-hENCXIMQXOg@public.gmane.org>
Date		: 2008-08-15 18:37 (9 days old)
References	: http://marc.info/?l=linux-kernel&m=121882578430056&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11343
Subject		: SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
Submitter	: Manny Maxwell <mannymax-7UBucS1kxs3k1uMJSBkQmQ@public.gmane.org>
Date		: 2008-08-14 4:16 (10 days old)
References	: http://marc.info/?l=linux-kernel&m=121868782917600&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11342
Subject		: Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
Submitter	: Alan D. Brunelle <Alan.Brunelle-VXdhtT5mjnY@public.gmane.org>
Date		: 2008-08-13 23:03 (11 days old)
References	: http://marc.info/?l=linux-kernel&m=121866876027629&w=4
Handled-By	: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11340
Subject		: LTP overnight run resulted in unusable box
Submitter	: Alexey Dobriyan <adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-13 9:24 (11 days old)
References	: http://marc.info/?l=linux-kernel&m=121861951902949&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11336
Subject		: 2.6.27-rc2:stall while mounting root fs
Submitter	: Torsten Kaiser <just.for.lkml-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
Date		: 2008-08-12 12:37 (12 days old)
References	: http://marc.info/?l=linux-kernel&m=121854484015909&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11335
Subject		: 2.6.27-rc2-git5 BUG: unable to handle kernel paging request
Submitter	: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-12 4:18 (12 days old)
References	: http://marc.info/?l=linux-kernel&m=121851477201960&w=4
		  http://lkml.org/lkml/2008/8/16/274
Handled-By	: Hugh Dickins <hugh-DTz5qymZ9yRBDgjK7y7TUQ@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11334
Subject		: myri10ge: use ioremap_wc: compilation failure on ARM
Submitter	: Martin Michlmayr <tbm-R+vWnYXSFMfQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-10 11:25 (14 days old)
References	: http://marc.info/?l=linux-netdev&m=121836771727632&w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (13 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11282
Subject		: Please fix x86 defconfig regression
Submitter	: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
Date		: 2008-08-07 20:46 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=121814188805666&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11279
Subject		: 2.6.27-rc0 Power Bugs with HP/Compaq Laptops
Submitter	: Matt Parnell <mparnell-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-07 14:57 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=121812108031685&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11272
Subject		: BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835
Submitter	: Jaswinder Singh <jaswinderlinux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-05 15:12 (19 days old)
References	: http://marc.info/?l=linux-kernel&m=121794900319776&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11271
Subject		: BUG: fealnx in 2.6.27-rc1
Submitter	: Jaswinder Singh <jaswinderlinux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-05 14:58 (19 days old)
References	: http://marc.info/?l=linux-netdev&m=121794762016830&w=4
		  http://lkml.org/lkml/2008/8/10/98
Handled-By	: Francois Romieu <romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11264
Subject		: Invalid op opcode in kernel/workqueue
Submitter	: Jean-Luc Coulon <jean.luc.coulon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-07 04:18 (17 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11237
Subject		: corrupt PMD after resume
Submitter	: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
Date		: 2008-08-02 9:51 (22 days old)
References	: http://marc.info/?l=linux-kernel&m=121767073424952&w=4
Handled-By	: Hugh Dickins <hugh-DTz5qymZ9yRBDgjK7y7TUQ@public.gmane.org>
		  Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11230
Subject		: Kconfig no longer outputs a .config with freshly updated defconfigs
Submitter	: Josh Boyer <jwboyer-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date		: 2008-08-02 16:03 (22 days old)
References	: http://marc.info/?l=linux-kernel&m=121769306319391&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11224
Subject		: Only three cores found on quad-core machine.
Submitter	: Dave Jones <davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-01 18:15 (23 days old)
References	: http://marc.info/?l=linux-kernel&m=121761475224719&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11220
Subject		: Screen stays black after resume
Submitter	: Nico Schottelius <nico-xuaVFQXs+5hIG4jRRZ66WA@public.gmane.org>
Date		: 2008-07-31 21:05 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121753882422899&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11219
Subject		: KVM modules break emergency reboot
Submitter	: Zdenek Kabelac <zdenek.kabelac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-01 20:25 (23 days old)
References	: http://marc.info/?l=linux-kernel&m=121762241105336&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11215
Subject		: INFO: possible recursive locking detected ps2_command
Submitter	: Zdenek Kabelac <zdenek.kabelac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-07-31 9:41 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121749737011637&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11210
Subject		: libata badness
Submitter	: Kumar Gala <galak-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
Date		: 2008-07-31 18:53 (24 days old)
References	: http://marc.info/?l=linux-ide&m=121753059307310&w=4
Handled-By	: Ben Dooks <ben-linux-elnMNo+KYs3YtjvyW6yDsg@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11209
Subject		: 2.6.27-rc1 process time accounting
Submitter	: Lukas Hejtmanek <xhejtman-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
Date		: 2008-07-31 10:43 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121750102917490&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11191
Subject		: 2.6.26-git8: spinlock lockup in c1e_idle()
Submitter	: Mikhail Kshevetskiy <mikhail.kshevetskiy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-07-24 03:22 (31 days old)
References	: http://lkml.org/lkml/2008/7/23/317
Handled-By	: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11141
Subject		: no battery or DC status - Dell i1501
Submitter	: Gu Rui <chaos.proton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-07-21 19:43 (34 days old)
Handled-By	: Zhao Yakui <yakui.zhao-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>


Regressions with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11413
Subject		: get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt()
Submitter	: Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
Date		: 2008-08-23 9:48 (1 days old)
References	: http://marc.info/?l=linux-kernel&m=121948503224161&w=4
Handled-By	: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121950734922457&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11409
Subject		: build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init'
Submitter	: Toralf FÃ¶rster <toralf.foerster-Mmb7MZpHnFY@public.gmane.org>
Date		: 2008-08-22 8:33 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121939410214677&w=4
Handled-By	: Alan Cox <alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121943097320451&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11361
Subject		: my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec
Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-17 6:25 (7 days old)
References	: http://marc.info/?l=linux-kernel&m=121895439927053&w=4
Handled-By	: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121917167232014&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11360
Subject		: mpc8xxx_wdt.c doesn't build modular
Submitter	: Dave Jones <davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-17 08:07 (7 days old)
References	: http://lkml.org/lkml/2008/8/12/465
Handled-By	: Anton Vorontsov <avorontsov-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
Patch		: http://lkml.org/lkml/2008/8/13/344


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11358
Subject		: net: forcedeth call restore mac addr in nv_shutdown path
Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-17 3:30 (7 days old)
References	: http://marc.info/?l=linux-kernel&m=121894389018584&w=4
Handled-By	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121894389018584&w=4


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11276
Subject		: build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things
Submitter	: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-06 17:18 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=121804329014332&w=4
		  http://lkml.org/lkml/2008/7/22/353
Handled-By	: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>
Patch		: http://lkml.org/lkml/2008/7/22/364


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11254
Subject		: KVM: fix userspace ABI breakage
Submitter	: Adrian Bunk <bunk-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date		: 21 Jul 2008 17:58:26 (0 days old)
References	: http://lkml.org/lkml/2008/7/21/197
Handled-By	: Adrian Bunk <bunk-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Patch		: http://lkml.org/lkml/2008/7/21/197


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11207
Subject		: VolanoMark regression with 2.6.27-rc1
Submitter	: Zhang, Yanmin <yanmin_zhang-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Date		: 2008-07-31 3:20 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121747464114335&w=4
Handled-By	: Zhang, Yanmin <yanmin_zhang-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
		  Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>
		  Dhaval Giani <dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
		  Miao Xie <miaox-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121922991027344&w=4


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.26,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=11167

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11141] no battery or DC status - Dell i1501
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
@ 2008-08-23 18:07 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11207] VolanoMark regression with 2.6.27-rc1 Rafael J. Wysocki
                   ` (53 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Gu Rui, Zhao Yakui

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11141
Subject		: no battery or DC status - Dell i1501
Submitter	: Gu Rui <chaos.proton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-07-21 19:43 (34 days old)
Handled-By	: Zhao Yakui <yakui.zhao-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11191] 2.6.26-git8: spinlock lockup in c1e_idle()
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
  2008-08-23 18:07 ` [Bug #11141] no battery or DC status - Dell i1501 Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11207] VolanoMark regression with 2.6.27-rc1 Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11210] libata badness Rafael J. Wysocki
                   ` (51 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Mikhail Kshevetskiy, Thomas Gleixner

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11191
Subject		: 2.6.26-git8: spinlock lockup in c1e_idle()
Submitter	: Mikhail Kshevetskiy <mikhail.kshevetskiy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-07-24 03:22 (31 days old)
References	: http://lkml.org/lkml/2008/7/23/317
Handled-By	: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11207] VolanoMark regression with 2.6.27-rc1
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
  2008-08-23 18:07 ` [Bug #11141] no battery or DC status - Dell i1501 Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11191] 2.6.26-git8: spinlock lockup in c1e_idle() Rafael J. Wysocki
                   ` (52 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Dhaval Giani, Miao Xie, Peter Zijlstra,
	Zhang, Yanmin

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11207
Subject		: VolanoMark regression with 2.6.27-rc1
Submitter	: Zhang, Yanmin <yanmin_zhang-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Date		: 2008-07-31 3:20 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121747464114335&w=4
Handled-By	: Zhang, Yanmin <yanmin_zhang-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
		  Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>
		  Dhaval Giani <dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
		  Miao Xie <miaox-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121922991027344&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11210] libata badness
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11191] 2.6.26-git8: spinlock lockup in c1e_idle() Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 22:23   ` Jeff Garzik
  2008-08-23 18:10 ` [Bug #11215] INFO: possible recursive locking detected ps2_command Rafael J. Wysocki
                   ` (50 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Ben Dooks, Kumar Gala

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11210
Subject		: libata badness
Submitter	: Kumar Gala <galak-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
Date		: 2008-07-31 18:53 (24 days old)
References	: http://marc.info/?l=linux-ide&m=121753059307310&w=4
Handled-By	: Ben Dooks <ben-linux-elnMNo+KYs3YtjvyW6yDsg@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11215] INFO: possible recursive locking detected ps2_command
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11210] libata badness Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11209] 2.6.27-rc1 process time accounting Rafael J. Wysocki
                   ` (49 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Peter Zijlstra, Zdenek Kabelac

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11215
Subject		: INFO: possible recursive locking detected ps2_command
Submitter	: Zdenek Kabelac <zdenek.kabelac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-07-31 9:41 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121749737011637&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11209] 2.6.27-rc1 process time accounting
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11215] INFO: possible recursive locking detected ps2_command Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11220] Screen stays black after resume Rafael J. Wysocki
                   ` (48 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Lukas Hejtmanek, Peter Zijlstra

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11209
Subject		: 2.6.27-rc1 process time accounting
Submitter	: Lukas Hejtmanek <xhejtman-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
Date		: 2008-07-31 10:43 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121750102917490&w=4
Handled-By	: Peter Zijlstra <a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11220] Screen stays black after resume
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11209] 2.6.27-rc1 process time accounting Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11219] KVM modules break emergency reboot Rafael J. Wysocki
                   ` (47 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Nico Schottelius

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11220
Subject		: Screen stays black after resume
Submitter	: Nico Schottelius <nico-xuaVFQXs+5hIG4jRRZ66WA@public.gmane.org>
Date		: 2008-07-31 21:05 (24 days old)
References	: http://marc.info/?l=linux-kernel&m=121753882422899&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11219] KVM modules break emergency reboot
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11220] Screen stays black after resume Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11224] Only three cores found on quad-core machine Rafael J. Wysocki
                   ` (46 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Zdenek Kabelac

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11219
Subject		: KVM modules break emergency reboot
Submitter	: Zdenek Kabelac <zdenek.kabelac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-01 20:25 (23 days old)
References	: http://marc.info/?l=linux-kernel&m=121762241105336&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11224] Only three cores found on quad-core machine.
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (7 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11219] KVM modules break emergency reboot Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11230] Kconfig no longer outputs a .config with freshly updated defconfigs Rafael J. Wysocki
                   ` (45 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Dave Jones

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11224
Subject		: Only three cores found on quad-core machine.
Submitter	: Dave Jones <davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-01 18:15 (23 days old)
References	: http://marc.info/?l=linux-kernel&m=121761475224719&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11230] Kconfig no longer outputs a .config with freshly updated defconfigs
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (8 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11224] Only three cores found on quad-core machine Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11237] corrupt PMD after resume Rafael J. Wysocki
                   ` (44 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Josh Boyer

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11230
Subject		: Kconfig no longer outputs a .config with freshly updated defconfigs
Submitter	: Josh Boyer <jwboyer-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date		: 2008-08-02 16:03 (22 days old)
References	: http://marc.info/?l=linux-kernel&m=121769306319391&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11237] corrupt PMD after resume
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (9 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11230] Kconfig no longer outputs a .config with freshly updated defconfigs Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11264] Invalid op opcode in kernel/workqueue Rafael J. Wysocki
                   ` (43 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Alan Jenkins, Hugh Dickins, Ingo Molnar,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11237
Subject		: corrupt PMD after resume
Submitter	: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
Date		: 2008-08-02 9:51 (22 days old)
References	: http://marc.info/?l=linux-kernel&m=121767073424952&w=4
Handled-By	: Hugh Dickins <hugh-DTz5qymZ9yRBDgjK7y7TUQ@public.gmane.org>
		  Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11254] KVM: fix userspace ABI breakage
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (12 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11271] BUG: fealnx in 2.6.27-rc1 Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-24 19:27   ` Adrian Bunk
  2008-08-23 18:10 ` [Bug #11279] 2.6.27-rc0 Power Bugs with HP/Compaq Laptops Rafael J. Wysocki
                   ` (40 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Adrian Bunk

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11254
Subject		: KVM: fix userspace ABI breakage
Submitter	: Adrian Bunk <bunk-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date		: 21 Jul 2008 17:58:26 (0 days old)
References	: http://lkml.org/lkml/2008/7/21/197
Handled-By	: Adrian Bunk <bunk-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Patch		: http://lkml.org/lkml/2008/7/21/197


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11264] Invalid op opcode in kernel/workqueue
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (10 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11237] corrupt PMD after resume Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11271] BUG: fealnx in 2.6.27-rc1 Rafael J. Wysocki
                   ` (42 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jean-Luc Coulon

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11264
Subject		: Invalid op opcode in kernel/workqueue
Submitter	: Jean-Luc Coulon <jean.luc.coulon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-07 04:18 (17 days old)


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11271] BUG: fealnx in 2.6.27-rc1
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (11 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11264] Invalid op opcode in kernel/workqueue Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 22:26   ` Jeff Garzik
  2008-08-23 18:10 ` [Bug #11254] KVM: fix userspace ABI breakage Rafael J. Wysocki
                   ` (41 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Francois Romieu, Jaswinder Singh

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11271
Subject		: BUG: fealnx in 2.6.27-rc1
Submitter	: Jaswinder Singh <jaswinderlinux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-05 14:58 (19 days old)
References	: http://marc.info/?l=linux-netdev&m=121794762016830&w=4
		  http://lkml.org/lkml/2008/8/10/98
Handled-By	: Francois Romieu <romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11272] BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (15 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11282] Please fix x86 defconfig regression Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11276] build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things Rafael J. Wysocki
                   ` (37 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jaswinder Singh

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11272
Subject		: BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835
Submitter	: Jaswinder Singh <jaswinderlinux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-05 15:12 (19 days old)
References	: http://marc.info/?l=linux-kernel&m=121794900319776&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11279] 2.6.27-rc0 Power Bugs with HP/Compaq Laptops
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (13 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11254] KVM: fix userspace ABI breakage Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11282] Please fix x86 defconfig regression Rafael J. Wysocki
                   ` (39 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Matt Parnell

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11279
Subject		: 2.6.27-rc0 Power Bugs with HP/Compaq Laptops
Submitter	: Matt Parnell <mparnell-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-07 14:57 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=121812108031685&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11276] build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (16 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11272] BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835 Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11336] 2.6.27-rc2:stall while mounting root fs Rafael J. Wysocki
                   ` (36 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Bjorn Helgaas, Ingo Molnar, Randy Dunlap

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11276
Subject		: build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things
Submitter	: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-06 17:18 (18 days old)
References	: http://marc.info/?l=linux-kernel&m=121804329014332&w=4
		  http://lkml.org/lkml/2008/7/22/353
Handled-By	: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>
Patch		: http://lkml.org/lkml/2008/7/22/364


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11282] Please fix x86 defconfig regression
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (14 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11279] 2.6.27-rc0 Power Bugs with HP/Compaq Laptops Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11272] BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835 Rafael J. Wysocki
                   ` (38 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Andi Kleen

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11282
Subject		: Please fix x86 defconfig regression
Submitter	: Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org>
Date		: 2008-08-07 20:46 (17 days old)
References	: http://marc.info/?l=linux-kernel&m=121814188805666&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (20 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11340] LTP overnight run resulted in unusable box Rafael J. Wysocki
                   ` (32 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (13 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11335] 2.6.27-rc2-git5 BUG: unable to handle kernel paging request
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (18 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11336] 2.6.27-rc2:stall while mounting root fs Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM Rafael J. Wysocki
                   ` (34 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Hugh Dickins, Randy Dunlap

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11335
Subject		: 2.6.27-rc2-git5 BUG: unable to handle kernel paging request
Submitter	: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-12 4:18 (12 days old)
References	: http://marc.info/?l=linux-kernel&m=121851477201960&w=4
		  http://lkml.org/lkml/2008/8/16/274
Handled-By	: Hugh Dickins <hugh-DTz5qymZ9yRBDgjK7y7TUQ@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11336] 2.6.27-rc2:stall while mounting root fs
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (17 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11276] build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11335] 2.6.27-rc2-git5 BUG: unable to handle kernel paging request Rafael J. Wysocki
                   ` (35 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Torsten Kaiser

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11336
Subject		: 2.6.27-rc2:stall while mounting root fs
Submitter	: Torsten Kaiser <just.for.lkml-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
Date		: 2008-08-12 12:37 (12 days old)
References	: http://marc.info/?l=linux-kernel&m=121854484015909&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (19 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11335] 2.6.27-rc2-git5 BUG: unable to handle kernel paging request Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-24 12:26   ` Martin Michlmayr
  2008-08-23 18:10 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
                   ` (33 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Brice Goglin, Martin Michlmayr

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11334
Subject		: myri10ge: use ioremap_wc: compilation failure on ARM
Submitter	: Martin Michlmayr <tbm-R+vWnYXSFMfQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-10 11:25 (14 days old)
References	: http://marc.info/?l=linux-netdev&m=121836771727632&w=2


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (22 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11340] LTP overnight run resulted in unusable box Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 20:10   ` Linus Torvalds
  2008-08-23 18:10 ` [Bug #11343] SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i Rafael J. Wysocki
                   ` (30 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Alan D. Brunelle, Andrew Morton,
	Linus Torvalds

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11342
Subject		: Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
Submitter	: Alan D. Brunelle <Alan.Brunelle-VXdhtT5mjnY@public.gmane.org>
Date		: 2008-08-13 23:03 (11 days old)
References	: http://marc.info/?l=linux-kernel&m=121866876027629&w=4
Handled-By	: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11340] LTP overnight run resulted in unusable box
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (21 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rafael J. Wysocki
                   ` (31 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Alexey Dobriyan

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11340
Subject		: LTP overnight run resulted in unusable box
Submitter	: Alexey Dobriyan <adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-13 9:24 (11 days old)
References	: http://marc.info/?l=linux-kernel&m=121861951902949&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11343] SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (23 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 22:34   ` Jeff Garzik
  2008-08-23 18:10 ` [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps' Rafael J. Wysocki
                   ` (29 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Manny Maxwell

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11343
Subject		: SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
Submitter	: Manny Maxwell <mannymax-7UBucS1kxs3k1uMJSBkQmQ@public.gmane.org>
Date		: 2008-08-14 4:16 (10 days old)
References	: http://marc.info/?l=linux-kernel&m=121868782917600&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (25 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps' Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-24 21:34   ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11354] AMD Elan regression with 2.6.27-rc3 Rafael J. Wysocki
                   ` (27 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Larry Finger

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11355
Subject		: Regression in 2.6.27-rc2 when cross-building the kernel
Submitter	: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
Date		: 2008-08-16 2:38 (8 days old)
References	: http://marc.info/?l=linux-kernel&m=121885432118368&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (24 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11343] SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-24  6:13   ` Frans Pop
  2008-08-23 18:10 ` [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel Rafael J. Wysocki
                   ` (28 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Frans Pop

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11356
Subject		: Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
Date		: 2008-08-16 19:11 (8 days old)
References	: http://marc.info/?l=linux-kernel&m=121891396320127&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11354] AMD Elan regression with 2.6.27-rc3
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (26 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11357] Can not boot up with zd1211rw USB-Wlan Stick Rafael J. Wysocki
                   ` (26 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Sean Young

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11354
Subject		: AMD Elan regression with 2.6.27-rc3
Submitter	: Sean Young <sean-hENCXIMQXOg@public.gmane.org>
Date		: 2008-08-15 18:37 (9 days old)
References	: http://marc.info/?l=linux-kernel&m=121882578430056&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11357] Can not boot up with zd1211rw USB-Wlan Stick
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (27 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11354] AMD Elan regression with 2.6.27-rc3 Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11358] net: forcedeth call restore mac addr in nv_shutdown path Rafael J. Wysocki
                   ` (25 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, uwe

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11357
Subject		: Can not boot up with zd1211rw USB-Wlan Stick
Submitter	: uwe <kender-KuiJ5kEpwI6ELgA04lAiVw@public.gmane.org>
Date		: 2008-08-16 14:17 (8 days old)


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11358] net: forcedeth call restore mac addr in nv_shutdown path
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (28 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11357] Can not boot up with zd1211rw USB-Wlan Stick Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11360] mpc8xxx_wdt.c doesn't build modular Rafael J. Wysocki
                   ` (24 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Jeff Garzik, Tobias Diedrich, Yinghai Lu

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11358
Subject		: net: forcedeth call restore mac addr in nv_shutdown path
Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-17 3:30 (7 days old)
References	: http://marc.info/?l=linux-kernel&m=121894389018584&w=4
Handled-By	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121894389018584&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (32 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11361] my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-24  6:18   ` Frans Pop
  2008-08-23 18:10 ` [Bug #11388] 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable Rafael J. Wysocki
                   ` (20 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bjorn Helgaas, Frans Pop

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11379
Subject		: char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
Date		: 2008-08-18 13:40 (6 days old)
References	: http://marc.info/?l=linux-kernel&m=121906698213329&w=4
Handled-By	: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11361] my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (31 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11380] lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16 Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop Rafael J. Wysocki
                   ` (21 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Rafael J. Wysocki, Yinghai Lu

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11361
Subject		: my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec
Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-17 6:25 (7 days old)
References	: http://marc.info/?l=linux-kernel&m=121895439927053&w=4
Handled-By	: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121917167232014&w=4

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11360] mpc8xxx_wdt.c doesn't build modular
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (29 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11358] net: forcedeth call restore mac addr in nv_shutdown path Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11380] lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16 Rafael J. Wysocki
                   ` (23 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Anton Vorontsov, Dave Jones

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11360
Subject		: mpc8xxx_wdt.c doesn't build modular
Submitter	: Dave Jones <davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-17 08:07 (7 days old)
References	: http://lkml.org/lkml/2008/8/12/465
Handled-By	: Anton Vorontsov <avorontsov-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
Patch		: http://lkml.org/lkml/2008/8/13/344


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11380] lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (30 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11360] mpc8xxx_wdt.c doesn't build modular Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11361] my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec Rafael J. Wysocki
                   ` (22 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Ingo Molnar

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11380
Subject		: lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16
Submitter	: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date		: 2008-08-20 6:44 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=121921480931970&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11398] hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (34 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11388] 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11401] pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected Rafael J. Wysocki
                   ` (18 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Frans Pop

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11398
Subject		: hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
Date		: 2008-08-21 17:17 (3 days old)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11401] pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (35 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11398] hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11382] e1000e: 2.6.27-rc1 corrupts EEPROM/NVM Rafael J. Wysocki
                   ` (17 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Alan Cox, Alan Cox, Andrew Morton,
	Jens Axboe, Laurent Riffard

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11401
Subject		: pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
Submitter	: Laurent Riffard <laurent.riffard-GANU6spQydw@public.gmane.org>
Date		: 2008-08-22 08:16 (2 days old)


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11382] e1000e: 2.6.27-rc1 corrupts EEPROM/NVM
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (36 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11401] pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11403] 2.6.27-rc2 USB suspend regression Rafael J. Wysocki
                   ` (16 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, David Vrabel

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11382
Subject		: e1000e: 2.6.27-rc1 corrupts EEPROM/NVM
Submitter	: David Vrabel <david.vrabel-kQvG35nSl+M@public.gmane.org>
Date		: 2008-08-08 10:47 (16 days old)
References	: http://marc.info/?l=linux-kernel&m=121819267211679&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11388] 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (33 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11398] hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj Rafael J. Wysocki
                   ` (19 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Joshua Hoblitt, Yinghai Lu

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11388
Subject		: 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable
Submitter	: Joshua Hoblitt <j_kernel-amK9oZtvyLhBDgjK7y7TUQ@public.gmane.org>
Date		: 2008-08-20 17:38 (4 days old)


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11403] 2.6.27-rc2 USB suspend regression
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (37 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11382] e1000e: 2.6.27-rc1 corrupts EEPROM/NVM Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11402] skbuff bug? Rafael J. Wysocki
                   ` (15 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Alan Stern, Jeremy Fitzhardinge

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11403
Subject		: 2.6.27-rc2 USB suspend regression
Submitter	: Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org>
Date		: 2008-08-20 20:48 (4 days old)
References	: http://marc.info/?l=linux-kernel&m=121926536103630&w=4
Handled-By	: Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11404] BUG: in 2.6.23-rc3-git7 in do_cciss_intr
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (39 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11402] skbuff bug? Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11405] 2.6.27-rc3 segfault on cold boot; not on warm boot Rafael J. Wysocki
                   ` (13 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, James Bottomley, Miller, Mike (OS Dev),
	rdunlap

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11404
Subject		: BUG: in 2.6.23-rc3-git7 in do_cciss_intr
Submitter	: rdunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-21 5:52 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121929819616273&w=4
		  http://marc.info/?l=linux-kernel&m=121932889105368&w=4
Handled-By	: Miller, Mike (OS Dev) <Mike.Miller-VXdhtT5mjnY@public.gmane.org>
		  James Bottomley <James.Bottomley-JuX6DAaQMKPCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11406] patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (41 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11405] 2.6.27-rc3 segfault on cold boot; not on warm boot Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11409] build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init' Rafael J. Wysocki
                   ` (11 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Jan Beulich, Robert Richter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11406
Subject		: patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug
Submitter	: Jan Beulich <jbeulich-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
Date		: 2008-08-21 12:59 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121932366326572&w=4
Handled-By	: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
		  Robert Richter <robert.richter-5C7GfCeVMHo@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11405] 2.6.27-rc3 segfault on cold boot; not on warm boot.
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (40 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11404] BUG: in 2.6.23-rc3-git7 in do_cciss_intr Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11406] patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug Rafael J. Wysocki
                   ` (12 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, David Greaves

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11405
Subject		: 2.6.27-rc3 segfault on cold boot; not on warm boot.
Submitter	: David Greaves <david-FQ/kcb21CSxWk0Htik3J/w@public.gmane.org>
Date		: 2008-08-21 9:45 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121931198904777&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11402] skbuff bug?
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (38 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11403] 2.6.27-rc2 USB suspend regression Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11404] BUG: in 2.6.23-rc3-git7 in do_cciss_intr Rafael J. Wysocki
                   ` (14 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Bruce Allan, Jeff Garzik, Jeff Kirsher,
	Yinghai Lu

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11402
Subject		: skbuff bug?
Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-21 3:56 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121929102707658&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11410] SLUB list_lock vs obj_hash.lock...
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (43 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11409] build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init' Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11413] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Rafael J. Wysocki
                   ` (9 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Daniel J Blueman

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11410
Subject		: SLUB list_lock vs obj_hash.lock...
Submitter	: Daniel J Blueman <daniel.blueman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-22 21:48 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121944176609042&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11409] build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init'
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (42 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11406] patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11410] SLUB list_lock vs obj_hash.lock Rafael J. Wysocki
                   ` (10 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Alan Cox, Toralf Förster

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11409
Subject		: build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init'
Submitter	: Toralf FÃ¶rster <toralf.foerster-Mmb7MZpHnFY@public.gmane.org>
Date		: 2008-08-22 8:33 (2 days old)
References	: http://marc.info/?l=linux-kernel&m=121939410214677&w=4
Handled-By	: Alan Cox <alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121943097320451&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11413] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt()
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (44 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11410] SLUB list_lock vs obj_hash.lock Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11407] suspend: unable to handle kernel paging request Rafael J. Wysocki
                   ` (8 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Ingo Molnar, Mikael Pettersson

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11413
Subject		: get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt()
Submitter	: Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
Date		: 2008-08-23 9:48 (1 days old)
References	: http://marc.info/?l=linux-kernel&m=121948503224161&w=4
Handled-By	: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Patch		: http://marc.info/?l=linux-kernel&m=121950734922457&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11407] suspend: unable to handle kernel paging request
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (45 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11413] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-23 18:10 ` [Bug #11414] Random crashes with 2.6.27-rc3 on PPC Rafael J. Wysocki
                   ` (7 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Pavel Machek, Pekka Enberg,
	Rafael J. Wysocki, Vegard Nossum

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11407
Subject		: suspend: unable to handle kernel paging request
Submitter	: Vegard Nossum <vegard.nossum-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date		: 2008-08-21 17:28 (3 days old)
References	: http://marc.info/?l=linux-kernel&m=121933974928881&w=4
Handled-By	: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
		  Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
		  Pavel Machek <pavel-AlSwsSmVLrQ@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11414] Random crashes with 2.6.27-rc3 on PPC
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (46 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11407] suspend: unable to handle kernel paging request Rafael J. Wysocki
@ 2008-08-23 18:10 ` Rafael J. Wysocki
  2008-08-24 17:48 ` 2.6.27-rc4-git1: Reported regressions from 2.6.26 Linus Torvalds
                   ` (6 subsequent siblings)
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-23 18:10 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Michael Buesch

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11414
Subject		: Random crashes with 2.6.27-rc3 on PPC
Submitter	: Michael Buesch <mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
Date		: 2008-08-23 14:10 (1 days old)
References	: http://marc.info/?l=linux-kernel&m=121950076812616&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-23 18:10 ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rafael J. Wysocki
@ 2008-08-23 20:10   ` Linus Torvalds
       [not found]     ` <alpine.LFD.1.10.0808231257310.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-23 20:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Alan D. Brunelle,
	Andrew Morton, Arjan van de Ven, Rusty Russell

On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11342
> Subject		: Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
> Submitter	: Alan D. Brunelle <Alan.Brunelle-VXdhtT5mjnY@public.gmane.org>
> Date		: 2008-08-13 23:03 (11 days old)
> References	: http://marc.info/?l=linux-kernel&m=121866876027629&w=4
> Handled-By	: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but 
then the call chain shows that there is no interrupt going on.

Also, the bisection is senseless - there's a trivial change wrt 
"do_one_initcall()" that got merged, but everything else is trivial about 
lguest and has nothing to do with the whole CPU-init thing. But if it was 
that initcall one, then "git bisect" woul have pointed to it, not the 
merge. And the merge itself had no conflicts or anything else going on..

The fact that it came and went later also implies that it's probably just 
some timing-dependent thing or some subtle memory corruption, making the 
bisection result even less likely to be exact.

But I'm adding Arjan and Rusty to the Cc, because that merge was takign 
Rusty's branch, and the "do_one_initcall()" is Arjan's commit. Since 
undoing that merge apparently does fix it, I'm wondering if something 
there just does end up triggering the problem.

The do_one_commit() thing _is_ in the path of sys_init_module(), so it 
_is_ at least somewhat relevant from an oops standpoint. 

One thing the "do_one_commit()" thing does is to put more pressure on the 
stack due to that whole buffer for the printk's going on.

Alan, can you try 
 - seeing how consistent it is with one kernel (ie boot a known-bad kernel 
   a few times just to see if it really is 100% consistent)
 - try enabling 'initcall_debug' on the kernel command line, to (a) see 
   the new code actually do something and (b) see what it is actually 
   calling just before.

Hmm..

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]     ` <alpine.LFD.1.10.0808231257310.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-23 20:15       ` Arjan van de Ven
       [not found]         ` <48B06FE6.8060404-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
  2008-08-23 20:17       ` Linus Torvalds
  1 sibling, 1 reply; 318+ messages in thread
From: Arjan van de Ven @ 2008-08-23 20:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Alan D. Brunelle, Andrew Morton, Rusty Russell

Linus Torvalds wrote:
> 
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>> The following bug entry is on the current list of known regressions
>> from 2.6.26.  Please verify if it still should be listed and let me know
>> (either way).
>>
>>
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11342
>> Subject		: Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
>> Submitter	: Alan D. Brunelle <Alan.Brunelle-VXdhtT5mjnY@public.gmane.org>
>> Date		: 2008-08-13 23:03 (11 days old)
>> References	: http://marc.info/?l=linux-kernel&m=121866876027629&w=4
>> Handled-By	: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> 
> This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but 
> then the call chain shows that there is no interrupt going on.
> 
> Also, the bisection is senseless - there's a trivial change wrt 
> "do_one_initcall()" that got merged, but everything else is trivial about 
> lguest and has nothing to do with the whole CPU-init thing. But if it was 
> that initcall one, then "git bisect" woul have pointed to it, not the 
> merge. And the merge itself had no conflicts or anything else going on..
> 
> The fact that it came and went later also implies that it's probably just 
> some timing-dependent thing or some subtle memory corruption, making the 
> bisection result even less likely to be exact.
> 
> But I'm adding Arjan and Rusty to the Cc, because that merge was takign 
> Rusty's branch, and the "do_one_initcall()" is Arjan's commit. Since 
> undoing that merge apparently does fix it, I'm wondering if something 
> there just does end up triggering the problem.
> 
> The do_one_commit() thing _is_ in the path of sys_init_module(), so it 
> _is_ at least somewhat relevant from an oops standpoint. 
> 
> One thing the "do_one_commit()" thing does is to put more pressure on the 
> stack due to that whole buffer for the printk's going on.

but it's 64 bit.. with 8Kb stack and separate irq stacks. I'd be surprised if we blow that this easily.
the trace is a tad long with a long ACPI call chain.

Wonder what gcc is in use?
(newer ones tend to be a ton better... but maybe Alex is using a really old one)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]     ` <alpine.LFD.1.10.0808231257310.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-23 20:15       ` Arjan van de Ven
@ 2008-08-23 20:17       ` Linus Torvalds
       [not found]         ` <alpine.LFD.1.10.0808231313170.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-23 20:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Alan D. Brunelle,
	Andrew Morton, Arjan van de Ven, Rusty Russell



On Sat, 23 Aug 2008, Linus Torvalds wrote:
> 
> This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but 
> then the call chain shows that there is no interrupt going on.

Ahh, later in that thread there's another totally unrelated oops in 
debug_mutex_add_waiter().

I'd guess that it is really wild pointer corrupting memory, quite possibly 
due to a double free or something like that. Alan - it would be good to 
run with DEBUG_PAGE_ALLOC and SLUB debugging etc if you don't already do 
that?

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11210] libata badness
  2008-08-23 18:10 ` [Bug #11210] libata badness Rafael J. Wysocki
@ 2008-08-23 22:23   ` Jeff Garzik
       [not found]     ` <48B08DD8.8010906-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Jeff Garzik @ 2008-08-23 22:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ben Dooks,
	Kumar Gala

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11210
> Subject		: libata badness
> Submitter	: Kumar Gala <galak-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
> Date		: 2008-07-31 18:53 (24 days old)
> References	: http://marc.info/?l=linux-ide&m=121753059307310&w=4
> Handled-By	: Ben Dooks <ben-linux-elnMNo+KYs3YtjvyW6yDsg@public.gmane.org>


FWIW,

http://marc.info/?l=linux-kernel&m=121754161727539&w=4

So IMO handled-by is Kumar?

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11271] BUG: fealnx in 2.6.27-rc1
  2008-08-23 18:10 ` [Bug #11271] BUG: fealnx in 2.6.27-rc1 Rafael J. Wysocki
@ 2008-08-23 22:26   ` Jeff Garzik
  0 siblings, 0 replies; 318+ messages in thread
From: Jeff Garzik @ 2008-08-23 22:26 UTC (permalink / raw)
  To: Jaswinder Singh
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Francois Romieu

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11271
> Subject		: BUG: fealnx in 2.6.27-rc1
> Submitter	: Jaswinder Singh <jaswinderlinux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date		: 2008-08-05 14:58 (19 days old)
> References	: http://marc.info/?l=linux-netdev&m=121794762016830&w=4
> 		  http://lkml.org/lkml/2008/8/10/98
> Handled-By	: Francois Romieu <romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org>


Jaswinder, does reverting 28cd4289abc2c8db90344ee4ff064a9bdf086fdf help?

That's the only material change to fealnx itself in years.

If not, any chance you could bisect this problem, and add more info to 
the bug?

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11343] SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
  2008-08-23 18:10 ` [Bug #11343] SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i Rafael J. Wysocki
@ 2008-08-23 22:34   ` Jeff Garzik
  0 siblings, 0 replies; 318+ messages in thread
From: Jeff Garzik @ 2008-08-23 22:34 UTC (permalink / raw)
  To: Manny Maxwell, Linux IDE mailing list
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11343
> Subject		: SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
> Submitter	: Manny Maxwell <mannymax@mannymax.net>
> Date		: 2008-08-14 4:16 (10 days old)
> References	: http://marc.info/?l=linux-kernel&m=121868782917600&w=4


hmmmm.  Looking at changes between the two csets listed in the email 
(623fa57..8f616cd), all of them are driver-specific and unrelated to 
Manny's hardware except for

	commit 2486fa561a3192bbbec39c7feef87a1e07bd6342
	Author: Tejun Heo <tj@kernel.org>
	Date:   Thu Jul 31 07:52:40 2008 +0900

	    libata: update atapi disable handling

So you could try to revert that and see what happens.  But given that 
small range of changes, it really seems like something else, maybe in 
the PCI subsystem (random guess).

Looking at the entire kernel, nothing jumps out, either.  Its mostly fs 
updates (ext4, xfs), a networking update, an ARM update, and a libata 
update.

Also, some reset-related fixes just went in, so re-testing the latest 
-git would be helpful as well.

	Jeff




^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
  2008-08-23 18:10 ` [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps' Rafael J. Wysocki
@ 2008-08-24  6:13   ` Frans Pop
       [not found]     ` <200808240813.56525.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Frans Pop @ 2008-08-24  6:13 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List

On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11356
> Subject	: Linux 2.6.27-rc3 - build failure: undefined reference to
> 		  `.lockdep_count_forward_deps'
> Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> 
> Date		: 2008-08-16 19:11 (8 days old)
> References	: http://marc.info/?l=linux-kernel&m=121891396320127&w=4

Fixed as per: http://marc.info/?l=linux-kernel&m=121898767530602&w=4
Adrian mentioned that he'd closed the bug, but apparently not.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
  2008-08-23 18:10 ` [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop Rafael J. Wysocki
@ 2008-08-24  6:18   ` Frans Pop
       [not found]     ` <200808240818.09275.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Frans Pop @ 2008-08-24  6:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Bjorn Helgaas

On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11379
> Subject	: char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
> Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
> Date		: 2008-08-18 13:40 (6 days old)
> References	: http://marc.info/?l=linux-kernel&m=121906698213329&w=4
> Handled-By	: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>

Fixed with:
commit 5e4c6564c95ce127beeefe75e15cd11c93487436
Author: Kay Sievers <kay.sievers-tD+1rO4QERM@public.gmane.org>
Date:   Thu Aug 21 15:28:56 2008 +0200

    pnp: fix "add acpi:* modalias entries"

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM
  2008-08-23 18:10 ` [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM Rafael J. Wysocki
@ 2008-08-24 12:26   ` Martin Michlmayr
       [not found]     ` <20080824122643.GG8772-u+sgIaa8TU6A7rR/f+Zz5kHK5LHFu9C3@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Martin Michlmayr @ 2008-08-24 12:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Brice Goglin

* Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> [2008-08-23 20:10]:
> This message has been generated automatically as a part of a report
> of recent regressions.
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).

Yes, this is still there.
-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (47 preceding siblings ...)
  2008-08-23 18:10 ` [Bug #11414] Random crashes with 2.6.27-rc3 on PPC Rafael J. Wysocki
@ 2008-08-24 17:48 ` Linus Torvalds
       [not found]   ` <alpine.LFD.1.10.0808241030060.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-24 18:03 ` Linus Torvalds
                   ` (5 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-24 17:48 UTC (permalink / raw)
  To: Rafael J. Wysocki, David Greaves
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11405
> Subject		: 2.6.27-rc3 segfault on cold boot; not on warm boot.
> Submitter	: David Greaves <david-FQ/kcb21CSxWk0Htik3J/w@public.gmane.org>
> Date		: 2008-08-21 9:45 (3 days old)
> References	: http://marc.info/?l=linux-kernel&m=121931198904777&w=4

It would be good to have some kind of bisection of this one, because it 
looks pretty odd. Also, google doesn't find anybody else seeing that 
"segfault at ffffffbf", even though it seems to be very consistent for 
David. So I don't think we'll be able to even _guess_ where it is without 
some more information about exactly when it started happening.

Since it's present in 2.6.26 too, it's clearly not a regression from that 
one, but perhaps more importantly, since it's apparently an old one I'd 
have expected more reports like this if it was some common problem. And 
the warm-vs-cold-boot thing makes me think it's some hardware setup issue. 

Possibly the disk controller, possibly the CPU (eg some MTRR/PAT 
setup issue or TLB thing). But the dmesg's are all from late enough at 
boot that I can't even tell what disk controller it is (except that it is 
SATA), nor can I tell what CPU it is.

But again, if it was some MTRR/PAT issue, I'd expect a _lot_ more reports 
of this. 

MD/XFS sounds unlikely, since they should have absolutely nothing that 
could possibly matter for cold/hot boot.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (48 preceding siblings ...)
  2008-08-24 17:48 ` 2.6.27-rc4-git1: Reported regressions from 2.6.26 Linus Torvalds
@ 2008-08-24 18:03 ` Linus Torvalds
       [not found]   ` <alpine.LFD.1.10.0808241050180.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-24 18:34 ` Linus Torvalds
                   ` (4 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-24 18:03 UTC (permalink / raw)
  To: Rafael J. Wysocki, Vegard Nossum, Daniel J Blueman,
	Thomas Gleixner, Ingo Molnar
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11410
> Subject		: SLUB list_lock vs obj_hash.lock...
> Submitter	: Daniel J Blueman <daniel.blueman@gmail.com>
> Date		: 2008-08-22 21:48 (2 days old)
> References	: http://marc.info/?l=linux-kernel&m=121944176609042&w=4

This one now has a suggested patch for Daniel to try from Vegard, but no 
reply yet:

	http://marc.info/?l=linux-kernel&m=121946972307110&w=4

Vegard, I think your patch is a bit odd, though. The result of your patch 
is

 - first loop:

	hlist_for_each_entry_safe(obj, node, tmp, &db->list, node) {
		hlist_del(&obj->node);
		hlist_add_head(&obj->node, &freelist);
	}

   and quite frankly, I don't see what the difference between that and a 
   something like a simple

	struct hlist_node *first = bd->list.first;
	if (first) {
		bd->list.first = NULL;
		first->pprev = &first;
	}

   really is?

I dunno. We don't have list splicing ops for the hlist things.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (49 preceding siblings ...)
  2008-08-24 18:03 ` Linus Torvalds
@ 2008-08-24 18:34 ` Linus Torvalds
       [not found]   ` <alpine.LFD.1.10.0808241120460.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-24 18:52 ` Linus Torvalds
                   ` (3 subsequent siblings)
  54 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-24 18:34 UTC (permalink / raw)
  To: Rafael J. Wysocki, Alan Cox, Peter Osterlund, Jens Axboe
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11401
> Subject		: pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
> Submitter	: Laurent Riffard <laurent.riffard-GANU6spQydw@public.gmane.org>
> Date		: 2008-08-22 08:16 (2 days old)

This one looks irritating.

It's bisected to 5b6155ee70e9c4d2ad7e6f514c8eee06e2711c3a ("pktcdvd: push 
BKL down into driver"), but the problem goes deeper than that.

The "unlocked" ioctl's do not get a "struct inode *" pointer, they _only_ 
get the "struct file *". And this is very much historical usage, where 
some internal functions only passed in the inode (good or not, whatever).

And ioctl_by_bdev() doesn't have a "struct file *" and has depended on 
passing in a NUMM "struct file *" and its own "struct inode *", and 
expects the ioctl's to just use that instead. But the unlocked ioctl just 
drops it on the floor, and uses just the (unusable) file pointer.

Grr.

And some other cases (like pkt_ioctl() itself) that simply pass in a 
_different_ inode than the file itself is attached to. It does

	blkdev_ioctl(pd->bdev->bd_inode, file, cmd, arg);

where "file" points to the pkt_ioctl thing, but "inode" points to the 
inode "behind" the pkt interface.

Double grr.

I really think the only sane model is to literally make "unlocked_ioctl()" 
have the same calling convention as the old "ioctl()" thing had, and pass 
in both file * and inode *. It was a stupid "cleanup" to try to have a 
simpler interface for the unlocked version. Having two different models, 
where we have actually _depended_ on the old model and then are trying to 
convert to a (weaker) new model, is not a good idea.

The alternative is to do this _only_ for the blkdev_ioctl's, and have 
those only take the "inode *", and then create a new fake "struct file *" 
to go with it, regardless of what "struct file" was passed in (exactly 
because the blockdev ones really think that the inode is the important 
part).

Hmm?

We need to fix this.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]   ` <alpine.LFD.1.10.0808241050180.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-24 18:43     ` Vegard Nossum
       [not found]       ` <19f34abd0808241143t6f5239d7o679135e9e974fe63-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Vegard Nossum @ 2008-08-24 18:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Daniel J Blueman, Thomas Gleixner, Ingo Molnar,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Sun, Aug 24, 2008 at 8:03 PM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>
>
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>>
>> Bug-Entry     : http://bugzilla.kernel.org/show_bug.cgi?id=11410
>> Subject               : SLUB list_lock vs obj_hash.lock...
>> Submitter     : Daniel J Blueman <daniel.blueman-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Date          : 2008-08-22 21:48 (2 days old)
>> References    : http://marc.info/?l=linux-kernel&m=121944176609042&w=4
>
> This one now has a suggested patch for Daniel to try from Vegard, but no
> reply yet:
>
>        http://marc.info/?l=linux-kernel&m=121946972307110&w=4
>

Hi!

> Vegard, I think your patch is a bit odd, though. The result of your patch
> is
>
>  - first loop:
>
>        hlist_for_each_entry_safe(obj, node, tmp, &db->list, node) {
>                hlist_del(&obj->node);
>                hlist_add_head(&obj->node, &freelist);
>        }
>
>   and quite frankly, I don't see what the difference between that and a
>   something like a simple
>
>        struct hlist_node *first = bd->list.first;
>        if (first) {
>                bd->list.first = NULL;
>                first->pprev = &first;
>        }
>
>   really is?
>
> I dunno. We don't have list splicing ops for the hlist things.

Hm.

I haven't really used the hlists before, so my first instinct was to
do what is obvious. That's also why I put the XXX comment. Other than
that, I guess open-coding list ops is also not very good programming
practice? :-)

But... feel free to submit your own patch. Oh, what am I saying.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (50 preceding siblings ...)
  2008-08-24 18:34 ` Linus Torvalds
@ 2008-08-24 18:52 ` Linus Torvalds
  2008-08-24 22:50   ` Sean Young
       [not found]   ` <alpine.LFD.1.10.0808241141470.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-24 19:03 ` Linus Torvalds
                   ` (2 subsequent siblings)
  54 siblings, 2 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-24 18:52 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ingo Molnar, H. Peter Anvin, Alok Kataria,
	Sean Young
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11354
> Subject		: AMD Elan regression with 2.6.27-rc3
> Submitter	: Sean Young <sean-hENCXIMQXOg@public.gmane.org>
> Date		: 2008-08-15 18:37 (9 days old)
> References	: http://marc.info/?l=linux-kernel&m=121882578430056&w=4

Peter? Ingo? Alok?

This _looks_ like it might be due to "x86: merge the TSC cpu-freq code" 
thing by Alok, where we do this:

	+static struct notifier_block time_cpufreq_notifier_block = {
	+       .notifier_call  = time_cpufreq_notifier
	+};
	+
	+static int __init cpufreq_tsc(void)
	+{
	+       cpufreq_register_notifier(&time_cpufreq_notifier_block,
	+                               CPUFREQ_TRANSITION_NOTIFIER);
	+       return 0;
	+}

but that's just _insane_ if the CPU doesn't even support TSC to begin 
with. Also, in the actual time_cpufreq_notifier(), we do:

	if (cpu_has(&cpu_data(freq->cpu), X86_FEATURE_CONSTANT_TSC))
		return 0;

and this is stupid because:

 (a) if the CPU has no TSC at all, then it sure as hell won't have a 
     _constant_ one, so we'll actually continue into the function.

 (b) and why the hell is this done at run-time in the notifier, and not in 
     the "cpufreq_tsc" init function? If anybody mixes totally different 
     kinds of CPU's in SMP, they deserve whatever they want.

so why is the patch not something like the appended?

Sean, does this make any difference for you?

		Linus

---
 arch/x86/kernel/tsc.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 46af716..9bed5ca 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -325,6 +325,10 @@ static struct notifier_block time_cpufreq_notifier_block = {
 
 static int __init cpufreq_tsc(void)
 {
+	if (!cpu_has_tsc)
+		return 0;
+	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+		return 0;
 	cpufreq_register_notifier(&time_cpufreq_notifier_block,
 				CPUFREQ_TRANSITION_NOTIFIER);
 	return 0;

^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]       ` <19f34abd0808241143t6f5239d7o679135e9e974fe63-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-24 18:58         ` Linus Torvalds
       [not found]           ` <alpine.LFD.1.10.0808241152370.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-24 18:58 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Rafael J. Wysocki, Daniel J Blueman, Thomas Gleixner, Ingo Molnar,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Sun, 24 Aug 2008, Vegard Nossum wrote:
> 
> I haven't really used the hlists before, so my first instinct was to
> do what is obvious.

I do agree that the hlist versions aren't very nice in this regard. The 
regular lists are much better at moving lists around.

> Other than that, I guess open-coding list ops is also not very good 
> programming practice? :-)

Agreed. It would be better if the people who use hlists most (I think that 
would be networking) would think about this.

> But... feel free to submit your own patch. Oh, what am I saying.

Silly boy. Next you'll ask me to _test_ any patches I send out.

Anyway, I think your patch is likely fine, I just thought it looked a bit 
odd to have a loop to move a list from one head pointer to another.

But regardless, it would need some testing. Daniel?

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (51 preceding siblings ...)
  2008-08-24 18:52 ` Linus Torvalds
@ 2008-08-24 19:03 ` Linus Torvalds
       [not found]   ` <alpine.LFD.1.10.0808241201090.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-24 21:40 ` Rafael J. Wysocki
  2008-08-25  0:48 ` Benjamin Herrenschmidt
  54 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-24 19:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11356
> Subject		: Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
> Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
> Date		: 2008-08-16 19:11 (8 days old)
> References	: http://marc.info/?l=linux-kernel&m=121891396320127&w=4

Hmm. Wasn't this already confirmed to be fixed by commit
df60a8441866153d691ae69b77934904c2de5e0d? 

At least Adrian sent out an email saying "Confirmed, bug closed.", but 
bugzilla seems to disagree and still show it as open.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]   ` <alpine.LFD.1.10.0808241201090.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-24 19:23     ` Adrian Bunk
  0 siblings, 0 replies; 318+ messages in thread
From: Adrian Bunk @ 2008-08-24 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Sun, Aug 24, 2008 at 12:03:37PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11356
> > Subject		: Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
> > Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
> > Date		: 2008-08-16 19:11 (8 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121891396320127&w=4
> 
> Hmm. Wasn't this already confirmed to be fixed by commit
> df60a8441866153d691ae69b77934904c2de5e0d? 
> 
> At least Adrian sent out an email saying "Confirmed, bug closed.", but 
> bugzilla seems to disagree and still show it as open.

There were two different reports, Rafael opened a bug for each, and I 
missed that there were two open bugs for the same issue.

The one I closed was #11344.

I've now closed #11356 as a duplicate of #11344.

> 		Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]   ` <alpine.LFD.1.10.0808241030060.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-24 19:23     ` David Greaves
       [not found]       ` <48B1B526.2030100-FQ/kcb21CSxWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Greaves @ 2008-08-24 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List

Linus Torvalds wrote:
> 
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11405
>> Subject		: 2.6.27-rc3 segfault on cold boot; not on warm boot.
>> Submitter	: David Greaves <david-FQ/kcb21CSxWk0Htik3J/w@public.gmane.org>
>> Date		: 2008-08-21 9:45 (3 days old)
>> References	: http://marc.info/?l=linux-kernel&m=121931198904777&w=4
> 
> It would be good to have some kind of bisection of this one, because it 
> looks pretty odd. Also, google doesn't find anybody else seeing that 
> "segfault at ffffffbf", even though it seems to be very consistent for 
> David. So I don't think we'll be able to even _guess_ where it is without 
> some more information about exactly when it started happening.
> 
> Since it's present in 2.6.26 too, it's clearly not a regression from that 
> one, but perhaps more importantly, since it's apparently an old one I'd 
> have expected more reports like this if it was some common problem. And 
> the warm-vs-cold-boot thing makes me think it's some hardware setup issue.
> 
> Possibly the disk controller, possibly the CPU (eg some MTRR/PAT 
> setup issue or TLB thing). But the dmesg's are all from late enough at 
> boot that I can't even tell what disk controller it is (except that it is 
> SATA), nor can I tell what CPU it is.
> 
> But again, if it was some MTRR/PAT issue, I'd expect a _lot_ more reports 
> of this. 

OK, that all makes sense.

Given that I'll manage at best 1 bisect/day with a reasonable chance of data
corruption and hardware intermittency screwing it all up I thought it best to
ask first in case there was another debug approach that could work. However
since it does indeed sounds somewhat hardware related and it's an isolated
problem for my wife (as opposed to a problem that others are having too) then I
think she deserves a new machine...

Thanks for the impetus to cheer her up ;)

David
PS if anyone really is interested then I am happy to try the bisection once I've
moved her to a new box; otherwise I'm happy to close this.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11254] KVM: fix userspace ABI breakage
  2008-08-23 18:10 ` [Bug #11254] KVM: fix userspace ABI breakage Rafael J. Wysocki
@ 2008-08-24 19:27   ` Adrian Bunk
       [not found]     ` <20080824192714.GC1627-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Adrian Bunk @ 2008-08-24 19:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Avi Kivity,
	Linus Torvalds, Andrew Morton

On Sat, Aug 23, 2008 at 08:10:16PM +0200, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11254
> Subject		: KVM: fix userspace ABI breakage
> Submitter	: Adrian Bunk <bunk-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Date		: 21 Jul 2008 17:58:26 (0 days old)
> References	: http://lkml.org/lkml/2008/7/21/197
> Handled-By	: Adrian Bunk <bunk-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Patch		: http://lkml.org/lkml/2008/7/21/197

The discussion in Bugzilla whether it is a regression at all can be 
condensed to the following question:

Can a struct that is part of the 2.6.26 userspace headers be defined to 
be part of an "experimental ABI" and therefore be changed?

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11210] libata badness
       [not found]     ` <48B08DD8.8010906-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>
@ 2008-08-24 21:04       ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-24 21:04 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linux Kernel Mailing List, Kernel Testers List, Ben Dooks,
	Kumar Gala

On Sunday, 24 of August 2008, Jeff Garzik wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11210
> > Subject		: libata badness
> > Submitter	: Kumar Gala <galak-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
> > Date		: 2008-07-31 18:53 (24 days old)
> > References	: http://marc.info/?l=linux-ide&m=121753059307310&w=4
> > Handled-By	: Ben Dooks <ben-linux-elnMNo+KYs3YtjvyW6yDsg@public.gmane.org>
> 
> 
> FWIW,
> 
> http://marc.info/?l=linux-kernel&m=121754161727539&w=4
> 
> So IMO handled-by is Kumar?

OK

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM
       [not found]     ` <20080824122643.GG8772-u+sgIaa8TU6A7rR/f+Zz5kHK5LHFu9C3@public.gmane.org>
@ 2008-08-24 21:05       ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-24 21:05 UTC (permalink / raw)
  To: Martin Michlmayr
  Cc: Linux Kernel Mailing List, Kernel Testers List, Brice Goglin

On Sunday, 24 of August 2008, Martin Michlmayr wrote:
> * Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> [2008-08-23 20:10]:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> 
> Yes, this is still there.

Thanks for the update.

Rafael

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
       [not found]     ` <200808240813.56525.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
@ 2008-08-24 21:10       ` Rafael J. Wysocki
  2008-08-25 14:03       ` Adrian Bunk
  1 sibling, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-24 21:10 UTC (permalink / raw)
  To: Frans Pop; +Cc: Linux Kernel Mailing List, Kernel Testers List

On Sunday, 24 of August 2008, Frans Pop wrote:
> On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11356
> > Subject	: Linux 2.6.27-rc3 - build failure: undefined reference to
> > 		  `.lockdep_count_forward_deps'
> > Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> 
> > Date		: 2008-08-16 19:11 (8 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121891396320127&w=4
> 
> Fixed as per: http://marc.info/?l=linux-kernel&m=121898767530602&w=4
> Adrian mentioned that he'd closed the bug, but apparently not.

The bug is closed now.

Thanks,
Rafael



^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
       [not found]     ` <200808240818.09275.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
@ 2008-08-24 21:12       ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-24 21:12 UTC (permalink / raw)
  To: Frans Pop; +Cc: Linux Kernel Mailing List, Kernel Testers List, Bjorn Helgaas

On Sunday, 24 of August 2008, Frans Pop wrote:
> On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11379
> > Subject	: char/tpm: tpm_infineon no longer loaded for HP 2510p laptop
> > Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
> > Date		: 2008-08-18 13:40 (6 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121906698213329&w=4
> > Handled-By	: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>
> 
> Fixed with:
> commit 5e4c6564c95ce127beeefe75e15cd11c93487436
> Author: Kay Sievers <kay.sievers-tD+1rO4QERM@public.gmane.org>
> Date:   Thu Aug 21 15:28:56 2008 +0200
> 
>     pnp: fix "add acpi:* modalias entries"

Thanks,  closed.

Rafael

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel
  2008-08-23 18:10 ` [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel Rafael J. Wysocki
@ 2008-08-24 21:34   ` Rafael J. Wysocki
       [not found]     ` <200808242334.05993.rjw-KKrjLPT3xs0@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-24 21:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Kernel Testers List, Larry Finger, Sam Ravnborg, David Woodhouse,
	Andrew Morton, Linus Torvalds

On Saturday, 23 of August 2008, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11355
> Subject		: Regression in 2.6.27-rc2 when cross-building the kernel
> Submitter	: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
> Date		: 2008-08-16 2:38 (8 days old)
> References	: http://marc.info/?l=linux-kernel&m=121885432118368&w=4

As I wrote in the Bugzilla, I'm seeing a related problem.

Namely, I build kernels on one box, with 'make O=<target>', then I mount
<target> on another one over NFS, 'cd' to it and try to install the kernel
modules with 'make modules_install'.  This results in 'HOSTCC firmware/ihex2fw'
and 'fatal error: ...: Read-only file system'.  It's readily reproducible.

Commenting out line 1130 of Makefile
("$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.fwinst obj=firmware __fw_modinst")
obviously helps, so it looks like Makefile.fwinst needs fixing.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (52 preceding siblings ...)
  2008-08-24 19:03 ` Linus Torvalds
@ 2008-08-24 21:40 ` Rafael J. Wysocki
  2008-08-25  0:48 ` Benjamin Herrenschmidt
  54 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-24 21:40 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

On Saturday, 23 of August 2008, Rafael J. Wysocki wrote:
[--snip--]

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11361
> Subject		: my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec
> Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date		: 2008-08-17 6:25 (7 days old)
> References	: http://marc.info/?l=linux-kernel&m=121895439927053&w=4
> Handled-By	: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
> Patch		: http://marc.info/?l=linux-kernel&m=121917167232014&w=4

[--snip--]
 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11358
> Subject		: net: forcedeth call restore mac addr in nv_shutdown path
> Submitter	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date		: 2008-08-17 3:30 (7 days old)
> References	: http://marc.info/?l=linux-kernel&m=121894389018584&w=4
> Handled-By	: Yinghai Lu <yhlu.kernel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Patch		: http://marc.info/?l=linux-kernel&m=121894389018584&w=4

Jeff, do you have the patches for these two in your queue?

Rafael

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-24 18:52 ` Linus Torvalds
@ 2008-08-24 22:50   ` Sean Young
       [not found]   ` <alpine.LFD.1.10.0808241141470.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 0 replies; 318+ messages in thread
From: Sean Young @ 2008-08-24 22:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, H. Peter Anvin, Alok Kataria,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Sun, Aug 24, 2008 at 11:52:06AM -0700, Linus Torvalds wrote:
> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11354
> > Subject		: AMD Elan regression with 2.6.27-rc3
> > Submitter	: Sean Young <sean@mess.org>
> > Date		: 2008-08-15 18:37 (9 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121882578430056&w=4
> 
> Peter? Ingo? Alok?
> 
> This _looks_ like it might be due to "x86: merge the TSC cpu-freq code" 
> thing by Alok, where we do this:
> 
> 	+static struct notifier_block time_cpufreq_notifier_block = {
> 	+       .notifier_call  = time_cpufreq_notifier
> 	+};
> 	+
> 	+static int __init cpufreq_tsc(void)
> 	+{
> 	+       cpufreq_register_notifier(&time_cpufreq_notifier_block,
> 	+                               CPUFREQ_TRANSITION_NOTIFIER);
> 	+       return 0;
> 	+}
> 
> but that's just _insane_ if the CPU doesn't even support TSC to begin 
> with. Also, in the actual time_cpufreq_notifier(), we do:
> 
> 	if (cpu_has(&cpu_data(freq->cpu), X86_FEATURE_CONSTANT_TSC))
> 		return 0;
> 
> and this is stupid because:
> 
>  (a) if the CPU has no TSC at all, then it sure as hell won't have a 
>      _constant_ one, so we'll actually continue into the function.
> 
>  (b) and why the hell is this done at run-time in the notifier, and not in 
>      the "cpufreq_tsc" init function? If anybody mixes totally different 
>      kinds of CPU's in SMP, they deserve whatever they want.
> 
> so why is the patch not something like the appended?
> 
> Sean, does this make any difference for you?

Yes, this patch fixes it.

Thanks
Sean

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]   ` <alpine.LFD.1.10.0808241141470.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-25  0:16     ` H. Peter Anvin
  0 siblings, 0 replies; 318+ messages in thread
From: H. Peter Anvin @ 2008-08-25  0:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Ingo Molnar, Alok Kataria, Sean Young,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

Linus Torvalds wrote:
>> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 46af716..9bed5ca 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -325,6 +325,10 @@ static struct notifier_block time_cpufreq_notifier_block = {
>  
>  static int __init cpufreq_tsc(void)
>  {
> +	if (!cpu_has_tsc)
> +		return 0;
> +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> +		return 0;
>  	cpufreq_register_notifier(&time_cpufreq_notifier_block,
>  				CPUFREQ_TRANSITION_NOTIFIER);
>  	return 0;

I added this patch to x86/urgent.

	-hpa

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
                   ` (53 preceding siblings ...)
  2008-08-24 21:40 ` Rafael J. Wysocki
@ 2008-08-25  0:48 ` Benjamin Herrenschmidt
  2008-08-25 11:40   ` Rafael J. Wysocki
  54 siblings, 1 reply; 318+ messages in thread
From: Benjamin Herrenschmidt @ 2008-08-25  0:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11414
> Subject		: Random crashes with 2.6.27-rc3 on PPC
> Submitter	: Michael Buesch <mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
> Date		: 2008-08-23 14:10 (1 days old)
> References	: http://marc.info/?l=linux-kernel&m=121950076812616&w=4

This appears to be a gcc bug when -fno-omit-stack-pointer is used (which
we mostly don't need on ppc anyway except that another gcc stupidity makes
it mandatory for -pg which ftrace needs).

We're working on a two fold workaround: removing -fno-omit-stack-pointer
in all the cases where we don't really need it, and for when we do (ie,
CONFIG_FTRACE becaues of -pg), using -mno-sched-epilog which seems to
work around it.

The root cause in gcc hasn't been fully identified yet.

Ben.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]       ` <48B1B526.2030100-FQ/kcb21CSxWk0Htik3J/w@public.gmane.org>
@ 2008-08-25  0:51         ` Linus Torvalds
  0 siblings, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-25  0:51 UTC (permalink / raw)
  To: David Greaves
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Adrian Bunk,
	Andrew Morton, Natalie Protasevich, Kernel Testers List

On Sun, 24 Aug 2008, David Greaves wrote:
> 
> Given that I'll manage at best 1 bisect/day with a reasonable chance of data
> corruption and hardware intermittency screwing it all up I thought it best to
> ask first in case there was another debug approach that could work.

Well, regardless, I think it would be good to fill in the hardware info, 
especially wrt CPU data and the exact SATA controller you have.

There's another regression for SATA cold/hot boot issues, and while that 
one looks very different, and is probably not really related, it's still a 
good idea to try to see if we can match them up. See

	Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=11343
	Subject         : SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i
	Submitter       : Manny Maxwell <mannymax-7UBucS1kxs3k1uMJSBkQmQ@public.gmane.org>
	Date            : 2008-08-14 4:16 (10 days old)
	References      : http://marc.info/?l=linux-kernel&m=121868782917600&w=4

which actually has a patch, and which seems to work fine in 2.6.26 (so not 
only is failure pattern different, the point were it starts is different). 

But regardless of the big differences, it does seem to point to some 
weakness in SATA initialization. But is it limited to _that_ particular 
SATA controller, or just a few ones? Or a generic issue? Without more 
reports to really find a pattern, I don't think we have a clue, and the 
two may be _totally_ unrelated in all ways, but it would be good to at 
least report and log the information you have..

Oh, I just noticed that your dmesg _does_ mention sata_sil and sata_via, 
so we know which of two drivers it would be, at least. Not the nVidia one.

However, there's been tons of changes in soem core functions: both the 
reset handling and the wait-for-ready has changed and caused lots of churn 
across most drivers in between 2.6.25 and 2.6.26. 

> PS if anyone really is interested then I am happy to try the bisection once I've
> moved her to a new box; otherwise I'm happy to close this.

I think it would be good to try to bisect. It could be something that is 
really just limited to that particular machine (maybe it really is some 
flaky hardware that just triggers some timing changes), but more likely it 
isn't. So the more information, the better. So keep the thing open as long 
as somebody is willing to try to gather more info, by all means.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11254] KVM: fix userspace ABI breakage
       [not found]     ` <20080824192714.GC1627-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
@ 2008-08-25 10:23       ` Avi Kivity
  0 siblings, 0 replies; 318+ messages in thread
From: Avi Kivity @ 2008-08-25 10:23 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Linus Torvalds, Andrew Morton

Adrian Bunk wrote:
> The discussion in Bugzilla whether it is a regression at all can be 
> condensed to the following question:
>
> Can a struct that is part of the 2.6.26 userspace headers be defined to 
> be part of an "experimental ABI" and therefore be changed?
>   

It is part of the experimental ABI.

However, as I'm going to apply your patch (as being the simplest fix, 
and as there is no measurable performance impact), the question is moot.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-25  0:48 ` Benjamin Herrenschmidt
@ 2008-08-25 11:40   ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-25 11:40 UTC (permalink / raw)
  To: benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r
  Cc: Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Kernel Testers List

On Monday, 25 of August 2008, Benjamin Herrenschmidt wrote:
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11414
> > Subject		: Random crashes with 2.6.27-rc3 on PPC
> > Submitter	: Michael Buesch <mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>
> > Date		: 2008-08-23 14:10 (1 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121950076812616&w=4
> 
> This appears to be a gcc bug when -fno-omit-stack-pointer is used (which
> we mostly don't need on ppc anyway except that another gcc stupidity makes
> it mandatory for -pg which ftrace needs).
> 
> We're working on a two fold workaround: removing -fno-omit-stack-pointer
> in all the cases where we don't really need it, and for when we do (ie,
> CONFIG_FTRACE becaues of -pg), using -mno-sched-epilog which seems to
> work around it.
> 
> The root cause in gcc hasn't been fully identified yet.

Thanks Ben.

I've already dropped it from the list of recent regressions.

Rafael

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]         ` <alpine.LFD.1.10.0808231313170.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-25 12:03           ` Alan D. Brunelle
       [not found]             ` <48B29F7B.6080405-VXdhtT5mjnY@public.gmane.org>
  2008-08-25 12:44           ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Alan D. Brunelle
  2008-08-25 14:05           ` Alan D. Brunelle
  2 siblings, 1 reply; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 12:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

Linus Torvalds wrote:
> 
> On Sat, 23 Aug 2008, Linus Torvalds wrote:
>> This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but 
>> then the call chain shows that there is no interrupt going on.
> 
> Ahh, later in that thread there's another totally unrelated oops in 
> debug_mutex_add_waiter().
> 
> I'd guess that it is really wild pointer corrupting memory, quite possibly 
> due to a double free or something like that. Alan - it would be good to 
> run with DEBUG_PAGE_ALLOC and SLUB debugging etc if you don't already do 
> that?
> 
> 		Linus
> 

I'll add those in - as to the repeatability: The "bad" kernels seem to
repeat quite reliably - not only in terms of counts (5 or 6 times in a
row before trying something else), but also in terms of the "what" -
either the original issue () or the other kernel with the later issue
(debug_mutex_add_waiter). That's /goodness/ in that it should help
narrow it down.

I'll make sure the kernel is still failing this morning, and then add in
DEBUG_PAGE_ALLOC and if that doesn't help, SLUB debugging...

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]         ` <48B06FE6.8060404-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
@ 2008-08-25 12:07           ` Alan D. Brunelle
  0 siblings, 0 replies; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 12:07 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Rusty Russell

Arjan van de Ven wrote:

> 
> Wonder what gcc is in use?
> (newer ones tend to be a ton better... but maybe Alex is using a really
> old one)

I'm running Ubuntu 8.04 w/ gcc:

gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]             ` <48B29F7B.6080405-VXdhtT5mjnY@public.gmane.org>
@ 2008-08-25 12:22               ` Alan D. Brunelle
       [not found]                 ` <48B2A421.7080705-VXdhtT5mjnY@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 12:22 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven,
	Rusty Russell

[-- Attachment #1: Type: text/plain, Size: 182 bytes --]

Before adding any more debugging, this is the status of my kernel boots:
3 times in a row w/ this same error. (Primary problem is the same,
secondary stacks differ of course.)

Alan

[-- Attachment #2: prob3.txt --]
[-- Type: text/plain, Size: 25900 bytes --]

Loading, please [    6.482953] busybox used greatest stack depth: 4840 bytes left
wait...
[    6.521876] all_generic_ide used greatest stack depth: 4784 bytes left
Begin: Loading essential drivers... ...
[    6.625509] fuse init (API version 7.9)
[    6.625509] modprobe used greatest stack depth: 1720 bytes left
[    6.644854] ACPI: SSDT CFFD0D0A, 08C4 (r1 HPQOEM  CPU_TM2        1 MSFT  100000E)
[    6.651489] BUG: unable to handle kernel NULL pointer dereference at 0000000000000858
[    6.655631] IP: [<ffffffff8025e302>] debug_mutex_add_waiter+0x32/0x80
[    6.655631] PGD 21a0a4067 PUD 21a4bd067 PMD 0
[    6.655631] Oops: 0002 [1] SMP
[    6.655631] CPU 1
[    6.655631] Modules linked in: processor(+) fan thermal_sys fuse
[    6.655631] Pid: 1259, comm: modprobe Not tainted 2.6.27-rc3 #29
[    6.655631] RIP: 0010:[<ffffffff8025e302>]  [<ffffffff8025e302>] debug_mutex_add_waiter+0x32/0x80
[    6.655631] RSP: 0018:ffff88021a4e7998  EFLAGS: 00010002
[    6.655631] RAX: 0000000000000000 RBX: ffff88021a4e79d8 RCX: 0000000000000000
[    6.655631] RDX: 0000000000000001 RSI: ffff88021a4e79d8 RDI: ffffffffa0091a60
[    6.655631] RBP: ffff88021a4e79b8 R08: ffffffff811deff0 R09: ffff8800a6fdb000
[    6.655631] R10: ffffffffa008f524 R11: 0000000000000000 R12: ffffffffa0091a60
[    6.655631] R13: ffff88021a4e6000 R14: ffff88021a9c40a0 R15: ffffffffa0091a98
[    6.655631] FS:  00007f233f11d6e0(0000) GS:ffff88022fc02a00(0000) knlGS:0000000000000000
[    6.655631] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    6.655631] CR2: 0000000000000858 CR3: 000000021a07e000 CR4: 00000000000006e0
[    6.655631] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.655631] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    6.655631] Process modprobe (pid: 1259, threadinfo ffff88021a4e6000, task ffff88021a9c40a0)
[    6.655631] Stack:  0000000000000000 ffffffffa0091a60 0000000000000246 ffffffffa008f524
[    6.655631]  ffff88021a4e7a38 ffffffff8049f596 ffffffffa008f524 ffffffffa0091a18
[    6.655631]  ffff88021a4e79d8 ffff88021a4e79d8 1111111111111111 1111111111111111
[    6.655631] Call Trace:
[    6.655631]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    6.655631]  [<ffffffff8049f596>] mutex_lock_nested+0xa6/0x250
[    6.655631]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    6.655631]  [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[    6.655631]  [<ffffffffa008f524>] get_idr+0x44/0xa0 [thermal_sys]
[    6.655631]  [<ffffffffa008fe43>] thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    6.655631]  [<ffffffffa019b2a3>] acpi_processor_start+0x64b/0x774 [processor]
[    6.655631]  [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[    6.655631]  [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[    6.655631]  [<ffffffff803a7f5e>] acpi_start_single_object+0x2d/0x52
[    6.655631]  [<ffffffff803a9556>] acpi_device_probe+0x7e/0x92
[    6.655631]  [<ffffffff803dd3eb>] driver_probe_device+0x9b/0x1a0
[    6.655631]  [<ffffffff803dd576>] __driver_attach+0x86/0x90
[    6.655631]  [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[    6.655631]  [<ffffffff803dc93d>] bus_for_each_dev+0x5d/0x90
[    6.655631]  [<ffffffff803dd22c>] driver_attach+0x1c/0x20
[    6.655631]  [<ffffffff803dcf79>] bus_add_driver+0x1e9/0x260
[    6.655631]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    6.655631]  [<ffffffff803dd74f>] driver_register+0x5f/0x140
[    6.655631]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    6.655631]  [<ffffffff803a9866>] acpi_bus_register_driver+0x3e/0x40
[    6.655631]  [<ffffffffa0222094>] acpi_processor_init+0x94/0x107 [processor]
[    6.655631]  [<ffffffff80209040>] _stext+0x40/0x180
[    6.655631]  [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[    6.655631]  [<ffffffff802676c2>] sys_init_module+0x142/0x1dc0
[    6.655631]  [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[    6.655631]  [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[    6.655631]  [<ffffffff8020c34b>] system_call_fastpath+0x16/0x1b
[    6.655631]
[    6.655631]
[    6.655631] Code: 20 48 89 5d e8 4c 89 65 f0 48 89 f3 4c 89 6d f8 8b 47 08 49 89 d5 49 89 fc 89 c2 25 ff ff 00 00 c1 ea 10 39 c2 74 1d 49 8b 4
[    6.655631] RIP  [<ffffffff8025e302>] debug_mutex_add_waiter+0x32/0x80
[    6.655631]  RSP <ffff88021a4e7998>
[    6.655631] CR2: 0000000000000858
[    6.655631] ---[ end trace 8bbd31df1403e48e ]---
[    7.024992] modprobe used greatest stack depth: 408 bytes left
[    7.030988] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[    7.031053] IP: [<ffffffff8023f39c>] do_exit+0x28c/0xa10
[    7.031053] PGD 0
[    7.031053] Oops: 0000 [2] SMP
[    7.031053] CPU 1
[    7.031053] Modules linked in: processor(+) fan thermal_sys fuse
[    7.031053] Pid: 1259, comm: modprobe Tainted: G      D   2.6.27-rc3 #29
[    7.031053] RIP: 0010:[<ffffffff8023f39c>]  [<ffffffff8023f39c>] do_exit+0x28c/0xa10
[    7.031053] RSP: 0018:ffff88021a4e77e8  EFLAGS: 00010246
[    7.031053] RAX: 0000000000000000 RBX: 0000000000000198 RCX: 0000000000000000
[    7.031053] RDX: 0000000000000000 RSI: ffffffff802740d0 RDI: 0000000000000000
[    7.031053] RBP: ffff88021a4e7848 R08: 0000000000000001 R09: 0000000000000000
[    7.031053] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88021a9c40a0
[    7.031053] R13: 0000000000000009 R14: ffff88021a4e78e8 R15: ffff88021a18b8a0
[    7.031053] FS:  0000000000000000(0000) GS:ffff88022fc02a00(0000) knlGS:0000000000000000
[    7.031053] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    7.031053] CR2: 0000000000000048 CR3: 0000000000201000 CR4: 00000000000006e0
[    7.031053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.031053] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    7.031053] Process modprobe (pid: 1259, threadinfo ffff88021a4e6000, task ffff88021a9c40a0)
[    7.031053] Stack:  0000000000000001 0000000000000001 0000000000000040 0000000000000001
[    7.031053]  ffff88021a4e7828 ffffffff803c7d59 0000000000000092 0000000000000092
[    7.031053]  ffff88021a4e78e8 0000000000000009 ffff88021a4e78e8 ffff88021a18b8a0
[    7.031053] Call Trace:
[    7.031053]  [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[    7.031053]  [<ffffffff804a1a57>] oops_end+0x87/0x90
[    7.031053]  [<ffffffff804a3d13>] do_page_fault+0x663/0x800
[    7.031053]  [<ffffffff804a162d>] error_exit+0x0/0x9a
[    7.031053]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.031053]  [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[    7.031053]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.031053]  [<ffffffff8049f596>] mutex_lock_nested+0xa6/0x250
[    7.031053]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.031053]  [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[    7.031053]  [<ffffffffa008f524>] get_idr+0x44/0xa0 [thermal_sys]
[    7.031053]  [<ffffffffa008fe43>] thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    7.031053]  [<ffffffffa019b2a3>] acpi_processor_start+0x64b/0x774 [processor]
[    7.031053]  [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[    7.031053]  [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[    7.031053]  [<ffffffff803a7f5e>] acpi_start_single_object+0x2d/0x52
[    7.031053]  [<ffffffff803a9556>] acpi_device_probe+0x7e/0x92
[    7.031053]  [<ffffffff803dd3eb>] driver_probe_device+0x9b/0x1a0
[    7.031053]  [<ffffffff803dd576>] __driver_attach+0x86/0x90
[    7.031053]  [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[    7.031053]  [<ffffffff803dc93d>] bus_for_each_dev+0x5d/0x90
[    7.031053]  [<ffffffff803dd22c>] driver_attach+0x1c/0x20
[    7.031053]  [<ffffffff803dcf79>] bus_add_driver+0x1e9/0x260
[    7.031053]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.031053]  [<ffffffff803dd74f>] driver_register+0x5f/0x140
[    7.031053]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.031053]  [<ffffffff803a9866>] acpi_bus_register_driver+0x3e/0x40
[    7.031053]  [<ffffffffa0222094>] acpi_processor_init+0x94/0x107 [processor]
[    7.031053]  [<ffffffff80209040>] _stext+0x40/0x180
[    7.031053]  [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[    7.031053]  [<ffffffff802676c2>] sys_init_module+0x142/0x1dc0
[    7.031053]  [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[    7.031053]  [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[    7.031053]  [<ffffffff8020c34b>] system_call_fastpath+0x16/0x1b
[    7.031053]
[    7.031053]
[    7.031053] Code: e8 8a e3 0e 00 8b 45 b8 85 c0 74 16 49 8b 84 24 40 07 00 00 8b 80 4c 01 00 00 85 c0 0f 85 77 07 00 00 49 8b 44 24 08 48 8b 4
[    7.031053] RIP  [<ffffffff8023f39c>] do_exit+0x28c/0xa10
[    7.031053]  RSP <ffff88021a4e77e8>
[    7.031053] CR2: 0000000000000048
[    7.421063] ------------[ cut here ]------------
[    7.424883] WARNING: at kernel/sched_fair.c:884 hrtick_start_fair+0x187/0x190()
[    7.424883] Modules linked in: processor(+) fan thermal_sys fuse
[    7.424883] Pid: 1259, comm: modprobe Tainted: G      D   2.6.27-rc3 #29
[    7.424883]
[    7.424883] Call Trace:
[    7.424883]  <IRQ>  [<ffffffff8023baef>] warn_on_slowpath+0x5f/0x80
[    7.424883]  [<ffffffff8022d927>] hrtick_start_fair+0x187/0x190
[    7.424883]  [<ffffffff8022ec79>] enqueue_task_fair+0x49/0x250
[    7.424883]  [<ffffffff8022c290>] enqueue_task+0x50/0x60
[    7.424883]  [<ffffffff8022c2c3>] activate_task+0x23/0x40
[    7.424883]  [<ffffffff80231653>] try_to_wake_up+0x253/0x280
[    7.424883]  [<ffffffff8023168d>] default_wake_function+0xd/0x10
[    7.424883]  [<ffffffff802521d1>] autoremove_wake_function+0x11/0x40
[    7.424883]  [<ffffffff8022bd6a>] __wake_up_common+0x5a/0x90
[    7.424883]  [<ffffffff8022d373>] __wake_up+0x43/0x70
[    7.424883]  [<ffffffff8024e910>] ? delayed_work_timer_fn+0x0/0x40
[    7.424883]  [<ffffffff8024df68>] insert_work+0x48/0x50
[    7.424883]  [<ffffffff8024e8f1>] __queue_work+0x31/0x50
[    7.424883]  [<ffffffff8024e942>] delayed_work_timer_fn+0x32/0x40
[    7.424883]  [<ffffffff80245e5b>] run_timer_softirq+0x1bb/0x230
[    7.424883]  [<ffffffff80255afa>] ? ktime_get_ts+0x4a/0x60
[    7.424883]  [<ffffffff8024157a>] __do_softirq+0x7a/0xf0
[    7.424883]  [<ffffffff8025cd8e>] ? tick_program_event+0x3e/0x70
[    7.424883]  [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[    7.424883]  [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[    7.424883]  [<ffffffff802414f5>] irq_exit+0x85/0x90
[    7.424883]  [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[    7.424883]  [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[    7.424883]  <EOI>  [<ffffffff804a1a1a>] ? oops_end+0x4a/0x90
[    7.424883]  [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[    7.424883]  [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[    7.424883]  [<ffffffff802740d0>] ? release_css_set_taskexit+0x0/0x10
[    7.424883]  [<ffffffff8023f39c>] ? do_exit+0x28c/0xa10
[    7.424883]  [<ffffffff8023f376>] ? do_exit+0x266/0xa10
[    7.424883]  [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[    7.424883]  [<ffffffff804a1a57>] ? oops_end+0x87/0x90
[    7.424883]  [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[    7.424883]  [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff8049f596>] ? mutex_lock_nested+0xa6/0x250
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    7.424883]  [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[    7.424883]  [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[    7.424883]  [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[    7.424883]  [<ffffffff803a7f5e>] ? acpi_start_single_object+0x2d/0x52
[    7.424883]  [<ffffffff803a9556>] ? acpi_device_probe+0x7e/0x92
[    7.424883]  [<ffffffff803dd3eb>] ? driver_probe_device+0x9b/0x1a0
[    7.424883]  [<ffffffff803dd576>] ? __driver_attach+0x86/0x90
[    7.424883]  [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[    7.424883]  [<ffffffff803dc93d>] ? bus_for_each_dev+0x5d/0x90
[    7.424883]  [<ffffffff803dd22c>] ? driver_attach+0x1c/0x20
[    7.424883]  [<ffffffff803dcf79>] ? bus_add_driver+0x1e9/0x260
[    7.424883]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.424883]  [<ffffffff803dd74f>] ? driver_register+0x5f/0x140
[    7.424883]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.424883]  [<ffffffff803a9866>] ? acpi_bus_register_driver+0x3e/0x40
[    7.424883]  [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[    7.424883]  [<ffffffff80209040>] ? _stext+0x40/0x180
[    7.424883]  [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[    7.424883]  [<ffffffff802676c2>] ? sys_init_module+0x142/0x1dc0
[    7.424883]  [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[    7.424883]  [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[    7.424883]  [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[    7.424883]
[    7.424883] ---[ end trace 8bbd31df1403e48e ]---
[    7.424883] ------------[ cut here ]------------
[    7.424883] kernel BUG at kernel/sched.c:1155!
[    7.424883] invalid opcode: 0000 [3] SMP
[    7.424883] CPU 1
[    7.424883] Modules linked in: processor(+) fan thermal_sys fuse
[    7.424883] Pid: 1259, comm: modprobe Tainted: G      D W 2.6.27-rc3 #29
[    7.424883] RIP: 0010:[<ffffffff8022cc2b>]  [<ffffffff8022cc2b>] resched_task+0x6b/0x70
[    7.424883] RSP: 0018:ffff88022f0abce0  EFLAGS: 00010046
[    7.424883] RAX: 0000000000000709 RBX: 0000000004bfe971 RCX: ffff88021a4e6000
[    7.424883] RDX: 0000000000000709 RSI: 0000000000000000 RDI: ffff88021a9c40a0
[    7.424883] RBP: ffff88022f0abce0 R08: ffff88022f180038 R09: ffff88021a9c40d8
[    7.424883] R10: ffffffff810c9e00 R11: 0000000000000000 R12: ffff8800a6fc9000
[    7.424883] R13: ffffffff810c9e00 R14: ffff88021a9c40a0 R15: 0000000000000001
[    7.424883] FS:  0000000000000000(0000) GS:ffff88022fc02a00(0000) knlGS:0000000000000000
[    7.424883] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    7.424883] CR2: 0000000000000048 CR3: 0000000000201000 CR4: 00000000000006e0
[    7.424883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.424883] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    7.424883] Process modprobe (pid: 1259, threadinfo ffff88021a4e6000, task ffff88021a9c40a0)
[    7.424883] Stack:  ffff88022f0abd20 ffffffff802387b3 0000000000000400 0000000000400000
[    7.424883]  ffff88022f180000 ffff8800280a4e00 0000000000000001 0000000000000003
[    7.424883]  ffff88022f0abd70 ffffffff802314bf 0000000100000001 0000000000000000
[    7.424883] Call Trace:
[    7.424883]  <IRQ>  [<ffffffff802387b3>] check_preempt_wakeup+0x133/0x1c0
[    7.424883]  [<ffffffff802314bf>] try_to_wake_up+0xbf/0x280
[    7.424883]  [<ffffffff8023168d>] default_wake_function+0xd/0x10
[    7.424883]  [<ffffffff802521d1>] autoremove_wake_function+0x11/0x40
[    7.424883]  [<ffffffff8022bd6a>] __wake_up_common+0x5a/0x90
[    7.424883]  [<ffffffff8022d373>] __wake_up+0x43/0x70
[    7.424883]  [<ffffffff8024e910>] ? delayed_work_timer_fn+0x0/0x40
[    7.424883]  [<ffffffff8024df68>] insert_work+0x48/0x50
[    7.424883]  [<ffffffff8024e8f1>] __queue_work+0x31/0x50
[    7.424883]  [<ffffffff8024e942>] delayed_work_timer_fn+0x32/0x40
[    7.424883]  [<ffffffff80245e5b>] run_timer_softirq+0x1bb/0x230
[    7.424883]  [<ffffffff80255afa>] ? ktime_get_ts+0x4a/0x60
[    7.424883]  [<ffffffff8024157a>] __do_softirq+0x7a/0xf0
[    7.424883]  [<ffffffff8025cd8e>] ? tick_program_event+0x3e/0x70
[    7.424883]  [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[    7.424883]  [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[    7.424883]  [<ffffffff802414f5>] irq_exit+0x85/0x90
[    7.424883]  [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[    7.424883]  [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[    7.424883]  <EOI>  [<ffffffff804a1a1a>] ? oops_end+0x4a/0x90
[    7.424883]  [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[    7.424883]  [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[    7.424883]  [<ffffffff802740d0>] ? release_css_set_taskexit+0x0/0x10
[    7.424883]  [<ffffffff8023f39c>] ? do_exit+0x28c/0xa10
[    7.424883]  [<ffffffff8023f376>] ? do_exit+0x266/0xa10
[    7.424883]  [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[    7.424883]  [<ffffffff804a1a57>] ? oops_end+0x87/0x90
[    7.424883]  [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[    7.424883]  [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff8049f596>] ? mutex_lock_nested+0xa6/0x250
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    7.424883]  [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[    7.424883]  [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[    7.424883]  [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[    7.424883]  [<ffffffff803a7f5e>] ? acpi_start_single_object+0x2d/0x52
[    7.424883]  [<ffffffff803a9556>] ? acpi_device_probe+0x7e/0x92
[    7.424883]  [<ffffffff803dd3eb>] ? driver_probe_device+0x9b/0x1a0
[    7.424883]  [<ffffffff803dd576>] ? __driver_attach+0x86/0x90
[    7.424883]  [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[    7.424883]  [<ffffffff803dc93d>] ? bus_for_each_dev+0x5d/0x90
[    7.424883]  [<ffffffff803dd22c>] ? driver_attach+0x1c/0x20
[    7.424883]  [<ffffffff803dcf79>] ? bus_add_driver+0x1e9/0x260
[    7.424883]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.424883]  [<ffffffff803dd74f>] ? driver_register+0x5f/0x140
[    7.424883]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.424883]  [<ffffffff803a9866>] ? acpi_bus_register_driver+0x3e/0x40
[    7.424883]  [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[    7.424883]  [<ffffffff80209040>] ? _stext+0x40/0x180
[    7.424883]  [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[    7.424883]  [<ffffffff802676c2>] ? sys_init_module+0x142/0x1dc0
[    7.424883]  [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[    7.424883]  [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[    7.424883]  [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[    7.424883]
[    7.424883]
[    7.424883] Code: 8b 47 08 8b 50 1c 65 8b 04 25 24 00 00 00 39 c2 74 0d 0f ae f0 48 8b 47 08 f6 40 18 04 74 02 c9 c3 89 d7 ff 15 1f 7b 3c 00 c
[    7.424883] RIP  [<ffffffff8022cc2b>] resched_task+0x6b/0x70
[    7.424883]  RSP <ffff88022f0abce0>
[    7.424883] ---[ end trace 8bbd31df1403e48e ]---
[    7.424883] Kernel panic - not syncing: Aiee, killing interrupt handler!
[    7.424883] ------------[ cut here ]------------
[    7.424883] WARNING: at kernel/smp.c:328 smp_call_function_mask+0x25a/0x260()
[    7.424883] Modules linked in: processor(+) fan thermal_sys fuse
[    7.424883] Pid: 1259, comm: modprobe Tainted: G      D W 2.6.27-rc3 #29
[    7.424883]
[    7.424883] Call Trace:
[    7.424883]  <IRQ>  [<ffffffff8023baef>] warn_on_slowpath+0x5f/0x80
[    7.424883]  [<ffffffff8026514a>] smp_call_function_mask+0x25a/0x260
[    7.424883]  [<ffffffff803695bd>] ? string+0x3d/0xd0
[    7.424883]  [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[    7.424883]  [<ffffffff803695bd>] ? string+0x3d/0xd0
[    7.424883]  [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[    7.424883]  [<ffffffff803695bd>] ? string+0x3d/0xd0
[    7.424883]  [<ffffffff803695bd>] ? string+0x3d/0xd0
[    7.424883]  [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[    7.424883]  [<ffffffff80368d5e>] ? number+0x2ae/0x2d0
[    7.424883]  [<ffffffff80368d5e>] ? number+0x2ae/0x2d0
[    7.424883]  [<ffffffff80269ecd>] ? kallsyms_lookup+0x5d/0xa0
[    7.424883]  [<ffffffff80368d5e>] ? number+0x2ae/0x2d0
[    7.424883]  [<ffffffff80369a8b>] ? vsnprintf+0x43b/0x720
[    7.424883]  [<ffffffff80369dd8>] ? sprintf+0x68/0x70
[    7.424883]  [<ffffffff803695bd>] ? string+0x3d/0xd0
[    7.424883]  [<ffffffff804a3fa3>] ? __atomic_notifier_call_chain+0x83/0xa0
[    7.424883]  [<ffffffff804a3f20>] ? __atomic_notifier_call_chain+0x0/0xa0
[    7.424883]  [<ffffffff804a0ef6>] ? _spin_unlock+0x26/0x30
[    7.424883]  [<ffffffff8021c470>] ? stop_this_cpu+0x0/0x30
[    7.424883]  [<ffffffff80265190>] smp_call_function+0x40/0x50
[    7.424883]  [<ffffffff8021c4f3>] native_smp_send_stop+0x23/0x40
[    7.424883]  [<ffffffff8023be3f>] panic+0xaf/0x190
[    7.424883]  [<ffffffff8023cc97>] ? printk+0x67/0x70
[    7.424883]  [<ffffffff8049f4e9>] ? mutex_unlock+0x9/0x10
[    7.424883]  [<ffffffff80256d11>] ? blocking_notifier_call_chain+0x11/0x20
[    7.424883]  [<ffffffff8023f979>] do_exit+0x869/0xa10
[    7.424883]  [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[    7.424883]  [<ffffffff804a1a57>] oops_end+0x87/0x90
[    7.424883]  [<ffffffff8020e08e>] die+0x5e/0x90
[    7.424883]  [<ffffffff804a1f60>] do_trap+0x130/0x150
[    7.424883]  [<ffffffff8020e662>] do_invalid_op+0x92/0xb0
[    7.424883]  [<ffffffff8022cc2b>] ? resched_task+0x6b/0x70
[    7.424883]  [<ffffffff804a162d>] error_exit+0x0/0x9a
[    7.424883]  [<ffffffff8022cc2b>] ? resched_task+0x6b/0x70
[    7.424883]  [<ffffffff802387b3>] check_preempt_wakeup+0x133/0x1c0
[    7.424883]  [<ffffffff802314bf>] try_to_wake_up+0xbf/0x280
[    7.424883]  [<ffffffff8023168d>] default_wake_function+0xd/0x10
[    7.424883]  [<ffffffff802521d1>] autoremove_wake_function+0x11/0x40
[    7.424883]  [<ffffffff8022bd6a>] __wake_up_common+0x5a/0x90
[    7.424883]  [<ffffffff8022d373>] __wake_up+0x43/0x70
[    7.424883]  [<ffffffff8024e910>] ? delayed_work_timer_fn+0x0/0x40
[    7.424883]  [<ffffffff8024df68>] insert_work+0x48/0x50
[    7.424883]  [<ffffffff8024e8f1>] __queue_work+0x31/0x50
[    7.424883]  [<ffffffff8024e942>] delayed_work_timer_fn+0x32/0x40
[    7.424883]  [<ffffffff80245e5b>] run_timer_softirq+0x1bb/0x230
[    7.424883]  [<ffffffff80255afa>] ? ktime_get_ts+0x4a/0x60
[    7.424883]  [<ffffffff8024157a>] __do_softirq+0x7a/0xf0
[    7.424883]  [<ffffffff8025cd8e>] ? tick_program_event+0x3e/0x70
[    7.424883]  [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[    7.424883]  [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[    7.424883]  [<ffffffff802414f5>] irq_exit+0x85/0x90
[    7.424883]  [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[    7.424883]  [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[    7.424883]  <EOI>  [<ffffffff804a1a1a>] ? oops_end+0x4a/0x90
[    7.424883]  [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[    7.424883]  [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[    7.424883]  [<ffffffff802740d0>] ? release_css_set_taskexit+0x0/0x10
[    7.424883]  [<ffffffff8023f39c>] ? do_exit+0x28c/0xa10
[    7.424883]  [<ffffffff8023f376>] ? do_exit+0x266/0xa10
[    7.424883]  [<ffffffff803c7d59>] ? do_unblank_screen+0x19/0x130
[    7.424883]  [<ffffffff804a1a57>] ? oops_end+0x87/0x90
[    7.424883]  [<ffffffff804a3d13>] ? do_page_fault+0x663/0x800
[    7.424883]  [<ffffffff804a162d>] ? error_exit+0x0/0x9a
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff8025e302>] ? debug_mutex_add_waiter+0x32/0x80
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff8049f596>] ? mutex_lock_nested+0xa6/0x250
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffff803635c4>] ? idr_pre_get+0x44/0x90
[    7.424883]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.424883]  [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    7.424883]  [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[    7.424883]  [<ffffffff8031a94b>] ? __sysfs_add_one+0x6b/0xa0
[    7.424883]  [<ffffffff8031ba3c>] ? sysfs_do_create_link+0xbc/0x150
[    7.424883]  [<ffffffff803a7f5e>] ? acpi_start_single_object+0x2d/0x52
[    7.424883]  [<ffffffff803a9556>] ? acpi_device_probe+0x7e/0x92
[    7.424883]  [<ffffffff803dd3eb>] ? driver_probe_device+0x9b/0x1a0
[    7.424883]  [<ffffffff803dd576>] ? __driver_attach+0x86/0x90
[    7.424883]  [<ffffffff803dd4f0>] ? __driver_attach+0x0/0x90
[    7.424883]  [<ffffffff803dc93d>] ? bus_for_each_dev+0x5d/0x90
[    7.424883]  [<ffffffff803dd22c>] ? driver_attach+0x1c/0x20
[    7.424883]  [<ffffffff803dcf79>] ? bus_add_driver+0x1e9/0x260
[    7.424883]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.424883]  [<ffffffff803dd74f>] ? driver_register+0x5f/0x140
[    7.424883]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.424883]  [<ffffffff803a9866>] ? acpi_bus_register_driver+0x3e/0x40
[    7.424883]  [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[    7.424883]  [<ffffffff80209040>] ? _stext+0x40/0x180
[    7.424883]  [<ffffffff802a8911>] ? __vunmap+0xa1/0x110
[    7.424883]  [<ffffffff802676c2>] ? sys_init_module+0x142/0x1dc0
[    7.424883]  [<ffffffff80367b16>] ? __up_read+0x46/0xb0
[    7.424883]  [<ffffffff8048e570>] ? cpu_down+0x0/0x70
[    7.424883]  [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[    7.424883]
[    7.424883] ---[ end trace 8bbd31df1403e48e ]---

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]         ` <alpine.LFD.1.10.0808231313170.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-25 12:03           ` Alan D. Brunelle
@ 2008-08-25 12:44           ` Alan D. Brunelle
  2008-08-25 13:13             ` Alan D. Brunelle
  2008-08-25 18:02             ` Linus Torvalds
  2008-08-25 14:05           ` Alan D. Brunelle
  2 siblings, 2 replies; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 12:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]

Linus Torvalds wrote:
> 
> On Sat, 23 Aug 2008, Linus Torvalds wrote:
>> This one makes no sense. It's triggering a BUG_ON(in_interrupt()), but 
>> then the call chain shows that there is no interrupt going on.
> 
> Ahh, later in that thread there's another totally unrelated oops in 
> debug_mutex_add_waiter().
> 
> I'd guess that it is really wild pointer corrupting memory, quite possibly 
> due to a double free or something like that. Alan - it would be good to 
> run with DEBUG_PAGE_ALLOC and SLUB debugging etc if you don't already do 
> that?

With /just/ DEBUG_PAGE_ALLOC defined, I have seen two general panic types:

o  A new double fault w/ SMP_DEBUG_PAGEALLOC problem (prob4.txt)

o  The NULL pointer dereference @ 0x858 (prob4a.txt)

Enabling SLUB debugging to see what that shows

Alan

[-- Attachment #2: prob4.txt --]
[-- Type: text/plain, Size: 4170 bytes --]

Begin: Loading essential drivers... ...
[    6.680626] fuse init (API version 7.9)
[    6.680626] modprobe used greatest stack depth: 1720 bytes left
[    6.704224] double fault: 0000 [1] SMP DEBUG_PAGEALLOC
[    6.704224] CPU 1
[    6.704224] Modules linked in: processor(+) fan thermal_sys fuse
[    6.710629] Pid: 1259, comm: modprobe Not tainted 2.6.27-rc3 #30
[    6.710629] RIP: 0010:[<ffffffff802214dc>]  [<ffffffff802214dc>] flat_send_IPI_allbutself+0x2c/0x80
[    6.710629] RSP: 0018:ffff88021a513ff8  EFLAGS: 00010282
[    6.710629] RAX: ffffffff805f5520 RBX: ffff88021a5141f8 RCX: 000000000000003f
[    6.710629] RDX: 0000000000000200 RSI: ffffffff80bf7920 RDI: ffff88021a5141f8
[    6.710629] RBP: ffff88021a514408 R08: 0000000000000040 R09: 0000000000000040
[    6.710629] R10: 0000000000001000 R11: 0000000000000000 R12: 00000000000000fc
[    6.710629] R13: ffff88021a514eb8 R14: 0000000000000001 R15: ffffffff8021cba0
[    6.710629] FS:  00007f39acc136e0(0000) GS:ffff88022fc81a00(0000) knlGS:0000000000000000
[    6.710629] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    6.710629] CR2: ffff88021a513fe8 CR3: 000000021a469000 CR4: 00000000000006e0
[    6.710629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.710629] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    6.710629] Process modprobe (pid: 1259, threadinfo ffff88021a514000, task ffff88021a9da050)
[    6.710629] Stack: <1>BUG: unable to handle kernel paging request at ffff88021a513ff8
[    6.710629] IP: [<ffffffff8020dd22>] show_stack_log_lvl+0x82/0x130
[    6.710629] PGD 202063 PUD 10067 PMD 22f176163 PTE 800000021a513160
[    6.710629] Oops: 0000 [2] SMP DEBUG_PAGEALLOC
[    6.710629] CPU 1
[    6.710629] Modules linked in: processor(+) fan thermal_sys fuse
[    6.710629] Pid: 1259, comm: modprobe Not tainted 2.6.27-rc3 #30
[    6.710629] RIP: 0010:[<ffffffff8020dd22>]  [<ffffffff8020dd22>] show_stack_log_lvl+0x82/0x130
[    6.710629] RSP: 0018:ffff88022f12ce28  EFLAGS: 00010046
[    6.710629] RAX: ffff88022fc81a00 RBX: ffff88021a513ff8 RCX: 000000000000000c
[    6.710629] RDX: ffff88021a513ff8 RSI: ffff88022f12cf58 RDI: 0000000000000000
[    6.710629] RBP: ffff88022f12ce78 R08: ffffffff8059aea9 R09: 0000000000000001
[    6.710629] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    6.710629] R13: ffff88022f127fc0 R14: ffff88022f12bfc0 R15: 0000000000000000
[    6.710629] FS:  00007f39acc136e0(0000) GS:ffff88022fc81a00(0000) knlGS:0000000000000000
[    6.710629] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    6.710629] CR2: ffff88021a513ff8 CR3: 000000021a469000 CR4: 00000000000006e0
[    6.710629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.710629] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    6.710629] Process modprobe (pid: 1259, threadinfo ffff88021a514000, task ffff88021a9da050)
[    6.710629] Stack:  ffff88022f12ce78 ffffffff8059aea9 ffff88021a514408 ffff88022f12cf58
[    6.710629]  ffff88021a513ff8 ffff88021a9da050 0000000000000000 ffff88022f12cf58
[    6.710629]  0000000000000040 000000000000002b ffff88022f12ceb8 ffffffff8020dea8
[    6.710629] Call Trace:
[    6.710629]  <#DF>  [<ffffffff8020dea8>] show_registers+0xd8/0x260
[    6.710629]  [<ffffffff8021cba0>] ? do_flush_tlb_all+0x0/0x40
[    6.710629]  [<ffffffff804a2083>] __die+0xa3/0x120
[    6.710629]  [<ffffffff8020e073>] die+0x43/0x90
[    6.710629]  [<ffffffff8020e1e3>] do_double_fault+0x63/0x70
[    6.710629]  [<ffffffff8020d4fd>] double_fault+0x7d/0x90
[    6.710629]  [<ffffffff8021cba0>] ? do_flush_tlb_all+0x0/0x40
[    6.710629]  [<ffffffff802214dc>] ? flat_send_IPI_allbutself+0x2c/0x80
[    6.710629]  <<EOE>>
[    6.710629]
[    6.710629] Code: 55 d0 85 c9 48 89 d3 7e 5a 45 31 e4 eb 44 4c 39 f3 77 44 66 0f 1f 44 00 00 74 7e 45 85 e4 74 0b 41 f6 c4 03 0f 1f 44 00 00 7
[    6.720629] RIP  [<ffffffff8020dd22>] show_stack_log_lvl+0x82/0x130
[    6.720629]  RSP <ffff88022f12ce28>
[    6.720629] CR2: ffff88021a513ff8
[    6.720629] ---[ end trace d4fdf12ff3e07cc3 ]---
[    7.052961] modprobe used greatest stack depth: 920 bytes left
Killed

[-- Attachment #3: prob4a.txt --]
[-- Type: text/plain, Size: 21642 bytes --]

[    6.551876] all_generic_ide used greatest stack depth: 4784 bytes left
Begin: Loading essential drivers... ...
[    6.658003] fuse init (API version 7.9)
[    6.661876] modprobe used greatest stack depth: 1720 bytes left
[    6.683510] ACPI: SSDT CFFD0D0A, 08C4 (r1 HPQOEM  CPU_TM2        1 MSFT  100000E)
[    6.690632] BUG: unable to handle kernel NULL pointer dereference at 0000000000000858
[    6.690632] IP: [<ffffffff8025e512>] debug_mutex_add_waiter+0x32/0x80
[    6.690632] PGD 21a145067 PUD 22f13a067 PMD 0
[    6.690632] Oops: 0002 [1] SMP DEBUG_PAGEALLOC
[    6.690632] CPU 1
[    6.690632] Modules linked in: processor(+) fan thermal_sys fuse
[    6.690632] Pid: 1259, comm: modprobe Not tainted 2.6.27-rc3 #30
[    6.690632] RIP: 0010:[<ffffffff8025e512>]  [<ffffffff8025e512>] debug_mutex_add_waiter+0x32/0x80
[    6.690632] RSP: 0018:ffff88021a959998  EFLAGS: 00010002
[    6.690632] RAX: 0000000000000000 RBX: ffff88021a9599d8 RCX: 0000000000000000
[    6.690632] RDX: 0000000000000001 RSI: ffff88021a9599d8 RDI: ffffffffa0091a60
[    6.690632] RBP: ffff88021a9599b8 R08: ffffffff811deff0 R09: ffff8800a6fdb000
[    6.690632] R10: ffffffffa008f524 R11: 0000000000000000 R12: ffffffffa0091a60
[    6.690632] R13: ffff88021a958000 R14: ffff88021a1c2050 R15: ffffffffa0091a98
[    6.690632] FS:  00007f28063c16e0(0000) GS:ffff88022fc81a00(0000) knlGS:0000000000000000
[    6.690632] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    6.690632] CR2: 0000000000000858 CR3: 0000000219c64000 CR4: 00000000000006e0
[    6.690632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.690632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    6.690632] Process modprobe (pid: 1259, threadinfo ffff88021a958000, task ffff88021a1c2050)
[    6.690632] Stack:  0000000000000000 ffffffffa0091a60 0000000000000246 ffffffffa008f524
[    6.690632]  ffff88021a959a38 ffffffff8049f856 ffffffffa008f524 ffffffffa0091a18
[    6.690632]  ffff88021a9599d8 ffff88021a9599d8 1111111111111111 1111111111111111
[    6.690632] Call Trace:
[    6.690632]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    6.690632]  [<ffffffff8049f856>] mutex_lock_nested+0xa6/0x250
[    6.690632]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    6.690632]  [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[    6.690632]  [<ffffffffa008f524>] get_idr+0x44/0xa0 [thermal_sys]
[    6.690632]  [<ffffffffa008fe43>] thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    6.690632]  [<ffffffffa019b2a3>] acpi_processor_start+0x64b/0x774 [processor]
[    6.690632]  [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[    6.690632]  [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[    6.690632]  [<ffffffff803a821e>] acpi_start_single_object+0x2d/0x52
[    6.690632]  [<ffffffff803a9816>] acpi_device_probe+0x7e/0x92
[    6.690632]  [<ffffffff803dd6ab>] driver_probe_device+0x9b/0x1a0
[    6.690632]  [<ffffffff803dd836>] __driver_attach+0x86/0x90
[    6.690632]  [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[    6.690632]  [<ffffffff803dcbfd>] bus_for_each_dev+0x5d/0x90
[    6.690632]  [<ffffffff803dd4ec>] driver_attach+0x1c/0x20
[    6.690632]  [<ffffffff803dd239>] bus_add_driver+0x1e9/0x260
[    6.690632]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    6.690632]  [<ffffffff803dda0f>] driver_register+0x5f/0x140
[    6.690632]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    6.690632]  [<ffffffff803a9b26>] acpi_bus_register_driver+0x3e/0x40
[    6.690632]  [<ffffffffa0222094>] acpi_processor_init+0x94/0x107 [processor]
[    6.690632]  [<ffffffff80209040>] _stext+0x40/0x180
[    6.690632]  [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[    6.690632]  [<ffffffff802678d2>] sys_init_module+0x142/0x1dc0
[    6.690632]  [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[    6.690632]  [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[    6.690632]  [<ffffffff8020c34b>] system_call_fastpath+0x16/0x1b
[    6.690632]
[    6.690632]
[    6.690632] Code: 20 48 89 5d e8 4c 89 65 f0 48 89 f3 4c 89 6d f8 8b 47 08 49 89 d5 49 89 fc 89 c2 25 ff ff 00 00 c1 ea 10 39 c2 74 1d 49 8b 4
[    6.690632] RIP  [<ffffffff8025e512>] debug_mutex_add_waiter+0x32/0x80
[    6.690632]  RSP <ffff88021a959998>
[    6.690632] CR2: 0000000000000858
[    6.690632] ---[ end trace 62c38812ae35bad0 ]---
[    7.060556] ------------[ cut here ]------------
[    7.060741] WARNING: at kernel/sched_fair.c:884 hrtick_start_fair+0x187/0x190()
[    7.060741] Modules linked in: processor(+) fan thermal_sys fuse
[    7.060741] Pid: 1259, comm: modprobe Tainted: G      D   2.6.27-rc3 #30
[    7.060741]
[    7.060741] Call Trace:
[    7.060741]  <IRQ>  [<ffffffff8023bcff>] warn_on_slowpath+0x5f/0x80
[    7.060741]  [<ffffffff8022db37>] hrtick_start_fair+0x187/0x190
[    7.060741]  [<ffffffff8022ee89>] enqueue_task_fair+0x49/0x250
[    7.060741]  [<ffffffff8022c4a0>] enqueue_task+0x50/0x60
[    7.060741]  [<ffffffff8022c4d3>] activate_task+0x23/0x40
[    7.060741]  [<ffffffff80231863>] try_to_wake_up+0x253/0x280
[    7.060741]  [<ffffffff8023189d>] default_wake_function+0xd/0x10
[    7.060741]  [<ffffffff802523e1>] autoremove_wake_function+0x11/0x40
[    7.060741]  [<ffffffff8022bf7a>] __wake_up_common+0x5a/0x90
[    7.060741]  [<ffffffff8022d583>] __wake_up+0x43/0x70
[    7.060741]  [<ffffffff8024eb20>] ? delayed_work_timer_fn+0x0/0x40
[    7.060741]  [<ffffffff8024e178>] insert_work+0x48/0x50
[    7.060741]  [<ffffffff8024eb01>] __queue_work+0x31/0x50
[    7.060741]  [<ffffffff8024eb52>] delayed_work_timer_fn+0x32/0x40
[    7.060741]  [<ffffffff8024606b>] run_timer_softirq+0x1bb/0x230
[    7.060741]  [<ffffffff80255d0a>] ? ktime_get_ts+0x4a/0x60
[    7.060741]  [<ffffffff8024178a>] __do_softirq+0x7a/0xf0
[    7.060741]  [<ffffffff8025cf9e>] ? tick_program_event+0x3e/0x70
[    7.060741]  [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[    7.060741]  [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[    7.060741]  [<ffffffff80241705>] irq_exit+0x85/0x90
[    7.060741]  [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[    7.060741]  [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[    7.060741]  <EOI>  [<ffffffff804a15eb>] ? _spin_unlock_irq+0x2b/0x30
[    7.060741]  [<ffffffff804a0e75>] ? __down_read+0xa5/0xb7
[    7.060741]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060741]  [<ffffffff8049fe57>] ? down_read+0x37/0x40
[    7.060741]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060741]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060741]  [<ffffffff8023f4ad>] ? do_exit+0x18d/0xa10
[    7.060741]  [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[    7.060741]  [<ffffffff804a1d17>] ? oops_end+0x87/0x90
[    7.060741]  [<ffffffff804a3fe3>] ? do_page_fault+0x663/0x800
[    7.060741]  [<ffffffff804a18ed>] ? error_exit+0x0/0x9a
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffff8025e512>] ? debug_mutex_add_waiter+0x32/0x80
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffff8049f856>] ? mutex_lock_nested+0xa6/0x250
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    7.060741]  [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[    7.060741]  [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[    7.060741]  [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[    7.060741]  [<ffffffff803a821e>] ? acpi_start_single_object+0x2d/0x52
[    7.060741]  [<ffffffff803a9816>] ? acpi_device_probe+0x7e/0x92
[    7.060741]  [<ffffffff803dd6ab>] ? driver_probe_device+0x9b/0x1a0
[    7.060741]  [<ffffffff803dd836>] ? __driver_attach+0x86/0x90
[    7.060741]  [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[    7.060741]  [<ffffffff803dcbfd>] ? bus_for_each_dev+0x5d/0x90
[    7.060741]  [<ffffffff803dd4ec>] ? driver_attach+0x1c/0x20
[    7.060741]  [<ffffffff803dd239>] ? bus_add_driver+0x1e9/0x260
[    7.060741]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.060741]  [<ffffffff803dda0f>] ? driver_register+0x5f/0x140
[    7.060741]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.060741]  [<ffffffff803a9b26>] ? acpi_bus_register_driver+0x3e/0x40
[    7.060741]  [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[    7.060741]  [<ffffffff80209040>] ? _stext+0x40/0x180
[    7.060741]  [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[    7.060741]  [<ffffffff802678d2>] ? sys_init_module+0x142/0x1dc0
[    7.060741]  [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[    7.060741]  [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[    7.060741]  [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[    7.060741]
[    7.060741] ---[ end trace 62c38812ae35bad0 ]---
[    7.060741] ------------[ cut here ]------------
[    7.060741] kernel BUG at kernel/sched.c:1155!
[    7.060741] invalid opcode: 0000 [2] SMP DEBUG_PAGEALLOC
[    7.060741] CPU 1
[    7.060741] Modules linked in: processor(+) fan thermal_sys fuse
[    7.060741] Pid: 1259, comm: modprobe Tainted: G      D W 2.6.27-rc3 #30
[    7.060741] RIP: 0010:[<ffffffff8022ce3b>]  [<ffffffff8022ce3b>] resched_task+0x6b/0x70
[    7.060741] RSP: 0018:ffff88022f12bce0  EFLAGS: 00010046
[    7.060741] RAX: 00000000000006e5 RBX: 00000000012c627a RCX: ffff88021a958000
[    7.060741] RDX: 00000000000006e5 RSI: 0000000000000000 RDI: ffff88021a1c2050
[    7.060741] RBP: ffff88022f12bce0 R08: ffff88022f1d8038 R09: ffff88021a1c2088
[    7.060741] R10: ffffffff810c9e00 R11: 0000000000000000 R12: ffff8800a6fc9000
[    7.060741] R13: ffffffff810c9e00 R14: ffff88021a1c2050 R15: 0000000000000001
[    7.060741] FS:  00007f28063c16e0(0000) GS:ffff88022fc81a00(0000) knlGS:0000000000000000
[    7.060741] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    7.060741] CR2: 0000000000000858 CR3: 0000000219c64000 CR4: 00000000000006e0
[    7.060741] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    7.060741] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    7.060741] Process modprobe (pid: 1259, threadinfo ffff88021a958000, task ffff88021a1c2050)
[    7.060741] Stack:  ffff88022f12bd20 ffffffff802389c3 0000000000000400 0000000000400000
[    7.060741]  ffff88022f1d8000 ffff8800280a4e00 0000000000000001 0000000000000003
[    7.060741]  ffff88022f12bd70 ffffffff802316cf 0000000100000001 0000000000000000
[    7.060741] Call Trace:
[    7.060741]  <IRQ>  [<ffffffff802389c3>] check_preempt_wakeup+0x133/0x1c0
[    7.060741]  [<ffffffff802316cf>] try_to_wake_up+0xbf/0x280
[    7.060741]  [<ffffffff8023189d>] default_wake_function+0xd/0x10
[    7.060741]  [<ffffffff802523e1>] autoremove_wake_function+0x11/0x40
[    7.060741]  [<ffffffff8022bf7a>] __wake_up_common+0x5a/0x90
[    7.060741]  [<ffffffff8022d583>] __wake_up+0x43/0x70
[    7.060741]  [<ffffffff8024eb20>] ? delayed_work_timer_fn+0x0/0x40
[    7.060741]  [<ffffffff8024e178>] insert_work+0x48/0x50
[    7.060741]  [<ffffffff8024eb01>] __queue_work+0x31/0x50
[    7.060741]  [<ffffffff8024eb52>] delayed_work_timer_fn+0x32/0x40
[    7.060741]  [<ffffffff8024606b>] run_timer_softirq+0x1bb/0x230
[    7.060741]  [<ffffffff80255d0a>] ? ktime_get_ts+0x4a/0x60
[    7.060741]  [<ffffffff8024178a>] __do_softirq+0x7a/0xf0
[    7.060741]  [<ffffffff8025cf9e>] ? tick_program_event+0x3e/0x70
[    7.060741]  [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[    7.060741]  [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[    7.060741]  [<ffffffff80241705>] irq_exit+0x85/0x90
[    7.060741]  [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[    7.060741]  [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[    7.060741]  <EOI>  [<ffffffff804a15eb>] ? _spin_unlock_irq+0x2b/0x30
[    7.060741]  [<ffffffff804a0e75>] ? __down_read+0xa5/0xb7
[    7.060741]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060741]  [<ffffffff8049fe57>] ? down_read+0x37/0x40
[    7.060741]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060741]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060741]  [<ffffffff8023f4ad>] ? do_exit+0x18d/0xa10
[    7.060741]  [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[    7.060741]  [<ffffffff804a1d17>] ? oops_end+0x87/0x90
[    7.060741]  [<ffffffff804a3fe3>] ? do_page_fault+0x663/0x800
[    7.060741]  [<ffffffff804a18ed>] ? error_exit+0x0/0x9a
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffff8025e512>] ? debug_mutex_add_waiter+0x32/0x80
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffff8049f856>] ? mutex_lock_nested+0xa6/0x250
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[    7.060741]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060741]  [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    7.060741]  [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[    7.060741]  [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[    7.060741]  [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[    7.060741]  [<ffffffff803a821e>] ? acpi_start_single_object+0x2d/0x52
[    7.060741]  [<ffffffff803a9816>] ? acpi_device_probe+0x7e/0x92
[    7.060741]  [<ffffffff803dd6ab>] ? driver_probe_device+0x9b/0x1a0
[    7.060741]  [<ffffffff803dd836>] ? __driver_attach+0x86/0x90
[    7.060741]  [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[    7.060741]  [<ffffffff803dcbfd>] ? bus_for_each_dev+0x5d/0x90
[    7.060741]  [<ffffffff803dd4ec>] ? driver_attach+0x1c/0x20
[    7.060741]  [<ffffffff803dd239>] ? bus_add_driver+0x1e9/0x260
[    7.060741]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.060741]  [<ffffffff803dda0f>] ? driver_register+0x5f/0x140
[    7.060741]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.060741]  [<ffffffff803a9b26>] ? acpi_bus_register_driver+0x3e/0x40
[    7.060741]  [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[    7.060741]  [<ffffffff80209040>] ? _stext+0x40/0x180
[    7.060741]  [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[    7.060741]  [<ffffffff802678d2>] ? sys_init_module+0x142/0x1dc0
[    7.060741]  [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[    7.060741]  [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[    7.060741]  [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[    7.060741]
[    7.060741]
[    7.060741] Code: 8b 47 08 8b 50 1c 65 8b 04 25 24 00 00 00 39 c2 74 0d 0f ae f0 48 8b 47 08 f6 40 18 04 74 02 c9 c3 89 d7 ff 15 0f 79 3c 00 c
[    7.060741] RIP  [<ffffffff8022ce3b>] resched_task+0x6b/0x70
[    7.060741]  RSP <ffff88022f12bce0>
[    7.060741] ---[ end trace 62c38812ae35bad0 ]---
[    7.060741] Kernel panic - not syncing: Aiee, killing interrupt handler!
[    7.060741] ------------[ cut here ]------------
[    7.060741] WARNING: at kernel/smp.c:328 smp_call_function_mask+0x25a/0x260()
[    7.060741] Modules linked in: processor(+) fan thermal_sys fuse
[    7.060741] Pid: 1259, comm: modprobe Tainted: G      D W 2.6.27-rc3 #30
[    7.060741]
[    7.060741] Call Trace:
[    7.060741]  <IRQ>  [<ffffffff8023bcff>] warn_on_slowpath+0x5f/0x80
[    7.060741]  [<ffffffff8026535a>] smp_call_function_mask+0x25a/0x260
[    7.060741]  [<ffffffff8036987d>] ? string+0x3d/0xd0
[    7.060741]  [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[    7.060741]  [<ffffffff8036987d>] ? string+0x3d/0xd0
[    7.060741]  [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[    7.060741]  [<ffffffff8036987d>] ? string+0x3d/0xd0
[    7.060741]  [<ffffffff8036987d>] ? string+0x3d/0xd0
[    7.060741]  [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[    7.060741]  [<ffffffff8036901e>] ? number+0x2ae/0x2d0
[    7.060741]  [<ffffffff8036901e>] ? number+0x2ae/0x2d0
[    7.060741]  [<ffffffff8026a0dd>] ? kallsyms_lookup+0x5d/0xa0
[    7.060741]  [<ffffffff8036901e>] ? number+0x2ae/0x2d0
[    7.060741]  [<ffffffff80369d4b>] ? vsnprintf+0x43b/0x720
[    7.060741]  [<ffffffff8036a098>] ? sprintf+0x68/0x70
[    7.060741]  [<ffffffff8036987d>] ? string+0x3d/0xd0
[    7.060741]  [<ffffffff804a4273>] ? __atomic_notifier_call_chain+0x83/0xa0
[    7.060741]  [<ffffffff804a41f0>] ? __atomic_notifier_call_chain+0x0/0xa0
[    7.060741]  [<ffffffff804a11b6>] ? _spin_unlock+0x26/0x30
[    7.060741]  [<ffffffff8021c470>] ? stop_this_cpu+0x0/0x30
[    7.060741]  [<ffffffff802653a0>] smp_call_function+0x40/0x50
[    7.060741]  [<ffffffff8021c4f3>] native_smp_send_stop+0x23/0x40
[    7.060741]  [<ffffffff8023c04f>] panic+0xaf/0x190
[    7.060741]  [<ffffffff8023cea7>] ? printk+0x67/0x70
[    7.060741]  [<ffffffff8049f7a9>] ? mutex_unlock+0x9/0x10
[    7.060741]  [<ffffffff80256f21>] ? blocking_notifier_call_chain+0x11/0x20
[    7.060741]  [<ffffffff8023fb89>] do_exit+0x869/0xa10
[    7.060741]  [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[    7.060741]  [<ffffffff804a1d17>] oops_end+0x87/0x90
[    7.060741]  [<ffffffff8020e08e>] die+0x5e/0x90
[    7.060741]  [<ffffffff804a2230>] do_trap+0x130/0x150
[    7.060741]  [<ffffffff8020e662>] do_invalid_op+0x92/0xb0
[    7.060741]  [<ffffffff8022ce3b>] ? resched_task+0x6b/0x70
[    7.060741]  [<ffffffff804a18ed>] error_exit+0x0/0x9a
[    7.060741]  [<ffffffff8022ce3b>] ? resched_task+0x6b/0x70
[    7.060741]  [<ffffffff802389c3>] check_preempt_wakeup+0x133/0x1c0
[    7.060741]  [<ffffffff802316cf>] try_to_wake_up+0xbf/0x280
[    7.060741]  [<ffffffff8023189d>] default_wake_function+0xd/0x10
[    7.060741]  [<ffffffff802523e1>] autoremove_wake_function+0x11/0x40
[    7.060741]  [<ffffffff8022bf7a>] __wake_up_common+0x5a/0x90
[    7.060741]  [<ffffffff8022d583>] __wake_up+0x43/0x70
[    7.060741]  [<ffffffff8024eb20>] ? delayed_work_timer_fn+0x0/0x40
[    7.060741]  [<ffffffff8024e178>] insert_work+0x48/0x50
[    7.060741]  [<ffffffff8024eb01>] __queue_work+0x31/0x50
[    7.060741]  [<ffffffff8024eb52>] delayed_work_timer_fn+0x32/0x40
[    7.060741]  [<ffffffff8024606b>] run_timer_softirq+0x1bb/0x230
[    7.060741]  [<ffffffff80255d0a>] ? ktime_get_ts+0x4a/0x60
[    7.060741]  [<ffffffff8024178a>] __do_softirq+0x7a/0xf0
[    7.060741]  [<ffffffff8025cf9e>] ? tick_program_event+0x3e/0x70
[    7.060741]  [<ffffffff8020d69c>] call_softirq+0x1c/0x30
[    7.060741]  [<ffffffff8020f28d>] do_softirq+0x3d/0x80
[    7.060741]  [<ffffffff80241705>] irq_exit+0x85/0x90
[    7.060741]  [<ffffffff8021d648>] smp_apic_timer_interrupt+0x88/0xc0
[    7.060741]  [<ffffffff8020d0e6>] apic_timer_interrupt+0x66/0x70
[    7.060741]  <EOI>  [<ffffffff804a15eb>] ? _spin_unlock_irq+0x2b/0x30
[    7.060741]  [<ffffffff804a0e75>] ? __down_read+0xa5/0xb7
[    7.060741]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060742]  [<ffffffff8049fe57>] ? down_read+0x37/0x40
[    7.060742]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060742]  [<ffffffff8026fdf5>] ? acct_collect+0x45/0x1d0
[    7.060742]  [<ffffffff8023f4ad>] ? do_exit+0x18d/0xa10
[    7.060742]  [<ffffffff803c8019>] ? do_unblank_screen+0x19/0x130
[    7.060742]  [<ffffffff804a1d17>] ? oops_end+0x87/0x90
[    7.060742]  [<ffffffff804a3fe3>] ? do_page_fault+0x663/0x800
[    7.060742]  [<ffffffff804a18ed>] ? error_exit+0x0/0x9a
[    7.060742]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060742]  [<ffffffff8025e512>] ? debug_mutex_add_waiter+0x32/0x80
[    7.060742]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060742]  [<ffffffff8049f856>] ? mutex_lock_nested+0xa6/0x250
[    7.060742]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060742]  [<ffffffff80363884>] ? idr_pre_get+0x44/0x90
[    7.060742]  [<ffffffffa008f524>] ? get_idr+0x44/0xa0 [thermal_sys]
[    7.060742]  [<ffffffffa008fe43>] ? thermal_cooling_device_register+0x83/0x250 [thermal_sys]
[    7.060742]  [<ffffffffa019b2a3>] ? acpi_processor_start+0x64b/0x774 [processor]
[    7.060742]  [<ffffffff8031ac0b>] ? __sysfs_add_one+0x6b/0xa0
[    7.060742]  [<ffffffff8031bcfc>] ? sysfs_do_create_link+0xbc/0x150
[    7.060742]  [<ffffffff803a821e>] ? acpi_start_single_object+0x2d/0x52
[    7.060742]  [<ffffffff803a9816>] ? acpi_device_probe+0x7e/0x92
[    7.060742]  [<ffffffff803dd6ab>] ? driver_probe_device+0x9b/0x1a0
[    7.060742]  [<ffffffff803dd836>] ? __driver_attach+0x86/0x90
[    7.060742]  [<ffffffff803dd7b0>] ? __driver_attach+0x0/0x90
[    7.060742]  [<ffffffff803dcbfd>] ? bus_for_each_dev+0x5d/0x90
[    7.060742]  [<ffffffff803dd4ec>] ? driver_attach+0x1c/0x20
[    7.060742]  [<ffffffff803dd239>] ? bus_add_driver+0x1e9/0x260
[    7.060742]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.060742]  [<ffffffff803dda0f>] ? driver_register+0x5f/0x140
[    7.060742]  [<ffffffffa0222000>] ? acpi_processor_init+0x0/0x107 [processor]
[    7.060742]  [<ffffffff803a9b26>] ? acpi_bus_register_driver+0x3e/0x40
[    7.060742]  [<ffffffffa0222094>] ? acpi_processor_init+0x94/0x107 [processor]
[    7.060742]  [<ffffffff80209040>] ? _stext+0x40/0x180
[    7.060742]  [<ffffffff802a8bd1>] ? __vunmap+0xa1/0x110
[    7.060742]  [<ffffffff802678d2>] ? sys_init_module+0x142/0x1dc0
[    7.060742]  [<ffffffff80367dd6>] ? __up_read+0x46/0xb0
[    7.060742]  [<ffffffff8048e830>] ? cpu_down+0x0/0x70
[    7.060742]  [<ffffffff8020c34b>] ? system_call_fastpath+0x16/0x1b
[    7.060742]
[    7.060742] ---[ end trace 62c38812ae35bad0 ]---

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]           ` <alpine.LFD.1.10.0808241152370.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-25 13:03             ` Daniel J Blueman
  0 siblings, 0 replies; 318+ messages in thread
From: Daniel J Blueman @ 2008-08-25 13:03 UTC (permalink / raw)
  To: Linus Torvalds, Vegard Nossum
  Cc: Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

Hi Linus, Vegard,

On Sun, Aug 24, 2008 at 7:58 PM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> On Sun, 24 Aug 2008, Vegard Nossum wrote:
[snip]
> Anyway, I think your patch is likely fine, I just thought it looked a bit
> odd to have a loop to move a list from one head pointer to another.
>
> But regardless, it would need some testing. Daniel?

This opens another lockdep report at boot-time [1] - promoting
pool_lock may not be the best fix?

We then see a new deadlock condition (on the pool_lock spinlock) [2],
which seemingly was avoided by taking the debug-bucket lock first.

We reproduce this by booting with debug_objects=1 and causing a lot of activity.

Daniel

--- [1]

[ INFO: possible irq lock inversion dependency detected ]
2.6.27-rc4-225c-debug #3
---------------------------------------------------------
rcu_sched_grace/9 just changed the state of lock:
 (pool_lock){-...}, at: [<ffffffff80466c2c>] free_object+0x7c/0xc0
but this lock was taken by another, hard-irq-safe lock in the past:
 (xtime_lock){++..}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
no locks held by rcu_sched_grace/9.

the first lock's dependencies:
-> (pool_lock){-...} ops: 59 {
   initial-use  at:
                                       [<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
                                       [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                       [<ffffffff806a2bb1>] _spin_lock+0x41/0x80
                                       [<ffffffff804675fa>]
__debug_object_init+0x14a/0x3e0
                                       [<ffffffff804678df>]
debug_object_init+0x1f/0x30
                                       [<ffffffff80260eee>]
hrtimer_init+0x2e/0x50
                                       [<ffffffff80237571>]
init_rt_bandwidth+0x41/0x60
                                       [<ffffffff812f3810>]
sched_init+0x72/0x63d
                                       [<ffffffff812e17e2>]
start_kernel+0x19c/0x456
                                       [<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
                                       [<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   hardirq-on-W at:
                                       [<ffffffff8026fae9>]
__lock_acquire+0x4f9/0x1160
                                       [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                       [<ffffffff806a2bb1>] _spin_lock+0x41/0x80
                                       [<ffffffff80466c2c>]
free_object+0x7c/0xc0
                                       [<ffffffff8046707e>]
debug_object_free+0xbe/0x130
                                       [<ffffffff806a070e>]
schedule_timeout+0x7e/0xe0
                                       [<ffffffff806a07ce>]
schedule_timeout_interruptible+0x1e/0x20
                                       [<ffffffff8029cfb2>]
rcu_sched_grace_period+0xa2/0x3a0
                                       [<ffffffff8025d50e>] kthread+0x4e/0x90
                                       [<ffffffff8020d919>] child_rip+0xa/0x11
                                       [<ffffffffffffffff>] 0xffffffffffffffff
 }
 ... key      at: [<ffffffff8088f5d8>] pool_lock+0x18/0x40

the second lock's dependencies:
-> (xtime_lock){++..} ops: 211 {
   initial-use  at:
                                       [<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
                                       [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                       [<ffffffff806a2bb1>] _spin_lock+0x41/0x80
                                       [<ffffffff812f5628>]
timekeeping_init+0x2f/0x144
                                       [<ffffffff812e189d>]
start_kernel+0x257/0x456
                                       [<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
                                       [<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   in-hardirq-W at:
                                       [<ffffffffffffffff>] 0xffffffffffffffff
   in-softirq-W at:
                                       [<ffffffffffffffff>] 0xffffffffffffffff
 }
 ... key      at: [<ffffffff80907220>] xtime_lock+0x20/0x40
 -> (&obj_hash[i].lock){.+..} ops: 1003901 {
    initial-use  at:
                                         [<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
                                         [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                         [<ffffffff806a2d43>]
_spin_lock_irqsave+0x53/0x90
                                         [<ffffffff80467567>]
__debug_object_init+0xb7/0x3e0
                                         [<ffffffff804678df>]
debug_object_init+0x1f/0x30
                                         [<ffffffff80260eee>]
hrtimer_init+0x2e/0x50
                                         [<ffffffff80237571>]
init_rt_bandwidth+0x41/0x60
                                         [<ffffffff812f3810>]
sched_init+0x72/0x63d
                                         [<ffffffff812e17e2>]
start_kernel+0x19c/0x456
                                         [<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
                                         [<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-softirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
  }
  ... key      at: [<ffffffff81cd2eb0>] __key.16550+0x0/0x8
  -> (pool_lock){-...} ops: 59 {
     initial-use  at:
                                           [<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
                                           [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                           [<ffffffff806a2bb1>]
_spin_lock+0x41/0x80
                                           [<ffffffff804675fa>]
__debug_object_init+0x14a/0x3e0
                                           [<ffffffff804678df>]
debug_object_init+0x1f/0x30
                                           [<ffffffff80260eee>]
hrtimer_init+0x2e/0x50
                                           [<ffffffff80237571>]
init_rt_bandwidth+0x41/0x60
                                           [<ffffffff812f3810>]
sched_init+0x72/0x63d
                                           [<ffffffff812e17e2>]
start_kernel+0x19c/0x456
                                           [<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
                                           [<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
                                           [<ffffffffffffffff>]
0xffffffffffffffff
     hardirq-on-W at:
                                           [<ffffffff8026fae9>]
__lock_acquire+0x4f9/0x1160
                                           [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                           [<ffffffff806a2bb1>]
_spin_lock+0x41/0x80
                                           [<ffffffff80466c2c>]
free_object+0x7c/0xc0
                                           [<ffffffff8046707e>]
debug_object_free+0xbe/0x130
                                           [<ffffffff806a070e>]
schedule_timeout+0x7e/0xe0
                                           [<ffffffff806a07ce>]
schedule_timeout_interruptible+0x1e/0x20
                                           [<ffffffff8029cfb2>]
rcu_sched_grace_period+0xa2/0x3a0
                                           [<ffffffff8025d50e>]
kthread+0x4e/0x90
                                           [<ffffffff8020d919>]
child_rip+0xa/0x11
                                           [<ffffffffffffffff>]
0xffffffffffffffff
   }
   ... key      at: [<ffffffff8088f5d8>] pool_lock+0x18/0x40
  ... acquired at:
   [<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160
   [<ffffffff802707e1>] lock_acquire+0x91/0xc0
   [<ffffffff806a2bb1>] _spin_lock+0x41/0x80
   [<ffffffff804675fa>] __debug_object_init+0x14a/0x3e0
   [<ffffffff804678df>] debug_object_init+0x1f/0x30
   [<ffffffff80260eee>] hrtimer_init+0x2e/0x50
   [<ffffffff80237571>] init_rt_bandwidth+0x41/0x60
   [<ffffffff812f3810>] sched_init+0x72/0x63d
   [<ffffffff812e17e2>] start_kernel+0x19c/0x456
   [<ffffffff812e10a9>] x86_64_start_reservations+0x99/0xb6
   [<ffffffff812e11d0>] x86_64_start_kernel+0xe2/0xe9
   [<ffffffffffffffff>] 0xffffffffffffffff

 ... acquired at:
   [<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160
   [<ffffffff802707e1>] lock_acquire+0x91/0xc0
   [<ffffffff806a2d43>] _spin_lock_irqsave+0x53/0x90
   [<ffffffff80467567>] __debug_object_init+0xb7/0x3e0
   [<ffffffff804678df>] debug_object_init+0x1f/0x30
   [<ffffffff80260eee>] hrtimer_init+0x2e/0x50
   [<ffffffff812f577b>] ntp_init+0x1e/0x2b
   [<ffffffff812f5633>] timekeeping_init+0x3a/0x144
   [<ffffffff812e189d>] start_kernel+0x257/0x456
   [<ffffffff812e10a9>] x86_64_start_reservations+0x99/0xb6
   [<ffffffff812e11d0>] x86_64_start_kernel+0xe2/0xe9
   [<ffffffffffffffff>] 0xffffffffffffffff

 -> (clocksource_lock){++..} ops: 214 {
    initial-use  at:
                                         [<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
                                         [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                         [<ffffffff806a2d43>]
_spin_lock_irqsave+0x53/0x90
                                         [<ffffffff80266245>]
clocksource_get_next+0x15/0x60
                                         [<ffffffff812f5638>]
timekeeping_init+0x3f/0x144
                                         [<ffffffff812e189d>]
start_kernel+0x257/0x456
                                         [<ffffffff812e10a9>]
x86_64_start_reservations+0x99/0xb6
                                         [<ffffffff812e11d0>]
x86_64_start_kernel+0xe2/0xe9
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-hardirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-softirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
  }
  ... key      at: [<ffffffff80877bb8>] clocksource_lock+0x18/0x40
 ... acquired at:
   [<ffffffff802703b1>] __lock_acquire+0xdc1/0x1160
   [<ffffffff802707e1>] lock_acquire+0x91/0xc0
   [<ffffffff806a2d43>] _spin_lock_irqsave+0x53/0x90
   [<ffffffff80266245>] clocksource_get_next+0x15/0x60
   [<ffffffff812f5638>] timekeeping_init+0x3f/0x144
   [<ffffffff812e189d>] start_kernel+0x257/0x456
   [<ffffffff812e10a9>] x86_64_start_reservations+0x99/0xb6
   [<ffffffff812e11d0>] x86_64_start_kernel+0xe2/0xe9
   [<ffffffffffffffff>] 0xffffffffffffffff

 -> (old_style_seqlock_init){++..} ops: 210 {
    initial-use  at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-hardirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-softirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
  }
  ... key      at: [<ffffffff812d61a0>] nl80211_policy+0xda0/0x2c00
 ... acquired at:
   [<ffffffffffffffff>] 0xffffffffffffffff

 -> (ftrace_shutdown_lock){++..} ops: 480 {
    initial-use  at:
                                         [<ffffffff8026f795>]
__lock_acquire+0x1a5/0x1160
                                         [<ffffffff802707e1>]
lock_acquire+0x91/0xc0
                                         [<ffffffff806a2d43>]
_spin_lock_irqsave+0x53/0x90
                                         [<ffffffff802a4216>]
ftrace_record_ip+0x196/0x2f0
                                         [<ffffffff8020c6b4>]
mcount_call+0x5/0x31
                                         [<ffffffff812e1c9b>]
kernel_init+0x14d/0x1b2
                                         [<ffffffff8020d919>] child_rip+0xa/0x11
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-hardirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
    in-softirq-W at:
                                         [<ffffffffffffffff>] 0xffffffffffffffff
  }
  ... key      at: [<ffffffff8087c1b8>] ftrace_shutdown_lock+0x18/0x40
 ... acquired at:
   [<ffffffffffffffff>] 0xffffffffffffffff


stack backtrace:
Pid: 9, comm: rcu_sched_grace Not tainted 2.6.27-rc4-225c-debug #3

Call Trace:
 [<ffffffff8026d8b2>] print_irq_inversion_bug+0x142/0x160
 [<ffffffff8026dd27>] check_usage_backwards+0x67/0xb0
 [<ffffffff8026ebd3>] mark_lock+0x363/0x7f0
 [<ffffffff8026fae9>] __lock_acquire+0x4f9/0x1160
 [<ffffffff802707e1>] lock_acquire+0x91/0xc0
 [<ffffffff80466c2c>] ? free_object+0x7c/0xc0
 [<ffffffff806a2bb1>] _spin_lock+0x41/0x80
 [<ffffffff80466c2c>] ? free_object+0x7c/0xc0
 [<ffffffff806a6bc9>] ? sub_preempt_count+0x69/0xd0
 [<ffffffff80466c2c>] free_object+0x7c/0xc0
 [<ffffffff8046707e>] debug_object_free+0xbe/0x130
 [<ffffffff806a070e>] schedule_timeout+0x7e/0xe0
 [<ffffffff802503a0>] ? process_timeout+0x0/0x10
 [<ffffffff806a06f2>] ? schedule_timeout+0x62/0xe0
 [<ffffffff8029cf10>] ? rcu_sched_grace_period+0x0/0x3a0
 [<ffffffff806a07ce>] schedule_timeout_interruptible+0x1e/0x20
 [<ffffffff8029cfb2>] rcu_sched_grace_period+0xa2/0x3a0
 [<ffffffff8029cf10>] ? rcu_sched_grace_period+0x0/0x3a0
 [<ffffffff8025d50e>] kthread+0x4e/0x90
 [<ffffffff8020d919>] child_rip+0xa/0x11
 [<ffffffff80239b8f>] ? finish_task_switch+0x5f/0x120
 [<ffffffff806a356b>] ? _spin_unlock_irq+0x3b/0x70
 [<ffffffff8020cf23>] ? restore_args+0x0/0x30
 [<ffffffff8025d4c0>] ? kthread+0x0/0x90
 [<ffffffff8020d90f>] ? child_rip+0x0/0x11

--- [2]

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
58  0  63972  17316     16 145172  128 21176   128 21176 6054 8662 79 21  0  0
47  2  81020  14380     16 139472 1120 17052  1240 17052 6106 9532 77 23  0  0
52  1  86276  33656     16 137140  604 5304   796  5304 5349 7954 81 19  0  0
94  0  86276  32192     16 137484  480    0   772     0 5418 7618 84 16  0  0
88  0  86264  22416     16 137800   96    0   396     0 4746 5937 87 13  0  0
63  1  86200  54636     16 137932 1380    0  1408     0 5472 8007 82 18  0  0
47  0  86020  22848     16 138132  256    0   312     0 6126 12227 72 28  0  0
75  2 103828  20252     16 135500  528 17836   592 17836 6655 12862 69 31  0  0
21  0 128568  17536     16 128888 2336 24732  2336 24732 6762 12891 66 34  0  0
159  0 154996  16888     16 124808  480 26236   504 26236 5930 7689 80 20  0  0
45  0 165616  40108     16 120544  192 10696   248 10696 6136 9163 77 23  0  0
95  0 165616  27296     16 120632  924    0   940     0 5293 7468 82 18  0  0
BUG: NMI Watchdog detected LOCKUP on CPU0, ip ffffffff80214407, registers:
CPU 0
Modules linked in: rfcomm l2cap bluetooth kvm_intel kvm microcode
dvb_usb_dtt200u dvb_usb uvcvideo dvb_core compat_ioctl32 i2c_core
videodev v4l1_compat shpchp pcig
Pid: 6948, comm: spiral Not tainted 2.6.27-rc4-225c-debug #3
RIP: 0010:[<ffffffff80214407>]  [<ffffffff80214407>] native_read_tsc+0x7/0x30
RSP: 0018:ffffffff8153ab70  EFLAGS: 00000002
RAX: 00000467c85cb001 RBX: ffffffff8088f5c0 RCX: 00000000c85cb001
RDX: 00000000c85cb001 RSI: 0000000000000103 RDI: 0000000000000001
RBP: ffffffff8153ab70 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: ffff8800cf978000 R12: 00000000c85cb001
R13: 0000000000000001 R14: 0000000000000000 R15: ffff88012a61c8c0
FS:  00007fb48cb796e0(0000) GS:ffffffff80e82dc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000075212e6 CR3: 00000000cc1e6000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process spiral (pid: 6948, threadinfo ffff8800cf950000, task ffff8800cf978000)
Stack:  ffffffff8153aba0 ffffffff804560c7 ffffffff8088f5c0 0000000005b61b94
 00000000d6915628 0000000000000001 ffffffff8153abb0 ffffffff80455fff
 ffffffff8153abe0 ffffffff80466302 ffffffff8088f5d8 ffffffff8088f5c0
Call Trace:
 <IRQ>  [<ffffffff804560c7>] delay_tsc+0x67/0xd0
 [<ffffffff80455fff>] __delay+0xf/0x20
 [<ffffffff80466302>] _raw_spin_lock+0x122/0x170
 [<ffffffff806a2bd1>] _spin_lock+0x61/0x80
 [<ffffffff80466c2c>] ? free_object+0x7c/0xc0
 [<ffffffff80466c2c>] free_object+0x7c/0xc0
 [<ffffffff80466e0f>] __debug_check_no_obj_freed+0x19f/0x1e0
 [<ffffffff80466e65>] debug_check_no_obj_freed+0x15/0x20
 [<ffffffff802dff0c>] kmem_cache_free+0xec/0x110
 [<ffffffff80517fa1>] ? scsi_pool_free_command+0x51/0x60
 [<ffffffff80517fa1>] scsi_pool_free_command+0x51/0x60
 [<ffffffff805184bf>] __scsi_put_command+0x5f/0xa0
 [<ffffffff80518561>] scsi_put_command+0x61/0x70
 [<ffffffff8051e47a>] scsi_next_command+0x3a/0x60
 [<ffffffff8051e544>] scsi_end_request+0xa4/0xc0
 [<ffffffff8051e68f>] scsi_io_completion+0x12f/0x440
 [<ffffffff80517bf5>] scsi_finish_command+0x95/0xd0
 [<ffffffff8051eb36>] scsi_softirq_done+0x86/0x110
 [<ffffffff8043bbed>] blk_done_softirq+0x8d/0xa0
 [<ffffffff8024ba04>] __do_softirq+0x74/0xf0
 [<ffffffff8020dc7c>] call_softirq+0x1c/0x30
 [<ffffffff8020f485>] do_softirq+0x75/0xb0
 [<ffffffff8024b155>] irq_exit+0xa5/0xb0
 [<ffffffff8020f7d3>] do_IRQ+0xe3/0x1d0
 [<ffffffff80466c66>] ? free_object+0xb6/0xc0
 [<ffffffff8020ce76>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80270ba0>] ? lock_release+0xe0/0x210
 [<ffffffff806a3253>] ? _spin_unlock+0x23/0x60
 [<ffffffff80466c66>] ? free_object+0xb6/0xc0
 [<ffffffff8046707e>] ? debug_object_free+0xbe/0x130
 [<ffffffff806a070e>] ? schedule_timeout+0x7e/0xe0
 [<ffffffff802503a0>] ? process_timeout+0x0/0x10
 [<ffffffff806a06f2>] ? schedule_timeout+0x62/0xe0
 [<ffffffff802f8b3e>] ? do_select+0x4be/0x610
 [<ffffffff802f8c90>] ? __pollwait+0x0/0x120
 [<ffffffff8026f1d9>] ? trace_hardirqs_on_caller+0x29/0x1b0
 [<ffffffff8026f1d9>] ? trace_hardirqs_on_caller+0x29/0x1b0
 [<ffffffff8026f36d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff806a0b9e>] ? mutex_unlock+0xe/0x10
 [<ffffffff8065919e>] ? unix_stream_recvmsg+0x32e/0x6d0
 [<ffffffff8026f1d9>] ? trace_hardirqs_on_caller+0x29/0x1b0
 [<ffffffff8026f36d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff802f8f4b>] ? core_sys_select+0x19b/0x2e0
 [<ffffffff802e89e9>] ? do_sync_read+0xf9/0x140
 [<ffffffff8025d8e0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff806a6bc9>] ? sub_preempt_count+0x69/0xd0
 [<ffffffff802f94b0>] ? sys_select+0xd0/0x1c0
 [<ffffffff8020c86b>] ? system_call_fastpath+0x16/0x1b


Code: 90 90 90 90 55 89 f8 48 89 e5 e6 70 e4 71 c9 c3 0f 1f 40 00 55
89 f0 48 89 e5 e6 70 89 f8 e6 71 c9 c3 66 90 55 48 89 e5 0f 1f 00 <0f>
ae e8 0f 31 89 c1 0f 1f
BUG: NMI Watchdog detected LOCKUP<4>---[ end trace fd851c3db62e5044 ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!
 on CPU1, ip ffffffff80214407, registers:
CPU 1
Modules linked in: rfcomm l2cap bluetooth kvm_intel kvm microcode
dvb_usb_dtt200u dvb_usb uvcvideo dvb_core compat_ioctl32 i2c_core
videodev v4l1_compat shpchp pcig
Pid: 10150, comm: gcc Not tainted 2.6.27-rc4-225c-debug #3
RIP: 0010:[<ffffffff80214407>]  [<ffffffff80214407>] native_read_tsc+0x7/0x30
RSP: 0018:ffff8800b6269c28  EFLAGS: 00000092
RAX: 0000000000000001 RBX: ffffffff8088f5c0 RCX: 00000000c85cafc2
RDX: 000000008b468b00 RSI: 0000000000000002 RDI: 0000000000000001
RBP: ffff8800b6269c28 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: ffff8800a65047e0 R12: 0000000005d22695
R13: 0000000000000001 R14: 0000000000000001 R15: ffff88009e9be940
FS:  00002b124e4fc6e0(0000) GS:ffff88012fa644b0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b8a265f5960 CR3: 00000000b61e9000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gcc (pid: 10150, threadinfo ffff8800b6268000, task ffff8800a65047e0)
Stack:  ffff8800b6269c58 ffffffff8045608a ffffffff8088f5c0 0000000005d22695
 00000000d6915628 0000000000000001 ffff8800b6269c68 ffffffff80455fff
 ffff8800b6269c98 ffffffff80466302 ffffffff8088f5d8 ffffffff8088f5c0
Call Trace:
 [<ffffffff8045608a>] delay_tsc+0x2a/0xd0
 [<ffffffff80455fff>] __delay+0xf/0x20
 [<ffffffff80466302>] _raw_spin_lock+0x122/0x170
 [<ffffffff806a2bd1>] _spin_lock+0x61/0x80
 [<ffffffff80466c2c>] ? free_object+0x7c/0xc0
 [<ffffffff80466c2c>] free_object+0x7c/0xc0
 [<ffffffff80466e0f>] __debug_check_no_obj_freed+0x19f/0x1e0
 [<ffffffff80466e65>] debug_check_no_obj_freed+0x15/0x20
 [<ffffffff802dff0c>] kmem_cache_free+0xec/0x110
 [<ffffffff8024264b>] ? __cleanup_signal+0x1b/0x20
 [<ffffffff8024264b>] __cleanup_signal+0x1b/0x20
 [<ffffffff80248273>] release_task+0x233/0x3d0
 [<ffffffff80248960>] wait_consider_task+0x550/0x8b0
 [<ffffffff80248e16>] do_wait+0x156/0x350
 [<ffffffff8023b6f0>] ? default_wake_function+0x0/0x10
 [<ffffffff802490a6>] sys_wait4+0x96/0xf0
 [<ffffffff8020c86b>] system_call_fastpath+0x16/0x1b


Code: 90 90 90 90 55 89 f8 48 89 e5 e6 70 e4 71 c9 c3 0f 1f 40 00 55
89 f0 48 89 e5 e6 70 89 f8 e6 71 c9 c3 66 90 55 48 89 e5 0f 1f 00 <0f>
ae e8 0f 31 89 c1 0f 1f
---[ end trace fd851c3db62e5044 ]---
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-25 12:44           ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Alan D. Brunelle
@ 2008-08-25 13:13             ` Alan D. Brunelle
  2008-08-25 18:02             ` Linus Torvalds
  1 sibling, 0 replies; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 13:13 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven,
	Rusty Russell

Adding in SLUB debugging doesn't show anything new (I think). Example
boot log (w/ initcall_debug enabled) is at:

http://free.linux.hp.com/~adb/bug.11342/prob5.txt

This has happened 3 times in a row as well. Whilst this is being looked
at, I'm going to fast-forward ahead to the latest in Linus' tree, and
see if the problem is still occurring (I think Linus' point earlier
about some sort of rogue timing and/or corruption bug is spot on, but
it's probably better to see how close to "today's tree" I can reproduce
this). I'll also try kernels w/ the problematic merge patch backed out
to see if that still "fixes" (or more likely(?) just patches over the
real problem).

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps'
       [not found]     ` <200808240813.56525.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
  2008-08-24 21:10       ` Rafael J. Wysocki
@ 2008-08-25 14:03       ` Adrian Bunk
  1 sibling, 0 replies; 318+ messages in thread
From: Adrian Bunk @ 2008-08-25 14:03 UTC (permalink / raw)
  To: Frans Pop
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Sun, Aug 24, 2008 at 08:13:55AM +0200, Frans Pop wrote:
> On Saturday 23 August 2008, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11356
> > Subject	: Linux 2.6.27-rc3 - build failure: undefined reference to
> > 		  `.lockdep_count_forward_deps'
> > Submitter	: Frans Pop <elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org> 
> > Date		: 2008-08-16 19:11 (8 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121891396320127&w=4
> 
> Fixed as per: http://marc.info/?l=linux-kernel&m=121898767530602&w=4
> Adrian mentioned that he'd closed the bug, but apparently not.

Sorry, I missed that Rafael had opened two bugs for two people reporting 
the same issue, and only closed the other one.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]         ` <alpine.LFD.1.10.0808231313170.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-25 12:03           ` Alan D. Brunelle
  2008-08-25 12:44           ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Alan D. Brunelle
@ 2008-08-25 14:05           ` Alan D. Brunelle
  2 siblings, 0 replies; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 14:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

I built a kernel @

commit 83097aca8567a0bd593534853b71fe0fa9a75d69
Author: Arjan van de Ven <arjan-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Date:   Sat Aug 23 21:45:21 2008 -0700

And it fails like the others do

o  http://free.linux.hp.com/~adb/bug.11342/prob6.txt

SMP_DEBUG_PAGEALLOC

o  http://free.linux.hp.com/~adb/bug.11342/prob6a.txt

[    7.591198] BUG: unable to handle kernel NULL pointer dereference at
0000000000000858

I then backed out /just/ the merge for

commit 1c89ac55017f982355c7761e1c912c88c941483d
Merge: 88fa08f... b1b135c...
Author: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date:   Tue Aug 12 08:40:19 2008 -0700

And the machine has booted fine 5 times in a row.

I've put the latest .config up at

http://free.linux.hp.com/~adb/bug.11342/config.txt

Is there /some/ way to break down the patches within the merged patch,
and I could by-hand bisect through those?

Here's what I did to take the latest tree, and back out that merge (to
get booting kernels):

git-diff
88fa08f67bee1a0c765237bdac106a32872f57d2..1c89ac55017f982355c7761e1c912c88c941483d
| patch -p1 -R
patching file Documentation/lguest/lguest.c
patching file arch/powerpc/Kconfig
patching file arch/x86/Kconfig
patching file arch/x86/mm/Makefile
patching file drivers/char/hvc_console.c
patching file drivers/lguest/page_tables.c
patching file include/linux/Kbuild
Hunk #1 succeeded at 358 (offset 2 lines).
patching file include/linux/init.h
patching file include/linux/mm.h
patching file init/main.c
patching file kernel/module.c
patching file kernel/stop_machine.c
patching file mm/Kconfig
patching file mm/util.c

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                 ` <48B2A421.7080705-VXdhtT5mjnY@public.gmane.org>
@ 2008-08-25 18:00                   ` Linus Torvalds
       [not found]                     ` <alpine.LFD.1.10.0808251019380.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-25 18:00 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
>
> Before adding any more debugging, this is the status of my kernel boots:
> 3 times in a row w/ this same error. (Primary problem is the same,
> secondary stacks differ of course.)

Ok, so I took a closer look, and the oops really is suggestive..

> [    6.482953] busybox used greatest stack depth: 4840 bytes left

Ok, 4840 bytes left out of 8kB.

> [    6.521876] all_generic_ide used greatest stack depth: 4784 bytes left

.. and this one is 4784 bytes left..

> Begin: Loading essential drivers... ...
> [    6.625509] fuse init (API version 7.9)
> [    6.625509] modprobe used greatest stack depth: 1720 bytes left

Uhhuh! The previous "modprobe" uses stack like mad.  It could be 
"fuse_init()" that has done it, but looking at fuse, I seriously doubt it. 
It doesn't seem to do anything particularly bad.

So something has used over 6kB of stack, and it may well be the module 
loading code itself.

The next stage is the actual oops itself:

> [    6.644854] ACPI: SSDT CFFD0D0A, 08C4 (r1 HPQOEM  CPU_TM2        1 MSFT  100000E)
> [    6.651489] BUG: unable to handle kernel NULL pointer dereference at 0000000000000858

This really looks like

	ti->task->blocked_on = waiter;

where "ti->task" is NULL. You probably have almost everything enabled in 
order to turn "struct task_struct" that big, but judging by your register 
state it's really an offset off a NULL pointer, not some small integer.

Now, there is no way "ti->task" can _possibly_ be NULL. No way.

Well, except that "ti" is just below the stack, and if you had a stack 
overflow that overwrote it.

So I seriously do believe that you have run out of stack. If that is true, 
then it's quite likely that with DEBUG_PAGE_ALLOC you'll actually get a 
double fault, which in turn is fairly hard to debug (you look at it wrong 
and it turns into a triple fault which is going to just reboot your 
machine immediately).

Now, the stack oveflow probably happened a few calls earlier (and just 
left your thread_info corrupted), but there is more reason to believe you 
have stack overflow and thread_info corruption later in your output:

> [    7.024992] modprobe used greatest stack depth: 408 bytes left  
> [    7.030988] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
> [    7.031053] IP: [<ffffffff8023f39c>] do_exit+0x28c/0xa10

Here there is only 408 bytes left, which is _way_ too little, but it's 
also an optimistic measure. What the stack code usage code does is to just 
see how many zeroes it can find on the stack. If you have a big stack 
frame somewhere, it's quite possible that it actually used all your stack 
and then some, but left a bunch of zeroes around.

And the do_exit() oops is simply because once the thread_info is 
corrupted, all the basic thread data structures are crap, and yes, you're 
almost guaranteed to oops at that point.

Could you make your kernel image available somewhere, and we can take a 
look at it? Some versions of gcc are total pigs when it comes to stack 
usage, and your exact configuration matters too.  But yes, module loading 
is a bad case, for me "sys_init_module()" contains

	subq    $392, %rsp      #,

which is probably mostly because of the insane inlining gcc does (ie it 
will likely have inlined every single function in that file that is only 
called once, and then it will make all local variables of all those 
functions alive over the whole function and allocate stack-space for them 
ALL AT THE SAME TIME).

Gcc sometimes drives me mad. It's inlining decisions are almost always 
pure and utter sh*t. But clearly something changed for you to start 
triggering this, and I think that also explains why you bisected things to 
the merge commit rather than to any individual change - because it was 
probably not any individual change that pushed it over the limit, but two 
different changes that made for bigger stack pressure, and _together_ they 
pushed you over the limit.

So it also explains why the merge you found had no possible merge errors 
on a source level - there were no actual clashes anywhere. Just a slow 
growth of stack that combined to something that overflowed.

And yes, I bet the change by Arjan to use do_one_initcall() was _part_ of 
it. It adds roughly 112 bytes of stack pressure to that module loading 
path, because of the 64-byte array and the extra function call (8 bytes 
for return address) with at least 5 quad-words saved (40 bytes) for 
register spills.

But there were probably other things happening too that made things worse.

So if there is some place where you can upload your 'vmlinux' binary, it 
would be good.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-25 12:44           ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Alan D. Brunelle
  2008-08-25 13:13             ` Alan D. Brunelle
@ 2008-08-25 18:02             ` Linus Torvalds
  1 sibling, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-25 18:02 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell



On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
> 
> With /just/ DEBUG_PAGE_ALLOC defined, I have seen two general panic types:
> 
> o  A new double fault w/ SMP_DEBUG_PAGEALLOC problem (prob4.txt)

Yeah, that's a stack overflow.

Confirmed.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                     ` <alpine.LFD.1.10.0808251019380.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-25 18:09                       ` Linus Torvalds
       [not found]                         ` <alpine.LFD.1.10.0808251106270.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-25 18:09 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell



On Mon, 25 Aug 2008, Linus Torvalds wrote:
> 
> Could you make your kernel image available somewhere, and we can take a 
> look at it? Some versions of gcc are total pigs when it comes to stack 
> usage, and your exact configuration matters too.  But yes, module loading 
> is a bad case, for me "sys_init_module()" contains
> 
> 	subq    $392, %rsp      #,
> 
> which is probably mostly because of the insane inlining gcc does (ie it 
> will likely have inlined every single function in that file that is only 
> called once, and then it will make all local variables of all those 
> functions alive over the whole function and allocate stack-space for them 
> ALL AT THE SAME TIME).

I bet this one-liner will probably make your kernel work. It's not a full 
solution, but it will make the module-loading path lose _all_ of the above 
stack slots by just not inlining "load_module()" - the stack slots will 
still be used when the module is _loaded_, but by the time we actually 
callt he ->init function they will have been released since it's not all 
in the same crazy function any more.

I _seriously_ believe that we were better off back when gcc only inlined 
what we told it to inline, and never inlined on its own. The gcc inlining 
logic is pure and utter sh*t in an environment like the kernel where stack 
space is a valuable resource.

Anyway, Alan, even if this solves your particular problem, I'd still like 
to see your kernel image, so that I can hunt for other problems like 
this..

			Linus

---
 kernel/module.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 08864d2..9db1191 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1799,7 +1799,7 @@ static void *module_alloc_update_bounds(unsigned long size)
 
 /* Allocate and load the module: note that size of section 0 is always
    zero, and we rely on this for optional sections. */
-static struct module *load_module(void __user *umod,
+static noinline struct module *load_module(void __user *umod,
 				  unsigned long len,
 				  const char __user *uargs)
 {

^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                         ` <alpine.LFD.1.10.0808251106270.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-25 20:19                           ` Alan D. Brunelle
       [not found]                             ` <48B313E0.1000501-VXdhtT5mjnY@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 20:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

Linus Torvalds wrote:
> 
> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>> Could you make your kernel image available somewhere, and we can take a 
>> look at it? Some versions of gcc are total pigs when it comes to stack 
>> usage, and your exact configuration matters too.  But yes, module loading 
>> is a bad case, for me "sys_init_module()" contains
>>
>> 	subq    $392, %rsp      #,
>>
>> which is probably mostly because of the insane inlining gcc does (ie it 
>> will likely have inlined every single function in that file that is only 
>> called once, and then it will make all local variables of all those 
>> functions alive over the whole function and allocate stack-space for them 
>> ALL AT THE SAME TIME).

Mine has:

Dump of assembler code for function sys_init_module:
0xffffffff802688c0 <sys_init_module+0>:	push   %rbp
0xffffffff802688c1 <sys_init_module+1>:	mov    %rsp,%rbp
0xffffffff802688c4 <sys_init_module+4>:	sub    $0x1c0,%rsp
0xffffffff802688cb <sys_init_module+11>:	mov    %r12,-0x20(%rbp)
0xffffffff802688cf <sys_init_module+15>:	mov    %rdi,%r12

so 448 bytes.

The kernel is up at: http://free.linux.hp.com/~adb/bug.11342/vmlinux (if
you would let me know when you are through with it so I can free up some
space there I'd appreciate it...)

By doing the patch you provided, sys_init_module now looks like:

Dump of assembler code for function sys_init_module:
0xffffffff8026aa20 <sys_init_module+0>:	push   %rbp
0xffffffff8026aa21 <sys_init_module+1>:	mov    %rsp,%rbp
0xffffffff8026aa24 <sys_init_module+4>:	sub    $0x20,%rsp
0xffffffff8026aa28 <sys_init_module+8>:	mov    %r14,0x18(%rsp)
0xffffffff8026aa2d <sys_init_module+13>:	mov    %rdi,%r14


So only 32 bytes. (But of course, load_module() exists, and now has
0x1d0 (464) bytes...)

With the patch you provide, I /was/ able to repeatedly boot OK (latest
tree, and I also ran the patch against the 26.27.rc3-based kernel I was
having problems with initially, and that booted OK as well).

Alan

> 
> I bet this one-liner will probably make your kernel work. It's not a full 
> solution, but it will make the module-loading path lose _all_ of the above 
> stack slots by just not inlining "load_module()" - the stack slots will 
> still be used when the module is _loaded_, but by the time we actually 
> callt he ->init function they will have been released since it's not all 
> in the same crazy function any more.
> 
> I _seriously_ believe that we were better off back when gcc only inlined 
> what we told it to inline, and never inlined on its own. The gcc inlining 
> logic is pure and utter sh*t in an environment like the kernel where stack 
> space is a valuable resource.
> 
> Anyway, Alan, even if this solves your particular problem, I'd still like 
> to see your kernel image, so that I can hunt for other problems like 
> this..
> 
> 			Linus
> 
> ---
>  kernel/module.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 08864d2..9db1191 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1799,7 +1799,7 @@ static void *module_alloc_update_bounds(unsigned long size)
>  
>  /* Allocate and load the module: note that size of section 0 is always
>     zero, and we rely on this for optional sections. */
> -static struct module *load_module(void __user *umod,
> +static noinline struct module *load_module(void __user *umod,
>  				  unsigned long len,
>  				  const char __user *uargs)
>  {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                             ` <48B313E0.1000501-VXdhtT5mjnY@public.gmane.org>
@ 2008-08-25 20:43                               ` Linus Torvalds
       [not found]                                 ` <alpine.LFD.1.10.0808251326500.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-25 20:43 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
> 
> Mine has:
> 
> Dump of assembler code for function sys_init_module:
> 0xffffffff802688c4 <sys_init_module+4>:	sub    $0x1c0,%rsp
> 
> so 448 bytes.

Yeah, your build seems to have consistently bigger stack usage, and that 
may be due to some config option, but most likely it's a compiler version 
issue.

But I think part of the reason is that you have frame pointers enabled: 
that makes the stack frames bigger not only because of the frame pointer 
save/restore, but also because you have more register pressure and thus 
spills.

> The kernel is up at: http://free.linux.hp.com/~adb/bug.11342/vmlinux (if
> you would let me know when you are through with it so I can free up some
> space there I'd appreciate it...)

I'm downloading it now, I'll probably be done by the time you get this 
email.

[ Update. Done. You can remove it ]

> By doing the patch you provided, sys_init_module now looks like:
> 
> Dump of assembler code for function sys_init_module:
> 0xffffffff8026aa24 <sys_init_module+4>:	sub    $0x20,%rsp
> 
> So only 32 bytes. (But of course, load_module() exists, and now has
> 0x1d0 (464) bytes...)

Right - the stack usage didn't go away, but the _lifetimes_ changed.

So now load_module() will still use almost 500 bytes of stack, and it will 
call other routines that use stack too, but the lifetime of that stack 
usage is no longer over the whole module loading and initialization part, 
it's purely over just the loading thing.

And since the deep callchain came much later (in the actual ->init 
routines), by the time we do that, we no longer now have the load_module 
stack usage active any more.

> With the patch you provide, I /was/ able to repeatedly boot OK (latest
> tree, and I also ran the patch against the 26.27.rc3-based kernel I was
> having problems with initially, and that booted OK as well).

I had actually already committed it, because it was correct regardless 
(and gcc really is a total ass for doing that inlining to begin with), but 
it's good to have verification that the behaviour you saw was literally 
about this thing.

I'll look at your vmlinux binary to see what else sucks from a stack depth 
standpoint, but one of the problems in this whole thing is that the 
stack usage is obviously both a static thing (with some functions using 
_way_ too much stack!) _and_ a dynamic thing (with the total stack use 
being not about any individual function, but the whole chain).

My patch obviously doesn't change the static stack usage, it just moves it 
around a bit so that it's no longer on that same deep path, so the dynamic 
stack usage is much less.

But I'll look at your vmlinux, see what stands out.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                 ` <alpine.LFD.1.10.0808251326500.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-25 20:45                                   ` Arjan van de Ven
  2008-08-25 20:52                                   ` Linus Torvalds
  2008-08-26  1:11                                   ` Rusty Russell
  2 siblings, 0 replies; 318+ messages in thread
From: Arjan van de Ven @ 2008-08-25 20:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Rusty Russell

Linus Torvalds wrote:
> On Mon, 25 Aug 2008, Alan D. Brunelle wrote:
>> Mine has:
>>
>> Dump of assembler code for function sys_init_module:
>> 0xffffffff802688c4 <sys_init_module+4>:	sub    $0x1c0,%rsp
>>
>> so 448 bytes.
> 
> Yeah, your build seems to have consistently bigger stack usage, and that 
> may be due to some config option, but most likely it's a compiler version 
> issue.
> 

I wonder if we ought to have a light version of "make checkstack" always run,
but in such a way that we make a file with "limits" on the stack usage for key
functions (and we can grow this list over time when we learn about critical ones)..
and either warn very loudly or even fail the build if we're way over what could work.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                 ` <alpine.LFD.1.10.0808251326500.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-25 20:45                                   ` Arjan van de Ven
@ 2008-08-25 20:52                                   ` Linus Torvalds
       [not found]                                     ` <alpine.LFD.1.10.0808251344250.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-26  1:11                                   ` Rusty Russell
  2 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-25 20:52 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell



On Mon, 25 Aug 2008, Linus Torvalds wrote:
> 
> But I'll look at your vmlinux, see what stands out.

Oops. I already see the problem.

Your .config has soem _huge_ CPU count, doesn't it?

checkstack.pl shows these things as the top problems:

	0xffffffff80266234 smp_call_function_mask [vmlinux]:    2736
	0xffffffff80234747 __build_sched_domains [vmlinux]:     2232
	0xffffffff8023523f __build_sched_domains [vmlinux]:     2232
	0xffffffff8021e884 setup_IO_APIC_irq [vmlinux]:         1616
	0xffffffff8021ee24 arch_setup_ht_irq [vmlinux]:         1600
	0xffffffff8021f144 arch_setup_msi_irq [vmlinux]:        1600
	0xffffffff8021e3b0 __assign_irq_vector [vmlinux]:       1592
	0xffffffff8021e626 __assign_irq_vector [vmlinux]:       1592
	0xffffffff8023257e move_task_off_dead_cpu [vmlinux]:    1592
	0xffffffff802326e8 move_task_off_dead_cpu [vmlinux]:    1592
	0xffffffff8025dbc5 tick_handle_oneshot_broadcast [vmlinux]:1544
	0xffffffff8025dcb4 tick_handle_oneshot_broadcast [vmlinux]:1544
	0xffffffff803f3dc4 store_scaling_governor [vmlinux]:    1376
	0xffffffff80279ef4 cpuset_write_resmask [vmlinux]:      1360
	0xffffffff803f465d cpufreq_add_dev [vmlinux]:           1352
	0xffffffff803f495b cpufreq_add_dev [vmlinux]:           1352
	0xffffffff803f3fc4 store_scaling_max_freq [vmlinux]:    1328
	0xffffffff803f4064 store_scaling_min_freq [vmlinux]:    1328
	0xffffffff803f44c4 cpufreq_update_policy [vmlinux]:     1328
	..

and sys_init_module is actually way way down the list. I bet the only 
reason it showed up at all was because dynamically it was such a deep 
callchain, and part of that callchain probably called some of those really 
nasty things.

Anyway, the reason smp_call_function_mask and friends have such _huge_ 
stack usages for you is that they contain a 'cpumask_t' on the stack.

For example, for me, usign a sane NR_CPU, the size of the stack frame for 
smp_call_function_mask is under 200 bytes.  For you, it's 2736 bytes.

How about you make CONFIG_NR_CPU's something _sane_? Like 16? Or do you 
really have four thousand CPU's in that system?

Oh, I guess you have the MAXSMP config enabled? I really think that was a 
bit too aggressive.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                     ` <alpine.LFD.1.10.0808251344250.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-25 21:15                                       ` Linus Torvalds
       [not found]                                         ` <alpine.LFD.1.10.0808251401590.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-25 21:30                                       ` Alan D. Brunelle
  1 sibling, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-25 21:15 UTC (permalink / raw)
  To: Alan D. Brunelle, Mike Travis, Ingo Molnar, Thomas Gleixner
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

On Mon, 25 Aug 2008, Linus Torvalds wrote:
> 
> checkstack.pl shows these things as the top problems:
> 
> 	0xffffffff80266234 smp_call_function_mask [vmlinux]:    2736
> 	0xffffffff80234747 __build_sched_domains [vmlinux]:     2232
> 	0xffffffff8023523f __build_sched_domains [vmlinux]:     2232
> 
> Anyway, the reason smp_call_function_mask and friends have such _huge_ 
> stack usages for you is that they contain a 'cpumask_t' on the stack.

In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in 
size. And they tend to call each other.

Quite frankly, I don't think we were really ready for 4k CPU's. I'm going 
to commit this patch to make sure others don't do that many CPU's by 
mistake. It marks MAXCPU's as being 'broken' so you cannot select it, and 
also limits the number of CPU's that you _can_ select to "just" 512.

Right now, 4k cpu's is known broken because of the stack usage. I'm not 
willing to debug more of these kinds of stack smashers, they're really 
nasty to work with. I wonder how many other random failures these have 
been involved with?

This patch also makes the ifdef mess in Kconfig much cleaner and avoids 
duplicate definitions by just conditionally suppressing the question and 
giving higher defaults. 

We can enable MAXSMP and raise the CPU limits some time in the future. But 
that future is not going to be before 2.6.27 - the code simply isn't ready 
for it.

The reason I picked 512 CPU's as the limit is that we _used_ to limit 
things to 255. So it's higher than it used to be, but low enough to still 
feel safe. Considering that a 4k-bit CPU mask (512 bytes) _almost_ worked, 
the 512-bit (64 bytes) masks are almost certainly fine.

Still, sane people should limit their NR_CPUS to 8 or 16 or something like 
that. Very very few people really need the pain of big NR_CPUS. Not even 
"just" 512 CPU's.

Travis, Ingo and Thomas cc'd, since they were involved in the original 
commit (1184dc2ffe2c8fb9afb766d870850f2c3165ef25) that raised the limit.

		Linus

---
 arch/x86/Kconfig |   30 ++++++++----------------------
 1 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 68d91c8..ed92864 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -577,35 +577,29 @@ config SWIOTLB

 config IOMMU_HELPER
 	def_bool (CALGARY_IOMMU || GART_IOMMU || SWIOTLB || AMD_IOMMU)
+
 config MAXSMP
 	bool "Configure Maximum number of SMP Processors and NUMA Nodes"
-	depends on X86_64 && SMP
+	depends on X86_64 && SMP && BROKEN
 	default n
 	help
 	  Configure maximum number of CPUS and NUMA Nodes for this architecture.
 	  If unsure, say N.

-if MAXSMP
-config NR_CPUS
-	int
-	default "4096"
-endif
-
-if !MAXSMP
 config NR_CPUS
-	int "Maximum number of CPUs (2-4096)"
-	range 2 4096
+	int "Maximum number of CPUs (2-512)" if !MAXSMP
+	range 2 512
 	depends on SMP
+	default "4096" if MAXSMP
 	default "32" if X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000
 	default "8"
 	help
 	  This allows you to specify the maximum number of CPUs which this
-	  kernel will support.  The maximum supported value is 4096 and the
+	  kernel will support.  The maximum supported value is 512 and the
 	  minimum value which makes sense is 2.

 	  This is purely to save memory - each supported CPU adds
 	  approximately eight kilobytes to the kernel image.
-endif

 config SCHED_SMT
 	bool "SMT (Hyperthreading) scheduler support"
@@ -996,17 +990,10 @@ config NUMA_EMU
 	  into virtual nodes when booted with "numa=fake=N", where N is the
 	  number of nodes. This is only useful for debugging.

-if MAXSMP
-
 config NODES_SHIFT
-	int
-	default "9"
-endif
-
-if !MAXSMP
-config NODES_SHIFT
-	int "Maximum NUMA Nodes (as a power of 2)"
+	int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP
 	range 1 9   if X86_64
+	default "9" if MAXSMP
 	default "6" if X86_64
 	default "4" if X86_NUMAQ
 	default "3"
@@ -1014,7 +1001,6 @@ config NODES_SHIFT
 	help
 	  Specify the maximum number of NUMA Nodes available on the target
 	  system.  Increases memory reserved to accomodate various tables.
-endif

 config HAVE_ARCH_BOOTMEM_NODE
 	def_bool y

^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                     ` <alpine.LFD.1.10.0808251344250.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-25 21:15                                       ` Linus Torvalds
@ 2008-08-25 21:30                                       ` Alan D. Brunelle
       [not found]                                         ` <48B32458.5020104-VXdhtT5mjnY@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Alan D. Brunelle @ 2008-08-25 21:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

Linus Torvalds wrote:
> 
> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>> But I'll look at your vmlinux, see what stands out.
> 
> Oops. I already see the problem.
> 
> Your .config has soem _huge_ CPU count, doesn't it?
> 
> checkstack.pl shows these things as the top problems:
> 
> 	0xffffffff80266234 smp_call_function_mask [vmlinux]:    2736
> 	0xffffffff80234747 __build_sched_domains [vmlinux]:     2232
> 	0xffffffff8023523f __build_sched_domains [vmlinux]:     2232
> 	0xffffffff8021e884 setup_IO_APIC_irq [vmlinux]:         1616
> 	0xffffffff8021ee24 arch_setup_ht_irq [vmlinux]:         1600
> 	0xffffffff8021f144 arch_setup_msi_irq [vmlinux]:        1600
> 	0xffffffff8021e3b0 __assign_irq_vector [vmlinux]:       1592
> 	0xffffffff8021e626 __assign_irq_vector [vmlinux]:       1592
> 	0xffffffff8023257e move_task_off_dead_cpu [vmlinux]:    1592
> 	0xffffffff802326e8 move_task_off_dead_cpu [vmlinux]:    1592
> 	0xffffffff8025dbc5 tick_handle_oneshot_broadcast [vmlinux]:1544
> 	0xffffffff8025dcb4 tick_handle_oneshot_broadcast [vmlinux]:1544
> 	0xffffffff803f3dc4 store_scaling_governor [vmlinux]:    1376
> 	0xffffffff80279ef4 cpuset_write_resmask [vmlinux]:      1360
> 	0xffffffff803f465d cpufreq_add_dev [vmlinux]:           1352
> 	0xffffffff803f495b cpufreq_add_dev [vmlinux]:           1352
> 	0xffffffff803f3fc4 store_scaling_max_freq [vmlinux]:    1328
> 	0xffffffff803f4064 store_scaling_min_freq [vmlinux]:    1328
> 	0xffffffff803f44c4 cpufreq_update_policy [vmlinux]:     1328
> 	..
> 
> and sys_init_module is actually way way down the list. I bet the only 
> reason it showed up at all was because dynamically it was such a deep 
> callchain, and part of that callchain probably called some of those really 
> nasty things.
> 
> Anyway, the reason smp_call_function_mask and friends have such _huge_ 
> stack usages for you is that they contain a 'cpumask_t' on the stack.
> 
> For example, for me, usign a sane NR_CPU, the size of the stack frame for 
> smp_call_function_mask is under 200 bytes.  For you, it's 2736 bytes.
> 
> How about you make CONFIG_NR_CPU's something _sane_? Like 16? Or do you 
> really have four thousand CPU's in that system?
> 
> Oh, I guess you have the MAXSMP config enabled? I really think that was a 
> bit too aggressive.
> 
> 		Linus

This probably all started when I was working on a software tool (aiod)
that was failing because somebody ELSE had 4,096 CPUs configured.
[[Seems that gcc had/has? it's MAX CPU value set to 1,024 (bits/sched.h
__CPU_SETSIZE), so when you issue system calls like sched_getaffinity,
it will "fail" for systems configured w/ 4,096 CPUs. I worked around it
by simply forgetting about the gcc values, and kept allocating larger
CPU masks until it worked.]]

I think you're right: the kernel as a whole may not be ready for 4,096
CPUs apparently...

Thanks for taking the time to look into this...

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                         ` <48B32458.5020104-VXdhtT5mjnY@public.gmane.org>
@ 2008-08-25 22:07                                           ` Christoph Lameter
       [not found]                                             ` <48B32D39.5040709-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Christoph Lameter @ 2008-08-25 22:07 UTC (permalink / raw)
  To: Alan D. Brunelle
  Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven,
	Rusty Russell, Mike Travis, Ingo Molnar

Alan D. Brunelle wrote:

> I think you're right: the kernel as a whole may not be ready for 4,096
> CPUs apparently...

Mike has been working diligently on getting all these cpumasks off the stack
for the last months and has created an infrastructure to do this. So I think
we are close. It might just be a matter of merging some more patches that are
still left in Ingo's tree.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                 ` <alpine.LFD.1.10.0808251326500.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-25 20:45                                   ` Arjan van de Ven
  2008-08-25 20:52                                   ` Linus Torvalds
@ 2008-08-26  1:11                                   ` Rusty Russell
       [not found]                                     ` <200808261111.19205.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
  2 siblings, 1 reply; 318+ messages in thread
From: Rusty Russell @ 2008-08-26  1:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven

On Tuesday 26 August 2008 06:43:03 Linus Torvalds wrote:
> So now load_module() will still use almost 500 bytes of stack

Hmm, wants neatening anyway; I'll see if I can reduce stack usage side effect.

Your workaround is very random, and that scares me.  I think a huge number of 
CPUs needs a real solution (an actual cpumask allocator, then do something 
clever if we come across an actual fastpath).

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                         ` <alpine.LFD.1.10.0808251401590.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26  7:22                                           ` Ingo Molnar
       [not found]                                             ` <20080826072220.GB31876-X9Un+BFzKDI@public.gmane.org>
  2008-08-26 19:03                                             ` Mike Travis
  2008-08-26 19:01                                           ` Mike Travis
  1 sibling, 2 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-08-26  7:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan D. Brunelle, Mike Travis, Thomas Gleixner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Rusty Russell


* Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> On Mon, 25 Aug 2008, Linus Torvalds wrote:
> > 
> > checkstack.pl shows these things as the top problems:
> > 
> > 	0xffffffff80266234 smp_call_function_mask [vmlinux]:    2736
> > 	0xffffffff80234747 __build_sched_domains [vmlinux]:     2232
> > 	0xffffffff8023523f __build_sched_domains [vmlinux]:     2232
> > 
> > Anyway, the reason smp_call_function_mask and friends have such _huge_ 
> > stack usages for you is that they contain a 'cpumask_t' on the stack.
> 
> In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in 
> size. And they tend to call each other.
> 
> Quite frankly, I don't think we were really ready for 4k CPU's. I'm 
> going to commit this patch to make sure others don't do that many 
> CPU's by mistake. It marks MAXCPU's as being 'broken' so you cannot 
> select it, and also limits the number of CPU's that you _can_ select 
> to "just" 512.

yeah, that's OK i guess - distros can still enable 4K support if they 
wish to. Someone interested in improving the stack footprint situation 
should dust off the max-stack-footprint tracer so that we can catch 
these things in a more structured way.

And i guess the next generation of 4K CPUs support should just get away 
from cpumask_t-on-kernel-stack model altogether, as the current model is 
not maintainable. We tried the on-kernel-stack variant, and it really 
does not work reliably. We can fix this in v2.6.28.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                             ` <20080826072220.GB31876-X9Un+BFzKDI@public.gmane.org>
@ 2008-08-26  7:46                                               ` David Miller
       [not found]                                                 ` <20080826.004607.253712060.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-08-26 19:06                                                 ` Mike Travis
  0 siblings, 2 replies; 318+ messages in thread
From: David Miller @ 2008-08-26  7:46 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI
  Cc: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, travis-sJ/iWh9BUns,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date: Tue, 26 Aug 2008 09:22:20 +0200

> And i guess the next generation of 4K CPUs support should just get away 
> from cpumask_t-on-kernel-stack model altogether, as the current model is 
> not maintainable. We tried the on-kernel-stack variant, and it really 
> does not work reliably. We can fix this in v2.6.28.

I recenetly did some work on sparc64 to use cpumask pointers
as much as possible.

The only case that didn't work was due to a limitation in
arch interfaces for the new generic smp_call_function() code.
It passes a cpumask_t instead of a pointer to one via
arch_send_call_function_ipi().

But other than that, the whole sparc64 SMP stuff uses cpumask_t
pointers only.

What it comes down to is that you have to do the "self cpu"
and other tests in the cross-call dispatch routines themselves,
instead of at the top-level working on cpumask_t objects.

Otherwise you have to modify cpumask_t objects and thus pluck
them onto the stack where they take up silly amounts of space.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                 ` <20080826.004607.253712060.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-08-26  7:53                                                   ` Ingo Molnar
       [not found]                                                     ` <20080826075355.GA7596-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-08-26  7:53 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, travis-sJ/iWh9BUns,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

* David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:

> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Date: Tue, 26 Aug 2008 09:22:20 +0200
> 
> > And i guess the next generation of 4K CPUs support should just get away 
> > from cpumask_t-on-kernel-stack model altogether, as the current model is 
> > not maintainable. We tried the on-kernel-stack variant, and it really 
> > does not work reliably. We can fix this in v2.6.28.
> 
> I recenetly did some work on sparc64 to use cpumask pointers as much 
> as possible.
> 
> The only case that didn't work was due to a limitation in arch 
> interfaces for the new generic smp_call_function() code. It passes a 
> cpumask_t instead of a pointer to one via 
> arch_send_call_function_ipi().
> 
> But other than that, the whole sparc64 SMP stuff uses cpumask_t 
> pointers only.

nice!

> What it comes down to is that you have to do the "self cpu" and other 
> tests in the cross-call dispatch routines themselves, instead of at 
> the top-level working on cpumask_t objects.
> 
> Otherwise you have to modify cpumask_t objects and thus pluck them 
> onto the stack where they take up silly amounts of space.

What we did was this: we added MAXSMP which just revs up all the SMP 
tunables to the maximum, so that we can see any problems early in 
testing.

And we triggered problems, and we fixed a couple of regressions all 
around stack footprint. But we didnt catch all of them - some were gcc 
version dependent and configuration dependent. So i think it's safe to 
say that the whole concept of allowing such a large cpumask_t to be on 
the stack is fragile.

Hence, i think the best way forward is to change the whole cpumask_t 
concept and disallow explicit masks altogether. It's so easy to smack a 
cpumask_t variable on the stack and nothing really warns about it, and 
any function can become part of a nested call sequence.

So i think the dynamics of it has to be changed: we need a get/put API 
and we need to make on-stack cpumask illegal on the build level (in 
generic code at least). This has been Rusty's main argument early on i 
think, and i now concur.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                             ` <48B32D39.5040709-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2008-08-26  7:59                                               ` Ingo Molnar
       [not found]                                                 ` <20080826075937.GB7596-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-08-26  7:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Alan D. Brunelle, Linus Torvalds, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Rusty Russell, Mike Travis

* Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> Alan D. Brunelle wrote:
> 
> > I think you're right: the kernel as a whole may not be ready for 4,096
> > CPUs apparently...
> 
> Mike has been working diligently on getting all these cpumasks off the 
> stack for the last months and has created an infrastructure to do 
> this. So I think we are close. It might just be a matter of merging 
> some more patches that are still left in Ingo's tree.

hm, there are no such patches left that i know of - the only bits in 
-tip are the zero-based percpu, which was found to be a bit fragile in 
testing:

 earth4:~/tip> git-log-line --author=Travis linus..
 d379497: Zero based percpu: infrastructure to rebase the per cpu area to zero
 b3a0cb4: x86: extend percpu ops to 64 bit

[and it has no relevance to stack footprint.]

So i dont think the current cpumask_t approach will work. We simply 
should not get into an endless fight against the windmills that 
introduce on-stack cpumask_t again and again. We should just take the 
plunge once and do a clean alloc/free cpumask model. Most of the hotpath 
cpumasks are constant or pre-constructed, so they are not a real issue.

Plus, on the general question of stack footprint problems and the 
difficulty of debugging them, the worst-case stack footprint tracer i 
wrote for -rt some time ago should be dusted off as well and put into 
ftrace. David has something quite close to that for Sparc64 already.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                     ` <20080826075355.GA7596-X9Un+BFzKDI@public.gmane.org>
@ 2008-08-26  8:36                                                       ` Yinghai Lu
       [not found]                                                         ` <86802c440808260136t3a33a9c8if53b6f70ab9df9e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2008-08-26 19:11                                                       ` Mike Travis
  1 sibling, 1 reply; 318+ messages in thread
From: Yinghai Lu @ 2008-08-26  8:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Miller, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, travis-sJ/iWh9BUns,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

On Tue, Aug 26, 2008 at 12:53 AM, Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
>
> * David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
>
>> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
>> Date: Tue, 26 Aug 2008 09:22:20 +0200
>>
>> > And i guess the next generation of 4K CPUs support should just get away
>> > from cpumask_t-on-kernel-stack model altogether, as the current model is
>> > not maintainable. We tried the on-kernel-stack variant, and it really
>> > does not work reliably. We can fix this in v2.6.28.
>>
>> I recenetly did some work on sparc64 to use cpumask pointers as much
>> as possible.
>>
>> The only case that didn't work was due to a limitation in arch
>> interfaces for the new generic smp_call_function() code. It passes a
>> cpumask_t instead of a pointer to one via
>> arch_send_call_function_ipi().
>>
>> But other than that, the whole sparc64 SMP stuff uses cpumask_t
>> pointers only.

wonder if could use "unsigned long *" directly.
so could dyn_array directly like

int cpumask_size;

unsigned long *online_cpu_map;
DEFINE_DYN_ARRAY(online_cpu_map, sizeof(unsigned long), cpumask_size,
PAGE_SIZE, NULL);

and after nr_cpu_ids is assigned, have
cpumask_size = (nr_cpu_ids + sizeof(unsigned long) - 1)/sizeof(unsigned long);

then we could NR_CPUS=4096 kernel to small system. ...

YH

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                         ` <86802c440808260136t3a33a9c8if53b6f70ab9df9e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-26 16:51                                                           ` Linus Torvalds
       [not found]                                                             ` <alpine.LFD.1.10.0808260939070.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 16:51 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, David Miller, Alan.Brunelle-VXdhtT5mjnY,
	travis-sJ/iWh9BUns, tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	Linux Kernel Mailing List, kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, arjan-VuQAYsv1563Yd54FQh9/CA,
	rusty-8n+1lVoiYb80n/F98K4Iww



On Tue, 26 Aug 2008, Yinghai Lu wrote:
> 
> wonder if could use "unsigned long *" directly.

I would actually suggest something like this:

 - we continue to have a magic "cpumask_t".

 - we do different cases for big and small NR_CPUS:

	#if NR_CPUS <= BITS_PER_LONG

	/*
	 * Make it an array - that way passing it as an argument will
	 * always pass it as a pointer!
	 */
	typedef unsigned long cpumask_t[1];

	static inline void create_cpumask(cpumask_t *p)
	{
		*p = 0;
	}
	static inline void free_cpumask(cpumask_t *p)
	{
	}

	#else

	typedef unsigned long *cpumask_t;

	static inline void create_cpumask(cpumask_t *p)
	{
		*p = kcalloc(..);
	}

	static inline void free_cpumask(cpumask_t *p)
	{
		kfree(*p);
	}

	#endif

and now after you do this, you can just do something like

	cpumask_t mycpu;

	create_cpumask(&mycpu);
	..
	free_cpumask(&mycpu);

and in between, you can use 'cpumask' as a pointer, because even when it 
is an array directly allocated on the stack, the array can always 
degenerate into a pointer by C type rules!

And for the small-NR_CPUS case there is zero overhead.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                             ` <alpine.LFD.1.10.0808260939070.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26 17:08                                                               ` Yinghai Lu
  2008-09-25  1:50                                                               ` Rusty Russell
  1 sibling, 0 replies; 318+ messages in thread
From: Yinghai Lu @ 2008-08-26 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, David Miller, Alan.Brunelle-VXdhtT5mjnY,
	travis-sJ/iWh9BUns, tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	Linux Kernel Mailing List, kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, arjan-VuQAYsv1563Yd54FQh9/CA,
	rusty-8n+1lVoiYb80n/F98K4Iww

On Tue, Aug 26, 2008 at 9:51 AM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>
>
> On Tue, 26 Aug 2008, Yinghai Lu wrote:
>>
>> wonder if could use "unsigned long *" directly.
>
> I would actually suggest something like this:
>
>  - we continue to have a magic "cpumask_t".
>
>  - we do different cases for big and small NR_CPUS:
>
>        #if NR_CPUS <= BITS_PER_LONG
>
>        /*
>         * Make it an array - that way passing it as an argument will
>         * always pass it as a pointer!
>         */
>        typedef unsigned long cpumask_t[1];
>
>        static inline void create_cpumask(cpumask_t *p)
>        {
>                *p = 0;
>        }
>        static inline void free_cpumask(cpumask_t *p)
>        {
>        }
>
>        #else
>
>        typedef unsigned long *cpumask_t;
>
>        static inline void create_cpumask(cpumask_t *p)
>        {
>                *p = kcalloc(..);
>        }
>
>        static inline void free_cpumask(cpumask_t *p)
>        {
>                kfree(*p);
>        }
>
>        #endif
>
> and now after you do this, you can just do something like
>
>        cpumask_t mycpu;
>
>        create_cpumask(&mycpu);
>        ..
>        free_cpumask(&mycpu);
>
> and in between, you can use 'cpumask' as a pointer, because even when it
> is an array directly allocated on the stack, the array can always
> degenerate into a pointer by C type rules!
>

that is good for local variables.

for global variables, need to allocate them in some point. may need one
int cpumask_size;

cpumask_t online_cpu_map;
DEFINE_DYN_ARRAY(online_cpu_map, sizeof(unsigned long), cpumask_size,
PAGE_SIZE, NULL);

or something like that.

YH

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                     ` <200808261111.19205.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
@ 2008-08-26 17:35                                       ` Linus Torvalds
  2008-08-26 18:30                                         ` Adrian Bunk
       [not found]                                         ` <alpine.LFD.1.10.0808261019450.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 2 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 17:35 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven, Ingo Molnar

On Tue, 26 Aug 2008, Rusty Russell wrote:
> 
> Your workaround is very random, and that scares me.  I think a huge number of 
> CPUs needs a real solution (an actual cpumask allocator, then do something 
> clever if we come across an actual fastpath).

The thing is, the inlining thing is a separate issue.

Yes, the cpumasks were what made stack pressure so critical to begin with, 
but no, a 400-byte stack frame in a deep callchain isn't acceptable 
_regardless_ of any cpumask_t issues.

Gcc inlining is a total and utter pile of shit. And _that_ is the problem. 
I seriously think we shouldn't allow gcc to inline anything at all unless 
we tell it to. That's how it used to work, and quite frankly, that's how 
it _should_ work.

The downsides of inlining are big enough from both a debugging and a real 
code generation angle (eg stack usage like this), that the upsides 
(_somesimes_ smaller kernel, possibly slightly faster code) simply aren't 
relevant.

So the "noinline" was random, yes, but this is a real issue. Looking at 
checkstack output for a saner config (NR_CPUS=16), the top entries for me 
are things like

	ide_generic_init [vmlinux]:             1384
	idefloppy_ioctl [vmlinux]:              1208
	e1000_check_options [vmlinux]:  	1152
	...

which are "leaf" functions. They are broken as hell (the e1000 is 
apparently because it builds structs on the stack that should all be 
"static const", for example), but they are different from something like 
the module init sequence in that they are not going to affect anything 
else.

It would be interesting to see what "-fno-default-inline" does to the 
kernel. It really would get rid of a _lot_ of gcc version issues too. 
Inlining behavior of gcc has long been a problem for us.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 17:35                                       ` Linus Torvalds
@ 2008-08-26 18:30                                         ` Adrian Bunk
       [not found]                                           ` <20080826183051.GB10925-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
       [not found]                                         ` <alpine.LFD.1.10.0808261019450.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Adrian Bunk @ 2008-08-26 18:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded

On Tue, Aug 26, 2008 at 10:35:05AM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 26 Aug 2008, Rusty Russell wrote:
> > 
> > Your workaround is very random, and that scares me.  I think a huge number of 
> > CPUs needs a real solution (an actual cpumask allocator, then do something 
> > clever if we come across an actual fastpath).
> 
> The thing is, the inlining thing is a separate issue.
> 
> Yes, the cpumasks were what made stack pressure so critical to begin with, 
> but no, a 400-byte stack frame in a deep callchain isn't acceptable 
> _regardless_ of any cpumask_t issues.
> 
> Gcc inlining is a total and utter pile of shit. And _that_ is the problem. 
> I seriously think we shouldn't allow gcc to inline anything at all unless 
> we tell it to. That's how it used to work, and quite frankly, that's how 
> it _should_ work.
> 
> The downsides of inlining are big enough from both a debugging and a real 
> code generation angle (eg stack usage like this), that the upsides 
> (_somesimes_ smaller kernel, possibly slightly faster code) simply aren't 
> relevant.
>...
> It would be interesting to see what "-fno-default-inline" does to the 
> kernel. It really would get rid of a _lot_ of gcc version issues too. 
> Inlining behavior of gcc has long been a problem for us.

I added "-fno-inline-functions-called-once -fno-early-inlining" to 
KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel 
image by 2%.

And when David's "-fwhole-program --combine" will become ready the cost
of disallowing gcc to inline functions will most likely increase.

A debugging option (for better traces) to disallow gcc some inlining 
might make sense (and might even make sense for distributions to 
enable in their kernels), but when you go to use cases that require
really small kernels the cost is too high.

But if you don't trust gcc's inlining you should revert
commit 3f9b5cc018566ad9562df0648395649aebdbc5e0 that increases gcc's 
freedom regarding what to inline in 2.6.27 - what gcc 4.2 does in the 
case of the regression tracked as Bugzilla #11276 is really not funny 
(two callers -> function not inlined; gcc seems to emit the function 
although both callers later get removed (or at least should be removed) 
by dead code elimination).

> 			Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                           ` <20080826183051.GB10925-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
@ 2008-08-26 18:40                                             ` Linus Torvalds
       [not found]                                               ` <alpine.LFD.1.10.0808261134530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-26 18:47                                             ` Linus Torvalds
  1 sibling, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 18:40 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, 26 Aug 2008, Adrian Bunk wrote:
> 
> A debugging option (for better traces) to disallow gcc some inlining 
> might make sense (and might even make sense for distributions to 
> enable in their kernels), but when you go to use cases that require
> really small kernels the cost is too high.

You ignore the fact that it's really not just about debugging.

Inlining really isn't the great tool some people think it is. Especially 
not since gcc stack allocation is so horrid that it won't re-use stack 
slots etc (which I don't disagree with per se - it's _hard_ to re-use 
stack slots while still allowing code scheduling).

NOTE! I also would never claim that _our_ choices of "inline" are all that 
great, and we've often inlined too much or not inlined things that really 
could be inlined. But at least when a developer says "inline" (or forgets 
to say it), we have somebody to blame. When the compiler does insane 
things that doesn't suit us, we're just screwed.

> But if you don't trust gcc's inlining you should revert
> commit 3f9b5cc018566ad9562df0648395649aebdbc5e0 that increases gcc's 
> freedom regarding what to inline in 2.6.27

Actually, that just allows gcc to _not_ inline. Which is probably ok.

(Well, it would be ok if gcc did it well enough, it obviously has some 
problems at times).

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                           ` <20080826183051.GB10925-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
  2008-08-26 18:40                                             ` Linus Torvalds
@ 2008-08-26 18:47                                             ` Linus Torvalds
  2008-08-26 19:02                                               ` Jamie Lokier
       [not found]                                               ` <alpine.LFD.1.10.0808261144510.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 2 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 18:47 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA



On Tue, 26 Aug 2008, Adrian Bunk wrote:
> 
> I added "-fno-inline-functions-called-once -fno-early-inlining" to 
> KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel 
> image by 2%.

Btw, did you check with just "-fno-inline-functions-called-once"?

The -fearly-inlining decisions _should_ be mostly right. If gcc sees early 
that a function is so small (even without any constant propagation etc) 
that it can be inlined, it's probably right. 

The inline-functions-called-once thing is what causes even big functions 
to be inlined, and that's where you find the big downsides too (eg the 
stack usage).

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                         ` <alpine.LFD.1.10.0808251401590.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-26  7:22                                           ` Ingo Molnar
@ 2008-08-26 19:01                                           ` Mike Travis
       [not found]                                             ` <48B452F3.9040304-sJ/iWh9BUns@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-08-26 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan D. Brunelle, Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Rusty Russell, Siddha, Suresh B, Luck, Tony,
	Jack Steiner, Andrew Morton, Christoph Lameter

Linus Torvalds wrote:
> 
> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>> checkstack.pl shows these things as the top problems:
>>
>> 	0xffffffff80266234 smp_call_function_mask [vmlinux]:    2736
>> 	0xffffffff80234747 __build_sched_domains [vmlinux]:     2232
>> 	0xffffffff8023523f __build_sched_domains [vmlinux]:     2232
>>
>> Anyway, the reason smp_call_function_mask and friends have such _huge_ 
>> stack usages for you is that they contain a 'cpumask_t' on the stack.
> 
> In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in 
> size. And they tend to call each other.
> 
> Quite frankly, I don't think we were really ready for 4k CPU's. I'm going 
> to commit this patch to make sure others don't do that many CPU's by 
> mistake. It marks MAXCPU's as being 'broken' so you cannot select it, and 
> also limits the number of CPU's that you _can_ select to "just" 512.
> 
> Right now, 4k cpu's is known broken because of the stack usage. I'm not 
> willing to debug more of these kinds of stack smashers, they're really 
> nasty to work with. I wonder how many other random failures these have 
> been involved with?
> 
> This patch also makes the ifdef mess in Kconfig much cleaner and avoids 
> duplicate definitions by just conditionally suppressing the question and 
> giving higher defaults. 
> 
> We can enable MAXSMP and raise the CPU limits some time in the future. But 
> that future is not going to be before 2.6.27 - the code simply isn't ready 
> for it.
> 
> The reason I picked 512 CPU's as the limit is that we _used_ to limit 
> things to 255. So it's higher than it used to be, but low enough to still 
> feel safe. Considering that a 4k-bit CPU mask (512 bytes) _almost_ worked, 
> the 512-bit (64 bytes) masks are almost certainly fine.
> 
> Still, sane people should limit their NR_CPUS to 8 or 16 or something like 
> that. Very very few people really need the pain of big NR_CPUS. Not even 
> "just" 512 CPU's.
> 
> Travis, Ingo and Thomas cc'd, since they were involved in the original 
> commit (1184dc2ffe2c8fb9afb766d870850f2c3165ef25) that raised the limit.
> 
> 		Linus

Hi Linus,

[Sorry for the long winded response, but I felt that sufficient background
is needed to address this issue.... YOMV :-)]

The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is
critical to our upcoming SGI systems using what we have been calling
"UV".  This system will be capable of supporting 4096 cpu threads in a
single system image (and 16k cpus/2k nodes is right around the corner).
While obviously I cannot divulge too many details, it's sufficient to
say there are customers who not only require this extended capability,
but are extremely excited about it.

But the nature of some of these system environments is that they will
not accept a specially built kernel, but only a kernel that has been
built and certified (both from the application standpoint as well as
the security standpoint) by standard distributions.  And you probably
know how extensively these distributions test and certify for many known
defects and absolutely require that incoming source changes come from
the community supported source bases, primarily yours.

Due to the lead time required to accomplish these certifications,
the version of the distributions that will be available when this
system releases will be based on 2.6.27.  (They will allow patches
"post-2.6.27-rc.final" as long as those are committed in the source base.)
The two distributions that SGI supports for our customers is SLES
(SUSE Linux Enterprise Server) and RHEL (Red Hat Enterprise Linux).
[They, of course, are free to run any OS of their choosing, but SGI only
provides front line support for those two.]

I started last August to begin analyzing how to accomplish the above goals
and where exactly are the hot spots in the kernel that would require
attention.  It quickly became clear that cpumask_t and nodemask_t are
two variables that are very casually used (along with NR_CPUS), because
the assumption was that 64 "was more than sufficient" for an upper limit
and even extending it to 128 or 255 (254 was the maximum IPI broadcast
ID until x2apic), only added a few more bytes here and there.

I chose not to introduce too many dramatic changes and instead analyzed
every instance where cpumask_t and NR_CPUS was being used (along with the
node counterparts.)  An initial proposal was to allow the default stack
size to be increased, but this was met with a lot of objections because
of the extensive work that was done to bring it to it's current size.

So in summary, the goals of the changes that I have been making since
last October are:

	1. Allow a "typical distro" configured kernel with NR_CPUS=4096
	   and NODES_SHIFT=9 to be booted on an x86_64 system with 2GB
	   of memory.  (Some thought was given to use a 512Mb laptop
	   as the base, but because of other memory bloat from using a
	   64-bit kernel, that was considered not very useful.)

	   [Note I frequently use an allyes and allmod config for
	   testing.]

	2. To lessen as much as possible, the impact on memory usage for
	   that same kernel on that same system.

	3. To lessen as much as possible, the impact on system performance
	   for that same kernel on that same system.  [Which mostly
	   depended on #2.]

I booted the first 4096 cpu kernel last February, and since around March
or April, Ingo has been (build and boot) testing the x86 branch using
"MAXSMP" to trigger the increased defaults quite extensively (IIRC, it
was somewhere between 75% and 90% of all kernels built.)  We here at SGI
nightly build 4 trees (linux-2.6, linux-next, linux-mm, linux-x86) to
insure new checkins don't conflict with changes we've made in the past.
Unfortunately, our run testing wasn't sufficient to catch this latest
error (and I will be quickly fixing that.)

I will also revisit all the past areas to analyze if there have been
other abuses of stack and memory space added since the 4k cpu limit
was "certified" as usable and releasable.  (See below for an initial
survey of size increases between a 512cpu/64node configuration and a
4096cpu/512node configuration.)

So perhaps "MAXSMP" is not needed (or perhaps should be more hidden to
reduce accidental uses), but allowing the defaults listed above to be in
the standard x86/Kconfig insures that the distros can at least attempt
certification with the maximally configured kernels for their enterprise
editions of Linux.

There are many more changes that will be proposed for the 2.6.28 window.
Most certainly your concerns, as well as others, about how to change the
current "cpumask paradigm" to be more easily manageable for systems with
huge cpu counts, will be visited.  (And surely be well discussed. :-)

Thanks,
Mike
---

linux-2.6: v2.6.27-rc4-176-gb8e6c91

====== Data (-l 500)
... files 2 vars 1421 all 0 lim 500 unch 0

    1 - 512-64-allmodconfig
    2 - 4096-512-allmodconfig

      .1.       .2.    ..final..
  1671168  +3899392 5570560  +233%  irq_desc(.data.cacheline_aligned)
   591872  +3899392 4491264  +658%  irq_cfg(.data.read_mostly)
    76800   +537600  614400  +700%  early_node_map(.init.data)
    66176   +462336  528512  +698%  init_mem_cgroup(.bss)
    65536   +458752  524288  +700%  boot_pageset(.bss)
    63648   +419328  482976  +658%  kmalloc_caches(.data.cacheline_aligned)
    15328    +61376   76704  +400%  def_root_domain(.bss)
    10240    +43008   53248  +420%  change_point_list(.init.data)
     8760      +504    9264    +5%  init_task(.data)
     8192    +57344   65536  +700%  kgdb_info(.bss)
     6404    +26880   33284  +419%  e820_saved(.bss)
     6404    +26880   33284  +419%  e820(.bss)
     6400    +26880   33280  +420%  new_bios(.init.data)
     5120    +35840   40960  +700%  node_devices(.bss)
     5120    +21504   26624  +420%  change_point(.init.data)
     4160    +29120   33280  +700%  cpu_bit_bitmap(.rodata)
     4096    +28672   32768  +700%  __cpu_pda(.init.data)
     3776    +25088   28864  +664%  hstates(.bss)
     3584    +25088   28672  +700%  bootmem_node_data(.init.data)
     2560    +10752   13312  +420%  overlap_list(.init.data)
     2048    +14336   16384  +700%  x86_cpu_to_node_map_early_map(.init.data)
     2048    +14336   16384  +700%  was_in_debug_nmi(.bss)
     2048    +14336   16384  +700%  rio_devs(.init.data)
     2048    +14336   16384  +700%  passive_cpu_wait(.bss)
     2048    +14336   16384  +700%  node_memblk_range(.init.data)
     2048    +14336   16384  +700%  ints(.init.data)
     2048    +14336   16384  +700%  cpu_in_kgdb(.bss)
     1024     +7168    8192  +700%  x86_cpu_to_apicid_early_map(.init.data)
     1024     +7168    8192  +700%  x86_bios_cpu_apicid_early_map(.init.data)
     1024     +1024    2048  +100%  pxm_to_node_map(.data)
     1024     +7168    8192  +700%  nodes_add(.bss)
     1024     +7168    8192  +700%  nodes(.init.data)
      512     +3584    4096  +700%  zone_movable_pfn(.init.data)
      512     +3584    4096  +700%  tvec_base_done(.data)
      512     +3584    4096  +700%  scal_devs(.init.data)
      512     +3584    4096  +700%  node_data(.data.read_mostly)
      512     +3584    4096  +700%  memblk_nodeid(.init.data)
        0     +2048    2048      .  node_to_pxm_map(.data)
        0     +2048    2048      .  node_order(.bss)
        0     +2048    2048      .  node_load(.bss)
        0     +2048    2048      .  fake_node_to_pxm_map(.init.data)
        0      +768     768      .  rcu_ctrlblk(.data)
        0      +768     768      .  rcu_bh_ctrlblk(.data)
        0      +768     768      .  per_cpu__cpu_info(.data.percpu)
        0      +768     768      .  boot_cpu_data(.data.read_mostly)
        0      +760     760      .  per_cpu__phys_domains(.data.percpu)
        0      +760     760      .  per_cpu__node_domains(.data.percpu)
        0      +760     760      .  per_cpu__cpu_domains(.data.percpu)
        0      +760     760      .  per_cpu__core_domains(.data.percpu)
        0      +760     760      .  per_cpu__allnodes_domains(.data.percpu)
        0      +720     720      .  top_cpuset(.data)
        0      +640     640      .  per_cpu__flush_state(.data.percpu)
        0      +632     632      .  pit_clockevent(.data)
        0      +632     632      .  per_cpu__lapic_events(.data.percpu)
        0      +632     632      .  lapic_clockevent(.data)
        0      +632     632      .  hpet_clockevent(.data)
        0      +616     616      .  net_dma(.data)
        0      +579     579      .  do_migrate_pages(.text)
        0      +568     568      .  irq2(.data)
        0      +568     568      .  irq0(.data)
        0      +528     528      .  per_cpu__sched_group_phys(.data.percpu)
        0      +528     528      .  per_cpu__sched_group_cpus(.data.percpu)
        0      +528     528      .  per_cpu__sched_group_core(.data.percpu)
        0      +528     528      .  per_cpu__sched_group_allnodes(.data.percpu)
        0      +520     520      .  out_of_memory(.text)
        0      +520     520      .  nohz(.data)
        0      +512     512      .  tick_broadcast_oneshot_mask(.bss)
        0      +512     512      .  tick_broadcast_mask(.bss)
        0      +512     512      .  prof_cpu_mask(.data)
        0      +512     512      .  per_cpu__local_cpu_mask(.data.percpu)
        0      +512     512      .  per_cpu__cpu_sibling_map(.data.percpu)
        0      +512     512      .  per_cpu__cpu_core_map(.data.percpu)
        0      +512     512      .  nohz_cpu_mask(.bss)
        0      +512     512      .  mce_device_initialized(.bss)
        0      +512     512      .  mce_cpus(.bss)
        0      +512     512      .  marked_cpus(.bss)
        0      +512     512      .  kmem_cach_cpu_free_init_once(.bss)
        0      +512     512      .  irq_default_affinity(.data)
        0      +512     512      .  frozen_cpus(.bss)
        0      +512     512      .  fallback_doms(.bss)
        0      +512     512      .  cpu_singlethread_map(.data.read_mostly)
        0      +512     512      .  cpu_sibling_setup_map(.bss)
        0      +512     512      .  cpu_present_map(.data.read_mostly)
        0      +512     512      .  cpu_possible_map(.bss)
        0      +512     512      .  cpu_populated_map(.data.read_mostly)
        0      +512     512      .  cpu_online_map(.data.read_mostly)
        0      +512     512      .  cpu_mask_none(.bss)
        0      +512     512      .  cpu_mask_all(.data.read_mostly)
        0      +512     512      .  cpu_isolated_map(.bss)
        0      +512     512      .  cpu_initialized(.data)
        0      +512     512      .  cpu_callout_map(.bss)
        0      +512     512      .  cpu_callin_map(.bss)
        0      +512     512      .  cpu_active_map(.bss)
        0      +512     512      .  cache_dev_map(.bss)
        0      +512     512      .  c1e_mask(.bss)
        0      +512     512      .  backtrace_mask(.bss)
 2647360 +10283499 12930859 +388%  Totals

====== Sections (-l 500)
... files 2 vars 36 all 0 lim 500 unch 0

    1 - 512-64-allmodconfig
    2 - 4096-512-allmodconfig

       .1.        .2.    ..final..
  66688274  +10345296 77033570   +15%  Total
  38237848     +44031 38281879    <1%  .debug_info
   8441752   +1215872  9657624   +14%  .bss
   2551715      +3136  2554851    <1%  .text
   1737600   +4318720  6056320  +248%  .data.cacheline_aligned
   1640096      +6784  1646880    <1%  .data.percpu
   1175061     +29104  1204165    +2%  .rodata
   1073400     +13712  1087112    +1%  .debug_abbrev
    901760      +1392   903152    <1%  .debug_ranges
    608192   +3906016  4514208  +642%  .data.read_mostly
    302704     +13504   316208    +4%  .data
    244896    +792112  1037008  +323%  .init.data
123603298 +20689679 144292977  +16%  Totals

====== Text/Data ()
... files 2 vars 6 all 0 lim 0 unch 0

    1 - 512-64-allmodconfig
    2 - 4096-512-allmodconfig

      .1.       .2.    ..final..
  2551808     +2048  2553856    <1%  TextSize
  1679360    +43008  1722368    +2%  DataSize
  8441856  +1216512  9658368   +14%  BssSize
  2138112   +798720  2936832   +37%  InitSize
  1640448     +6144  1646592    <1%  PerCPU
  2383872  +8228864 10612736  +345%  OtherSize
18835456 +10295296 29130752  +54%  Totals

====== PerCPU ()
... files 2 vars 22 all 0 lim 0 unch 0

    1 - 512-64-allmodconfig
    2 - 4096-512-allmodconfig

   .1.    .2.    ..final..
  2048  -2048    .  -100%  vm_event_states
  2048  -2048    .  -100%  softnet_data
  2048  -2048    .  -100%  init_sched_rt_entity
  2048  -2048    .  -100%  core_domains
     0  +2048 2048      .  sched_group_core
     0  +2048 2048      .  node_domains
     0  +2048 2048      .  lru_add_active_pvecs
     0  +2048 2048      .  init_rt_rq
     0  +2048 2048      .  cpu_domains
     0  +2048 2048      .  cpu_core_map
     0  +2048 2048      .  cpu_buffer
 8192 +6144 14336  +75%  Totals

====== Stack (-l 500)
... files 2 vars 126 all 0 lim 500 unch 0

    1 - 512-64-allmodconfig
    2 - 4096-512-allmodconfig

  .1.    .2.    ..final..
    0  +2712 2712      .  smp_call_function_mask
    0  +1576 1576      .  setup_IO_APIC_irq
    0  +1576 1576      .  move_task_off_dead_cpu
    0  +1560 1560      .  arch_setup_ht_irq
    0  +1560 1560      .  __assign_irq_vector
    0  +1544 1544      .  tick_handle_oneshot_broadcast
    0  +1544 1544      .  msi_compose_msg
    0  +1440 1440      .  cpuset_write_resmask
    0  +1352 1352      .  store_scaling_governor
    0  +1352 1352      .  cpufreq_add_dev
    0  +1320 1320      .  cpufreq_update_policy
    0  +1312 1312      .  store_scaling_min_freq
    0  +1312 1312      .  store_scaling_max_freq
    0  +1176 1176      .  threshold_create_device
    0  +1128 1128      .  setup_IO_APIC
    0  +1096 1096      .  sched_balance_self
    0  +1080 1080      .  sched_rt_period_timer
    0  +1080 1080      .  _cpu_down
    0  +1064 1064      .  set_ioapic_affinity_irq
    0  +1048 1048      .  store_interrupt_enable
    0  +1048 1048      .  setup_timer_IRQ0_pin
    0  +1048 1048      .  setup_ioapic_dest
    0  +1048 1048      .  set_msi_irq_affinity
    0  +1048 1048      .  set_ht_irq_affinity
    0  +1048 1048      .  native_machine_crash_shutdown
    0  +1048 1048      .  native_flush_tlb_others
    0  +1048 1048      .  dmar_msi_set_affinity
    0  +1040 1040      .  store_threshold_limit
    0  +1040 1040      .  show_error_count
    0  +1040 1040      .  acpi_map_lsapic
    0  +1032 1032      .  tick_do_periodic_broadcast
    0  +1032 1032      .  sched_setaffinity
    0  +1032 1032      .  native_send_call_func_ipi
    0  +1032 1032      .  local_cpus_show
    0  +1032 1032      .  local_cpulist_show
    0  +1032 1032      .  irq_select_affinity
    0  +1032 1032      .  irq_complete_move
    0  +1032 1032      .  irq_affinity_proc_write
    0  +1032 1032      .  flush_tlb_mm
    0  +1032 1032      .  flush_tlb_current_task
    0  +1032 1032      .  fixup_irqs
    0  +1032 1032      .  create_irq
    0  +1024 1024      .  uv_vector_allocation_domain
    0  +1024 1024      .  uv_send_IPI_allbutself
    0  +1024 1024      .  store_error_count
    0  +1024 1024      .  physflat_send_IPI_allbutself
    0  +1024 1024      .  pci_bus_show_cpuaffinity
    0  +1024 1024      .  move_masked_irq
    0  +1024 1024      .  flush_tlb_page
    0  +1024 1024      .  flat_send_IPI_allbutself
    0   +784  784      .  sd_init_ALLNODES
    0   +776  776      .  sd_init_SIBLING
    0   +776  776      .  sd_init_NODE
    0   +768  768      .  sd_init_MC
    0   +768  768      .  sd_init_CPU
    0   +728  728      .  update_flag
    0   +696  696      .  init_intel_cacheinfo
    0   +680  680      .  __build_sched_domains
    0   +648  648      .  thread_return
    0   +648  648      .  schedule
    0   +640  640      .  cpuset_attach
    0   +616  616      .  rebalance_domains
    0   +600  600      .  select_task_rq_fair
    0   +600  600      .  cache_add_dev
    0   +584  584      .  shmem_getpage
    0   +568  568      .  pdflush
    0   +552  552      .  tick_notify
    0   +552  552      .  partition_sched_domains
    0   +552  552      .  free_sched_groups
    0   +552  552      .  __percpu_alloc_mask
    0   +544  544      .  taskstats_user_cmd
    0   +536  536      .  sched_init_smp
    0   +536  536      .  pci_device_probe
    0   +536  536      .  cpuset_common_file_read
    0   +536  536      .  cpupri_find
    0   +536  536      .  acpi_processor_ffh_cstate_probe
    0   +536  536      .  __cpu_disable
    0   +520  520      .  uv_send_IPI_all
    0   +520  520      .  tick_do_broadcast
    0   +520  520      .  smp_call_function
    0   +520  520      .  show_related_cpus
    0   +520  520      .  show_affected_cpus
    0   +520  520      .  prof_cpu_mask_write_proc
    0   +520  520      .  physflat_send_IPI_mask
    0   +520  520      .  physflat_send_IPI_all
    0   +520  520      .  native_smp_send_reschedule
    0   +520  520      .  native_send_call_func_single_ipi
    0   +520  520      .  lapic_timer_broadcast
    0   +520  520      .  irq_set_affinity
    0   +520  520      .  flat_vector_allocation_domain
    0   +520  520      .  flat_send_IPI_all
    0   +520  520      .  find_lowest_rq
    0   +520  520      .  cpuset_can_attach
    0   +520  520      .  cpu_callback
    0   +520  520      .  compat_sys_sched_setaffinity
    0   +520  520      .  add_del_listener
    0   +512  512      .  sys_sched_setaffinity
    0   +512  512      .  sys_sched_getaffinity
    0   +512  512      .  run_rebalance_domains
    0   +512  512      .  ioapic_retrigger_irq
    0   +512  512      .  generic_processor_info
    0   +512  512      .  force_quiescent_state
    0   +512  512      .  destroy_irq
    0   +512  512      .  default_affinity_write
    0   +512  512      .  cpu_to_phys_group
    0   +512  512      .  cpu_to_allnodes_group
    0   +512  512      .  compat_sys_sched_getaffinity
    0   +512  512      .  check_preempt_curr_rt
    0   +512  512      .  assign_irq_vector
   0 +92248 92248   +0%  Totals

====== MemInfo ()
... files 0 vars 0 all 0 lim 0 unch 0

(runtime meminfo not collected.)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 18:47                                             ` Linus Torvalds
@ 2008-08-26 19:02                                               ` Jamie Lokier
       [not found]                                                 ` <20080826190213.GA30255-yetKDKU6eevNLxjTenLetw@public.gmane.org>
       [not found]                                               ` <alpine.LFD.1.10.0808261144510.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Jamie Lokier @ 2008-08-26 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded

Linus Torvalds wrote:
> The inline-functions-called-once thing is what causes even big functions 
> to be inlined, and that's where you find the big downsides too (eg the 
> stack usage).

That's a bit bizarre, though, isn't it?

A function which is only called from one place should, if everything
made sense, _never_ use more stack through being inlined.  Inlining
should just increase the opportunities that the called function's
local variables can share the same stack slots are the caller's dead
locals.

Whereas not inlining guarantees they occupy separate, immediately
adjacent regions of the stack, and shouldn't be increasing the total
numbers of local variables.

-- Jamie

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26  7:22                                           ` Ingo Molnar
       [not found]                                             ` <20080826072220.GB31876-X9Un+BFzKDI@public.gmane.org>
@ 2008-08-26 19:03                                             ` Mike Travis
       [not found]                                               ` <48B45387.8090205-sJ/iWh9BUns@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-08-26 19:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Alan D. Brunelle, Thomas Gleixner,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell

Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>>> checkstack.pl shows these things as the top problems:
>>>
>>> 	0xffffffff80266234 smp_call_function_mask [vmlinux]:    2736
>>> 	0xffffffff80234747 __build_sched_domains [vmlinux]:     2232
>>> 	0xffffffff8023523f __build_sched_domains [vmlinux]:     2232
>>>
>>> Anyway, the reason smp_call_function_mask and friends have such _huge_ 
>>> stack usages for you is that they contain a 'cpumask_t' on the stack.
>> In fact, they contain multiple CPU-masks, each 4k-bits - 512 bytes - in 
>> size. And they tend to call each other.
>>
>> Quite frankly, I don't think we were really ready for 4k CPU's. I'm 
>> going to commit this patch to make sure others don't do that many 
>> CPU's by mistake. It marks MAXCPU's as being 'broken' so you cannot 
>> select it, and also limits the number of CPU's that you _can_ select 
>> to "just" 512.
> 
> yeah, that's OK i guess - distros can still enable 4K support if they 
> wish to. Someone interested in improving the stack footprint situation 
> should dust off the max-stack-footprint tracer so that we can catch 
> these things in a more structured way.
> 
> And i guess the next generation of 4K CPUs support should just get away 
> from cpumask_t-on-kernel-stack model altogether, as the current model is 
> not maintainable. We tried the on-kernel-stack variant, and it really 
> does not work reliably. We can fix this in v2.6.28.
> 
> 	Ingo

I would be most interested in any tools to analyze call-trees and
accumulated stack usages.  My current method of using kdb is really
time consuming.

Thanks!
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26  7:46                                               ` David Miller
       [not found]                                                 ` <20080826.004607.253712060.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-08-26 19:06                                                 ` Mike Travis
       [not found]                                                   ` <48B4542A.1050004-sJ/iWh9BUns@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-08-26 19:06 UTC (permalink / raw)
  To: David Miller
  Cc: mingo, torvalds, Alan.Brunelle, tglx, rjw, linux-kernel,
	kernel-testers, akpm, arjan, rusty

David Miller wrote:
> From: Ingo Molnar <mingo@elte.hu>
> Date: Tue, 26 Aug 2008 09:22:20 +0200
> 
>> And i guess the next generation of 4K CPUs support should just get away 
>> from cpumask_t-on-kernel-stack model altogether, as the current model is 
>> not maintainable. We tried the on-kernel-stack variant, and it really 
>> does not work reliably. We can fix this in v2.6.28.
> 
> I recently did some work on sparc64 to use cpumask pointers
> as much as possible.
> 
> The only case that didn't work was due to a limitation in
> arch interfaces for the new generic smp_call_function() code.
> It passes a cpumask_t instead of a pointer to one via
> arch_send_call_function_ipi().
> 
> But other than that, the whole sparc64 SMP stuff uses cpumask_t
> pointers only.
> 
> What it comes down to is that you have to do the "self cpu"
> and other tests in the cross-call dispatch routines themselves,
> instead of at the top-level working on cpumask_t objects.
> 
> Otherwise you have to modify cpumask_t objects and thus pluck
> them onto the stack where they take up silly amounts of space.

Yes, I had proposed either modifying, or supplementing a new
smp_call function to pass the cpumask_t as a pointer (similar
to set_cpus_allowed_ptr.)  But an ABI change such as this was
not well received at the time.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                             ` <48B452F3.9040304-sJ/iWh9BUns@public.gmane.org>
@ 2008-08-26 19:09                                               ` Linus Torvalds
  2008-08-26 19:28                                                 ` Dave Jones
       [not found]                                                 ` <alpine.LFD.1.10.0808261205530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 2 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 19:09 UTC (permalink / raw)
  To: Mike Travis
  Cc: Alan D. Brunelle, Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Rusty Russell, Siddha, Suresh B, Luck, Tony,
	Jack Steiner, Andrew Morton, Christoph Lameter

On Tue, 26 Aug 2008, Mike Travis wrote:
> 
> The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is
> critical to our upcoming SGI systems using what we have been calling
> "UV".

That's fine. You can do it. The default kernel will not, because it's 
clearly not safe.

I really don't care what you do to _your_ images. But I will not 
distribute a known-broken kernel, and I will not debug random stack 
overflows that happen in it.

If you want the default kernel to support 4k cores, we'll need to fix the 
stack usage.  I don't think that is impossible, but IT IS NOT GOING TO 
HAPPEN for 2.6.27.

And quite frankly, if some vendor like RedHat enables NR_CPUS=4096 by 
default, they are totally and utterly crazy.

But some SGI-specific binary that is meant for SGI machines only, and has 
been extensively tested with the setup used on SGI machines is a different 
thing.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                     ` <20080826075355.GA7596-X9Un+BFzKDI@public.gmane.org>
  2008-08-26  8:36                                                       ` Yinghai Lu
@ 2008-08-26 19:11                                                       ` Mike Travis
  1 sibling, 0 replies; 318+ messages in thread
From: Mike Travis @ 2008-08-26 19:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Miller, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

Ingo Molnar wrote:
> * David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
> 
>> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
>> Date: Tue, 26 Aug 2008 09:22:20 +0200
>>
>>> And i guess the next generation of 4K CPUs support should just get away 
>>> from cpumask_t-on-kernel-stack model altogether, as the current model is 
>>> not maintainable. We tried the on-kernel-stack variant, and it really 
>>> does not work reliably. We can fix this in v2.6.28.
>> I recenetly did some work on sparc64 to use cpumask pointers as much 
>> as possible.
>>
>> The only case that didn't work was due to a limitation in arch 
>> interfaces for the new generic smp_call_function() code. It passes a 
>> cpumask_t instead of a pointer to one via 
>> arch_send_call_function_ipi().
>>
>> But other than that, the whole sparc64 SMP stuff uses cpumask_t 
>> pointers only.
> 
> nice!
> 
>> What it comes down to is that you have to do the "self cpu" and other 
>> tests in the cross-call dispatch routines themselves, instead of at 
>> the top-level working on cpumask_t objects.
>>
>> Otherwise you have to modify cpumask_t objects and thus pluck them 
>> onto the stack where they take up silly amounts of space.
> 
> What we did was this: we added MAXSMP which just revs up all the SMP 
> tunables to the maximum, so that we can see any problems early in 
> testing.
> 
> And we triggered problems, and we fixed a couple of regressions all 
> around stack footprint. But we didnt catch all of them - some were gcc 
> version dependent and configuration dependent. So i think it's safe to 
> say that the whole concept of allowing such a large cpumask_t to be on 
> the stack is fragile.

Iirc, it was the problem of basing percpu variables at zero that hit
problems with various gcc toolset versions.  I don't remember any
version problems with cpumask's on the stack, they all failed the
same way... :-)
> 
> Hence, i think the best way forward is to change the whole cpumask_t 
> concept and disallow explicit masks altogether. It's so easy to smack a 
> cpumask_t variable on the stack and nothing really warns about it, and 
> any function can become part of a nested call sequence.

This is a great idea!
> 
> So i think the dynamics of it has to be changed: we need a get/put API 
> and we need to make on-stack cpumask illegal on the build level (in 
> generic code at least). This has been Rusty's main argument early on i 
> think, and i now concur.
> 
> 	Ingo

Removing cpumask_t's from the stack is fairly straight forward.  The
problem of changing all functions to expect a cpumask pointer via a
global change is much more problematic.  And of course all those
functions that return a cpumask value would need to be addressed.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                 ` <20080826190213.GA30255-yetKDKU6eevNLxjTenLetw@public.gmane.org>
@ 2008-08-26 19:18                                                   ` Linus Torvalds
  0 siblings, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 19:18 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, 26 Aug 2008, Jamie Lokier wrote:
> 
> A function which is only called from one place should, if everything
> made sense, _never_ use more stack through being inlined.

But that's simply not true.

See the whole discussion.

The problem is that if you inline that function, the stack usage of the 
newly inlined function is now added to ALL THE OTHER paths too!

So the case we had in module loading was that yes, we had a function with 
a big stack footprint, but it was NOT in the deep path.

But by inlining it, it now moved the stack footprint "up" one level to 
another function, and now the big stack footprint really _was_ in the deep 
path, because the caller was involved in a much deeper chain. 

So inlining moves the code up the callchain, and that is a problem for the 
backtrace, but that's "just" a debugging issue. But it also moves the 
stack footprint up the callchain, and that can actually be a correctness 
issue.

Of course, a compiler doesn't _have_ to do that. A compiler _could_ have 
multiple different stack footprints for a single function, and do liveness 
analysis etc. But no sane compiler probably does that, because it's very 
painful indeed, and it's not even an issue if you aren't stack-limited 
(and being stack-limited is really just a kernel thing).

(Yeah, it can be an issue even if you have a big stack, in that you get 
worse cache behaviour, so a dense stack footprint _would_ help. But the 
complexity of stack liveness analysis is almost certainly not worth the 
relatively small gains it would get on some odd cases).

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 19:09                                               ` Linus Torvalds
@ 2008-08-26 19:28                                                 ` Dave Jones
       [not found]                                                   ` <20080826192848.GA20653-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
       [not found]                                                 ` <alpine.LFD.1.10.0808261205530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Dave Jones @ 2008-08-26 19:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Travis, Alan D. Brunelle, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell, Siddha, Suresh B,
	Luck, Tony, Jack Steiner, Christoph Lameter

On Tue, Aug 26, 2008 at 12:09:46PM -0700, Linus Torvalds wrote:

 > If you want the default kernel to support 4k cores, we'll need to fix the 
 > stack usage.  I don't think that is impossible, but IT IS NOT GOING TO 
 > HAPPEN for 2.6.27.
 > 
 > And quite frankly, if some vendor like RedHat enables NR_CPUS=4096 by 
 > default, they are totally and utterly crazy.

heh.  *picks through Fedora changelog*

* Thu Aug 14 2008 Dave Jones <davej@redhat.com>
- Bump max cpus supported on x86-64 to 4096. Just to see what happens.

I never did get to find out unfortunatly, because of the security fiasco
in Fedora infrastructure the last week or two.

 > But some SGI-specific binary that is meant for SGI machines only, and has 
 > been extensively tested with the setup used on SGI machines is a different 
 > thing.

Every extra kernel image a distro vendor ends up shipping has an associated cost.

* build time: It currently takes about 2 hours for a set of Fedora RPMs.
  For RHEL it'll be even worse due to the extra archs).
  Killing off -smp specific builds was a big win for us in this regard.
  Adding extra flavours is always painful.

* diskspace (distro kernels aren't small. With the associated debugging symbols,
   they take up a shitload of disk space really fast).

* Having everyone running the same kernel makes it much easier to test/debug.
  Our QA guys hate adding extra columns to their test matrix.

But yes, for this to be even remotely feasible, there has to be a negligable
performance cost associated with it, which right now, we clearly don't have.
Given that the number of people running 4096 CPU boxes even in a few years time
will still be tiny, punishing the common case is obviously absurd.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                 ` <alpine.LFD.1.10.0808261205530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26 19:35                                                   ` Mike Travis
  0 siblings, 0 replies; 318+ messages in thread
From: Mike Travis @ 2008-08-26 19:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan D. Brunelle, Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Rusty Russell, Siddha, Suresh B, Luck, Tony,
	Jack Steiner, Christoph Lameter

Linus Torvalds wrote:
> 
> On Tue, 26 Aug 2008, Mike Travis wrote:
>> The need to allow distros to set NR_CPUS=4096 (and NODES_SHIFT=9) is
>> critical to our upcoming SGI systems using what we have been calling
>> "UV".
> 
> That's fine. You can do it. The default kernel will not, because it's 
> clearly not safe.
> 
> I really don't care what you do to _your_ images. But I will not 
> distribute a known-broken kernel, and I will not debug random stack 
> overflows that happen in it.
> 
> If you want the default kernel to support 4k cores, we'll need to fix the 
> stack usage.  I don't think that is impossible, but IT IS NOT GOING TO 
> HAPPEN for 2.6.27.
> 
> And quite frankly, if some vendor like RedHat enables NR_CPUS=4096 by 
> default, they are totally and utterly crazy.
> 
> But some SGI-specific binary that is meant for SGI machines only, and has 
> been extensively tested with the setup used on SGI machines is a different 
> thing.
> 
> 			Linus

Ok, thanks for the reply, and looking into this issue.  We will "strongly
encourage" our distros to base the relevant releases on 2.6.28. :-)  

[Supplying an SGI-specific kernel would not be acceptable to many of our
customers because of the certification issues I mentioned.] 

Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                               ` <48B45387.8090205-sJ/iWh9BUns@public.gmane.org>
@ 2008-08-26 19:40                                                 ` Linus Torvalds
  0 siblings, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 19:40 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Alan D. Brunelle, Thomas Gleixner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Rusty Russell

On Tue, 26 Aug 2008, Mike Travis wrote:
> 
> I would be most interested in any tools to analyze call-trees and
> accumulated stack usages.  My current method of using kdb is really
> time consuming.

Well, even just scripts/checkstack.pl is quite relevant.

The fact is, anything with a stack footprint of more than a hundred bytes 
is suspect. We _do_ have a lot of cases of several hundred bytes, and some 
of them are even very intentional.

For an example of _intentional_ and valid large stacks, look at 
do_sys_poll and do_select. They both have a big stack footprint in a 
normal kernel, and that's on purpose - it's not pretty, but they are very 
common and performance-sensitive functions, and using a big stack allows 
some basic allocations to be much cheaper by default.

Same goes for early_printk(), although I don't think the reasons are 
really very strong in that case.

Sadly, while those functions are _fairly_ high up, they aren't at the top, 
and we do have a lot of other functions that have huge stack footprints 
for totally bogus reasons. But the intentional ones are at least in the 
top ten.

But the kernel that Alan had problems with was different. The 
_intentional_ ones were way down in the noise.  do_sys_poll wasn't in the 
top ten, it was barely even in the top 50! (It was in fact #49, to be 
exact).

So look at the top ten in my kernel:

     1  ide_generic_init [vmlinux]:             1384
     2  idefloppy_ioctl [vmlinux]:              1208
     3  e1000_check_options [vmlinux]:  	1152
     4  do_sys_poll [vmlinux]:          	904
     5  ide_floppy_get_capacity [vmlinux]:      872
     6  do_select [vmlinux]:                    744
     7  early_printk [vmlinux]:         	720
     8  do_task_stat [vmlinux]:         	680
     9  mmc_ioctl [vmlinux]:                    648
    10  elf_kcore_store_hdr [vmlinux]:  	576

.. and in Alan's kernel:

     1  smp_call_function_mask [vmlinux]:       2736
     2  __build_sched_domains [vmlinux]:        2232
     3  setup_IO_APIC_irq [vmlinux]:            1616
     4  arch_setup_ht_irq [vmlinux]:            1600
     5  arch_setup_msi_irq [vmlinux]:   	1600
     6  __assign_irq_vector [vmlinux]:  	1592
     7  move_task_off_dead_cpu [vmlinux]:       1592
     8  tick_handle_oneshot_broadcast [vmlinux]:1544
     9  store_scaling_governor [vmlinux]:       1376
    10  cpuset_write_resmask [vmlinux]:		1360

That's a big difference. The top #1 in my kernel would just _barely_ be in 
the top 10 in Alan's kernel (he doesn't have it at all, because he didn't 
compile the drives I did into the kernel).

And the top three in my kernel are just because of crap code. That 
"e1000_check_options" thing is there just because it creates multiple 
"struct e1000_option" structures. I wrote an ugly but totally trivial 
patch to get it down to ~600 bytes, and it would be less if I had bothered 
to waste any more time on it.

The others are similar issues of "people just didn't think".

But look at the top ones in Alan's kernel. Not only are they _much_ bigger 
than the top ones in a sane kernel, they are _all_ due to cpumask_t, I 
think.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                 ` <20080826075937.GB7596-X9Un+BFzKDI@public.gmane.org>
@ 2008-08-26 19:48                                                   ` Mike Travis
  0 siblings, 0 replies; 318+ messages in thread
From: Mike Travis @ 2008-08-26 19:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Alan D. Brunelle, Linus Torvalds,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Rusty Russell, Jack Steiner

Ingo Molnar wrote:
> * Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> 
>> Alan D. Brunelle wrote:
>>
>>> I think you're right: the kernel as a whole may not be ready for 4,096
>>> CPUs apparently...
>> Mike has been working diligently on getting all these cpumasks off the 
>> stack for the last months and has created an infrastructure to do 
>> this. So I think we are close. It might just be a matter of merging 
>> some more patches that are still left in Ingo's tree.
> 
> hm, there are no such patches left that i know of - the only bits in 
> -tip are the zero-based percpu, which was found to be a bit fragile in 
> testing:

Yes, it's just a case of new changes abusing the stack.
> 
>  earth4:~/tip> git-log-line --author=Travis linus..
>  d379497: Zero based percpu: infrastructure to rebase the per cpu area to zero
>  b3a0cb4: x86: extend percpu ops to 64 bit
> 
> [and it has no relevance to stack footprint.]
> 
> So i dont think the current cpumask_t approach will work. We simply 
> should not get into an endless fight against the windmills that 
> introduce on-stack cpumask_t again and again. We should just take the 
> plunge once and do a clean alloc/free cpumask model. Most of the hotpath 
> cpumasks are constant or pre-constructed, so they are not a real issue.

It would have been nice to know this 9 months ago... ;-)
> 
> Plus, on the general question of stack footprint problems and the 
> difficulty of debugging them, the worst-case stack footprint tracer i 
> wrote for -rt some time ago should be dusted off as well and put into 
> ftrace. David has something quite close to that for Sparc64 already.
> 
> 	Ingo

I'll start experimenting with globally changing cpumask_t to be a pointer,
and see what falls out.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                         ` <alpine.LFD.1.10.0808261019450.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26 19:55                                           ` Jeff Garzik
       [not found]                                             ` <48B45FA2.8040603-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Jeff Garzik @ 2008-08-26 19:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar

Linus Torvalds wrote:
> The downsides of inlining are big enough from both a debugging and a real 
> code generation angle (eg stack usage like this), that the upsides 
> (_somesimes_ smaller kernel, possibly slightly faster code) simply aren't 
> relevant.
> 
> So the "noinline" was random, yes, but this is a real issue. Looking at 
> checkstack output for a saner config (NR_CPUS=16), the top entries for me 
> are things like
> 
> 	ide_generic_init [vmlinux]:             1384
> 	idefloppy_ioctl [vmlinux]:              1208
> 	e1000_check_options [vmlinux]:  	1152
> 	...
> 
> which are "leaf" functions. They are broken as hell (the e1000 is 
> apparently because it builds structs on the stack that should all be 
> "static const", for example), but they are different from something like 
> the module init sequence in that they are not going to affect anything 
> else.


e1000_check_options builds a struct (singular) on the stack, really... 
struct e1000_option is reasonably small.

The problem, which has also shown itself in large ioctl-style switch{} 
statements, is that gcc will generate code such that the stack usage 
from independent code branches

	if {cond1} {
		char buster1[1000];
		foo(buster1);
	} else if (cond2) {
		char buster2[1000];
		foo(buster2);
	}

are added together, not noticed as mutually exclusive.

Of course, adding 'static const' as you noted is a reasonable 
workaround, but gcc is really annoying WRT stack allocation in this manner.

I had problems in the past, before struct ethtool_ops, with like ethtool 
ioctl switch statements using gobs of stack.  In fact, that was a big 
motivation for struct ethtool_ops.

	Jeff


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                   ` <20080826192848.GA20653-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2008-08-26 20:01                                                     ` Mike Travis
  2008-08-27  6:54                                                       ` Nick Piggin
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-08-26 20:01 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Mike Travis, Alan D. Brunelle,
	Ingo Molnar

Dave Jones wrote:
...
> 
> But yes, for this to be even remotely feasible, there has to be a negligable
> performance cost associated with it, which right now, we clearly don't have.
> Given that the number of people running 4096 CPU boxes even in a few years time
> will still be tiny, punishing the common case is obviously absurd.
> 
> 	Dave
> 

I did do some fairly extensive benchmarking between configs of NR_CPUS = 128 and
4096 and most performance hits were in the neighborhood of < 5% on systems with
8 cpus and 4GB of memory (our most common test system).  [But changing cpumask_t's
to be pointers instead of values will likely increase this.]  I've tried to be
very sensitive to this issue with all my previous changes, so convincing the distros
to set NR_CPUS=4096 would be as painless for them as possible. ;-)

Btw, huge count cpu systems I don't think are that far away.  I believe the nextgen
Larabbee chips will be geared towards HPC applications [instead of just GFX apps],
and putting 4 of these chips on a motherboard would add up to 512 cpu threads (1024
if they support hyperthreading.)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)
       [not found]                                             ` <48B45FA2.8040603-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>
@ 2008-08-26 20:06                                               ` Linus Torvalds
       [not found]                                                 ` <alpine.LFD.1.10.0808261257210.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 20:06 UTC (permalink / raw)
  To: Jeff Garzik, Auke Kok, Jeff Kirsher
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar



On Tue, 26 Aug 2008, Jeff Garzik wrote:
> 
> e1000_check_options builds a struct (singular) on the stack, really... struct
> e1000_option is reasonably small.

No it doesn't.

Look a bit more closely.

It builds a struct (singular) MANY MANY times. It also then builds up a 
huge e1000_opt_list[] array, even though it is const and should be static 
(and const).

I know. I wrote a patch to FIX it.

Here's the patch. It shrinks the stack from 1152 bytes to 192 bytes (the 
first version, that only did the e1000_option part, got it down to 600 
bytes). About half comes from not using multiple "e1000_option" 
structures, the other half comes from turning the "e1000_opt_list[]" 
arrays into "static const" instead, so that gcc doesn't copy them onto the 
stack.

Most of the patch is actually doing things like turning

	struct struct e1000_option opt = {

(which declares a _new_ e1000_option variable each time) into

	opt = (struct e1000_option) {

which just re-uses the single variable.

It becomes slightly larger than that, because some places the "opt = .." 
had to be moved around, since it's no longer a variable declaration, but a 
regular assignment.

The rest is just adding "const" to the right places, and turning

	struct e1000_opt_list speed_list[] = ..

into

	static const struct e1000_opt_list speed_list[] = ..

instead, and fixing the indentation to be more straightforward.

I have not tested the dang thing, but I think it's correct. And it turns 
stack usage from "totally horrible and broken" into "pretty reasonable".

		Linus

---
 drivers/net/e1000/e1000_param.c |   81 +++++++++++++++++++++-----------------
 1 files changed, 45 insertions(+), 36 deletions(-)

diff --git a/drivers/net/e1000/e1000_param.c b/drivers/net/e1000/e1000_param.c
index b9f90a5..213437d 100644
--- a/drivers/net/e1000/e1000_param.c
+++ b/drivers/net/e1000/e1000_param.c
@@ -208,7 +208,7 @@ struct e1000_option {
 		} r;
 		struct { /* list_option info */
 			int nr;
-			struct e1000_opt_list { int i; char *str; } *p;
+			const struct e1000_opt_list { int i; char *str; } *p;
 		} l;
 	} arg;
 };
@@ -242,7 +242,7 @@ static int __devinit e1000_validate_option(unsigned int *value,
 		break;
 	case list_option: {
 		int i;
-		struct e1000_opt_list *ent;
+		const struct e1000_opt_list *ent;
 
 		for (i = 0; i < opt->arg.l.nr; i++) {
 			ent = &opt->arg.l.p[i];
@@ -279,7 +279,9 @@ static void e1000_check_copper_options(struct e1000_adapter *adapter);
 
 void __devinit e1000_check_options(struct e1000_adapter *adapter)
 {
+	struct e1000_option opt;
 	int bd = adapter->bd_number;
+
 	if (bd >= E1000_MAX_NIC) {
 		DPRINTK(PROBE, NOTICE,
 		       "Warning: no configuration for board #%i\n", bd);
@@ -287,19 +289,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 	}
 
 	{ /* Transmit Descriptor Count */
-		struct e1000_option opt = {
+		struct e1000_tx_ring *tx_ring = adapter->tx_ring;
+		int i;
+		e1000_mac_type mac_type = adapter->hw.mac_type;
+
+		opt = (struct e1000_option) {
 			.type = range_option,
 			.name = "Transmit Descriptors",
 			.err  = "using default of "
 				__MODULE_STRING(E1000_DEFAULT_TXD),
 			.def  = E1000_DEFAULT_TXD,
-			.arg  = { .r = { .min = E1000_MIN_TXD }}
+			.arg  = { .r = {
+				.min = E1000_MIN_TXD,
+				.max = mac_type < e1000_82544 ? E1000_MAX_TXD : E1000_MAX_82544_TXD
+				}}
 		};
-		struct e1000_tx_ring *tx_ring = adapter->tx_ring;
-		int i;
-		e1000_mac_type mac_type = adapter->hw.mac_type;
-		opt.arg.r.max = mac_type < e1000_82544 ?
-			E1000_MAX_TXD : E1000_MAX_82544_TXD;
 
 		if (num_TxDescriptors > bd) {
 			tx_ring->count = TxDescriptors[bd];
@@ -313,19 +317,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 			tx_ring[i].count = tx_ring->count;
 	}
 	{ /* Receive Descriptor Count */
-		struct e1000_option opt = {
+		struct e1000_rx_ring *rx_ring = adapter->rx_ring;
+		int i;
+		e1000_mac_type mac_type = adapter->hw.mac_type;
+
+		opt = (struct e1000_option) {
 			.type = range_option,
 			.name = "Receive Descriptors",
 			.err  = "using default of "
 				__MODULE_STRING(E1000_DEFAULT_RXD),
 			.def  = E1000_DEFAULT_RXD,
-			.arg  = { .r = { .min = E1000_MIN_RXD }}
+			.arg  = { .r = {
+				.min = E1000_MIN_RXD,
+				.max = mac_type < e1000_82544 ? E1000_MAX_RXD : E1000_MAX_82544_RXD
+			}}
 		};
-		struct e1000_rx_ring *rx_ring = adapter->rx_ring;
-		int i;
-		e1000_mac_type mac_type = adapter->hw.mac_type;
-		opt.arg.r.max = mac_type < e1000_82544 ? E1000_MAX_RXD :
-			E1000_MAX_82544_RXD;
 
 		if (num_RxDescriptors > bd) {
 			rx_ring->count = RxDescriptors[bd];
@@ -339,7 +345,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 			rx_ring[i].count = rx_ring->count;
 	}
 	{ /* Checksum Offload Enable/Disable */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = enable_option,
 			.name = "Checksum Offload",
 			.err  = "defaulting to Enabled",
@@ -363,7 +369,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 			 { E1000_FC_FULL,    "Flow Control Enabled" },
 			 { E1000_FC_DEFAULT, "Flow Control Hardware Default" }};
 
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = list_option,
 			.name = "Flow Control",
 			.err  = "reading default settings from EEPROM",
@@ -381,7 +387,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Transmit Interrupt Delay */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = range_option,
 			.name = "Transmit Interrupt Delay",
 			.err  = "using default of " __MODULE_STRING(DEFAULT_TIDV),
@@ -399,7 +405,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Transmit Absolute Interrupt Delay */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = range_option,
 			.name = "Transmit Absolute Interrupt Delay",
 			.err  = "using default of " __MODULE_STRING(DEFAULT_TADV),
@@ -417,7 +423,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Receive Interrupt Delay */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = range_option,
 			.name = "Receive Interrupt Delay",
 			.err  = "using default of " __MODULE_STRING(DEFAULT_RDTR),
@@ -435,7 +441,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Receive Absolute Interrupt Delay */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = range_option,
 			.name = "Receive Absolute Interrupt Delay",
 			.err  = "using default of " __MODULE_STRING(DEFAULT_RADV),
@@ -453,7 +459,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Interrupt Throttling Rate */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = range_option,
 			.name = "Interrupt Throttling Rate (ints/sec)",
 			.err  = "using default of " __MODULE_STRING(DEFAULT_ITR),
@@ -497,7 +503,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Smart Power Down */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = enable_option,
 			.name = "PHY Smart Power Down",
 			.err  = "defaulting to Disabled",
@@ -513,7 +519,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Kumeran Lock Loss Workaround */
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = enable_option,
 			.name = "Kumeran Lock Loss Workaround",
 			.err  = "defaulting to Enabled",
@@ -578,16 +584,18 @@ static void __devinit e1000_check_fiber_options(struct e1000_adapter *adapter)
 
 static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
 {
+	struct e1000_option opt;
 	unsigned int speed, dplx, an;
 	int bd = adapter->bd_number;
 
 	{ /* Speed */
-		struct e1000_opt_list speed_list[] = {{          0, "" },
-						      {   SPEED_10, "" },
-						      {  SPEED_100, "" },
-						      { SPEED_1000, "" }};
+		static const struct e1000_opt_list speed_list[] = {
+			{          0, "" },
+			{   SPEED_10, "" },
+			{  SPEED_100, "" },
+			{ SPEED_1000, "" }};
 
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = list_option,
 			.name = "Speed",
 			.err  = "parameter ignored",
@@ -604,11 +612,12 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
 		}
 	}
 	{ /* Duplex */
-		struct e1000_opt_list dplx_list[] = {{           0, "" },
-						     { HALF_DUPLEX, "" },
-						     { FULL_DUPLEX, "" }};
+		static const struct e1000_opt_list dplx_list[] = {
+			{           0, "" },
+			{ HALF_DUPLEX, "" },
+			{ FULL_DUPLEX, "" }};
 
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = list_option,
 			.name = "Duplex",
 			.err  = "parameter ignored",
@@ -637,7 +646,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
 		       "parameter ignored\n");
 		adapter->hw.autoneg_advertised = AUTONEG_ADV_DEFAULT;
 	} else { /* Autoneg */
-		struct e1000_opt_list an_list[] =
+		static const struct e1000_opt_list an_list[] =
 			#define AA "AutoNeg advertising "
 			{{ 0x01, AA "10/HD" },
 			 { 0x02, AA "10/FD" },
@@ -671,7 +680,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
 			 { 0x2e, AA "1000/FD, 100/FD, 100/HD, 10/FD" },
 			 { 0x2f, AA "1000/FD, 100/FD, 100/HD, 10/FD, 10/HD" }};
 
-		struct e1000_option opt = {
+		opt = (struct e1000_option) {
 			.type = list_option,
 			.name = "AutoNeg",
 			.err  = "parameter ignored",

^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)
       [not found]                                                 ` <alpine.LFD.1.10.0808261257210.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26 20:14                                                   ` Kok, Auke
       [not found]                                                     ` <48B4641A.1020806-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Kok, Auke @ 2008-08-26 20:14 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Kirsher
  Cc: Jeff Garzik, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar

Linus Torvalds wrote:
> 
> On Tue, 26 Aug 2008, Jeff Garzik wrote:
>> e1000_check_options builds a struct (singular) on the stack, really... struct
>> e1000_option is reasonably small.
> 
> No it doesn't.
> 
> Look a bit more closely.
> 
> It builds a struct (singular) MANY MANY times. It also then builds up a 
> huge e1000_opt_list[] array, even though it is const and should be static 
> (and const).
> 
> I know. I wrote a patch to FIX it.

totally cool patch afaics - if I still maintained this driver I'd have this tested
and merged right away :)

I suppose Jeff Kirsher is already doing so right now.

I suppose that he'll have to look at the other Intel ethernet drivers as well :)

Jeff, please add my:

Reveiewed-by: Auke Kok <auke-jan.h.kok-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Cheers,

Auke

> 
> Here's the patch. It shrinks the stack from 1152 bytes to 192 bytes (the 
> first version, that only did the e1000_option part, got it down to 600 
> bytes). About half comes from not using multiple "e1000_option" 
> structures, the other half comes from turning the "e1000_opt_list[]" 
> arrays into "static const" instead, so that gcc doesn't copy them onto the 
> stack.
> 
> Most of the patch is actually doing things like turning
> 
> 	struct struct e1000_option opt = {
> 
> (which declares a _new_ e1000_option variable each time) into
> 
> 	opt = (struct e1000_option) {
> 
> which just re-uses the single variable.
> 
> It becomes slightly larger than that, because some places the "opt = .." 
> had to be moved around, since it's no longer a variable declaration, but a 
> regular assignment.
> 
> The rest is just adding "const" to the right places, and turning
> 
> 	struct e1000_opt_list speed_list[] = ..
> 
> into
> 
> 	static const struct e1000_opt_list speed_list[] = ..
> 
> instead, and fixing the indentation to be more straightforward.
> 
> I have not tested the dang thing, but I think it's correct. And it turns 
> stack usage from "totally horrible and broken" into "pretty reasonable".
> 
> 		Linus
> 
> ---
>  drivers/net/e1000/e1000_param.c |   81 +++++++++++++++++++++-----------------
>  1 files changed, 45 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/net/e1000/e1000_param.c b/drivers/net/e1000/e1000_param.c
> index b9f90a5..213437d 100644
> --- a/drivers/net/e1000/e1000_param.c
> +++ b/drivers/net/e1000/e1000_param.c
> @@ -208,7 +208,7 @@ struct e1000_option {
>  		} r;
>  		struct { /* list_option info */
>  			int nr;
> -			struct e1000_opt_list { int i; char *str; } *p;
> +			const struct e1000_opt_list { int i; char *str; } *p;
>  		} l;
>  	} arg;
>  };
> @@ -242,7 +242,7 @@ static int __devinit e1000_validate_option(unsigned int *value,
>  		break;
>  	case list_option: {
>  		int i;
> -		struct e1000_opt_list *ent;
> +		const struct e1000_opt_list *ent;
>  
>  		for (i = 0; i < opt->arg.l.nr; i++) {
>  			ent = &opt->arg.l.p[i];
> @@ -279,7 +279,9 @@ static void e1000_check_copper_options(struct e1000_adapter *adapter);
>  
>  void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  {
> +	struct e1000_option opt;
>  	int bd = adapter->bd_number;
> +
>  	if (bd >= E1000_MAX_NIC) {
>  		DPRINTK(PROBE, NOTICE,
>  		       "Warning: no configuration for board #%i\n", bd);
> @@ -287,19 +289,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  	}
>  
>  	{ /* Transmit Descriptor Count */
> -		struct e1000_option opt = {
> +		struct e1000_tx_ring *tx_ring = adapter->tx_ring;
> +		int i;
> +		e1000_mac_type mac_type = adapter->hw.mac_type;
> +
> +		opt = (struct e1000_option) {
>  			.type = range_option,
>  			.name = "Transmit Descriptors",
>  			.err  = "using default of "
>  				__MODULE_STRING(E1000_DEFAULT_TXD),
>  			.def  = E1000_DEFAULT_TXD,
> -			.arg  = { .r = { .min = E1000_MIN_TXD }}
> +			.arg  = { .r = {
> +				.min = E1000_MIN_TXD,
> +				.max = mac_type < e1000_82544 ? E1000_MAX_TXD : E1000_MAX_82544_TXD
> +				}}
>  		};
> -		struct e1000_tx_ring *tx_ring = adapter->tx_ring;
> -		int i;
> -		e1000_mac_type mac_type = adapter->hw.mac_type;
> -		opt.arg.r.max = mac_type < e1000_82544 ?
> -			E1000_MAX_TXD : E1000_MAX_82544_TXD;
>  
>  		if (num_TxDescriptors > bd) {
>  			tx_ring->count = TxDescriptors[bd];
> @@ -313,19 +317,21 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  			tx_ring[i].count = tx_ring->count;
>  	}
>  	{ /* Receive Descriptor Count */
> -		struct e1000_option opt = {
> +		struct e1000_rx_ring *rx_ring = adapter->rx_ring;
> +		int i;
> +		e1000_mac_type mac_type = adapter->hw.mac_type;
> +
> +		opt = (struct e1000_option) {
>  			.type = range_option,
>  			.name = "Receive Descriptors",
>  			.err  = "using default of "
>  				__MODULE_STRING(E1000_DEFAULT_RXD),
>  			.def  = E1000_DEFAULT_RXD,
> -			.arg  = { .r = { .min = E1000_MIN_RXD }}
> +			.arg  = { .r = {
> +				.min = E1000_MIN_RXD,
> +				.max = mac_type < e1000_82544 ? E1000_MAX_RXD : E1000_MAX_82544_RXD
> +			}}
>  		};
> -		struct e1000_rx_ring *rx_ring = adapter->rx_ring;
> -		int i;
> -		e1000_mac_type mac_type = adapter->hw.mac_type;
> -		opt.arg.r.max = mac_type < e1000_82544 ? E1000_MAX_RXD :
> -			E1000_MAX_82544_RXD;
>  
>  		if (num_RxDescriptors > bd) {
>  			rx_ring->count = RxDescriptors[bd];
> @@ -339,7 +345,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  			rx_ring[i].count = rx_ring->count;
>  	}
>  	{ /* Checksum Offload Enable/Disable */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = enable_option,
>  			.name = "Checksum Offload",
>  			.err  = "defaulting to Enabled",
> @@ -363,7 +369,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  			 { E1000_FC_FULL,    "Flow Control Enabled" },
>  			 { E1000_FC_DEFAULT, "Flow Control Hardware Default" }};
>  
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = list_option,
>  			.name = "Flow Control",
>  			.err  = "reading default settings from EEPROM",
> @@ -381,7 +387,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Transmit Interrupt Delay */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = range_option,
>  			.name = "Transmit Interrupt Delay",
>  			.err  = "using default of " __MODULE_STRING(DEFAULT_TIDV),
> @@ -399,7 +405,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Transmit Absolute Interrupt Delay */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = range_option,
>  			.name = "Transmit Absolute Interrupt Delay",
>  			.err  = "using default of " __MODULE_STRING(DEFAULT_TADV),
> @@ -417,7 +423,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Receive Interrupt Delay */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = range_option,
>  			.name = "Receive Interrupt Delay",
>  			.err  = "using default of " __MODULE_STRING(DEFAULT_RDTR),
> @@ -435,7 +441,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Receive Absolute Interrupt Delay */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = range_option,
>  			.name = "Receive Absolute Interrupt Delay",
>  			.err  = "using default of " __MODULE_STRING(DEFAULT_RADV),
> @@ -453,7 +459,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Interrupt Throttling Rate */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = range_option,
>  			.name = "Interrupt Throttling Rate (ints/sec)",
>  			.err  = "using default of " __MODULE_STRING(DEFAULT_ITR),
> @@ -497,7 +503,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Smart Power Down */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = enable_option,
>  			.name = "PHY Smart Power Down",
>  			.err  = "defaulting to Disabled",
> @@ -513,7 +519,7 @@ void __devinit e1000_check_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Kumeran Lock Loss Workaround */
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = enable_option,
>  			.name = "Kumeran Lock Loss Workaround",
>  			.err  = "defaulting to Enabled",
> @@ -578,16 +584,18 @@ static void __devinit e1000_check_fiber_options(struct e1000_adapter *adapter)
>  
>  static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
>  {
> +	struct e1000_option opt;
>  	unsigned int speed, dplx, an;
>  	int bd = adapter->bd_number;
>  
>  	{ /* Speed */
> -		struct e1000_opt_list speed_list[] = {{          0, "" },
> -						      {   SPEED_10, "" },
> -						      {  SPEED_100, "" },
> -						      { SPEED_1000, "" }};
> +		static const struct e1000_opt_list speed_list[] = {
> +			{          0, "" },
> +			{   SPEED_10, "" },
> +			{  SPEED_100, "" },
> +			{ SPEED_1000, "" }};
>  
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = list_option,
>  			.name = "Speed",
>  			.err  = "parameter ignored",
> @@ -604,11 +612,12 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
>  		}
>  	}
>  	{ /* Duplex */
> -		struct e1000_opt_list dplx_list[] = {{           0, "" },
> -						     { HALF_DUPLEX, "" },
> -						     { FULL_DUPLEX, "" }};
> +		static const struct e1000_opt_list dplx_list[] = {
> +			{           0, "" },
> +			{ HALF_DUPLEX, "" },
> +			{ FULL_DUPLEX, "" }};
>  
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = list_option,
>  			.name = "Duplex",
>  			.err  = "parameter ignored",
> @@ -637,7 +646,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
>  		       "parameter ignored\n");
>  		adapter->hw.autoneg_advertised = AUTONEG_ADV_DEFAULT;
>  	} else { /* Autoneg */
> -		struct e1000_opt_list an_list[] =
> +		static const struct e1000_opt_list an_list[] =
>  			#define AA "AutoNeg advertising "
>  			{{ 0x01, AA "10/HD" },
>  			 { 0x02, AA "10/FD" },
> @@ -671,7 +680,7 @@ static void __devinit e1000_check_copper_options(struct e1000_adapter *adapter)
>  			 { 0x2e, AA "1000/FD, 100/FD, 100/HD, 10/FD" },
>  			 { 0x2f, AA "1000/FD, 100/FD, 100/HD, 10/FD, 10/HD" }};
>  
> -		struct e1000_option opt = {
> +		opt = (struct e1000_option) {
>  			.type = list_option,
>  			.name = "AutoNeg",
>  			.err  = "parameter ignored",

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                               ` <alpine.LFD.1.10.0808261134530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26 20:21                                                 ` Adrian Bunk
  2008-08-26 20:41                                                   ` Linus Torvalds
  0 siblings, 1 reply; 318+ messages in thread
From: Adrian Bunk @ 2008-08-26 20:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 11:40:10AM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 26 Aug 2008, Adrian Bunk wrote:
> > 
> > A debugging option (for better traces) to disallow gcc some inlining 
> > might make sense (and might even make sense for distributions to 
> > enable in their kernels), but when you go to use cases that require
> > really small kernels the cost is too high.
> 
> You ignore the fact that it's really not just about debugging.

I had in mind that we anyway have to support it for tiny kernels.

I simply don't see that we add kconfig options for 5kB of code for
tiny kernels but remove something like this that can cause size
increases > 1%.

> Inlining really isn't the great tool some people think it is. Especially 
> not since gcc stack allocation is so horrid that it won't re-use stack 
> slots etc (which I don't disagree with per se - it's _hard_ to re-use 
> stack slots while still allowing code scheduling).

gcc's stack allocation has become better
(that's why we disable unit-at-a-time only for gcc 3.4 on i386).

> NOTE! I also would never claim that _our_ choices of "inline" are all that 
> great, and we've often inlined too much or not inlined things that really 
> could be inlined. But at least when a developer says "inline" (or forgets 
> to say it), we have somebody to blame. When the compiler does insane 
> things that doesn't suit us, we're just screwed.

Most LOCs of the kernel are not written by people like you or Al Viro or 
David Miller, and the average kernel developer is unlikely to do it as 
good as gcc.

For the average driver the choice is realistically between
"inline's randomly sprinkled across the driver" and
"no inline's, leave it to gcc".

And code evolves during the years from tiny with 1 caller to huge with 
many callers.

BTW:
I just ran checkstack on a (roughly) allyesconfig kernel, and we have a 
new driver that allocates "unsigned char recvbuf[1500];" on the stack...

> > But if you don't trust gcc's inlining you should revert
> > commit 3f9b5cc018566ad9562df0648395649aebdbc5e0 that increases gcc's 
> > freedom regarding what to inline in 2.6.27
> 
> Actually, that just allows gcc to _not_ inline. Which is probably ok.
> 
> (Well, it would be ok if gcc did it well enough, it obviously has some 
> problems at times).

With the "gcc inline's static functions" you complain about we have
4-5 years of experience.

Suddenly allowing 4 release series of gcc to ignore any inline's is a 
completely new area for us. I'd generally agree with giving gcc more 
freedom here, but I'd rather do it right by removing tons of wrong 
inline's than doing one global change hoping that it will make things 
better.

And whether the "optimized inlining" actually makes the kernel bigger or 
smaller depends in my experience on the .config and the gcc version.

> 			Linus

cu
Adrian

[1] there are some rare exceptions

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 20:21                                                 ` Adrian Bunk
@ 2008-08-26 20:41                                                   ` Linus Torvalds
  2008-08-27 16:21                                                     ` Jamie Lokier
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 20:41 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded

On Tue, 26 Aug 2008, Adrian Bunk wrote:
> 
> I had in mind that we anyway have to support it for tiny kernels.

I actually don't think that is true.

If we really were to decide to be stricter about it, and it makes a big 
size difference, we can probably also add a tool to warn about functions 
that really should be inline.

> > Inlining really isn't the great tool some people think it is. Especially 
> > not since gcc stack allocation is so horrid that it won't re-use stack 
> > slots etc (which I don't disagree with per se - it's _hard_ to re-use 
> > stack slots while still allowing code scheduling).
> 
> gcc's stack allocation has become better
> (that's why we disable unit-at-a-time only for gcc 3.4 on i386).

I agree that it has become better. But it still absolutely *sucks*.

For example, see the patch I just posted about e1000 stack usage. Even 
though the variables were all in completely separate scopes, they all got 
individual space on the stack over the whole lifetime of the function, 
causing an explosion of stack-space. As such, gcc used 500 bytes too much 
of stack, just because it didn't re-use the stackspace.

That was with gcc-4.3.0, and no, there were hardly any inlining issues 
involevd, although it is true that inlining actually did make it slightly 
worse in that case too (but since it was essentially a leaf function, that 
had little real life impact, since there were no deep callchains below it 
to care).

So the fact is, "better" simply is not "good enough". We still need to do 
a lot of optimizations _manually_, because gcc cannot see that it can 
re-use the stack-slots.

And sometimes those "optimizations" are actually performance 
pessimizations, because in order to make gcc not use all the stack at the 
same time, you simply have to break things out and force-disable inlining.

> Most LOCs of the kernel are not written by people like you or Al Viro or 
> David Miller, and the average kernel developer is unlikely to do it as 
> good as gcc.

Sure. But we do have tools. We do have checkstack.pl, it's just that it 
hasn't been an issue in a long time, so I suspect many people didn't even 
_realize_ we have it, and I certainly can attest to the fact that even 
people who remember it - like me - don't actually tend to run it all that 
often.

> For the average driver the choice is realistically between
> "inline's randomly sprinkled across the driver" and
> "no inline's, leave it to gcc".

And neither is likely to be a big problem.

> BTW:
> I just ran checkstack on a (roughly) allyesconfig kernel, and we have a 
> new driver that allocates "unsigned char recvbuf[1500];" on the stack...

Yeah, it's _way_ too easy to do bad things.

> With the "gcc inline's static functions" you complain about we have
> 4-5 years of experience.

Sure. And most of it isn't all that great.

But I do agree that lettign gcc make more decisions is _dangerous_. 
However, in this case, at least, the decisions it makes would at least 
make for less inlining, and thus less stack space explosion.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                   ` <48B4542A.1050004-sJ/iWh9BUns@public.gmane.org>
@ 2008-08-26 20:45                                                     ` David Miller
       [not found]                                                       ` <20080826.134535.193703558.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Miller @ 2008-08-26 20:45 UTC (permalink / raw)
  To: travis-sJ/iWh9BUns
  Cc: mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

From: Mike Travis <travis-sJ/iWh9BUns@public.gmane.org>
Date: Tue, 26 Aug 2008 12:06:18 -0700

> David Miller wrote:
> > The only case that didn't work was due to a limitation in
> > arch interfaces for the new generic smp_call_function() code.
> > It passes a cpumask_t instead of a pointer to one via
> > arch_send_call_function_ipi().
> > 
> > But other than that, the whole sparc64 SMP stuff uses cpumask_t
> > pointers only.
> > 
> > What it comes down to is that you have to do the "self cpu"
> > and other tests in the cross-call dispatch routines themselves,
> > instead of at the top-level working on cpumask_t objects.
> > 
> > Otherwise you have to modify cpumask_t objects and thus pluck
> > them onto the stack where they take up silly amounts of space.
> 
> Yes, I had proposed either modifying, or supplementing a new
> smp_call function to pass the cpumask_t as a pointer (similar
> to set_cpus_allowed_ptr.)  But an ABI change such as this was
> not well received at the time.

What it seems to come down to is that any cpumask_t not inside of
a dynamically allocated object should be marked const.

And that is something we can enforce at compile time.

Linus has just suggested dynamically allocating cpumask_t's
for such cases but I don't see that as the fix either.

Just mark them const and enforce that cpumask_t objects can only
be modified when they appear in dynamically allocated objects.

You really don't need to modify the ones that passed around functions
anyways.  The only code that wants to change bits in these things is
the cpu cross-call dispatch stuff, and that cpu choice logic can just
live where it belongs down in the cross-call dispatch code.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                               ` <alpine.LFD.1.10.0808261144510.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26 20:59                                                 ` Adrian Bunk
  2008-08-26 21:04                                                   ` Linus Torvalds
  0 siblings, 1 reply; 318+ messages in thread
From: Adrian Bunk @ 2008-08-26 20:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 11:47:01AM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 26 Aug 2008, Adrian Bunk wrote:
> > 
> > I added "-fno-inline-functions-called-once -fno-early-inlining" to 
> > KBUILD_CFLAGS, and (with gcc 4.3) that increased the size of my kernel 
> > image by 2%.
> 
> Btw, did you check with just "-fno-inline-functions-called-once"?
> 
> The -fearly-inlining decisions _should_ be mostly right. If gcc sees early 
> that a function is so small (even without any constant propagation etc) 
> that it can be inlined, it's probably right. 
> 
> The inline-functions-called-once thing is what causes even big functions 
> to be inlined, and that's where you find the big downsides too (eg the 
> stack usage).

-fno-inline-functions-called-once alone costs me nearly 1% in code size.

And I'd expect it to become more with "-fwhole-program --combine".


If you think we have too many stacksize problems I'd suggest to consider 
removing the choice of 4k stacks on i386, sh and m68knommu instead of 
using -fno-inline-functions-called-once:

Now that 32bit x86 is no longer used for extreme highend configurations 
the only serious usecase for 4k stacks are AFAIK space savings on 
embedded archs.

4k stacks have caused us much pain [1], and the cases where gcc inlined 
too much were the easy ones.

I'm not saying that I'd like removing the choice of 4k stacks, but if we 
want to reduce the number of stack related problems that's IMHO the 
better alternative.


> 			Linus

cu
Adrian

[1] AFAIR some callpaths in the kernel are still too big

BTW: In case anyone wonders about why I suggest removing 4k stacks:
     My position is that 4k stacks should either be enabled 
     unconditionally or no longer offered at all.
     And if we remove 4k stacks from 32bit x86 it's no longer 
     realistically maintainable for other architectures.

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 20:59                                                 ` Adrian Bunk
@ 2008-08-26 21:04                                                   ` Linus Torvalds
       [not found]                                                     ` <alpine.LFD.1.10.0808261403360.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 21:04 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded



On Tue, 26 Aug 2008, Adrian Bunk wrote:
> 
> If you think we have too many stacksize problems I'd suggest to consider 
> removing the choice of 4k stacks on i386, sh and m68knommu instead of 
> using -fno-inline-functions-called-once:

Don't be silly. That makes the problem _worse_.

We're much better off with a 1% code-size reduction than forcing big 
stacks on people. The 4kB stack option is also a good way of saying "if it 
works with this, then 8kB is certainly safe".

And embedded people (the ones that might care about 1% code size) are the 
ones that would also want smaller stacks even more!

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected)
       [not found]                                                     ` <48B4641A.1020806-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2008-08-26 22:04                                                       ` Jeff Kirsher
  0 siblings, 0 replies; 318+ messages in thread
From: Jeff Kirsher @ 2008-08-26 22:04 UTC (permalink / raw)
  To: Kok, Auke
  Cc: Linus Torvalds, Jeff Garzik, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar

On Tue, Aug 26, 2008 at 1:14 PM, Kok, Auke <auke-jan.h.kok-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> Linus Torvalds wrote:
>>
>> On Tue, 26 Aug 2008, Jeff Garzik wrote:
>>> e1000_check_options builds a struct (singular) on the stack, really... struct
>>> e1000_option is reasonably small.
>>
>> No it doesn't.
>>
>> Look a bit more closely.
>>
>> It builds a struct (singular) MANY MANY times. It also then builds up a
>> huge e1000_opt_list[] array, even though it is const and should be static
>> (and const).
>>
>> I know. I wrote a patch to FIX it.
>
> totally cool patch afaics - if I still maintained this driver I'd have this tested
> and merged right away :)
>
> I suppose Jeff Kirsher is already doing so right now

You suppose correctly.
.
>
> I suppose that he'll have to look at the other Intel ethernet drivers as well :)
>
> Jeff, please add my:
>
> Reveiewed-by: Auke Kok <auke-jan.h.kok-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>

Will do.

-- 
Cheers,
Jeff

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                     ` <alpine.LFD.1.10.0808261403360.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-26 22:54                                                       ` Parag Warudkar
       [not found]                                                         ` <f7848160808261554j2f4eaaa6i1ee8801ae75ca7bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
                                                                           ` (2 more replies)
  2008-08-26 23:24                                                       ` Adrian Bunk
  1 sibling, 3 replies; 318+ messages in thread
From: Parag Warudkar @ 2008-08-26 22:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> And embedded people (the ones that might care about 1% code size) are the
> ones that would also want smaller stacks even more!

This is something I never understood - embedded devices are not going
to run more than a few processes and 4K*(Few Processes)
 IMHO is not worth a saving now a days even in embedded world given
falling memory prices. Or do I misunderstand?

Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                         ` <f7848160808261554j2f4eaaa6i1ee8801ae75ca7bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-26 23:00                                                           ` David VomLehn
  2008-08-26 23:45                                                             ` Adrian Bunk
  0 siblings, 1 reply; 318+ messages in thread
From: David VomLehn @ 2008-08-26 23:00 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

Parag Warudkar wrote:
> On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
> <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> 
>> And embedded people (the ones that might care about 1% code size) are the
>> ones that would also want smaller stacks even more!
> 
> This is something I never understood - embedded devices are not going
> to run more than a few processes and 4K*(Few Processes)
>  IMHO is not worth a saving now a days even in embedded world given
> falling memory prices. Or do I misunderstand?

Embedded applications span a huge range of sizes, from the very small devices to 
which you refer, to quite complex devices. The cable settop boxes we develop have 
over a hundred interrupt sources, typically run 250-300 threads, and have 192+ 
MiB of memory. For all that, we are very cost sensitive and are under constant 
pressure to come up with reliable ways to save memory.

> Parag
--
David VomLehn


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                     ` <alpine.LFD.1.10.0808261403360.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-26 22:54                                                       ` Parag Warudkar
@ 2008-08-26 23:24                                                       ` Adrian Bunk
       [not found]                                                         ` <20080826232411.GC11734-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Adrian Bunk @ 2008-08-26 23:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 02:04:57PM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 26 Aug 2008, Adrian Bunk wrote:
> > 
> > If you think we have too many stacksize problems I'd suggest to consider 
> > removing the choice of 4k stacks on i386, sh and m68knommu instead of 
> > using -fno-inline-functions-called-once:
> 
> Don't be silly. That makes the problem _worse_.
> 
> We're much better off with a 1% code-size reduction than forcing big 
> stacks on people. The 4kB stack option is also a good way of saying "if it 
> works with this, then 8kB is certainly safe".
>...

You implicitely assume both would solve the same problem.

While 4kB stacks are something we anyway never got 100% working, the 
cases where gcc inlining functions causes a critical increase in stack 
usage are usually not that hard to find, and once found the fix is 
trivial.

We should anyway monitor stack usages better since we have frequent 
programming errors in this area, and problems caused by gcc can this
way be detected en passant.

You have a good point that aiming at 4kB makes 8kB a very safe choice.

But I do not think the problem you'd solve with 
-fno-inline-functions-called-once is big enough to warrant the size 
increase it causes.

> 		Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 23:00                                                           ` David VomLehn
@ 2008-08-26 23:45                                                             ` Adrian Bunk
  0 siblings, 0 replies; 318+ messages in thread
From: Adrian Bunk @ 2008-08-26 23:45 UTC (permalink / raw)
  To: David VomLehn
  Cc: Parag Warudkar, Linus Torvalds, Adrian Bunk, Rusty Russell,
	Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded

On Tue, Aug 26, 2008 at 04:00:33PM -0700, David VomLehn wrote:
> Parag Warudkar wrote:
>> On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>
>>> And embedded people (the ones that might care about 1% code size) are the
>>> ones that would also want smaller stacks even more!
>>
>> This is something I never understood - embedded devices are not going
>> to run more than a few processes and 4K*(Few Processes)
>>  IMHO is not worth a saving now a days even in embedded world given
>> falling memory prices. Or do I misunderstand?
>
> Embedded applications span a huge range of sizes, from the very small 
> devices to which you refer, to quite complex devices. The cable settop 
> boxes we develop have over a hundred interrupt sources, typically run 
> 250-300 threads, and have 192+ MiB of memory. For all that, we are very 
> cost sensitive and are under constant pressure to come up with reliable 
> ways to save memory.

As you say correctly the term "embedded" gets used for many different 
devices.

And if you have 192+ MiB of memory you have so much that all these 
kernel size discussions don't really matter.

> David VomLehn

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 22:54                                                       ` Parag Warudkar
       [not found]                                                         ` <f7848160808261554j2f4eaaa6i1ee8801ae75ca7bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-26 23:47                                                         ` Linus Torvalds
       [not found]                                                           ` <alpine.LFD.1.10.0808261644260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27  8:34                                                         ` Bernd Petrovitsch
  2 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 23:47 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded



On Tue, 26 Aug 2008, Parag Warudkar wrote:
> 
> This is something I never understood - embedded devices are not going
> to run more than a few processes and 4K*(Few Processes)
>  IMHO is not worth a saving now a days even in embedded world given
> falling memory prices. Or do I misunderstand?

Well, by that argument, 1% of kernel size doesn't matter either..

1% of a kernel for an embedded device is roughly 10-30kB or so depending 
on how small you make the configuration. 

If that matters, then so should the difference of 3-8 processes' kernel 
stack usage when you have a 4k/8k stack choice.

And they _all_ will have at least 3-8 processes on them. Even the simplest 
ones will tend to have many more.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                         ` <20080826232411.GC11734-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
@ 2008-08-26 23:51                                                           ` Linus Torvalds
       [not found]                                                             ` <alpine.LFD.1.10.0808261648140.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27  8:25                                                           ` Alan Cox
  1 sibling, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-26 23:51 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > 
> > We're much better off with a 1% code-size reduction than forcing big 
> > stacks on people. The 4kB stack option is also a good way of saying "if it 
> > works with this, then 8kB is certainly safe".
> 
> You implicitely assume both would solve the same problem.

I'm just saying that your logic doesn't hold water.

If we can save kernel stack usage, then a 1% increase in kernel size is 
more than worth it.

> While 4kB stacks are something we anyway never got 100% working

What? Don't be silly. 

Linux _historically_ always used 4kB stacks.

No, they are likely not usable on x86-64, but dammit, they should be more 
than usable on x86-32 still.

> But I do not think the problem you'd solve with 
> -fno-inline-functions-called-once is big enough to warrant the size 
> increase it causes.

You continually try to see the inlining as a single solution to one 
problem (debuggability, stack, whatever).

The biggest problem with gcc inlining has always been that it has been 
_unpredictable_. It causes problems in many different ways. It has caused 
stability issues due to gcc versions doing random things. It causes the 
stack expansion. It makes stack traces harder for debugging, etc.

If it was any one thing, I wouldn't care. But it's exactly the fact that 
it causes all these problems in different areas.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                             ` <alpine.LFD.1.10.0808261648140.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27  0:23                                                               ` Adrian Bunk
  2008-08-27  0:28                                                                 ` Linus Torvalds
  0 siblings, 1 reply; 318+ messages in thread
From: Adrian Bunk @ 2008-08-27  0:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 04:51:52PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > > 
> > > We're much better off with a 1% code-size reduction than forcing big 
> > > stacks on people. The 4kB stack option is also a good way of saying "if it 
> > > works with this, then 8kB is certainly safe".
> > 
> > You implicitely assume both would solve the same problem.
> 
> I'm just saying that your logic doesn't hold water.
> 
> If we can save kernel stack usage, then a 1% increase in kernel size is 
> more than worth it.

From some tests the size increase seems to become bigger for smaller 
kernels, but I don't have any really good data.


An interesting question is why most of our architectures for embedded 
devices only offer bigger stacks:

The only architectures offering a 4kB stacks option are:
- m68knommu
- sh
- 32bit x86

The following architectures that are used in embedded devices 
always use 8kB stacks (or bigger) in your tree:
- arm
- avr32
- blackfin
- cris
- frv
- h8300
- m32r
- m68k
- mips
- mn10300 (has an #ifdef CONFIG_4KSTACKS but no kconfig option)
- powerpc
- xtensa


> > While 4kB stacks are something we anyway never got 100% working
> 
> What? Don't be silly. 
> 
> Linux _historically_ always used 4kB stacks.
> 
> No, they are likely not usable on x86-64, but dammit, they should be more 
> than usable on x86-32 still.


When did we get callpaths like like nfs+xfs+md+scsi reliably 
working with 4kB stacks on x86-32?


>...
> 			Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  0:23                                                               ` Adrian Bunk
@ 2008-08-27  0:28                                                                 ` Linus Torvalds
       [not found]                                                                   ` <alpine.LFD.1.10.0808261726560.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27  0:28 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded



On Wed, 27 Aug 2008, Adrian Bunk wrote:
> 
> When did we get callpaths like like nfs+xfs+md+scsi reliably 
> working with 4kB stacks on x86-32?

XFS may never have been usable, but the rest, sure.

And you seem to be making this whole argument an excuse to SUCK, adn an 
excuse to let gcc crap even more on our stack space.

Why?

Why aren't you saying that we should be able to do better? Instead, you 
seem to asking us to do even worse than we do now?

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                           ` <alpine.LFD.1.10.0808261644260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27  0:53                                                             ` Greg Ungerer
  2008-08-27  1:08                                                               ` Parag Warudkar
  2008-08-27  0:58                                                             ` Parag Warudkar
  1 sibling, 1 reply; 318+ messages in thread
From: Greg Ungerer @ 2008-08-27  0:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Parag Warudkar, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA


Linus Torvalds wrote:
> On Tue, 26 Aug 2008, Parag Warudkar wrote:
>> This is something I never understood - embedded devices are not going
>> to run more than a few processes and 4K*(Few Processes)
>>  IMHO is not worth a saving now a days even in embedded world given
>> falling memory prices. Or do I misunderstand?
> 
> Well, by that argument, 1% of kernel size doesn't matter either..
> 
> 1% of a kernel for an embedded device is roughly 10-30kB or so depending 
> on how small you make the configuration. 
> 
> If that matters, then so should the difference of 3-8 processes' kernel 
> stack usage when you have a 4k/8k stack choice.
> 
> And they _all_ will have at least 3-8 processes on them. Even the simplest 
> ones will tend to have many more.

I have some simple devices (network access/routers) with 8MB of RAM,
at power up not really being configured to do anything running 25
processes. (Heck there is over 10 kernel processes running!). Configure
some interfaces and services and that will easily push past 40.
I'd be happy with a 160k saving :-)

The init memory being freed at the end of the kernel boot is 88k,
4k stacks could save more than that.

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     gerg-XXXsiaCtIV5Wk0Htik3J/w@public.gmane.org
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                           ` <alpine.LFD.1.10.0808261644260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27  0:53                                                             ` Greg Ungerer
@ 2008-08-27  0:58                                                             ` Parag Warudkar
       [not found]                                                               ` <f7848160808261758q7b84aab1m188c1ebb59304818-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2008-08-27  9:00                                                               ` Bernd Petrovitsch
  1 sibling, 2 replies; 318+ messages in thread
From: Parag Warudkar @ 2008-08-27  0:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 7:47 PM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> If that matters, then so should the difference of 3-8 processes' kernel
> stack usage when you have a 4k/8k stack choice.

The savings part -financial ones- are not always realizable with the
way memory is priced/sized/fitted.
Savings in few Mb of Kernel stack are not necessarily going to allow
getting rid of a single memory chip of 64M or so.
Either that or embedded manufacturing/configurations are different
than the desktop world.

(If my device has 2 memory slots and my user space requires 100Mb
including kernel memory - I anyways have to put in 64Mx2 there to take
advantage of mass manufactured, general purpose memory - so no big
deal if I saved 1.2Mb in Kernel stack or not. And savings of 64Mb
Kernel memory are not feasible anyways to allow user space to work
with 64Mb.)

On the other hand reducing  user space memory usage on those devices
(not counting savings from kernel stack size) is a way more attractive
option.

And although you said in your later reply that Linux x86 with 4K
stacks should be more than usable - my experiences running a untainted
desktop/file server with 4K stack have been always disastrous XFS or
not.  It _might_ work for some well defined workloads but you would
not want to risk 4K stacks otherwise.

I understand the having 4K stack option as a non-default for very
specific workloads is a good idea but apart from that I think no one
else seems to bother with reducing stack sizes (by no one I mean other
OSes.)

Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  0:53                                                             ` Greg Ungerer
@ 2008-08-27  1:08                                                               ` Parag Warudkar
  2008-08-27  1:31                                                                 ` Greg Ungerer
  0 siblings, 1 reply; 318+ messages in thread
From: Parag Warudkar @ 2008-08-27  1:08 UTC (permalink / raw)
  To: Greg Ungerer
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded

On Tue, Aug 26, 2008 at 8:53 PM, Greg Ungerer <gerg@snapgear.com> wrote:

> I have some simple devices (network access/routers) with 8MB of RAM,
> at power up not really being configured to do anything running 25
> processes. (Heck there is over 10 kernel processes running!). Configure
> some interfaces and services and that will easily push past 40.
> I'd be happy with a 160k saving :-)
>

So you really need to run all 25 processes on that 8Mb box?
(For reference even the NGW100 development board comes with 16Mb RAM).

Even if you do need those all 25 processes on the 8Mb box, fixing the
memory usage of those user space hogs is lot better than trying to
save 160Kb in kernel stacks.
Last I looked, user space wasn't particularly frugal with memory usage.

Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  1:08                                                               ` Parag Warudkar
@ 2008-08-27  1:31                                                                 ` Greg Ungerer
       [not found]                                                                   ` <48B4AE68.4040205-XXXsiaCtIV5Wk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Greg Ungerer @ 2008-08-27  1:31 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded


Parag Warudkar wrote:
> On Tue, Aug 26, 2008 at 8:53 PM, Greg Ungerer <gerg@snapgear.com> wrote:
> 
>> I have some simple devices (network access/routers) with 8MB of RAM,
>> at power up not really being configured to do anything running 25
>> processes. (Heck there is over 10 kernel processes running!). Configure
>> some interfaces and services and that will easily push past 40.
>> I'd be happy with a 160k saving :-)
>>
> 
> So you really need to run all 25 processes on that 8Mb box?

Yes, of course. Considerable effort has been put into running
a minimal set of processes (that still for fills the required function
set of this device).


> (For reference even the NGW100 development board comes with 16Mb RAM).

Lots of development boards are fitted with lots of RAM.

And the pressure will still be on in _real_ products to reduce
the RAM footprint as much as possible. There are exceptions but
generally less is cheaper. Simple economics really.


> Even if you do need those all 25 processes on the 8Mb box, fixing the
> memory usage of those user space hogs is lot better than trying to
> save 160Kb in kernel stacks.

Yep, been done too. You don't squeeze a lot into these smaller
devices without looking at everything in it.


> Last I looked, user space wasn't particularly frugal with memory usage.

Then you haven't looked in the right places :-)

There are plenty of choices for making things small in user space.
Simple stuff like using uClibc, busybox, etc.

In this specific example things like /bin/init is 10k, /bin/inetd
is 10k, /bin/crond is 11k, etc. (Ofcourse it is a shared uClibc setup,
uClibc is ~300k). And XIP can help out here too.

Regards
Greg



------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     gerg@snapgear.com
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                               ` <f7848160808261758q7b84aab1m188c1ebb59304818-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-27  1:49                                                                 ` Linus Torvalds
       [not found]                                                                   ` <alpine.LFD.1.10.0808261837530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27 11:58                                                                   ` Adrian Bunk
  0 siblings, 2 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27  1:49 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, 26 Aug 2008, Parag Warudkar wrote:
>
> And although you said in your later reply that Linux x86 with 4K
> stacks should be more than usable - my experiences running a untainted
> desktop/file server with 4K stack have been always disastrous XFS or
> not.  It _might_ work for some well defined workloads but you would
> not want to risk 4K stacks otherwise.

Umm. How long?

4kB used to be the _only_ choice. And no, there weren't even irq stacks. 
So that 4kB was not just the whole kernel call-chain, it was also all the 
irq nesting above it.

And yes, we've gotten much worse over time, and no, I can't really suggest 
going back to that in general. The code bloat has certainly been 
accompanied by a stack bloat too.

But part of it is definitely gcc. Some versions of gcc used to be 
absolutely _horrid_ when it came to stack usage, especially with some 
flags, and especially with the crazy inlining that module-at-a-time 
caused.

But I'd be really happy if some embedded people tried to take some of that 
bloat back, and aim for 4kB stacks. Because it's definitely not 
unrealistic. At least it _shouldn't_ be. And a lot of the cases of us 
having structures on the stack is actually not worth it, and tends to be 
about being lazy rather than anything else.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                   ` <48B4AE68.4040205-XXXsiaCtIV5Wk0Htik3J/w@public.gmane.org>
@ 2008-08-27  2:16                                                                     ` Parag Warudkar
  2008-08-27  8:44                                                                       ` Bernd Petrovitsch
  0 siblings, 1 reply; 318+ messages in thread
From: Parag Warudkar @ 2008-08-27  2:16 UTC (permalink / raw)
  To: Greg Ungerer
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 9:31 PM, Greg Ungerer <gerg-XXXsiaCtIV5Wk0Htik3J/w@public.gmane.org> wrote:

>
> And the pressure will still be on in _real_ products to reduce
> the RAM footprint as much as possible. There are exceptions but
> generally less is cheaper. Simple economics really.

Well, sure  - but the industry as a whole seems to have gone the other
way - do more with more at the similar or lower price points!
By that definition of less is better we should try and make the kernel
memory pageable (or has someone already done that?) - Windows does it,
by default ;)

Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                   ` <alpine.LFD.1.10.0808261837530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27  2:36                                                                     ` Parag Warudkar
       [not found]                                                                       ` <f7848160808261936m18c69dc0r26f41850efae4b91-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2008-08-27  8:32                                                                       ` Alan Cox
  2008-08-27  6:01                                                                     ` Paul Mackerras
  1 sibling, 2 replies; 318+ messages in thread
From: Parag Warudkar @ 2008-08-27  2:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 9:49 PM, Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>
>
> On Tue, 26 Aug 2008, Parag Warudkar wrote:
>>
>> And although you said in your later reply that Linux x86 with 4K
>> stacks should be more than usable - my experiences running a untainted
>> desktop/file server with 4K stack have been always disastrous XFS or
>> not.  It _might_ work for some well defined workloads but you would
>> not want to risk 4K stacks otherwise.
>
> Umm. How long?
>

IIRC the last I tried 4K stacks with x86 was on 2.6.21 - Fedora 7
kernel, around June 07 time frame.
The oops included a ugly and long call trace that I still remember.

> And a lot of the cases of us
> having structures on the stack is actually not worth it, and tends to be
> about being lazy rather than anything else.

What about deep call chains? The problem with the uptake of 4K stacks
seems to be that is not reliably provable that it will work under all
circumstances.

Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                       ` <f7848160808261936m18c69dc0r26f41850efae4b91-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-27  2:52                                                                         ` Linus Torvalds
  0 siblings, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27  2:52 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, 26 Aug 2008, Parag Warudkar wrote:
> 
> What about deep call chains? The problem with the uptake of 4K stacks
> seems to be that is not reliably provable that it will work under all
> circumstances.

Umm. Neither is 8k stacks. Nobody "proved" anything.

But yes, some subsystems have insanely deep call chains. And yes, things 
like the VFS recursion (for symlinks) makes that deeper yet for 
filesystems, although only on the lookup path. And that is exactly the 
kind of thing that can exacerbate the problem of the compiler artificially 
making for a bigger stack footprint of a function (*).

For things like the VFS layer, right now we allow a nesting level of 8, I 
think. If I remember correctly, it was 5 historically. Part of raising 
that depth, though, was that we actually moved the recursive part into 
fs/namei.c, and the nesting stack-depth was something pretty damn small 
when the filesystem used "follow_link" properly and let the VFS do it for 
it (ie the callchain to actually look up the link could be deep, but it 
would not recurse back, and instead just return a pointer, so that the 
actual _recursive_ part was just __do_follow_link() and is just a few 
words on the stack).

So yes, we do have some deep callchains, but they tend to be pretty well 
managed for _good_ code. The problems tend to be the areas with lots of 
indirection layers, and yeah, XFS, MD and ACPI all have those kinds of 
things.

In an embdedded world, many of those should be a non-issue, though. 

			Linus

(*) ie the function that _is_ on the deep chain doesn't actually need much 
of a stack footprint at all itself, but it may call a helper function that 
is _not_ in the deep chain, and if it gets inlined it may give its 
excessive stack footprint to the deep chain - and this is _exactly_ the 
problem that happened with inlining "load_module()".

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                   ` <alpine.LFD.1.10.0808261837530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27  2:36                                                                     ` Parag Warudkar
@ 2008-08-27  6:01                                                                     ` Paul Mackerras
       [not found]                                                                       ` <18612.60878.887716.452936-nUko2b1QN/1kfgV4h6NXRTJtLkR7yuzc@public.gmane.org>
  2008-08-27 15:18                                                                       ` Linus Torvalds
  1 sibling, 2 replies; 318+ messages in thread
From: Paul Mackerras @ 2008-08-27  6:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Parag Warudkar, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

Linus Torvalds writes:

> 4kB used to be the _only_ choice. And no, there weren't even irq stacks. 
> So that 4kB was not just the whole kernel call-chain, it was also all the 
> irq nesting above it.

I think your memory is failing you.  In 2.4 and earlier, the kernel
stack was 8kB minus the size of the task_struct, which sat at the
start of the 8kB.  For instance, from include/asm-i386/processor.h for
2.4.29:

#define THREAD_SIZE (2*PAGE_SIZE)
#define alloc_task_struct() ((struct task_struct *) __get_free_pages(GFP_KERNEL,1))
#define free_task_struct(p) free_pages((unsigned long) (p), 1)

Paul.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 20:01                                                     ` Mike Travis
@ 2008-08-27  6:54                                                       ` Nick Piggin
  2008-08-27  7:05                                                         ` David Miller
       [not found]                                                         ` <200808271654.32721.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
  0 siblings, 2 replies; 318+ messages in thread
From: Nick Piggin @ 2008-08-27  6:54 UTC (permalink / raw)
  To: Mike Travis
  Cc: Dave Jones, Linus Torvalds, Alan D. Brunelle, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven,
	Rusty Russell, Siddha, Suresh B, Luck, Tony, Jack Steiner,
	Christoph Lameter

On Wednesday 27 August 2008 06:01, Mike Travis wrote:
> Dave Jones wrote:
> ...
>
> > But yes, for this to be even remotely feasible, there has to be a
> > negligable performance cost associated with it, which right now, we
> > clearly don't have. Given that the number of people running 4096 CPU
> > boxes even in a few years time will still be tiny, punishing the common
> > case is obviously absurd.
> >
> > 	Dave
>
> I did do some fairly extensive benchmarking between configs of NR_CPUS =
> 128 and 4096 and most performance hits were in the neighborhood of < 5% on
> systems with 8 cpus and 4GB of memory (our most common test system).

5% is a pretty nasty performance hit... what sort of benchmarks are we
talking about here?

I just made some pretty crazy changes to the VM to get "only" around 5
or so % performance improvement in some workloads.

What places are making heavy use of cpumasks that causes such a slowdown?
Hopefully callers can mostly be improved so they don't need to use cpumasks
for common cases.

Until then, it would be kind of sad for a distro to ship a generic x86
kernel and lose 5% performance because it is set to 4096 CPUs...

But if I misunderstand and you're talking about specific microbenchmarks to
find the worst case for huge cpumasks, then I take that back.


> [But 
> changing cpumask_t's to be pointers instead of values will likely increase
> this.]  I've tried to be very sensitive to this issue with all my previous
> changes, so convincing the distros to set NR_CPUS=4096 would be as painless
> for them as possible. ;-)
>
> Btw, huge count cpu systems I don't think are that far away.  I believe the
> nextgen Larabbee chips will be geared towards HPC applications [instead of
> just GFX apps], and putting 4 of these chips on a motherboard would add up
> to 512 cpu threads (1024 if they support hyperthreading.)

It would be quite interesting if they make them cache coherent / MP capable.
Will they be?

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  6:54                                                       ` Nick Piggin
@ 2008-08-27  7:05                                                         ` David Miller
       [not found]                                                           ` <20080827.000506.177643294.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
       [not found]                                                         ` <200808271654.32721.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: David Miller @ 2008-08-27  7:05 UTC (permalink / raw)
  To: nickpiggin
  Cc: travis, davej, torvalds, Alan.Brunelle, mingo, tglx, rjw,
	linux-kernel, kernel-testers, akpm, arjan, rusty, suresh.b.siddha,
	tony.luck, steiner, cl

From: Nick Piggin <nickpiggin@yahoo.com.au>
Date: Wed, 27 Aug 2008 16:54:32 +1000

> 5% is a pretty nasty performance hit... what sort of benchmarks are we
> talking about here?
> 
> I just made some pretty crazy changes to the VM to get "only" around 5
> or so % performance improvement in some workloads.
> 
> What places are making heavy use of cpumasks that causes such a slowdown?
> Hopefully callers can mostly be improved so they don't need to use cpumasks
> for common cases.

It's almost certainly from the cross-call dispatch call chain.

As just one example, just to do a TLB flush mm->cpu_vm_mask probably
gets passed around as an aggregate two or three times on the way down
to the APIC programming code on x86.  That's two or three 512 byte
copies on the stack :)

Look at the sparc64 SMP code for how I solved the problem there.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                           ` <20080827.000506.177643294.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-08-27  7:47                                                             ` Nick Piggin
       [not found]                                                               ` <200808271747.14690.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
  2008-08-27 14:36                                                             ` Mike Travis
  1 sibling, 1 reply; 318+ messages in thread
From: Nick Piggin @ 2008-08-27  7:47 UTC (permalink / raw)
  To: David Miller
  Cc: travis-sJ/iWh9BUns, davej-H+wXaHxf7aLQT0dZR+AlfA,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, mingo-X9Un+BFzKDI,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww,
	suresh.b.siddha-ral2JQCrhuEAvxtiuMwx3w,
	tony.luck-ral2JQCrhuEAvxtiuMwx3w, steiner-sJ/iWh9BUns,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

On Wednesday 27 August 2008 17:05, David Miller wrote:
> From: Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
> Date: Wed, 27 Aug 2008 16:54:32 +1000
>
> > 5% is a pretty nasty performance hit... what sort of benchmarks are we
> > talking about here?
> >
> > I just made some pretty crazy changes to the VM to get "only" around 5
> > or so % performance improvement in some workloads.
> >
> > What places are making heavy use of cpumasks that causes such a slowdown?
> > Hopefully callers can mostly be improved so they don't need to use
> > cpumasks for common cases.
>
> It's almost certainly from the cross-call dispatch call chain.
>
> As just one example, just to do a TLB flush mm->cpu_vm_mask probably
> gets passed around as an aggregate two or three times on the way down
> to the APIC programming code on x86.  That's two or three 512 byte
> copies on the stack :)

Yeah, I see. That's stupid isn't it? (Well, I guess it was completely
sane when cpumasks were word sized ;))

Hopefully that accounts for a significant chunk...

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                         ` <20080826232411.GC11734-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
  2008-08-26 23:51                                                           ` Linus Torvalds
@ 2008-08-27  8:25                                                           ` Alan Cox
  2008-08-27 12:52                                                             ` Parag Warudkar
  1 sibling, 1 reply; 318+ messages in thread
From: Alan Cox @ 2008-08-27  8:25 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

> You have a good point that aiming at 4kB makes 8kB a very safe choice.

Not really no - we use separate IRQ stacks in 4K but not 8K mode on
x86-32. That means you've actually got no more space if you are unlucky
with the timing of events. The 8K mode is merely harder to debug.

If 4K stacks really are not safe then x86-32 really really needs to
switch to using IRQ stacks in 8K stack mode as well.

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  2:36                                                                     ` Parag Warudkar
       [not found]                                                                       ` <f7848160808261936m18c69dc0r26f41850efae4b91-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-27  8:32                                                                       ` Alan Cox
  1 sibling, 0 replies; 318+ messages in thread
From: Alan Cox @ 2008-08-27  8:32 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded

> What about deep call chains? The problem with the uptake of 4K stacks
> seems to be that is not reliably provable that it will work under all
> circumstances.

On x86-32 with 8K stacks your IRQ paths share them so that is even harder
to prove (not that you can prove any of them) and the bugs are more
obscure and random.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 22:54                                                       ` Parag Warudkar
       [not found]                                                         ` <f7848160808261554j2f4eaaa6i1ee8801ae75ca7bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2008-08-26 23:47                                                         ` Linus Torvalds
@ 2008-08-27  8:34                                                         ` Bernd Petrovitsch
  2 siblings, 0 replies; 318+ messages in thread
From: Bernd Petrovitsch @ 2008-08-27  8:34 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded

On Tue, 2008-08-26 at 18:54 -0400, Parag Warudkar wrote:
> On Tue, Aug 26, 2008 at 5:04 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> 
> > And embedded people (the ones that might care about 1% code size) are the
> > ones that would also want smaller stacks even more!
> 
> This is something I never understood - embedded devices are not going
> to run more than a few processes and 4K*(Few Processes)
>  IMHO is not worth a saving now a days even in embedded world given
> falling memory prices. Or do I misunderstand?

Falling prices are no reason to increase the amount of available RAM (or
other hardware).
Especially if you (intend to) build >1E5 devices - where every Euro
counts.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  2:16                                                                     ` Parag Warudkar
@ 2008-08-27  8:44                                                                       ` Bernd Petrovitsch
  0 siblings, 0 replies; 318+ messages in thread
From: Bernd Petrovitsch @ 2008-08-27  8:44 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Greg Ungerer, Linus Torvalds, Adrian Bunk, Rusty Russell,
	Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded

On Tue, 2008-08-26 at 22:16 -0400, Parag Warudkar wrote:
[...]
> Well, sure  - but the industry as a whole seems to have gone the other

"The industry as a whole" doesn't exist on that low level. You can't
compare the laptop and/or desktop computer market (where one may buy
today hardware that runs in 3 years with the next generation/release of
the OS and applications) with the e.g. "WLAN router" market where - from
the commercial point of view - every Euro counts (and where the
requirements for the lifetime of the device are long frozen before the
thing gets in a shop).

> way - do more with more at the similar or lower price points!
> By that definition of less is better we should try and make the kernel
> memory pageable (or has someone already done that?) - Windows does it,

That doesn't help as in really small devices (like WLAN routers, cable
modems, etc.) you run without any means of paging/swapping. And even
binaries/read-only files are not necessarily executable in place (but
must be loaded into RAM). So you can't flush these pages.

And pageable kernel memory doesn't come for free - even if one only
counts the increased code and it's complexity.

> by default ;)

Which is more a sign that it is probably a very bad idea.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                               ` <200808271747.14690.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
@ 2008-08-27  8:44                                                                 ` David Miller
       [not found]                                                                   ` <20080827.014457.140528687.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Miller @ 2008-08-27  8:44 UTC (permalink / raw)
  To: nickpiggin-/E1597aS9LT0CCvOHzKKcA
  Cc: travis-sJ/iWh9BUns, davej-H+wXaHxf7aLQT0dZR+AlfA,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, mingo-X9Un+BFzKDI,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww,
	suresh.b.siddha-ral2JQCrhuEAvxtiuMwx3w,
	tony.luck-ral2JQCrhuEAvxtiuMwx3w, steiner-sJ/iWh9BUns,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
Date: Wed, 27 Aug 2008 17:47:14 +1000

> Yeah, I see. That's stupid isn't it? (Well, I guess it was completely
> sane when cpumasks were word sized ;))
> 
> Hopefully that accounts for a significant chunk...

There is a lot of indirect costs that are hard to see as well.

Two things a lot of these cross-call dispatch paths do is:

1) Clear self-cpu

2) AND with cpus_online

#1 can normally be a simple bit clear, but some places can also
implement this with something like "cpus_andn(X, cpumask_of_cpu(cpu))"

It's simply easier to move those two things down to the bottom of
the APIC programming code, they just loop over the cpumask doing
an expensive APIC I/O operation anyways, might as well overlap it
with these "skip self-cpu" and "skip not-online cpus" checks.

And oh yeah we get the stack wastage fixed too, isn't what what we
were talking about? :-)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  0:58                                                             ` Parag Warudkar
       [not found]                                                               ` <f7848160808261758q7b84aab1m188c1ebb59304818-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-27  9:00                                                               ` Bernd Petrovitsch
       [not found]                                                                 ` <1219827609.30209.29.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Bernd Petrovitsch @ 2008-08-27  9:00 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded

On Tue, 2008-08-26 at 20:58 -0400, Parag Warudkar wrote:
[...]
> The savings part -financial ones- are not always realizable with the
> way memory is priced/sized/fitted.
> Savings in few Mb of Kernel stack are not necessarily going to allow
> getting rid of a single memory chip of 64M or so.

No, but you can put an additional service(s) on it and sales people have
one (or two or ....) line more for their sales brochures.

> Either that or embedded manufacturing/configurations are different
> than the desktop world.

They are different. Think of running the complete system acting as a
bridge, router and/or firewall (Kernel early 2.4 though) from 4MB flash
in 32MB RAM and - listing the outside visible services - having a
command-line interface, web-GUI (implying a http server) and and a
(net-)SNMP agent on it.
Running a glibc without thread support is win there (implying that there
is no thread support available on that device).

> (If my device has 2 memory slots and my user space requires 100Mb
> including kernel memory - I anyways have to put in 64Mx2 there to take
> advantage of mass manufactured, general purpose memory - so no big
> deal if I saved 1.2Mb in Kernel stack or not. And savings of 64Mb
> Kernel memory are not feasible anyways to allow user space to work
> with 64Mb.)

As soon as product management realizes that there is space left on the
device, they get new ideas and/or customer requirements to run more
services on that device.

> On the other hand reducing  user space memory usage on those devices
> (not counting savings from kernel stack size) is a way more attractive
> option.

There is no question if save space here or there. You save it - sooner
or later - on all fronts. Period.

> And although you said in your later reply that Linux x86 with 4K
> stacks should be more than usable - my experiences running a untainted
> desktop/file server with 4K stack have been always disastrous XFS or
> not.  It _might_ work for some well defined workloads but you would
> not want to risk 4K stacks otherwise.

The embedded world of really small devices usually doesn't run XFS (or
ext? or reiser* of jfs or NFS or ...) or stacks block devices on files
or .....

> I understand the having 4K stack option as a non-default for very
> specific workloads is a good idea but apart from that I think no one
> else seems to bother with reducing stack sizes (by no one I mean other
> OSes.)

They probably gave the idea pretty soon because you need to
rework/improve large parts of the kernel + drivers (and that has two
major problems - it consumes a lot of man power for "no new features and
everything must be completely tested again"[0] and it adds new risks).
And that is practically impossible if one sells "stable driver APIs" for
3rd party (commercial) drivers because these must be changed too.

	Bernd

[0]: Let alone if you (or your customers) need certificates from some
     governmental agencys.
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                       ` <18612.60878.887716.452936-nUko2b1QN/1kfgV4h6NXRTJtLkR7yuzc@public.gmane.org>
@ 2008-08-27 10:58                                                                         ` Arjan van de Ven
  0 siblings, 0 replies; 318+ messages in thread
From: Arjan van de Ven @ 2008-08-27 10:58 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Linus Torvalds, Parag Warudkar, Adrian Bunk, Rusty Russell,
	Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

Paul Mackerras wrote:
> Linus Torvalds writes:
> 
>> 4kB used to be the _only_ choice. And no, there weren't even irq stacks. 
>> So that 4kB was not just the whole kernel call-chain, it was also all the 
>> irq nesting above it.
> 
> I think your memory is failing you.  In 2.4 and earlier, the kernel
> stack was 8kB minus the size of the task_struct, which sat at the
> start of the 8kB.  For instance, from include/asm-i386/processor.h for
> 2.4.29:

but was shared with interrupts; so out of the 6Kb left, you had still really only 4Kb for user context stack

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                   ` <alpine.LFD.1.10.0808261726560.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27 11:58                                                                     ` Adrian Bunk
       [not found]                                                                       ` <20080827115829.GF11734-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Adrian Bunk @ 2008-08-27 11:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > 
> > When did we get callpaths like like nfs+xfs+md+scsi reliably 
> > working with 4kB stacks on x86-32?
> 
> XFS may never have been usable, but the rest, sure.
> 
> And you seem to be making this whole argument an excuse to SUCK, adn an 
> excuse to let gcc crap even more on our stack space.
> 
> Why?
> 
> Why aren't you saying that we should be able to do better? Instead, you 
> seem to asking us to do even worse than we do now?

My main point is:
- getting 4kB stacks working reliably is a hard task
- having an eye on gcc increasing the stack usage, and fixing it if
  required, is relatively easy

If we should be able to do better at getting (and keeping) 4kB stacks 
working, then coping with possible inlining problems caused by gcc
should not be a big problem for us.

> 			Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  1:49                                                                 ` Linus Torvalds
       [not found]                                                                   ` <alpine.LFD.1.10.0808261837530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27 11:58                                                                   ` Adrian Bunk
  1 sibling, 0 replies; 318+ messages in thread
From: Adrian Bunk @ 2008-08-27 11:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Parag Warudkar, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded

On Tue, Aug 26, 2008 at 06:49:19PM -0700, Linus Torvalds wrote:
>...
> But part of it is definitely gcc. Some versions of gcc used to be 
> absolutely _horrid_ when it came to stack usage, especially with some 
> flags, and especially with the crazy inlining that module-at-a-time 
> caused.
>...

That was gcc 3.4.

And due to that we disable unit-at-a-time for gcc 3.4 on 32bit x86.

> 			Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  8:25                                                           ` Alan Cox
@ 2008-08-27 12:52                                                             ` Parag Warudkar
       [not found]                                                               ` <f7848160808270552u2ee66167x912a68e0bf8b25bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Parag Warudkar @ 2008-08-27 12:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: Adrian Bunk, Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded

On Wed, Aug 27, 2008 at 4:25 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> You have a good point that aiming at 4kB makes 8kB a very safe choice.
>
> Not really no - we use separate IRQ stacks in 4K but not 8K mode on
> x86-32. That means you've actually got no more space if you are unlucky
> with the timing of events. The 8K mode is merely harder to debug.
>

By your logic though, XFS on x86 should work fine with 4K stacks -
many will attest that it does not and blows up due to stack issues.

I have first hand experiences of things blowing up with deep call
chains when using 4K stacks where 8K worked just fine on same
workload.

So there is definitely some other problem with 4K stacks.

Thanks
Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                 ` <1219827609.30209.29.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
@ 2008-08-27 12:56                                                                   ` Parag Warudkar
  2008-08-27 13:17                                                                     ` Bernd Petrovitsch
  0 siblings, 1 reply; 318+ messages in thread
From: Parag Warudkar @ 2008-08-27 12:56 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 27, 2008 at 5:00 AM, Bernd Petrovitsch <bernd-GBwJepH+xoVeoWH0uzbU5w@public.gmane.org> wrote:

>
> They probably gave the idea pretty soon because you need to
> rework/improve large parts of the kernel + drivers (and that has two
> major problems - it consumes a lot of man power for "no new features and
> everything must be completely tested again"[0] and it adds new risks).
> And that is practically impossible if one sells "stable driver APIs" for
> 3rd party (commercial) drivers because these must be changed too.
>

But not many embedded Linux arches support 4K stacks like Adrian
pointed out earlier.
So the same (lot of man power requirement) would apply to Linux.

Sure it will be good - but how reasonable it is to attempt it and how
reliably it will work under all conceived loads - those are the
questions.

Thanks

Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27 12:56                                                                   ` Parag Warudkar
@ 2008-08-27 13:17                                                                     ` Bernd Petrovitsch
       [not found]                                                                       ` <1219843032.30209.51.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Bernd Petrovitsch @ 2008-08-27 13:17 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Linus Torvalds, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded


On Wed, 2008-08-27 at 08:56 -0400, Parag Warudkar wrote:
> On Wed, Aug 27, 2008 at 5:00 AM, Bernd Petrovitsch <bernd@firmix.at> wrote:
> > They probably gave the idea pretty soon because you need to
> > rework/improve large parts of the kernel + drivers (and that has two
> > major problems - it consumes a lot of man power for "no new features and
> > everything must be completely tested again"[0] and it adds new risks).
> > And that is practically impossible if one sells "stable driver APIs" for
> > 3rd party (commercial) drivers because these must be changed too.
> 
> But not many embedded Linux arches support 4K stacks like Adrian

What is an "embedded Linux arch"?
Personally I encountered i386, ARM, MIPS and PPC in the embedded world.

> pointed out earlier.
> So the same (lot of man power requirement) would apply to Linux.

Of course. Look at the amount of work done by lots of people in that
area (including stack frame size reductions) and on-going discussions.

> Sure it will be good - but how reasonable it is to attempt it and how
> reliably it will work under all conceived loads - those are the
> questions.

If you "develop" an embedded system (which is partly system integration
of existing apps) to be installed in the field, you don't have that many
conceivable work loads compared to a desktop/server system. And you have
a fixed list of drivers and applications.
A usual approach is to run stress tests on several (or all)
subsystems/services/... in parallel and if the device survives it
functioning correctly, it is at least good enough.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                               ` <f7848160808270552u2ee66167x912a68e0bf8b25bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-08-27 13:21                                                                 ` Alan Cox
       [not found]                                                                   ` <20080827142142.303cdba8-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Alan Cox @ 2008-08-27 13:21 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: Adrian Bunk, Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

> By your logic though, XFS on x86 should work fine with 4K stacks -
> many will attest that it does not and blows up due to stack issues.
> 
> I have first hand experiences of things blowing up with deep call
> chains when using 4K stacks where 8K worked just fine on same
> workload.
> 
> So there is definitely some other problem with 4K stacks.

Nothing of the sort. If it blows up with a 4K stack it will almost
certainly blow up with an 8K stack *eventually* - when a heavy stack usage
coincides with a heavy stack using IRQ handler.

You won't catch it in simple testing, you won't catch it in trivial
simulation and it'll be incredibly hard to reproduce. Not the kind of bug
you want in a production system really. IRQ stacks make things much more
predictable.

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                         ` <200808271654.32721.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
@ 2008-08-27 14:35                                                           ` Mike Travis
  0 siblings, 0 replies; 318+ messages in thread
From: Mike Travis @ 2008-08-27 14:35 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Dave Jones, Linus Torvalds, Alan D. Brunelle, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven,
	Rusty Russell, Siddha, Suresh B, Luck, Tony, Jack Steiner,
	Christoph Lameter

Nick Piggin wrote:
> On Wednesday 27 August 2008 06:01, Mike Travis wrote:
>> Dave Jones wrote:
>> ...
>>
>>> But yes, for this to be even remotely feasible, there has to be a
>>> negligable performance cost associated with it, which right now, we
>>> clearly don't have. Given that the number of people running 4096 CPU
>>> boxes even in a few years time will still be tiny, punishing the common
>>> case is obviously absurd.
>>>
>>> 	Dave
>> I did do some fairly extensive benchmarking between configs of NR_CPUS =
>> 128 and 4096 and most performance hits were in the neighborhood of < 5% on
>> systems with 8 cpus and 4GB of memory (our most common test system).
> 
> 5% is a pretty nasty performance hit... what sort of benchmarks are we
> talking about here?

It's been a while now, I should go back and check my notes.  Many of the
BM's did not have any changes.  I believe the ones that were right on the
edge of paging were affected by the fact that less memory was available.
> 
> I just made some pretty crazy changes to the VM to get "only" around 5
> or so % performance improvement in some workloads.
> 
> What places are making heavy use of cpumasks that causes such a slowdown?
> Hopefully callers can mostly be improved so they don't need to use cpumasks
> for common cases.

That's another study I did, and it seemed that maybe 95% of the functions
would not be affected by passing pointers to cpumasks instead of the cpumasks
themselves, because the data was processed by a cpu_xxx function that
uses a pointer.  Most commonly was to create a temp cpumask, using
cpus_and(temp_mask, callers_mask, cpu_online_map);  The speedup to use nr_cpu_ids
instead of NR_CPUS in the traversal functions helped quite a bit.  Using this
same method in the cpus_xxx functions would further speed up things.  (As
well as only allocating the cpumask sized by nr_cpu_ids instead of NR_CPUS
as the current cpumask_t definition specifies.)

> 
> Until then, it would be kind of sad for a distro to ship a generic x86
> kernel and lose 5% performance because it is set to 4096 CPUs...
> 
> But if I misunderstand and you're talking about specific microbenchmarks to
> find the worst case for huge cpumasks, then I take that back.

Yes, I was (at the time) trying to determine how many of the cpumask functions
were actually in play by user tasks, so I was zeroing in on those (cpusets,
rescheds, etc.)

> 
> 
>> [But 
>> changing cpumask_t's to be pointers instead of values will likely increase
>> this.]  I've tried to be very sensitive to this issue with all my previous
>> changes, so convincing the distros to set NR_CPUS=4096 would be as painless
>> for them as possible. ;-)
>>
>> Btw, huge count cpu systems I don't think are that far away.  I believe the
>> nextgen Larabbee chips will be geared towards HPC applications [instead of
>> just GFX apps], and putting 4 of these chips on a motherboard would add up
>> to 512 cpu threads (1024 if they support hyperthreading.)
> 
> It would be quite interesting if they make them cache coherent / MP capable.
> Will they be?

There's not been a lot of info available yet, but I think the 128 cores will
share at least an L2 cache + memory controller.  How the APIC's interact is
also another big question.  And most likely some standard system controller
CPU will be needed, but that could be a tiny VIA processor... ;-)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                           ` <20080827.000506.177643294.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-08-27  7:47                                                             ` Nick Piggin
@ 2008-08-27 14:36                                                             ` Mike Travis
  1 sibling, 0 replies; 318+ messages in thread
From: Mike Travis @ 2008-08-27 14:36 UTC (permalink / raw)
  To: David Miller
  Cc: nickpiggin-/E1597aS9LT0CCvOHzKKcA, davej-H+wXaHxf7aLQT0dZR+AlfA,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, mingo-X9Un+BFzKDI,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww,
	suresh.b.siddha-ral2JQCrhuEAvxtiuMwx3w,
	tony.luck-ral2JQCrhuEAvxtiuMwx3w, steiner-sJ/iWh9BUns,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

David Miller wrote:
> From: Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
> Date: Wed, 27 Aug 2008 16:54:32 +1000
> 
>> 5% is a pretty nasty performance hit... what sort of benchmarks are we
>> talking about here?
>>
>> I just made some pretty crazy changes to the VM to get "only" around 5
>> or so % performance improvement in some workloads.
>>
>> What places are making heavy use of cpumasks that causes such a slowdown?
>> Hopefully callers can mostly be improved so they don't need to use cpumasks
>> for common cases.
> 
> It's almost certainly from the cross-call dispatch call chain.
> 
> As just one example, just to do a TLB flush mm->cpu_vm_mask probably
> gets passed around as an aggregate two or three times on the way down
> to the APIC programming code on x86.  That's two or three 512 byte
> copies on the stack :)
> 
> Look at the sparc64 SMP code for how I solved the problem there.

I will, thanks!

Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                   ` <20080827.014457.140528687.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-08-27 14:48                                                                     ` Mike Travis
  0 siblings, 0 replies; 318+ messages in thread
From: Mike Travis @ 2008-08-27 14:48 UTC (permalink / raw)
  To: David Miller
  Cc: nickpiggin-/E1597aS9LT0CCvOHzKKcA, davej-H+wXaHxf7aLQT0dZR+AlfA,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, mingo-X9Un+BFzKDI,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww,
	suresh.b.siddha-ral2JQCrhuEAvxtiuMwx3w,
	tony.luck-ral2JQCrhuEAvxtiuMwx3w, steiner-sJ/iWh9BUns,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

David Miller wrote:
> From: Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
> Date: Wed, 27 Aug 2008 17:47:14 +1000
> 
>> Yeah, I see. That's stupid isn't it? (Well, I guess it was completely
>> sane when cpumasks were word sized ;))
>>
>> Hopefully that accounts for a significant chunk...
> 
> There is a lot of indirect costs that are hard to see as well.
> 
> Two things a lot of these cross-call dispatch paths do is:
> 
> 1) Clear self-cpu
> 
> 2) AND with cpus_online
> 
> #1 can normally be a simple bit clear, but some places can also
> implement this with something like "cpus_andn(X, cpumask_of_cpu(cpu))"
> 
> It's simply easier to move those two things down to the bottom of
> the APIC programming code, they just loop over the cpumask doing
> an expensive APIC I/O operation anyways, might as well overlap it
> with these "skip self-cpu" and "skip not-online cpus" checks.
> 
> And oh yeah we get the stack wastage fixed too, isn't what what we
> were talking about? :-)

Yes, the most time consuming part was determining whether a kmalloc
could safely be used in the context of the function, and what to
do about the out-of-memory problem.  Pushing that down to something
like:  for_each_cpu_thats_online(cpu, *maskptr) would remove the need for
many of the temp masks.  A simple if (cpu != me) would take care of
excluding self.  It might have better interaction with cpu hotplug
as well, since the online map would be checked just before the call
to that cpu is made.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27  6:01                                                                     ` Paul Mackerras
       [not found]                                                                       ` <18612.60878.887716.452936-nUko2b1QN/1kfgV4h6NXRTJtLkR7yuzc@public.gmane.org>
@ 2008-08-27 15:18                                                                       ` Linus Torvalds
  1 sibling, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27 15:18 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Parag Warudkar, Adrian Bunk, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded



On Wed, 27 Aug 2008, Paul Mackerras wrote:
> 
> I think your memory is failing you.  In 2.4 and earlier, the kernel
> stack was 8kB minus the size of the task_struct, which sat at the
> start of the 8kB.

Yup, you're right.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                       ` <1219843032.30209.51.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
@ 2008-08-27 15:48                                                                         ` Jamie Lokier
  2008-08-27 16:38                                                                           ` Bernd Petrovitsch
       [not found]                                                                           ` <20080827154805.GA25387-yetKDKU6eevNLxjTenLetw@public.gmane.org>
  0 siblings, 2 replies; 318+ messages in thread
From: Jamie Lokier @ 2008-08-27 15:48 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Parag Warudkar, Linus Torvalds, Adrian Bunk, Rusty Russell,
	Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

Bernd Petrovitsch wrote:
> If you "develop" an embedded system (which is partly system integration
> of existing apps) to be installed in the field, you don't have that many
> conceivable work loads compared to a desktop/server system. And you have
> a fixed list of drivers and applications.

Hah!  Not in my line of embedded device.

32MB no-MMU ARM boards which people run new things and attach new
devices to rather often - without making new hardware.  Volume's too
low per individual application to get new hardware designed and made.

I'm seriously thinking of forwarding porting the 4 year old firmware
from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
Backporting is tedious, so's feeling wretchedly far from the mainline
world.

> A usual approach is to run stress tests on several (or all)
> subsystems/services/... in parallel and if the device survives it
> functioning correctly, it is at least good enough.

Per application.

Some little devices run hundreds of different applications and
customers expect to customise, script themselves, and attach different
devices (over USB).  The next customer in the chain expects the bits
you supplied to work in a variety of unexpected situations, even when
you advise that it probably won't do that.

Much like desktop/server Linux, but on a small device where silly
little things like 'create a process' are a stress for the dear little
thing.

(My biggest lesson: insist on an MMU next time!)

-- Jamie

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                       ` <20080827115829.GF11734-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
@ 2008-08-27 16:00                                                                         ` Paul Mundt
       [not found]                                                                           ` <20080827173544.GH11734@cs181140183.pp.htv.fi>
       [not found]                                                                           ` <20080827160052.GA15968-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
  0 siblings, 2 replies; 318+ messages in thread
From: Paul Mundt @ 2008-08-27 16:00 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
> > On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > > 
> > > When did we get callpaths like like nfs+xfs+md+scsi reliably 
> > > working with 4kB stacks on x86-32?
> > 
> > XFS may never have been usable, but the rest, sure.
> > 
> > And you seem to be making this whole argument an excuse to SUCK, adn an 
> > excuse to let gcc crap even more on our stack space.
> > 
> > Why?
> > 
> > Why aren't you saying that we should be able to do better? Instead, you 
> > seem to asking us to do even worse than we do now?
> 
> My main point is:
> - getting 4kB stacks working reliably is a hard task
> - having an eye on gcc increasing the stack usage, and fixing it if
>   required, is relatively easy
> 
> If we should be able to do better at getting (and keeping) 4kB stacks 
> working, then coping with possible inlining problems caused by gcc
> should not be a big problem for us.
> 
Out of the architectures you've mentioned for 4k stacks, they also tend
to do IRQ stacks, which is something you seem to have overlooked.

In addition to that, debugging the runaway stack users on 4k tends to be
easier anyways since you end up blowing the stack a lot sooner. On sh
we've had pretty good luck with it, though most of our users are using
fairly deterministic workloads and continually profiling the footprint.
Anything that runs away or uses an insane amount of stack space needs to
be fixed well before that anyways, so catching it sooner is always
preferable. I imagine the same case is true for m68knommu (even sans IRQ
stacks).

Things might be more sensitive on x86, but it's certainly not something
that's a huge problem for the various embedded platforms to wire up,
whether they want to go the IRQ stack route or not.

In any event, lack of support for something on embedded architectures in
the kernel is more often due to apathy/utter indifference on the part of
the architecture maintainer rather than being indicative of any intrinsic
difficulty in supporting the thing in question. Most new "features" on the
lesser maintained architectures tend to end up there either out of peer
pressure or copying-and-pasting accidents rather than any sort of design.
;-)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-26 20:41                                                   ` Linus Torvalds
@ 2008-08-27 16:21                                                     ` Jamie Lokier
  0 siblings, 0 replies; 318+ messages in thread
From: Jamie Lokier @ 2008-08-27 16:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Adrian Bunk, Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded

Linus Torvalds wrote:
> > Most LOCs of the kernel are not written by people like you or Al Viro or 
> > David Miller, and the average kernel developer is unlikely to do it as 
> > good as gcc.
> 
> Sure. But we do have tools. We do have checkstack.pl, it's just that it 
> hasn't been an issue in a long time, so I suspect many people didn't even 
> _realize_ we have it, and I certainly can attest to the fact that even 
> people who remember it - like me - don't actually tend to run it all that 
> often.

Sounds like what's really desired here isn't more worry and
unpredictability, but for GCC+Binutils to gain the ability to
calculate the stack depth over all callchains (doesn't have to be
exact, just an upper bound; annotate recursions) in a way that's good
enough to do on every compile, complain if a depth is exceeded
statically (or it can't be proven), and to gain the
architecture-independent option "optimise to reduce stack usage".

> > BTW:
> > I just ran checkstack on a (roughly) allyesconfig kernel, and we have a 
> > new driver that allocates "unsigned char recvbuf[1500];" on the stack...
> 
> Yeah, it's _way_ too easy to do bad things.

In my userspace code, I have macros tmp_alloc and tmp_free.  They must
be matched in the same function:

     unsigned char * recvbuf = tmp_alloc(1500);
     ....
     tmp_free(recvbuf);

When stack is plentiful, it maps to alloca() which is roughly
equivalent to using a stack variable.

When stack is constrained (as it is on my little devices), that maps
to xmalloc/free.  The kernel equivalent would be kmalloc GFP_ATOMIC
(perhaps).

With different macros to mine, it may be possible to map small
fixed-size requests exactly onto local variables, and large ones to
kmalloc().  A stab at it (not tested):

    #define LOCAL_ALLOC_THRESHOLD     128

    #define LOCAL_ALLOC(type, ptr)                                        \
        __typeof__(type) __attribute__((__unused__)) ptr##_local_struct;  \
        __typeof__(type) * ptr =                                          \
              ((__builtin_constant_p(sizeof(type))                        \
                && sizeof(type) <= LOCAL_ALLOC_THRESHOLD)                 \
               ? &ptr##_local_struct : kmalloc(sizeof(type), GFP_ATOMIC))

    #define LOCAL_FREE(ptr)                           \
        ((__builtin_constant_p(sizeof (*(ptr)))       \
          && sizeof(*(ptr)) <= LOCAL_ALLOC_THRESHOLD) \
         ? (void) 0 : kfree(ptr))

Would that be useful in the kernel?

I'm thinking if it were a commonly used pattern for temporary buffers,
unknown structures and arrays of macro-determined size, the "new
driver" author would be less likely to accidentally drop a big object
on the stack.

Obviously it would be nicer for GCC to code such a thing
automatically, but that really is wishful thinking.

-- Jamie

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                   ` <20080827142142.303cdba8-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
@ 2008-08-27 16:24                                                                     ` Parag Warudkar
  0 siblings, 0 replies; 318+ messages in thread
From: Parag Warudkar @ 2008-08-27 16:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: Adrian Bunk, Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 27, 2008 at 9:21 AM, Alan Cox <alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org> wrote:
>> By your logic though, XFS on x86 should work fine with 4K stacks -
>> many will attest that it does not and blows up due to stack issues.
>>
>> I have first hand experiences of things blowing up with deep call
>> chains when using 4K stacks where 8K worked just fine on same
>> workload.
>>
>> So there is definitely some other problem with 4K stacks.
>
> Nothing of the sort. If it blows up with a 4K stack it will almost
> certainly blow up with an 8K stack *eventually* - when a heavy stack usage
> coincides with a heavy stack using IRQ handler.
>
> You won't catch it in simple testing, you won't catch it in trivial
> simulation and it'll be incredibly hard to reproduce. Not the kind of bug
> you want in a production system really. IRQ stacks make things much more
> predictable.


I see - so if I end up having a workload on 8k where heavy stack using
IRQs and deep kernel call chains come at the same time - even 8K will
blow up.
So 4K will blow too except that it doesn't require IRQs also to use
heavy stack, just XFS is good enough :)

It then seems like the IRQs using lot of stack is not so much of a
problem in the current kernel as much as deeper call chains and stack
usage of normal non-irq path code is.
So 8k makes it possible for the deeper call chains of non-irq path to
survive since they get better part of the 8K to themselves and IRQs
can do with less almost always.

At least that's what I can derive from the fact that we do not have
lots of reports of 8K stack blowing up.

Thanks

Parag

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27 15:48                                                                         ` Jamie Lokier
@ 2008-08-27 16:38                                                                           ` Bernd Petrovitsch
       [not found]                                                                             ` <1219855121.30209.112.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
       [not found]                                                                           ` <20080827154805.GA25387-yetKDKU6eevNLxjTenLetw@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Bernd Petrovitsch @ 2008-08-27 16:38 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Parag Warudkar, Linus Torvalds, Adrian Bunk, Rusty Russell,
	Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded

On Wed, 2008-08-27 at 16:48 +0100, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > If you "develop" an embedded system (which is partly system integration
> > of existing apps) to be installed in the field, you don't have that many
> > conceivable work loads compared to a desktop/server system. And you have
> > a fixed list of drivers and applications.
> 
> Hah!  Not in my line of embedded device.
> 
> 32MB no-MMU ARM boards which people run new things and attach new
> devices to rather often - without making new hardware.  Volume's too
> low per individual application to get new hardware designed and made.

Yes, you may have several products on the same hardware with somewhat
differing requirements (or not). But that is much less than a general
purpose system IMHO.

> I'm seriously thinking of forwarding porting the 4 year old firmware
> from 2.4.26 to 2.6.current, just to get new drivers and capabilities.

That sounds reasonable (and I never meant maintaining the old system
infinitely. Actually once the thing is shipped it usually enters deep
maintenance mode and the next is more a fork from the old).

> Backporting is tedious, so's feeling wretchedly far from the mainline
> world.

ACK. But that also depends on amount local changes (and sorry, but not
all locally necessary patches would be accepted in mainline in any way).

> > A usual approach is to run stress tests on several (or all)
> > subsystems/services/... in parallel and if the device survives it
> > functioning correctly, it is at least good enough.
> 
> Per application.
> 
> Some little devices run hundreds of different applications and
> customers expect to customise, script themselves, and attach different
> devices (over USB).  The next customer in the chain expects the bits
> you supplied to work in a variety of unexpected situations, even when
> you advise that it probably won't do that.

Basically their problem. Yes, "they" actually think they get a Linux
system where they can do everything and it simply works.

Oh, that's obviously not a usual "WLAN-router style" of product (where
you are not expected to actually login on a console or per ssh).

> Much like desktop/server Linux, but on a small device where silly
> little things like 'create a process' are a stress for the dear little
> thing.
> 
> (My biggest lesson: insist on an MMU next time!)

ACK. We avoid MMU-less hardware too - especially since there is enough
hardware with a MMU around.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                           ` <20080827160052.GA15968-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
@ 2008-08-27 17:35                                                                             ` Adrian Bunk
  2008-08-28  1:05                                                                             ` Greg Ungerer
  1 sibling, 0 replies; 318+ messages in thread
From: Adrian Bunk @ 2008-08-27 17:35 UTC (permalink / raw)
  To: Paul Mundt, Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael 

On Thu, Aug 28, 2008 at 01:00:52AM +0900, Paul Mundt wrote:
> On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> > On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
> > > On Wed, 27 Aug 2008, Adrian Bunk wrote:
> > > > 
> > > > When did we get callpaths like like nfs+xfs+md+scsi reliably 
> > > > working with 4kB stacks on x86-32?
> > > 
> > > XFS may never have been usable, but the rest, sure.
> > > 
> > > And you seem to be making this whole argument an excuse to SUCK, adn an 
> > > excuse to let gcc crap even more on our stack space.
> > > 
> > > Why?
> > > 
> > > Why aren't you saying that we should be able to do better? Instead, you 
> > > seem to asking us to do even worse than we do now?
> > 
> > My main point is:
> > - getting 4kB stacks working reliably is a hard task
> > - having an eye on gcc increasing the stack usage, and fixing it if
> >   required, is relatively easy
> > 
> > If we should be able to do better at getting (and keeping) 4kB stacks 
> > working, then coping with possible inlining problems caused by gcc
> > should not be a big problem for us.
> > 
> Out of the architectures you've mentioned for 4k stacks, they also tend
> to do IRQ stacks, which is something you seem to have overlooked.

No, I am aware of that, and on i386 IRQ stacks are only used with
4kB stacks.

On i386 it is effectively a step from 6kB to 4kB.

> In addition to that, debugging the runaway stack users on 4k tends to be
> easier anyways since you end up blowing the stack a lot sooner. On sh
> we've had pretty good luck with it, though most of our users are using
> fairly deterministic workloads and continually profiling the footprint.
> Anything that runs away or uses an insane amount of stack space needs to
> be fixed well before that anyways, so catching it sooner is always
> preferable. I imagine the same case is true for m68knommu (even sans IRQ
> stacks).

CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
wanted with an arbitrary limit.

> Things might be more sensitive on x86, but it's certainly not something
> that's a huge problem for the various embedded platforms to wire up,
> whether they want to go the IRQ stack route or not.

How many platforms use 4kB stacks on sh?

Only 1 out of 34 defconfigs uses it.

Are there any numbers for real life usage.

> In any event, lack of support for something on embedded architectures in
> the kernel is more often due to apathy/utter indifference on the part of
> the architecture maintainer rather than being indicative of any intrinsic
> difficulty in supporting the thing in question. Most new "features" on the
> lesser maintained architectures tend to end up there either out of peer
> pressure or copying-and-pasting accidents rather than any sort of design.
> ;-)

arm or powerpc aren't exactly lesser maintained architectures.

4kB has shown to be a hard to achieve limit. After more than 4 years in 
mainline being available on i386 there are still cases where 4kB are not 
enough.

IMHO there seems to currently be a mismatch between it's maintainance 
cost and the actual number of users. That's in my opinion the main 
problem with it, no matter in which direction it gets resolved.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                             ` <1219855121.30209.112.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
@ 2008-08-27 17:51                                                                               ` Jamie Lokier
  2008-08-27 19:30                                                                                 ` Bernd Petrovitsch
  2008-08-28  0:06                                                                                 ` Greg Ungerer
  0 siblings, 2 replies; 318+ messages in thread
From: Jamie Lokier @ 2008-08-27 17:51 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Parag Warudkar, Linus Torvalds, Adrian Bunk, Rusty Russell,
	Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

Bernd Petrovitsch wrote:
> > 32MB no-MMU ARM boards which people run new things and attach new
> > devices to rather often - without making new hardware.  Volume's too
> > low per individual application to get new hardware designed and made.
> 
> Yes, you may have several products on the same hardware with somewhat
> differing requirements (or not). But that is much less than a general
> purpose system IMHO.

It is, but the idea that small embedded systems go through a 'all
components are known, drivers are known, test and if it passes it's
shippable' does not always apply.

> > I'm seriously thinking of forwarding porting the 4 year old firmware
> > from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
> 
> That sounds reasonable (and I never meant maintaining the old system
> infinitely.

Sounds reasonable, but it's vetoed for anticipated time and cost,
compared with backporting on demand.  Fair enough, since 2.6.current
doesn't support ARM no-MMU last I heard ('soon'?).

On the other hand, the 2.6 anti-fragmentation patches, including
latest SLUB stuff, ironically meant to help big machines, sound really
appealing for my current problem and totally unrealistic to
backport...

> ACK. We avoid MMU-less hardware too - especially since there is enough
> hardware with a MMU around.

I can't emphasise enough how much difference MMU makes to Linux userspace.

It's practically: MMU = standard Linux (with less RAM), have everything.
No-MMU = lots of familiar 'Linux' things not available or break.

-- Jamie

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27 17:51                                                                               ` Jamie Lokier
@ 2008-08-27 19:30                                                                                 ` Bernd Petrovitsch
  2008-08-28  0:06                                                                                 ` Greg Ungerer
  1 sibling, 0 replies; 318+ messages in thread
From: Bernd Petrovitsch @ 2008-08-27 19:30 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Parag Warudkar, Linus Torvalds, Adrian Bunk, Rusty Russell,
	Alan D. Brunelle, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Andrew Morton, Arjan van de Ven, Ingo Molnar,
	linux-embedded

On Mit, 2008-08-27 at 18:51 +0100, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
[...]
> It is, but the idea that small embedded systems go through a 'all
> components are known, drivers are known, test and if it passes it's
> shippable' does not always apply.

Not always but often enough. And yes, there is ARM-based embedded
hardware with 1GB Flash-RAM and 128MB RAM.

> > > I'm seriously thinking of forwarding porting the 4 year old firmware
> > > from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
> > 
> > That sounds reasonable (and I never meant maintaining the old system
> > infinitely.
> 
> Sounds reasonable, but it's vetoed for anticipated time and cost,

That is to be expected;-)

[....]
> > ACK. We avoid MMU-less hardware too - especially since there is enough
> > hardware with a MMU around.
> 
> I can't emphasise enough how much difference MMU makes to Linux userspace.
> 
> It's practically: MMU = standard Linux (with less RAM), have everything.
> No-MMU = lots of familiar 'Linux' things not available or break.

ACK. And tell that a customer that everything is more effort and more
risk and not just "simply cross-compile it as it runs on my desktop
too".

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]   ` <alpine.LFD.1.10.0808241120460.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27 20:17     ` Peter Osterlund
       [not found]       ` <m3k5e2qkk2.fsf-zq6IREYz3ykAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Peter Osterlund @ 2008-08-27 20:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> writes:

> On Sat, 23 Aug 2008, Rafael J. Wysocki wrote:
>> 
>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11401
>> Subject		: pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected
>> Submitter	: Laurent Riffard <laurent.riffard-GANU6spQydw@public.gmane.org>
>> Date		: 2008-08-22 08:16 (2 days old)
>
> This one looks irritating.
>
> It's bisected to 5b6155ee70e9c4d2ad7e6f514c8eee06e2711c3a ("pktcdvd: push 
> BKL down into driver"), but the problem goes deeper than that.
...
> Grr.
...
> Double grr.
...
> Hmm?
>
> We need to fix this.

Why not just revert the offending change and try again during the next
merge window, assuming someone has figured out an acceptable way to
handle this mess by then?

-- 
Peter Osterlund - petero2-zq6IREYz3ykAvxtiuMwx3w@public.gmane.org
http://web.telia.com/~u89404340

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]       ` <m3k5e2qkk2.fsf-zq6IREYz3ykAvxtiuMwx3w@public.gmane.org>
@ 2008-08-27 20:40         ` Linus Torvalds
       [not found]           ` <alpine.LFD.1.10.0808271335260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27 22:08         ` Alan Cox
  1 sibling, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27 20:40 UTC (permalink / raw)
  To: Peter Osterlund
  Cc: Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Wed, 27 Aug 2008, Peter Osterlund wrote:
> 
> Why not just revert the offending change and try again during the next
> merge window, assuming someone has figured out an acceptable way to
> handle this mess by then?

Well,, for 2.6.27 that's what we'll have to do. But there's actually a 
real problem here - the unlocked ioctl's (which we _should_ prefer) have a 
strictly weaker and worse interface. I also wonder if any other 
block_ioctl users were converted..

Anyway, I'll take your email as an ack for the revert.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]           ` <alpine.LFD.1.10.0808271335260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27 20:45             ` Linus Torvalds
  2008-08-28 13:52             ` Christoph Hellwig
  1 sibling, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27 20:45 UTC (permalink / raw)
  To: Peter Osterlund
  Cc: Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Wed, 27 Aug 2008, Linus Torvalds wrote:
>
> I also wonder if any other block_ioctl users were converted..

Well, doing

	git log -p v2.6.26.. -Sunlocked_ioctl

and looking for blkdev_ioctl, that does seem to be the only one. So 
hopefully no other case like this is lurking, although it is possible that 
non-block areas have similar issues.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]       ` <m3k5e2qkk2.fsf-zq6IREYz3ykAvxtiuMwx3w@public.gmane.org>
  2008-08-27 20:40         ` Linus Torvalds
@ 2008-08-27 22:08         ` Alan Cox
       [not found]           ` <20080827230828.4285022b-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Alan Cox @ 2008-08-27 22:08 UTC (permalink / raw)
  To: Peter Osterlund
  Cc: Linus Torvalds, Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

> >
> > We need to fix this.
> 
> Why not just revert the offending change and try again during the next
> merge window, assuming someone has figured out an acceptable way to
> handle this mess by then?

Easier just to fix it. Its a case of building everything until it
compiles with the prototype change. Almost all stuff  will just take the
argument initially and not use it.

Anyone else plan to do it or shall I hit all the x86 cases and post a
patch ?

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]               ` <alpine.LFD.1.10.0808271530350.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27 22:28                 ` Alan Cox
       [not found]                   ` <20080827232803.2ba8dd96-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
  2008-08-27 22:43                 ` David Miller
  2008-08-27 22:45                 ` Alexey Dobriyan
  2 siblings, 1 reply; 318+ messages in thread
From: Alan Cox @ 2008-08-27 22:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Osterlund, Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

> Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that 
> too? That's another difference between the "unlocked" and the traditional 
> version..

I don't know - a lot of syscall returns got defined as long and I guess
someone thought propogating the right type was a good diea ?
> 
> As to the "x86 cases", I think you should try to hit them all. Doing a 
> "git grep unlocked_ioctl" gets 185 entries, and it looks like only 
> something like 8 of them are non-x86 (3 in the arch/ directory, five in 
> s390 drivers).
> 
> Of course, some of them may be drivers that aren't available on x86 for 
> other reasons (ie the ARM embedded stuff), but regardless..
> 
> Anyway, the pure size of that patch makes me suspect that we might as well 
> leave it until the next merge window, but if you do it and it's obviously 
> totally mechanical, I'd be likely to just let it slip in early.

I'll take a crack at it tomorrow - but if its 185 entries then it
probably wants to go into -next instead.

Alan

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]           ` <20080827230828.4285022b-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
@ 2008-08-27 22:38             ` Linus Torvalds
       [not found]               ` <alpine.LFD.1.10.0808271530350.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27 22:38 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Osterlund, Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Wed, 27 Aug 2008, Alan Cox wrote:
> 
> Easier just to fix it. Its a case of building everything until it
> compiles with the prototype change. Almost all stuff  will just take the
> argument initially and not use it.
> 
> Anyone else plan to do it or shall I hit all the x86 cases and post a
> patch ?

Well, I alrady reverted it, but if you actually fix unlocked_ioctl() to 
have the same calling convention as regular ioctl() then a lot of the 
noise from ioctl conversion goes away, and all that remains is literally 
just the BKL part.

Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that 
too? That's another difference between the "unlocked" and the traditional 
version..

As to the "x86 cases", I think you should try to hit them all. Doing a 
"git grep unlocked_ioctl" gets 185 entries, and it looks like only 
something like 8 of them are non-x86 (3 in the arch/ directory, five in 
s390 drivers).

Of course, some of them may be drivers that aren't available on x86 for 
other reasons (ie the ARM embedded stuff), but regardless..

Anyway, the pure size of that patch makes me suspect that we might as well 
leave it until the next merge window, but if you do it and it's obviously 
totally mechanical, I'd be likely to just let it slip in early.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]               ` <alpine.LFD.1.10.0808271530350.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27 22:28                 ` Alan Cox
@ 2008-08-27 22:43                 ` David Miller
  2008-08-27 22:45                 ` Alexey Dobriyan
  2 siblings, 0 replies; 318+ messages in thread
From: David Miller @ 2008-08-27 22:43 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	petero2-zq6IREYz3ykAvxtiuMwx3w, rjw-KKrjLPT3xs0,
	alan-H+wXaHxf7aLQT0dZR+AlfA, jens.axboe-QHcLZuEGTsvQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, bunk-DgEjT+Ai2ygdnm+yROfE0A,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	protasnb-Re5JQEeQqe8AvxtiuMwx3w,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Wed, 27 Aug 2008 15:38:16 -0700 (PDT)

> Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that 
> too? That's another difference between the "unlocked" and the traditional 
> version..

The return values want to be "long" sign extended all the way back
down to syscall dispatch, I think this is just an effort to add
some consistency here so that the int --> long extension eventually
can be eliminated once unlocked_ioctl is the only case left.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]               ` <alpine.LFD.1.10.0808271530350.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27 22:28                 ` Alan Cox
  2008-08-27 22:43                 ` David Miller
@ 2008-08-27 22:45                 ` Alexey Dobriyan
  2 siblings, 0 replies; 318+ messages in thread
From: Alexey Dobriyan @ 2008-08-27 22:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Peter Osterlund, Rafael J. Wysocki, Alan Cox,
	Jens Axboe, Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List

On Wed, Aug 27, 2008 at 03:38:16PM -0700, Linus Torvalds wrote:
> On Wed, 27 Aug 2008, Alan Cox wrote:
> > 
> > Easier just to fix it. Its a case of building everything until it
> > compiles with the prototype change. Almost all stuff  will just take the
> > argument initially and not use it.
> > 
> > Anyone else plan to do it or shall I hit all the x86 cases and post a
> > patch ?
> 
> Well, I alrady reverted it, but if you actually fix unlocked_ioctl() to 
> have the same calling convention as regular ioctl() then a lot of the 
> noise from ioctl conversion goes away, and all that remains is literally 
> just the BKL part.
> 
> Btw, why is unlocked_ioctl returning "long"? Does anybody depend on that 
> too? That's another difference between the "unlocked" and the traditional 
> version..
> 
> As to the "x86 cases", I think you should try to hit them all. Doing a 
> "git grep unlocked_ioctl" gets 185 entries, and it looks like only 
> something like 8 of them are non-x86 (3 in the arch/ directory, five in 
> s390 drivers).
> 
> Of course, some of them may be drivers that aren't available on x86 for 
> other reasons (ie the ARM embedded stuff), but regardless..
> 
> Anyway, the pure size of that patch makes me suspect that we might as well 
> leave it until the next merge window, but if you do it and it's obviously 
> totally mechanical, I'd be likely to just let it slip in early.

Anybody doing this, don't forget to actually use "inode" instead of all those
dereferences:

	struct inode *inode = filp->f_path.dentry->d_inode;

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]                   ` <20080827232803.2ba8dd96-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
@ 2008-08-27 23:00                     ` Linus Torvalds
       [not found]                       ` <alpine.LFD.1.10.0808271551380.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27 23:00 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Osterlund, Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Wed, 27 Aug 2008, Alan Cox wrote:
> 
> I'll take a crack at it tomorrow - but if its 185 entries then it
> probably wants to go into -next instead.

Being more careful.. This:

	git grep 'unlocked_ioctl.*=' |
		sed 's/^.*=[ 	]*\([_a-zA-Z0-9]*\).*$/\1/' |
		uniq | wc

says that ther are 160 distinct cases. I'm not sure it catches everything 
exactly, but it will be reasonably close, at least.

I wonder if I could essentially automate something to do the conversion..

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]                       ` <alpine.LFD.1.10.0808271551380.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-27 23:12                         ` Linus Torvalds
  2008-08-28  0:35                           ` Linus Torvalds
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-27 23:12 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Osterlund, Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Wed, 27 Aug 2008, Linus Torvalds wrote:
> 
> I wonder if I could essentially automate something to do the conversion..

Hmm. compat_ioctl() actually has exactly the same issue. Damn.

So you can't just add the new argument, you also have to _pass_ the 
argument in the compat_ioctl handlers to the non-compat ones.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-27 17:51                                                                               ` Jamie Lokier
  2008-08-27 19:30                                                                                 ` Bernd Petrovitsch
@ 2008-08-28  0:06                                                                                 ` Greg Ungerer
  1 sibling, 0 replies; 318+ messages in thread
From: Greg Ungerer @ 2008-08-28  0:06 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Bernd Petrovitsch, Parag Warudkar, Linus Torvalds, Adrian Bunk,
	Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar, linux-embedded


Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
>>> 32MB no-MMU ARM boards which people run new things and attach new
>>> devices to rather often - without making new hardware.  Volume's too
>>> low per individual application to get new hardware designed and made.
>> Yes, you may have several products on the same hardware with somewhat
>> differing requirements (or not). But that is much less than a general
>> purpose system IMHO.
> 
> It is, but the idea that small embedded systems go through a 'all
> components are known, drivers are known, test and if it passes it's
> shippable' does not always apply.
> 
>>> I'm seriously thinking of forwarding porting the 4 year old firmware
>>> from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
>> That sounds reasonable (and I never meant maintaining the old system
>> infinitely.
> 
> Sounds reasonable, but it's vetoed for anticipated time and cost,
> compared with backporting on demand.  Fair enough, since 2.6.current
> doesn't support ARM no-MMU last I heard ('soon'?).
> 
> On the other hand, the 2.6 anti-fragmentation patches, including
> latest SLUB stuff, ironically meant to help big machines, sound really
> appealing for my current problem and totally unrealistic to
> backport...
> 
>> ACK. We avoid MMU-less hardware too - especially since there is enough
>> hardware with a MMU around.
> 
> I can't emphasise enough how much difference MMU makes to Linux userspace.
> 
> It's practically: MMU = standard Linux (with less RAM), have everything.
> No-MMU = lots of familiar 'Linux' things not available or break.

And lots of things work in the usual way...

Of course the flip side is that for people who have platforms
without MMU they can run something more than the mostly "toy"
like operating systems typically available. There are plenty of
problem domains that the non-MMU limitations are not a problem for.
(Yours doesn't sound like one of them :-)

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     gerg@snapgear.com
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                           ` <20080827154805.GA25387-yetKDKU6eevNLxjTenLetw@public.gmane.org>
@ 2008-08-28  0:11                                                                             ` Greg Ungerer
  0 siblings, 0 replies; 318+ messages in thread
From: Greg Ungerer @ 2008-08-28  0:11 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Bernd Petrovitsch, Parag Warudkar, Linus Torvalds, Adrian Bunk,
	Rusty Russell, Alan D. Brunelle, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Andrew Morton,
	Arjan van de Ven, Ingo Molnar,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA


Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
>> If you "develop" an embedded system (which is partly system integration
>> of existing apps) to be installed in the field, you don't have that many
>> conceivable work loads compared to a desktop/server system. And you have
>> a fixed list of drivers and applications.
> 
> Hah!  Not in my line of embedded device.
> 
> 32MB no-MMU ARM boards which people run new things and attach new
> devices to rather often - without making new hardware.  Volume's too
> low per individual application to get new hardware designed and made.
> 
> I'm seriously thinking of forwarding porting the 4 year old firmware
> from 2.4.26 to 2.6.current, just to get new drivers and capabilities.
> Backporting is tedious, so's feeling wretchedly far from the mainline
> world.
> 
>> A usual approach is to run stress tests on several (or all)
>> subsystems/services/... in parallel and if the device survives it
>> functioning correctly, it is at least good enough.
> 
> Per application.
> 
> Some little devices run hundreds of different applications and
> customers expect to customise, script themselves, and attach different
> devices (over USB).  The next customer in the chain expects the bits
> you supplied to work in a variety of unexpected situations, even when
> you advise that it probably won't do that.
> 
> Much like desktop/server Linux, but on a small device where silly
> little things like 'create a process' are a stress for the dear little
> thing.
> 
> (My biggest lesson: insist on an MMU next time!)

But given you have hardware you can't change would you choose
to not run Linux, even with the limitations of non-MMU?

Hell no :-)

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     gerg-XXXsiaCtIV5Wk0Htik3J/w@public.gmane.org
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                           ` <20080827173544.GH11734@cs181140183.pp.htv.fi>
@ 2008-08-28  0:32                                                                             ` Paul Mundt
  2008-08-28  0:46                                                                               ` David Miller
       [not found]                                                                               ` <20080828003211.GA18893-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
  0 siblings, 2 replies; 318+ messages in thread
From: Paul Mundt @ 2008-08-28  0:32 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Andrew Morton, Arjan van de Ven, Ingo Molnar, linux-embedded

On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> On Thu, Aug 28, 2008 at 01:00:52AM +0900, Paul Mundt wrote:
> > On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> > In addition to that, debugging the runaway stack users on 4k tends to be
> > easier anyways since you end up blowing the stack a lot sooner. On sh
> > we've had pretty good luck with it, though most of our users are using
> > fairly deterministic workloads and continually profiling the footprint.
> > Anything that runs away or uses an insane amount of stack space needs to
> > be fixed well before that anyways, so catching it sooner is always
> > preferable. I imagine the same case is true for m68knommu (even sans IRQ
> > stacks).
> 
> CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> wanted with an arbitrary limit.
> 
In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
only performed from do_IRQ(), which is sporadic at best, especially on
tickless. While it catches some things, it's not a complete solution in
and of iteslf.

In addition to this, there are even fewer platforms that support it than
there are platforms that do 4k stacks. At first glance, it looks like
it's only m32r, powerpc, sh, x86, and xtensa. Others support the Kconfig
option, but don't seem to realize that it's not an option that the kernel
does anything with by itself, and so don't actually do anything (ie,
FRV).

> > Things might be more sensitive on x86, but it's certainly not something
> > that's a huge problem for the various embedded platforms to wire up,
> > whether they want to go the IRQ stack route or not.
> 
> How many platforms use 4kB stacks on sh?
> 
> Only 1 out of 34 defconfigs uses it.
> 
The defconfigs tend to enable as much random stuff as people are
interested in for development and testing purposes. Most of these end up
being reference boards and are the basis for products, rather than
shipping products themselves. In the latter case, everything is gradually
tightened down, and 4k stack utilization in that case is the norm, rather
than the exception.

> > In any event, lack of support for something on embedded architectures in
> > the kernel is more often due to apathy/utter indifference on the part of
> > the architecture maintainer rather than being indicative of any intrinsic
> > difficulty in supporting the thing in question. Most new "features" on the
> > lesser maintained architectures tend to end up there either out of peer
> > pressure or copying-and-pasting accidents rather than any sort of design.
> > ;-)
> 
> arm or powerpc aren't exactly lesser maintained architectures.
> 
Indeed, which is why I find it bizarre that you would even bother
applying what was said to those platforms. Specifically I was referring
to the embedded platforms that don't do 4k stacks today. The fact they
don't support them today has much less to do with 4k being an
unattainable limit as it does with people simply not bothering to
implement it on their platform.

> IMHO there seems to currently be a mismatch between it's maintainance 
> cost and the actual number of users. That's in my opinion the main 
> problem with it, no matter in which direction it gets resolved.
> 
Perhaps that's true on x86, but in general I take issue with that. On sh
we've had to do very little maintenance for it and most shipping products
are using it today (at least on MMU-Linux, we don't bother with it on
nommu). Most of the problems we ran in to with 4k stacks tended to be
stuff that we wanted to fix for 8k anyways. I suspect that this case is
true for the other embedded platforms also.

Note that on sh we also conditionalize IRQ stacks separately, so while
they are often used together, it's possible to use 4k stacks without
resorting to IRQ stacks (as m68knommu also seems to do).

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
  2008-08-27 23:12                         ` Linus Torvalds
@ 2008-08-28  0:35                           ` Linus Torvalds
  0 siblings, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-08-28  0:35 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Osterlund, Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List



On Wed, 27 Aug 2008, Linus Torvalds wrote:
> 
> Hmm. compat_ioctl() actually has exactly the same issue. Damn.
> 
> So you can't just add the new argument, you also have to _pass_ the 
> argument in the compat_ioctl handlers to the non-compat ones.

What the hell. Here's a test patch. A largish part of it was generated 
through a stupid script that basically did a number of grep + 'sed' on a 
lot of files, and then the rest was fixed up manually after running "make 
allmodconfig".

I'm not going to guarantee anything, but it gets close. A starting point 
for somebody else, and considering that it is

 208 files changed, 370 insertions(+), 376 deletions(-)

this is definitely linux-next material.

The extra deletions are mainly because the passing of "inode" as an 
argument means that some functions don't need to look it up manually any 
more.

And yeah, I changed the return type to "int". There's no way the kernel 
can validly return anything bigger than that anyway. And this way all the 
ioctl functions have the same type, no confusion.

TOTALLY UNTESTED apart from the fact that it compiles.

			Linus

---
 arch/mips/sibyte/common/sb_tbprof.c   |    2 +-
 arch/parisc/kernel/perf.c             |    4 +-
 arch/sparc/kernel/apc.c               |    2 +-
 arch/x86/kernel/apm_32.c              |    2 +-
 arch/x86/kernel/cpu/mcheck/mce_64.c   |    2 +-
 arch/x86/kernel/cpu/mtrr/if.c         |    3 +-
 block/bsg.c                           |    2 +-
 block/compat_ioctl.c                  |   18 +++++++++++-----
 block/ioctl.c                         |    3 +-
 drivers/block/DAC960.c                |    2 +-
 drivers/block/cciss.c                 |    4 +-
 drivers/block/loop.c                  |    3 +-
 drivers/block/paride/pt.c             |    4 +-
 drivers/char/agp/agp.h                |    2 +-
 drivers/char/agp/compat_ioctl.c       |    2 +-
 drivers/char/agp/frontend.c           |    2 +-
 drivers/char/ds1302.c                 |    2 +-
 drivers/char/dsp56k.c                 |    2 +-
 drivers/char/efirtc.c                 |    4 +-
 drivers/char/ip2/ip2main.c            |    6 ++--
 drivers/char/ip27-rtc.c               |    4 +-
 drivers/char/ipmi/ipmi_devintf.c      |    2 +-
 drivers/char/mmtimer.c                |    4 +-
 drivers/char/mwave/mwavedd.c          |    4 +-
 drivers/char/pcmcia/cm4000_cs.c       |    3 +-
 drivers/char/ppdev.c                  |    2 +-
 drivers/char/random.c                 |    2 +-
 drivers/char/rio/rio_linux.c          |    4 +-
 drivers/char/rtc.c                    |    4 +-
 drivers/char/sx.c                     |    4 +-
 drivers/char/tty_io.c                 |   14 +++++-------
 drivers/char/viotape.c                |    2 +-
 drivers/firewire/fw-cdev.c            |    8 +++---
 drivers/gpu/drm/i915/i915_drv.h       |    2 +-
 drivers/gpu/drm/i915/i915_ioc32.c     |    2 +-
 drivers/gpu/drm/mga/mga_drv.h         |    2 +-
 drivers/gpu/drm/mga/mga_ioc32.c       |    2 +-
 drivers/gpu/drm/r128/r128_drv.h       |    2 +-
 drivers/gpu/drm/r128/r128_ioc32.c     |    2 +-
 drivers/gpu/drm/radeon/radeon_drv.h   |    2 +-
 drivers/gpu/drm/radeon/radeon_ioc32.c |    2 +-
 drivers/hid/hidraw.c                  |    3 +-
 drivers/hid/usbhid/hiddev.c           |    6 ++--
 drivers/i2c/i2c-dev.c                 |    2 +-
 drivers/ieee1394/dv1394.c             |   20 +++++++++---------
 drivers/ieee1394/raw1394.c            |    4 +-
 drivers/ieee1394/video1394.c          |   34 ++++++++++++++++----------------
 drivers/infiniband/core/user_mad.c    |    4 +-
 drivers/input/evdev.c                 |    4 +-
 drivers/input/joydev.c                |    4 +-
 drivers/input/misc/uinput.c           |    2 +-
 drivers/md/dm-ioctl.c                 |    6 ++--
 drivers/media/video/compat_ioctl32.c  |   26 ++++++++++++------------
 drivers/message/fusion/mptctl.c       |    8 +++---
 drivers/message/i2o/i2o_config.c      |    2 +-
 drivers/misc/phantom.c                |    6 ++--
 drivers/misc/sgi-gru/grufile.c        |    2 +-
 drivers/net/ppp_generic.c             |    2 +-
 drivers/pci/proc.c                    |    2 +-
 drivers/rtc/rtc-dev.c                 |    2 +-
 drivers/s390/block/dasd_int.h         |    2 +-
 drivers/s390/char/tape_char.c         |    2 +-
 drivers/s390/char/vmcp.c              |    2 +-
 drivers/s390/cio/chsc_sch.c           |    2 +-
 drivers/s390/crypto/zcrypt_api.c      |    4 +-
 drivers/s390/scsi/zfcp_cfdc.c         |    2 +-
 drivers/sbus/char/cpwatchdog.c        |    2 +-
 drivers/sbus/char/display7seg.c       |    2 +-
 drivers/sbus/char/openprom.c          |    2 +-
 drivers/scsi/aacraid/linit.c          |    2 +-
 drivers/scsi/ch.c                     |    6 ++--
 drivers/scsi/dpt_i2o.c                |    7 +----
 drivers/scsi/megaraid/megaraid_mm.c   |   10 ++++----
 drivers/scsi/megaraid/megaraid_sas.c  |   10 ++++----
 drivers/scsi/osst.c                   |    2 +-
 drivers/scsi/sd.c                     |    2 +-
 drivers/scsi/sg.c                     |    2 +-
 drivers/scsi/st.c                     |    4 +-
 drivers/spi/spidev.c                  |    4 +-
 drivers/telephony/ixj.c               |    2 +-
 drivers/usb/class/usblp.c             |    2 +-
 drivers/usb/gadget/inode.c            |    6 ++--
 drivers/usb/gadget/printer.c          |    4 +-
 drivers/usb/misc/iowarrior.c          |    2 +-
 drivers/usb/misc/rio500.c             |    2 +-
 drivers/usb/misc/sisusbvga/sisusb.c   |   10 ++++----
 drivers/usb/misc/usblcd.c             |    2 +-
 drivers/video/fbmem.c                 |    4 +--
 drivers/watchdog/acquirewdt.c         |    2 +-
 drivers/watchdog/advantechwdt.c       |    2 +-
 drivers/watchdog/alim1535_wdt.c       |    2 +-
 drivers/watchdog/alim7101_wdt.c       |    2 +-
 drivers/watchdog/ar7_wdt.c            |    2 +-
 drivers/watchdog/at32ap700x_wdt.c     |    2 +-
 drivers/watchdog/at91rm9200_wdt.c     |    2 +-
 drivers/watchdog/bfin_wdt.c           |    2 +-
 drivers/watchdog/booke_wdt.c          |    2 +-
 drivers/watchdog/cpu5wdt.c            |    2 +-
 drivers/watchdog/davinci_wdt.c        |    2 +-
 drivers/watchdog/ep93xx_wdt.c         |    2 +-
 drivers/watchdog/eurotechwdt.c        |    2 +-
 drivers/watchdog/hpwdt.c              |    2 +-
 drivers/watchdog/i6300esb.c           |    2 +-
 drivers/watchdog/iTCO_wdt.c           |    2 +-
 drivers/watchdog/ib700wdt.c           |    2 +-
 drivers/watchdog/ibmasr.c             |    2 +-
 drivers/watchdog/indydog.c            |    2 +-
 drivers/watchdog/iop_wdt.c            |    2 +-
 drivers/watchdog/it8712f_wdt.c        |    2 +-
 drivers/watchdog/ixp2000_wdt.c        |    2 +-
 drivers/watchdog/ixp4xx_wdt.c         |    2 +-
 drivers/watchdog/ks8695_wdt.c         |    2 +-
 drivers/watchdog/machzwd.c            |    2 +-
 drivers/watchdog/mixcomwd.c           |    2 +-
 drivers/watchdog/mpc5200_wdt.c        |    2 +-
 drivers/watchdog/mpc8xxx_wdt.c        |    2 +-
 drivers/watchdog/mpcore_wdt.c         |    2 +-
 drivers/watchdog/mtx-1_wdt.c          |    2 +-
 drivers/watchdog/mv64x60_wdt.c        |    2 +-
 drivers/watchdog/omap_wdt.c           |    2 +-
 drivers/watchdog/pc87413_wdt.c        |    2 +-
 drivers/watchdog/pcwd.c               |    2 +-
 drivers/watchdog/pcwd_pci.c           |    2 +-
 drivers/watchdog/pcwd_usb.c           |    2 +-
 drivers/watchdog/pnx4008_wdt.c        |    2 +-
 drivers/watchdog/rm9k_wdt.c           |    4 +-
 drivers/watchdog/s3c2410_wdt.c        |    2 +-
 drivers/watchdog/sa1100_wdt.c         |    2 +-
 drivers/watchdog/sb_wdog.c            |    2 +-
 drivers/watchdog/sbc60xxwdt.c         |    2 +-
 drivers/watchdog/sbc7240_wdt.c        |    2 +-
 drivers/watchdog/sbc_epx_c3.c         |    2 +-
 drivers/watchdog/sc1200wdt.c          |    2 +-
 drivers/watchdog/sc520_wdt.c          |    2 +-
 drivers/watchdog/scx200_wdt.c         |    2 +-
 drivers/watchdog/shwdt.c              |    2 +-
 drivers/watchdog/smsc37b787_wdt.c     |    2 +-
 drivers/watchdog/softdog.c            |    2 +-
 drivers/watchdog/txx9wdt.c            |    2 +-
 drivers/watchdog/w83627hf_wdt.c       |    2 +-
 drivers/watchdog/w83697hf_wdt.c       |    2 +-
 drivers/watchdog/w83877f_wdt.c        |    2 +-
 drivers/watchdog/w83977f_wdt.c        |    2 +-
 drivers/watchdog/wafer5823wdt.c       |    2 +-
 drivers/watchdog/wdrtas.c             |    2 +-
 drivers/watchdog/wdt.c                |    2 +-
 drivers/watchdog/wdt285.c             |    2 +-
 drivers/watchdog/wdt977.c             |    2 +-
 drivers/watchdog/wdt_pci.c            |    2 +-
 fs/bad_inode.c                        |    4 +-
 fs/block_dev.c                        |    7 +++++-
 fs/cifs/cifsfs.h                      |    2 +-
 fs/cifs/ioctl.c                       |    3 +-
 fs/compat_ioctl.c                     |    3 +-
 fs/ext2/ext2.h                        |    4 +-
 fs/ext2/ioctl.c                       |    7 ++---
 fs/ext3/ioctl.c                       |    3 +-
 fs/ext4/ext4.h                        |    4 +-
 fs/ext4/ioctl.c                       |    7 ++---
 fs/fat/dir.c                          |    3 +-
 fs/gfs2/ops_file.c                    |    2 +-
 fs/inotify_user.c                     |    2 +-
 fs/ioctl.c                            |    8 +++---
 fs/jffs2/ioctl.c                      |    2 +-
 fs/jffs2/os-linux.h                   |    2 +-
 fs/jfs/ioctl.c                        |    7 ++---
 fs/jfs/jfs_inode.h                    |    4 +-
 fs/ncpfs/ioctl.c                      |    3 +-
 fs/ocfs2/ioctl.c                      |    7 ++---
 fs/ocfs2/ioctl.h                      |    4 +-
 fs/pipe.c                             |    3 +-
 fs/proc/inode.c                       |   14 ++++++------
 fs/reiserfs/ioctl.c                   |    3 +-
 fs/ubifs/ioctl.c                      |    7 ++---
 fs/ubifs/ubifs.h                      |    4 +-
 fs/xfs/linux-2.6/xfs_file.c           |    8 +++---
 fs/xfs/linux-2.6/xfs_ioctl32.c        |    6 +++-
 fs/xfs/linux-2.6/xfs_ioctl32.h        |    4 +-
 include/linux/ext3_fs.h               |    2 +-
 include/linux/fs.h                    |   10 ++++----
 include/linux/ncp_fs.h                |    2 +-
 include/linux/reiserfs_fs.h           |    2 +-
 include/linux/tty.h                   |    2 +-
 include/linux/wanrouter.h             |    2 +-
 include/media/v4l2-ioctl.h            |    2 +-
 kernel/power/user.c                   |    2 +-
 net/irda/irnet/irnet_ppp.c            |    3 +-
 net/irda/irnet/irnet_ppp.h            |    5 ++-
 net/socket.c                          |    8 +++---
 net/wanrouter/wanmain.c               |    3 +-
 sound/core/control.c                  |    2 +-
 sound/core/control_compat.c           |    4 +-
 sound/core/hwdep.c                    |    2 +-
 sound/core/hwdep_compat.c             |    4 +-
 sound/core/info.c                     |    2 +-
 sound/core/init.c                     |    2 +-
 sound/core/oss/mixer_oss.c            |    2 +-
 sound/core/oss/pcm_oss.c              |    2 +-
 sound/core/pcm_compat.c               |    2 +-
 sound/core/pcm_native.c               |    4 +-
 sound/core/rawmidi.c                  |    2 +-
 sound/core/rawmidi_compat.c           |    4 +-
 sound/core/seq/oss/seq_oss.c          |    6 ++--
 sound/core/seq/seq_clientmgr.c        |    2 +-
 sound/core/seq/seq_compat.c           |    2 +-
 sound/core/timer.c                    |    2 +-
 sound/core/timer_compat.c             |    4 +-
 virt/kvm/kvm_main.c                   |    8 +++---
 208 files changed, 370 insertions(+), 376 deletions(-)

diff --git a/arch/mips/sibyte/common/sb_tbprof.c b/arch/mips/sibyte/common/sb_tbprof.c
index 66e3e3f..5419f85 100644
--- a/arch/mips/sibyte/common/sb_tbprof.c
+++ b/arch/mips/sibyte/common/sb_tbprof.c
@@ -507,7 +507,7 @@ static ssize_t sbprof_tb_read(struct file *filp, char *buf,
 	return count;
 }
 
-static long sbprof_tb_ioctl(struct file *filp,
+static int sbprof_tb_ioctl(struct inode *inode, struct file *filp,
 			    unsigned int command,
 			    unsigned long arg)
 {
diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
index f696f57..6d98acc 100644
--- a/arch/parisc/kernel/perf.c
+++ b/arch/parisc/kernel/perf.c
@@ -198,7 +198,7 @@ static int perf_open(struct inode *inode, struct file *file);
 static ssize_t perf_read(struct file *file, char __user *buf, size_t cnt, loff_t *ppos);
 static ssize_t perf_write(struct file *file, const char __user *buf, size_t count, 
 	loff_t *ppos);
-static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int perf_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 static void perf_start_counters(void);
 static int perf_stop_counters(uint32_t *raddr);
 static const struct rdr_tbl_ent * perf_rdr_get_entry(uint32_t rdr_num);
@@ -442,7 +442,7 @@ static void perf_patch_images(void)
  * must be running on the processor that you wish to change.
  */
 
-static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int perf_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	long error_start;
 	uint32_t raddr[4];
diff --git a/arch/sparc/kernel/apc.c b/arch/sparc/kernel/apc.c
index 5267d48..d49a35a 100644
--- a/arch/sparc/kernel/apc.c
+++ b/arch/sparc/kernel/apc.c
@@ -85,7 +85,7 @@ static int apc_release(struct inode *inode, struct file *f)
 	return 0;
 }
 
-static long apc_ioctl(struct file *f, unsigned int cmd, unsigned long __arg)
+static int apc_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long __arg)
 {
 	__u8 inarg, __user *arg;
 
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 9ee24e6..329e4c5 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -1460,7 +1460,7 @@ static unsigned int do_poll(struct file *fp, poll_table *wait)
 	return 0;
 }
 
-static long do_ioctl(struct file *filp, u_int cmd, u_long arg)
+static int do_ioctl(struct inode *inode, struct file *filp, u_int cmd, u_long arg)
 {
 	struct apm_user *as;
 	int ret;
diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
index 726a5fc..91f970f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -645,7 +645,7 @@ static unsigned int mce_poll(struct file *file, poll_table *wait)
 	return 0;
 }
 
-static long mce_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int mce_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
 {
 	int __user *p = (int __user *)arg;
 
diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
index 84c480b..d6b053b 100644
--- a/arch/x86/kernel/cpu/mtrr/if.c
+++ b/arch/x86/kernel/cpu/mtrr/if.c
@@ -145,8 +145,7 @@ mtrr_write(struct file *file, const char __user *buf, size_t len, loff_t * ppos)
 	return -EINVAL;
 }
 
-static long
-mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
+static int mtrr_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long __arg)
 {
 	int err = 0;
 	mtrr_type type;
diff --git a/block/bsg.c b/block/bsg.c
index 0aae8d7..1ec2e02 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -872,7 +872,7 @@ static unsigned int bsg_poll(struct file *file, poll_table *wait)
 	return mask;
 }
 
-static long bsg_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int bsg_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct bsg_device *bd = file->private_data;
 	int __user *uarg = (int __user *) arg;
diff --git a/block/compat_ioctl.c b/block/compat_ioctl.c
index c23177e..2c32818 100644
--- a/block/compat_ioctl.c
+++ b/block/compat_ioctl.c
@@ -709,7 +709,7 @@ static int compat_blkdev_driver_ioctl(struct inode *inode, struct file *file,
 	}
 
 	if (disk->fops->unlocked_ioctl)
-		return disk->fops->unlocked_ioctl(file, cmd, arg);
+		return disk->fops->unlocked_ioctl(inode, file, cmd, arg);
 
 	if (disk->fops->ioctl) {
 		lock_kernel();
@@ -773,10 +773,16 @@ static int compat_blkdev_locked_ioctl(struct inode *inode, struct file *file,
 	return -ENOIOCTLCMD;
 }
 
-/* Most of the generic ioctls are handled in the normal fallback path.
-   This assumes the blkdev's low level compat_ioctl always returns
-   ENOIOCTLCMD for unknown ioctls. */
-long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+/*
+ * Most of the generic ioctls are handled in the normal fallback path.
+ * This assumes the blkdev's low level compat_ioctl always returns
+ * ENOIOCTLCMD for unknown ioctls.
+ *
+ * NOTE! We ignore the on-disk inode that was passed as
+ * an argument, and use the "f_mapping->host" inode for
+ * all block ioctls!
+ */
+int compat_blkdev_ioctl(struct inode *unused, struct file *file, unsigned cmd, unsigned long arg)
 {
 	int ret = -ENOIOCTLCMD;
 	struct inode *inode = file->f_mapping->host;
@@ -806,7 +812,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 	ret = compat_blkdev_locked_ioctl(inode, file, bdev, cmd, arg);
 	/* FIXME: why do we assume -> compat_ioctl needs the BKL? */
 	if (ret == -ENOIOCTLCMD && disk->fops->compat_ioctl)
-		ret = disk->fops->compat_ioctl(file, cmd, arg);
+		ret = disk->fops->compat_ioctl(inode, file, cmd, arg);
 	unlock_kernel();
 
 	if (ret != -ENOIOCTLCMD)
diff --git a/block/ioctl.c b/block/ioctl.c
index 77185e5..a85824e 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -204,8 +204,9 @@ int blkdev_driver_ioctl(struct inode *inode, struct file *file,
 			struct gendisk *disk, unsigned cmd, unsigned long arg)
 {
 	int ret;
+
 	if (disk->fops->unlocked_ioctl)
-		return disk->fops->unlocked_ioctl(file, cmd, arg);
+		return disk->fops->unlocked_ioctl(inode, file, cmd, arg);
 
 	if (disk->fops->ioctl) {
 		lock_kernel();
diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index a002a38..972539d 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -6628,7 +6628,7 @@ static void DAC960_DestroyProcEntries(DAC960_Controller_T *Controller)
  * DAC960_gam_ioctl is the ioctl function for performing RAID operations.
 */
 
-static long DAC960_gam_ioctl(struct file *file, unsigned int Request,
+static int DAC960_gam_ioctl(struct inode *inode, struct file *file, unsigned int Request,
 						unsigned long Argument)
 {
   long ErrorCode = 0;
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index b73116e..67404dd 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -192,7 +192,7 @@ static void cciss_procinit(int i)
 #endif				/* CONFIG_PROC_FS */
 
 #ifdef CONFIG_COMPAT
-static long cciss_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg);
+static int cciss_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg);
 #endif
 
 static struct block_device_operations cciss_fops = {
@@ -618,7 +618,7 @@ static int cciss_ioctl32_passthru(struct file *f, unsigned cmd,
 static int cciss_ioctl32_big_passthru(struct file *f, unsigned cmd,
 				      unsigned long arg);
 
-static long cciss_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg)
+static int cciss_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case CCISS_GETPCIINFO:
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d3a25b0..bfa4f44 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1292,9 +1292,8 @@ loop_get_status_compat(struct loop_device *lo,
 	return err;
 }
 
-static long lo_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int lo_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
 	struct loop_device *lo = inode->i_bdev->bd_disk->private_data;
 	int err;
 
diff --git a/drivers/block/paride/pt.c b/drivers/block/paride/pt.c
index 673b8b2..5a6fe4a 100644
--- a/drivers/block/paride/pt.c
+++ b/drivers/block/paride/pt.c
@@ -190,7 +190,7 @@ module_param_array(drive3, int, NULL, 0);
 #define ATAPI_LOG_SENSE		0x4d
 
 static int pt_open(struct inode *inode, struct file *file);
-static long pt_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int pt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 static int pt_release(struct inode *inode, struct file *file);
 static ssize_t pt_read(struct file *filp, char __user *buf,
 		       size_t count, loff_t * ppos);
@@ -690,7 +690,7 @@ out:
 	return err;
 }
 
-static long pt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int pt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct pt_unit *tape = file->private_data;
 	struct mtop __user *p = (void __user *)arg;
diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index 4bada0e..acdeee0 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -313,7 +313,7 @@ extern const struct aper_size_info_16 agp3_generic_sizes[];
 extern int agp_off;
 extern int agp_try_unsupported_boot;
 
-long compat_agp_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int compat_agp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 
 /* Chipset independant registers (from AGP Spec) */
 #define AGP_APBASE	0x10
diff --git a/drivers/char/agp/compat_ioctl.c b/drivers/char/agp/compat_ioctl.c
index 58c57cb..abd8974 100644
--- a/drivers/char/agp/compat_ioctl.c
+++ b/drivers/char/agp/compat_ioctl.c
@@ -202,7 +202,7 @@ static int compat_agpioc_unbind_wrap(struct agp_file_private *priv, void __user
 	return agp_unbind_memory(memory);
 }
 
-long compat_agp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int compat_agp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct agp_file_private *curr_priv = file->private_data;
 	int ret_val = -ENOTTY;
diff --git a/drivers/char/agp/frontend.c b/drivers/char/agp/frontend.c
index a96f319..0a2d134 100644
--- a/drivers/char/agp/frontend.c
+++ b/drivers/char/agp/frontend.c
@@ -971,7 +971,7 @@ int agpioc_chipset_flush_wrap(struct agp_file_private *priv)
 	return 0;
 }
 
-static long agp_ioctl(struct file *file,
+static int agp_ioctl(struct inode *inode, struct file *file,
 		     unsigned int cmd, unsigned long arg)
 {
 	struct agp_file_private *curr_priv = file->private_data;
diff --git a/drivers/char/ds1302.c b/drivers/char/ds1302.c
index c5e67a6..95aac80 100644
--- a/drivers/char/ds1302.c
+++ b/drivers/char/ds1302.c
@@ -154,7 +154,7 @@ static unsigned char days_in_mo[] =
 
 /* ioctl that supports RTC_RD_TIME and RTC_SET_TIME (read and set time/date). */
 
-static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	unsigned long flags;
 
diff --git a/drivers/char/dsp56k.c b/drivers/char/dsp56k.c
index ca7c72a..e4866bc 100644
--- a/drivers/char/dsp56k.c
+++ b/drivers/char/dsp56k.c
@@ -303,7 +303,7 @@ static ssize_t dsp56k_write(struct file *file, const char __user *buf, size_t co
 	}
 }
 
-static long dsp56k_ioctl(struct file *file, unsigned int cmd,
+static int dsp56k_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 			 unsigned long arg)
 {
 	int dev = iminor(file->f_path.dentry->d_inode) & 0x0f;
diff --git a/drivers/char/efirtc.c b/drivers/char/efirtc.c
index 34d15d5..3131dc0 100644
--- a/drivers/char/efirtc.c
+++ b/drivers/char/efirtc.c
@@ -51,7 +51,7 @@
 
 static DEFINE_SPINLOCK(efi_rtc_lock);
 
-static long efi_rtc_ioctl(struct file *file, unsigned int cmd,
+static int efi_rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg);
 
 #define is_leap(year) \
@@ -146,7 +146,7 @@ convert_from_efi_time(efi_time_t *eft, struct rtc_time *wtime)
 	}
 }
 
-static long efi_rtc_ioctl(struct file *file, unsigned int cmd,
+static int efi_rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 
diff --git a/drivers/char/ip2/ip2main.c b/drivers/char/ip2/ip2main.c
index 689f9dc..5ac4c8d 100644
--- a/drivers/char/ip2/ip2main.c
+++ b/drivers/char/ip2/ip2main.c
@@ -203,7 +203,7 @@ static int set_serial_info(i2ChanStrPtr, struct serial_struct __user *);
 
 static ssize_t ip2_ipl_read(struct file *, char __user *, size_t, loff_t *);
 static ssize_t ip2_ipl_write(struct file *, const char __user *, size_t, loff_t *);
-static long ip2_ipl_ioctl(struct file *, UINT, ULONG);
+static int ip2_ipl_ioctl(struct inode *inode, struct file *, UINT, ULONG);
 static int ip2_ipl_open(struct inode *, struct file *);
 
 static int DumpTraceBuffer(char __user *, int);
@@ -2845,8 +2845,8 @@ ip2_ipl_write(struct file *pFile, const char __user *pData, size_t count, loff_t
 /*                                                                            */
 /*                                                                            */
 /******************************************************************************/
-static long
-ip2_ipl_ioctl (struct file *pFile, UINT cmd, ULONG arg )
+static int
+ip2_ipl_ioctl(struct inode *inode, struct file *pFile, UINT cmd, ULONG arg )
 {
 	unsigned int iplminor = iminor(pFile->f_path.dentry->d_inode);
 	int rc = 0;
diff --git a/drivers/char/ip27-rtc.c b/drivers/char/ip27-rtc.c
index ec9d044..f85a353 100644
--- a/drivers/char/ip27-rtc.c
+++ b/drivers/char/ip27-rtc.c
@@ -47,7 +47,7 @@
 #include <asm/sn/sn0/hub.h>
 #include <asm/sn/sn_private.h>
 
-static long rtc_ioctl(struct file *filp, unsigned int cmd,
+static int rtc_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 			unsigned long arg);
 
 static int rtc_read_proc(char *page, char **start, off_t off,
@@ -76,7 +76,7 @@ static unsigned long epoch = 1970;	/* year corresponding to 0x00	*/
 static const unsigned char days_in_mo[] =
 {0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};
 
-static long rtc_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int rtc_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 
 	struct rtc_time wtime;
diff --git a/drivers/char/ipmi/ipmi_devintf.c b/drivers/char/ipmi/ipmi_devintf.c
index 64e1c16..02a8511 100644
--- a/drivers/char/ipmi/ipmi_devintf.c
+++ b/drivers/char/ipmi/ipmi_devintf.c
@@ -762,7 +762,7 @@ static long put_compat_ipmi_recv(struct ipmi_recv *p64,
 /*
  * Handle compatibility ioctls
  */
-static long compat_ipmi_ioctl(struct file *filep, unsigned int cmd,
+static int compat_ipmi_ioctl(struct inode *inode, struct file *filep, unsigned int cmd,
 			      unsigned long arg)
 {
 	int rc;
diff --git a/drivers/char/mmtimer.c b/drivers/char/mmtimer.c
index 918711a..e2b2463 100644
--- a/drivers/char/mmtimer.c
+++ b/drivers/char/mmtimer.c
@@ -58,7 +58,7 @@ extern unsigned long sn_rtc_cycles_per_second;
 
 #define rtc_time()              (*RTC_COUNTER_ADDR)
 
-static long mmtimer_ioctl(struct file *file, unsigned int cmd,
+static int mmtimer_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg);
 static int mmtimer_mmap(struct file *file, struct vm_area_struct *vma);
 
@@ -365,7 +365,7 @@ restart:
  * %MMTIMER_GETCOUNTER - Gets the current value in the counter and places it
  * in the address specified by @arg.
  */
-static long mmtimer_ioctl(struct file *file, unsigned int cmd,
+static int mmtimer_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	int ret = 0;
diff --git a/drivers/char/mwave/mwavedd.c b/drivers/char/mwave/mwavedd.c
index 4f8d67f..41a3af0 100644
--- a/drivers/char/mwave/mwavedd.c
+++ b/drivers/char/mwave/mwavedd.c
@@ -86,7 +86,7 @@ module_param(mwave_uart_io, int, 0);
 
 static int mwave_open(struct inode *inode, struct file *file);
 static int mwave_close(struct inode *inode, struct file *file);
-static long mwave_ioctl(struct file *filp, unsigned int iocmd,
+static int mwave_ioctl(struct inode *inode, struct file *filp, unsigned int iocmd,
 							unsigned long ioarg);
 
 MWAVE_DEVICE_DATA mwave_s_mdd;
@@ -119,7 +119,7 @@ static int mwave_close(struct inode *inode, struct file *file)
 	return retval;
 }
 
-static long mwave_ioctl(struct file *file, unsigned int iocmd,
+static int mwave_ioctl(struct inode *inode, struct file *file, unsigned int iocmd,
 							unsigned long ioarg)
 {
 	unsigned int retval = 0;
diff --git a/drivers/char/pcmcia/cm4000_cs.c b/drivers/char/pcmcia/cm4000_cs.c
index f070ae7..f556c56 100644
--- a/drivers/char/pcmcia/cm4000_cs.c
+++ b/drivers/char/pcmcia/cm4000_cs.c
@@ -1406,11 +1406,10 @@ static void stop_monitor(struct cm4000_dev *dev)
 	DEBUGP(3, dev, "<- stop_monitor\n");
 }
 
-static long cmm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int cmm_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	struct cm4000_dev *dev = filp->private_data;
 	unsigned int iobase = dev->p_dev->io.BasePort1;
-	struct inode *inode = filp->f_path.dentry->d_inode;
 	struct pcmcia_device *link;
 	int size;
 	int rc;
diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c
index bee39fd..fafcc15 100644
--- a/drivers/char/ppdev.c
+++ b/drivers/char/ppdev.c
@@ -633,7 +633,7 @@ static int pp_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	return 0;
 }
 
-static long pp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int pp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	long ret;
 	lock_kernel();
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 1838aa3..93e26d0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1061,7 +1061,7 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
 	return (ssize_t)count;
 }
 
-static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int random_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
 {
 	int size, ent_count;
 	int __user *p = (int __user *)arg;
diff --git a/drivers/char/rio/rio_linux.c b/drivers/char/rio/rio_linux.c
index a8f68a3..1fad0e4 100644
--- a/drivers/char/rio/rio_linux.c
+++ b/drivers/char/rio/rio_linux.c
@@ -179,7 +179,7 @@ static int rio_set_real_termios(void *ptr);
 static void rio_hungup(void *ptr);
 static void rio_close(void *ptr);
 static int rio_chars_in_buffer(void *ptr);
-static long rio_fw_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+static int rio_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg);
 static int rio_init_drivers(void);
 
 static void my_hd(void *addr, int len);
@@ -560,7 +560,7 @@ static void rio_close(void *ptr)
 
 
 
-static long rio_fw_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int rio_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	int rc = 0;
 	func_enter();
diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c
index f53d4d0..3bb7b51 100644
--- a/drivers/char/rtc.c
+++ b/drivers/char/rtc.c
@@ -142,7 +142,7 @@ static DEFINE_TIMER(rtc_irq_timer, rtc_dropped_irq, 0, 0);
 static ssize_t rtc_read(struct file *file, char __user *buf,
 			size_t count, loff_t *ppos);
 
-static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 static void rtc_get_rtc_time(struct rtc_time *rtc_tm);
 
 #ifdef RTC_IRQ
@@ -717,7 +717,7 @@ static int rtc_do_ioctl(unsigned int cmd, unsigned long arg, int kernel)
 			    &wtime, sizeof wtime) ? -EFAULT : 0;
 }
 
-static long rtc_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int rtc_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	long ret;
 	lock_kernel();
diff --git a/drivers/char/sx.c b/drivers/char/sx.c
index c385206..54d0c48 100644
--- a/drivers/char/sx.c
+++ b/drivers/char/sx.c
@@ -286,7 +286,7 @@ static void sx_close(void *ptr);
 static int sx_chars_in_buffer(void *ptr);
 static int sx_init_board(struct sx_board *board);
 static int sx_init_portstructs(int nboards, int nports);
-static long sx_fw_ioctl(struct file *filp, unsigned int cmd,
+static int sx_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 						unsigned long arg);
 static int sx_init_drivers(void);
 
@@ -1686,7 +1686,7 @@ static int do_memtest_w(struct sx_board *board, int min, int max)
 }
 #endif
 
-static long sx_fw_ioctl(struct file *filp, unsigned int cmd,
+static int sx_fw_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 							unsigned long arg)
 {
 	long rc = 0;
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index daeb8f7..835658b 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -150,9 +150,9 @@ ssize_t redirected_tty_write(struct file *, const char __user *,
 static unsigned int tty_poll(struct file *, poll_table *);
 static int tty_open(struct inode *, struct file *);
 static int tty_release(struct inode *, struct file *);
-long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 #ifdef CONFIG_COMPAT
-static long tty_compat_ioctl(struct file *file, unsigned int cmd,
+static int tty_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				unsigned long arg);
 #else
 #define tty_compat_ioctl NULL
@@ -785,13 +785,13 @@ static unsigned int hung_up_tty_poll(struct file *filp, poll_table *wait)
 	return POLLIN | POLLOUT | POLLERR | POLLHUP | POLLRDNORM | POLLWRNORM;
 }
 
-static long hung_up_tty_ioctl(struct file *file, unsigned int cmd,
+static int hung_up_tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 		unsigned long arg)
 {
 	return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
 }
 
-static long hung_up_tty_compat_ioctl(struct file *file,
+static int hung_up_tty_compat_ioctl(struct inode *inode, struct file *file,
 				     unsigned int cmd, unsigned long arg)
 {
 	return cmd == TIOCSPGRP ? -ENOTTY : -EIO;
@@ -2941,13 +2941,12 @@ static int tty_tiocmset(struct tty_struct *tty, struct file *file, unsigned int
 /*
  * Split this up, as gcc can choke on it otherwise..
  */
-long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct tty_struct *tty, *real_tty;
 	void __user *p = (void __user *)arg;
 	int retval;
 	struct tty_ldisc *ld;
-	struct inode *inode = file->f_dentry->d_inode;
 
 	tty = (struct tty_struct *)file->private_data;
 	if (tty_paranoia_check(tty, inode, "tty_ioctl"))
@@ -3075,10 +3074,9 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 }
 
 #ifdef CONFIG_COMPAT
-static long tty_compat_ioctl(struct file *file, unsigned int cmd,
+static int tty_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				unsigned long arg)
 {
-	struct inode *inode = file->f_dentry->d_inode;
 	struct tty_struct *tty = file->private_data;
 	struct tty_ldisc *ld;
 	int retval = -ENOIOCTLCMD;
diff --git a/drivers/char/viotape.c b/drivers/char/viotape.c
index 7a70a40..649b50e 100644
--- a/drivers/char/viotape.c
+++ b/drivers/char/viotape.c
@@ -678,7 +678,7 @@ free_op:
 	return ret;
 }
 
-static long viotap_unlocked_ioctl(struct file *file,
+static int viotap_unlocked_ioctl(struct inode *inode, struct file *file,
 		unsigned int cmd, unsigned long arg)
 {
 	long rc;
diff --git a/drivers/firewire/fw-cdev.c b/drivers/firewire/fw-cdev.c
index 2e6d584..c7b1e3d 100644
--- a/drivers/firewire/fw-cdev.c
+++ b/drivers/firewire/fw-cdev.c
@@ -916,8 +916,8 @@ dispatch_ioctl(struct client *client, unsigned int cmd, void __user *arg)
 	return 0;
 }
 
-static long
-fw_device_op_ioctl(struct file *file,
+static int
+fw_device_op_ioctl(struct inode *inode, struct file *file,
 		   unsigned int cmd, unsigned long arg)
 {
 	struct client *client = file->private_data;
@@ -929,8 +929,8 @@ fw_device_op_ioctl(struct file *file,
 }
 
 #ifdef CONFIG_COMPAT
-static long
-fw_device_op_compat_ioctl(struct file *file,
+static int
+fw_device_op_compat_ioctl(struct inode *inode, struct file *file,
 			  unsigned int cmd, unsigned long arg)
 {
 	struct client *client = file->private_data;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d7326d9..ecc9ce6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -216,7 +216,7 @@ extern void i915_driver_lastclose(struct drm_device * dev);
 extern void i915_driver_preclose(struct drm_device *dev,
 				 struct drm_file *file_priv);
 extern int i915_driver_device_is_agp(struct drm_device * dev);
-extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int i915_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 			      unsigned long arg);
 
 /* i915_irq.c */
diff --git a/drivers/gpu/drm/i915/i915_ioc32.c b/drivers/gpu/drm/i915/i915_ioc32.c
index 1fe68a2..f8f623e 100644
--- a/drivers/gpu/drm/i915/i915_ioc32.c
+++ b/drivers/gpu/drm/i915/i915_ioc32.c
@@ -199,7 +199,7 @@ drm_ioctl_compat_t *i915_compat_ioctls[] = {
  * \param arg user argument.
  * \return zero on success or negative number on failure.
  */
-long i915_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int i915_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	unsigned int nr = DRM_IOCTL_NR(cmd);
 	drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/gpu/drm/mga/mga_drv.h b/drivers/gpu/drm/mga/mga_drv.h
index f6ebd24..dfe6cd7 100644
--- a/drivers/gpu/drm/mga/mga_drv.h
+++ b/drivers/gpu/drm/mga/mga_drv.h
@@ -187,7 +187,7 @@ extern irqreturn_t mga_driver_irq_handler(DRM_IRQ_ARGS);
 extern void mga_driver_irq_preinstall(struct drm_device * dev);
 extern void mga_driver_irq_postinstall(struct drm_device * dev);
 extern void mga_driver_irq_uninstall(struct drm_device * dev);
-extern long mga_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int mga_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 			     unsigned long arg);
 
 #define mga_flush_write_combine()	DRM_WRITEMEMORYBARRIER()
diff --git a/drivers/gpu/drm/mga/mga_ioc32.c b/drivers/gpu/drm/mga/mga_ioc32.c
index 30d0047..b5d0826 100644
--- a/drivers/gpu/drm/mga/mga_ioc32.c
+++ b/drivers/gpu/drm/mga/mga_ioc32.c
@@ -208,7 +208,7 @@ drm_ioctl_compat_t *mga_compat_ioctls[] = {
  * \param arg user argument.
  * \return zero on success or negative number on failure.
  */
-long mga_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int mga_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	unsigned int nr = DRM_IOCTL_NR(cmd);
 	drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/gpu/drm/r128/r128_drv.h b/drivers/gpu/drm/r128/r128_drv.h
index 011105e..e145952 100644
--- a/drivers/gpu/drm/r128/r128_drv.h
+++ b/drivers/gpu/drm/r128/r128_drv.h
@@ -159,7 +159,7 @@ extern void r128_driver_lastclose(struct drm_device * dev);
 extern void r128_driver_preclose(struct drm_device * dev,
 				 struct drm_file *file_priv);
 
-extern long r128_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int r128_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 			      unsigned long arg);
 
 /* Register definitions, register access macros and drmAddMap constants
diff --git a/drivers/gpu/drm/r128/r128_ioc32.c b/drivers/gpu/drm/r128/r128_ioc32.c
index d3cb676..f242fdb 100644
--- a/drivers/gpu/drm/r128/r128_ioc32.c
+++ b/drivers/gpu/drm/r128/r128_ioc32.c
@@ -198,7 +198,7 @@ drm_ioctl_compat_t *r128_compat_ioctls[] = {
  * \param arg user argument.
  * \return zero on success or negative number on failure.
  */
-long r128_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int r128_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	unsigned int nr = DRM_IOCTL_NR(cmd);
 	drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/gpu/drm/radeon/radeon_drv.h b/drivers/gpu/drm/radeon/radeon_drv.h
index 0993816..4b55abd 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.h
+++ b/drivers/gpu/drm/radeon/radeon_drv.h
@@ -401,7 +401,7 @@ extern void radeon_driver_preclose(struct drm_device * dev, struct drm_file *fil
 extern void radeon_driver_postclose(struct drm_device * dev, struct drm_file * filp);
 extern void radeon_driver_lastclose(struct drm_device * dev);
 extern int radeon_driver_open(struct drm_device * dev, struct drm_file * filp_priv);
-extern long radeon_compat_ioctl(struct file *filp, unsigned int cmd,
+extern int radeon_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 				unsigned long arg);
 
 /* r300_cmdbuf.c */
diff --git a/drivers/gpu/drm/radeon/radeon_ioc32.c b/drivers/gpu/drm/radeon/radeon_ioc32.c
index 56decda..6b518cb 100644
--- a/drivers/gpu/drm/radeon/radeon_ioc32.c
+++ b/drivers/gpu/drm/radeon/radeon_ioc32.c
@@ -401,7 +401,7 @@ drm_ioctl_compat_t *radeon_compat_ioctls[] = {
  * \param arg user argument.
  * \return zero on success or negative number on failure.
  */
-long radeon_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int radeon_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	unsigned int nr = DRM_IOCTL_NR(cmd);
 	drm_ioctl_compat_t *fn = NULL;
diff --git a/drivers/hid/hidraw.c b/drivers/hid/hidraw.c
index c40f040..0a15260 100644
--- a/drivers/hid/hidraw.c
+++ b/drivers/hid/hidraw.c
@@ -217,10 +217,9 @@ static int hidraw_release(struct inode * inode, struct file * file)
 	return 0;
 }
 
-static long hidraw_ioctl(struct file *file, unsigned int cmd,
+static int hidraw_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
 	unsigned int minor = iminor(inode);
 	long ret = 0;
 	/* FIXME: What stops hidraw_table going NULL */
diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c
index 842e9ed..0b08caf 100644
--- a/drivers/hid/usbhid/hiddev.c
+++ b/drivers/hid/usbhid/hiddev.c
@@ -544,7 +544,7 @@ static noinline int hiddev_ioctl_string(struct hiddev *hiddev, unsigned int cmd,
 	return len;
 }
 
-static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int hiddev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct hiddev_list *list = file->private_data;
 	struct hiddev *hiddev = list->hiddev;
@@ -761,9 +761,9 @@ static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 }
 
 #ifdef CONFIG_COMPAT
-static long hiddev_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int hiddev_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
-	return hiddev_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+	return hiddev_ioctl(inode, file, cmd, (unsigned long)compat_ptr(arg));
 }
 #endif
 
diff --git a/drivers/i2c/i2c-dev.c b/drivers/i2c/i2c-dev.c
index af4491f..98ec3d2 100644
--- a/drivers/i2c/i2c-dev.c
+++ b/drivers/i2c/i2c-dev.c
@@ -367,7 +367,7 @@ static noinline int i2cdev_ioctl_smbus(struct i2c_client *client,
 	return res;
 }
 
-static long i2cdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int i2cdev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct i2c_client *client = (struct i2c_client *)file->private_data;
 	unsigned long funcs;
diff --git a/drivers/ieee1394/dv1394.c b/drivers/ieee1394/dv1394.c
index b6eb2cf..a8bdc2c 100644
--- a/drivers/ieee1394/dv1394.c
+++ b/drivers/ieee1394/dv1394.c
@@ -158,7 +158,7 @@ static void it_tasklet_func(unsigned long data);
 static void ir_tasklet_func(unsigned long data);
 
 #ifdef CONFIG_COMPAT
-static long dv1394_compat_ioctl(struct file *file, unsigned int cmd,
+static int dv1394_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 			       unsigned long arg);
 #endif
 
@@ -1533,7 +1533,7 @@ static ssize_t dv1394_read(struct file *file,  char __user *buffer, size_t count
 
 /*** DEVICE IOCTL INTERFACE ************************************************/
 
-static long dv1394_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int dv1394_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct video_card *video = file_to_video_card(file);
 	unsigned long flags;
@@ -2457,7 +2457,7 @@ struct dv1394_status32 {
 
 /* RED-PEN: this should use compat_alloc_userspace instead */
 
-static int handle_dv1394_init(struct file *file, unsigned int cmd, unsigned long arg)
+static int handle_dv1394_init(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct dv1394_init32 dv32;
 	struct dv1394_init dv;
@@ -2480,13 +2480,13 @@ static int handle_dv1394_init(struct file *file, unsigned int cmd, unsigned long
 
 	old_fs = get_fs();
 	set_fs(KERNEL_DS);
-	ret = dv1394_ioctl(file, DV1394_IOC_INIT, (unsigned long)&dv);
+	ret = dv1394_ioctl(inode, file, DV1394_IOC_INIT, (unsigned long)&dv);
 	set_fs(old_fs);
 
 	return ret;
 }
 
-static int handle_dv1394_get_status(struct file *file, unsigned int cmd, unsigned long arg)
+static int handle_dv1394_get_status(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct dv1394_status32 dv32;
 	struct dv1394_status dv;
@@ -2498,7 +2498,7 @@ static int handle_dv1394_get_status(struct file *file, unsigned int cmd, unsigne
 
 	old_fs = get_fs();
 	set_fs(KERNEL_DS);
-	ret = dv1394_ioctl(file, DV1394_IOC_GET_STATUS, (unsigned long)&dv);
+	ret = dv1394_ioctl(inode, file, DV1394_IOC_GET_STATUS, (unsigned long)&dv);
 	set_fs(old_fs);
 
 	if (!ret) {
@@ -2523,7 +2523,7 @@ static int handle_dv1394_get_status(struct file *file, unsigned int cmd, unsigne
 
 
 
-static long dv1394_compat_ioctl(struct file *file, unsigned int cmd,
+static int dv1394_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 			       unsigned long arg)
 {
 	switch (cmd) {
@@ -2532,12 +2532,12 @@ static long dv1394_compat_ioctl(struct file *file, unsigned int cmd,
 	case DV1394_IOC_WAIT_FRAMES:
 	case DV1394_IOC_RECEIVE_FRAMES:
 	case DV1394_IOC_START_RECEIVE:
-		return dv1394_ioctl(file, cmd, arg);
+		return dv1394_ioctl(inode, file, cmd, arg);
 
 	case DV1394_IOC32_INIT:
-		return handle_dv1394_init(file, cmd, arg);
+		return handle_dv1394_init(inode, file, cmd, arg);
 	case DV1394_IOC32_GET_STATUS:
-		return handle_dv1394_get_status(file, cmd, arg);
+		return handle_dv1394_get_status(inode, file, cmd, arg);
 	default:
 		return -ENOIOCTLCMD;
 	}
diff --git a/drivers/ieee1394/raw1394.c b/drivers/ieee1394/raw1394.c
index 6fa9e4a..6cf46fa 100644
--- a/drivers/ieee1394/raw1394.c
+++ b/drivers/ieee1394/raw1394.c
@@ -2656,7 +2656,7 @@ static long do_raw1394_ioctl(struct file *file, unsigned int cmd,
 	return -EINVAL;
 }
 
-static long raw1394_ioctl(struct file *file, unsigned int cmd,
+static int raw1394_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	long ret;
@@ -2717,7 +2717,7 @@ static long raw1394_read_cycle_timer32(struct file_info *fi, void __user * uaddr
 	return err;
 }
 
-static long raw1394_compat_ioctl(struct file *file,
+static int raw1394_compat_ioctl(struct inode *inode, struct file *file,
 				 unsigned int cmd, unsigned long arg)
 {
 	struct file_info *fi = file->private_data;
diff --git a/drivers/ieee1394/video1394.c b/drivers/ieee1394/video1394.c
index 25db6e6..ed4eb78 100644
--- a/drivers/ieee1394/video1394.c
+++ b/drivers/ieee1394/video1394.c
@@ -716,7 +716,7 @@ static inline unsigned video1394_buffer_state(struct dma_iso_ctx *d,
 	return ret;
 }
 
-static long video1394_ioctl(struct file *file,
+static int video1394_ioctl(struct inode *inode, struct file *file,
 			    unsigned int cmd, unsigned long arg)
 {
 	struct file_ctx *ctx = (struct file_ctx *)file->private_data;
@@ -1272,7 +1272,7 @@ static int video1394_release(struct inode *inode, struct file *file)
 }
 
 #ifdef CONFIG_COMPAT
-static long video1394_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg);
+static int video1394_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg);
 #endif
 
 static struct cdev video1394_cdev;
@@ -1386,7 +1386,7 @@ struct video1394_wait32 {
 	struct compat_timeval filltime;
 };
 
-static int video1394_wr_wait32(struct file *file, unsigned int cmd, unsigned long arg)
+static int video1394_wr_wait32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
         struct video1394_wait32 __user *argp = (void __user *)arg;
         struct video1394_wait32 wait32;
@@ -1405,11 +1405,11 @@ static int video1394_wr_wait32(struct file *file, unsigned int cmd, unsigned lon
         old_fs = get_fs();
         set_fs(KERNEL_DS);
         if (cmd == VIDEO1394_IOC32_LISTEN_WAIT_BUFFER)
-		ret = video1394_ioctl(file,
+		ret = video1394_ioctl(inode, file,
 				      VIDEO1394_IOC_LISTEN_WAIT_BUFFER,
 				      (unsigned long) &wait);
         else
-		ret = video1394_ioctl(file,
+		ret = video1394_ioctl(inode, file,
 				      VIDEO1394_IOC_LISTEN_POLL_BUFFER,
 				      (unsigned long) &wait);
         set_fs(old_fs);
@@ -1427,7 +1427,7 @@ static int video1394_wr_wait32(struct file *file, unsigned int cmd, unsigned lon
         return ret;
 }
 
-static int video1394_w_wait32(struct file *file, unsigned int cmd, unsigned long arg)
+static int video1394_w_wait32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
         struct video1394_wait32 wait32;
         struct video1394_wait wait;
@@ -1445,11 +1445,11 @@ static int video1394_w_wait32(struct file *file, unsigned int cmd, unsigned long
         old_fs = get_fs();
         set_fs(KERNEL_DS);
         if (cmd == VIDEO1394_IOC32_LISTEN_QUEUE_BUFFER)
-		ret = video1394_ioctl(file,
+		ret = video1394_ioctl(inode, file,
 				      VIDEO1394_IOC_LISTEN_QUEUE_BUFFER,
 				      (unsigned long) &wait);
         else
-		ret = video1394_ioctl(file,
+		ret = video1394_ioctl(inode, file,
 				      VIDEO1394_IOC_TALK_WAIT_BUFFER,
 				      (unsigned long) &wait);
         set_fs(old_fs);
@@ -1457,33 +1457,33 @@ static int video1394_w_wait32(struct file *file, unsigned int cmd, unsigned long
         return ret;
 }
 
-static int video1394_queue_buf32(struct file *file, unsigned int cmd, unsigned long arg)
+static int video1394_queue_buf32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
         return -EFAULT;   /* ??? was there before. */
 
-	return video1394_ioctl(file,
+	return video1394_ioctl(inode, file,
 				VIDEO1394_IOC_TALK_QUEUE_BUFFER, arg);
 }
 
-static long video1394_compat_ioctl(struct file *f, unsigned cmd, unsigned long arg)
+static int video1394_compat_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case VIDEO1394_IOC_LISTEN_CHANNEL:
 	case VIDEO1394_IOC_UNLISTEN_CHANNEL:
 	case VIDEO1394_IOC_TALK_CHANNEL:
 	case VIDEO1394_IOC_UNTALK_CHANNEL:
-		return video1394_ioctl(f, cmd, arg);
+		return video1394_ioctl(inode, f, cmd, arg);
 
 	case VIDEO1394_IOC32_LISTEN_QUEUE_BUFFER:
-		return video1394_w_wait32(f, cmd, arg);
+		return video1394_w_wait32(inode, f, cmd, arg);
 	case VIDEO1394_IOC32_LISTEN_WAIT_BUFFER:
-		return video1394_wr_wait32(f, cmd, arg);
+		return video1394_wr_wait32(inode, f, cmd, arg);
 	case VIDEO1394_IOC_TALK_QUEUE_BUFFER:
-		return video1394_queue_buf32(f, cmd, arg);
+		return video1394_queue_buf32(inode, f, cmd, arg);
 	case VIDEO1394_IOC32_TALK_WAIT_BUFFER:
-		return video1394_w_wait32(f, cmd, arg);
+		return video1394_w_wait32(inode, f, cmd, arg);
 	case VIDEO1394_IOC32_LISTEN_POLL_BUFFER:
-		return video1394_wr_wait32(f, cmd, arg);
+		return video1394_wr_wait32(inode, f, cmd, arg);
 	default:
 		return -ENOIOCTLCMD;
 	}
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 268a2d2..6cd0bc3 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -743,7 +743,7 @@ static long ib_umad_enable_pkey(struct ib_umad_file *file)
 	return ret;
 }
 
-static long ib_umad_ioctl(struct file *filp, unsigned int cmd,
+static int ib_umad_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 			  unsigned long arg)
 {
 	switch (cmd) {
@@ -759,7 +759,7 @@ static long ib_umad_ioctl(struct file *filp, unsigned int cmd,
 }
 
 #ifdef CONFIG_COMPAT
-static long ib_umad_compat_ioctl(struct file *filp, unsigned int cmd,
+static int ib_umad_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 				 unsigned long arg)
 {
 	switch (cmd) {
diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 3524bef..9fd8fa9 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -888,13 +888,13 @@ static long evdev_ioctl_handler(struct file *file, unsigned int cmd,
 	return retval;
 }
 
-static long evdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int evdev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	return evdev_ioctl_handler(file, cmd, (void __user *)arg, 0);
 }
 
 #ifdef CONFIG_COMPAT
-static long evdev_ioctl_compat(struct file *file,
+static int evdev_ioctl_compat(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
 	return evdev_ioctl_handler(file, cmd, compat_ptr(arg), 1);
diff --git a/drivers/input/joydev.c b/drivers/input/joydev.c
index 65d7077..d4db145 100644
--- a/drivers/input/joydev.c
+++ b/drivers/input/joydev.c
@@ -555,7 +555,7 @@ static int joydev_ioctl_common(struct joydev *joydev,
 }
 
 #ifdef CONFIG_COMPAT
-static long joydev_compat_ioctl(struct file *file,
+static int joydev_compat_ioctl(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
 	struct joydev_client *client = file->private_data;
@@ -622,7 +622,7 @@ static long joydev_compat_ioctl(struct file *file,
 }
 #endif /* CONFIG_COMPAT */
 
-static long joydev_ioctl(struct file *file,
+static int joydev_ioctl(struct inode *inode, struct file *file,
 			 unsigned int cmd, unsigned long arg)
 {
 	struct joydev_client *client = file->private_data;
diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c
index 223d56d..a37877e 100644
--- a/drivers/input/misc/uinput.c
+++ b/drivers/input/misc/uinput.c
@@ -455,7 +455,7 @@ static int uinput_release(struct inode *inode, struct file *file)
 	__ret;						\
 })
 
-static long uinput_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int uinput_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int			retval;
 	struct uinput_device	*udev;
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index b262c00..c21cbdc 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1470,15 +1470,15 @@ static int ctl_ioctl(uint command, struct dm_ioctl __user *user)
 	return r;
 }
 
-static long dm_ctl_ioctl(struct file *file, uint command, ulong u)
+static int dm_ctl_ioctl(struct inode *inode, struct file *file, uint command, ulong u)
 {
 	return (long)ctl_ioctl(command, (struct dm_ioctl __user *)u);
 }
 
 #ifdef CONFIG_COMPAT
-static long dm_compat_ctl_ioctl(struct file *file, uint command, ulong u)
+static int dm_compat_ctl_ioctl(struct inode *inode, struct file *file, uint command, ulong u)
 {
-	return (long)dm_ctl_ioctl(file, command, (ulong) compat_ptr(u));
+	return dm_ctl_ioctl(inode, file, command, (ulong) compat_ptr(u));
 }
 #else
 #define dm_compat_ctl_ioctl NULL
diff --git a/drivers/media/video/compat_ioctl32.c b/drivers/media/video/compat_ioctl32.c
index bd5d9de..7eacc2d 100644
--- a/drivers/media/video/compat_ioctl32.c
+++ b/drivers/media/video/compat_ioctl32.c
@@ -110,15 +110,15 @@ struct video_window32 {
 };
 #endif
 
-static int native_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int native_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int ret = -ENOIOCTLCMD;
 
 	if (file->f_op->unlocked_ioctl)
-		ret = file->f_op->unlocked_ioctl(file, cmd, arg);
+		ret = file->f_op->unlocked_ioctl(inode, file, cmd, arg);
 	else if (file->f_op->ioctl) {
 		lock_kernel();
-		ret = file->f_op->ioctl(file->f_path.dentry->d_inode, file, cmd, arg);
+		ret = file->f_op->ioctl(inode, file, cmd, arg);
 		unlock_kernel();
 	}
 
@@ -549,7 +549,7 @@ enum {
 	MaxClips = (~0U-sizeof(struct video_window))/sizeof(struct video_clip)
 };
 
-static int do_set_window(struct file *file, unsigned int cmd, unsigned long arg)
+static int do_set_window(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct video_window32 __user *up = compat_ptr(arg);
 	struct video_window __user *vw;
@@ -607,11 +607,11 @@ static int do_set_window(struct file *file, unsigned int cmd, unsigned long arg)
 		}
 	}
 
-	return native_ioctl(file, VIDIOCSWIN, (unsigned long)vw);
+	return native_ioctl(inode, file, VIDIOCSWIN, (unsigned long)vw);
 }
 #endif
 
-static int do_video_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int do_video_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	union {
 #ifdef CONFIG_VIDEO_V4L1_COMPAT
@@ -754,12 +754,12 @@ static int do_video_ioctl(struct file *file, unsigned int cmd, unsigned long arg
 		goto out;
 
 	if(compatible_arg)
-		err = native_ioctl(file, realcmd, (unsigned long)up);
+		err = native_ioctl(inode, file, realcmd, (unsigned long)up);
 	else {
 		mm_segment_t old_fs = get_fs();
 
 		set_fs(KERNEL_DS);
-		err = native_ioctl(file, realcmd, (unsigned long) &karg);
+		err = native_ioctl(inode, file, realcmd, (unsigned long) &karg);
 		set_fs(old_fs);
 	}
 	if(err == 0) {
@@ -827,7 +827,7 @@ out:
 	return err;
 }
 
-long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
+int v4l_compat_ioctl32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int ret = -ENOIOCTLCMD;
 
@@ -837,7 +837,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
 	switch (cmd) {
 #ifdef CONFIG_VIDEO_V4L1_COMPAT
 	case VIDIOCSWIN32:
-		ret = do_set_window(file, cmd, arg);
+		ret = do_set_window(inode, file, cmd, arg);
 		break;
 	case VIDIOCGTUNER32:
 	case VIDIOCSTUNER32:
@@ -885,7 +885,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
 	case VIDIOC_S_INPUT32:
 	case VIDIOC_TRY_FMT32:
 	case VIDIOC_S_HW_FREQ_SEEK:
-		ret = do_video_ioctl(file, cmd, arg);
+		ret = do_video_ioctl(inode, file, cmd, arg);
 		break;
 
 #ifdef CONFIG_VIDEO_V4L1_COMPAT
@@ -913,7 +913,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
 	case _IOR('v' , BASE_VIDIOCPRIVATE+5, int):
 	case _IOR('v' , BASE_VIDIOCPRIVATE+6, int):
 	case _IOR('v' , BASE_VIDIOCPRIVATE+7, int):
-		ret = native_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+		ret = native_ioctl(inode, file, cmd, (unsigned long)compat_ptr(arg));
 		break;
 #endif
 	default:
@@ -922,7 +922,7 @@ long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
 	return ret;
 }
 #else
-long v4l_compat_ioctl32(struct file *file, unsigned int cmd, unsigned long arg)
+int v4l_compat_ioctl32(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	return -ENOIOCTLCMD;
 }
diff --git a/drivers/message/fusion/mptctl.c b/drivers/message/fusion/mptctl.c
index f5233f3..3d20a3c 100644
--- a/drivers/message/fusion/mptctl.c
+++ b/drivers/message/fusion/mptctl.c
@@ -116,7 +116,7 @@ static int  mptctl_probe(struct pci_dev *, const struct pci_device_id *);
 static void mptctl_remove(struct pci_dev *);
 
 #ifdef CONFIG_COMPAT
-static long compat_mpctl_ioctl(struct file *f, unsigned cmd, unsigned long arg);
+static int compat_mpctl_ioctl(struct inode *inode, struct file *f, unsigned cmd, unsigned long arg);
 #endif
 /*
  * Private function calls.
@@ -652,8 +652,8 @@ __mptctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	return ret;
 }
 
-static long
-mptctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+mptctl_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	long ret;
 	lock_kernel();
@@ -2818,7 +2818,7 @@ compat_mpt_command(struct file *filp, unsigned int cmd,
 	return ret;
 }
 
-static long compat_mpctl_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int compat_mpctl_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
 {
 	long ret;
 	lock_kernel();
diff --git a/drivers/message/i2o/i2o_config.c b/drivers/message/i2o/i2o_config.c
index 4238de9..442cdb3 100644
--- a/drivers/message/i2o/i2o_config.c
+++ b/drivers/message/i2o/i2o_config.c
@@ -746,7 +746,7 @@ out:
 	return rcode;
 }
 
-static long i2o_cfg_compat_ioctl(struct file *file, unsigned cmd,
+static int i2o_cfg_compat_ioctl(struct inode *inode, struct file *file, unsigned cmd,
 				 unsigned long arg)
 {
 	int ret;
diff --git a/drivers/misc/phantom.c b/drivers/misc/phantom.c
index daf5856..4902f28 100644
--- a/drivers/misc/phantom.c
+++ b/drivers/misc/phantom.c
@@ -83,7 +83,7 @@ static int phantom_status(struct phantom_device *dev, unsigned long newstat)
  * File ops
  */
 
-static long phantom_ioctl(struct file *file, unsigned int cmd,
+static int phantom_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 		unsigned long arg)
 {
 	struct phantom_device *dev = file->private_data;
@@ -195,14 +195,14 @@ static long phantom_ioctl(struct file *file, unsigned int cmd,
 }
 
 #ifdef CONFIG_COMPAT
-static long phantom_compat_ioctl(struct file *filp, unsigned int cmd,
+static int phantom_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 		unsigned long arg)
 {
 	if (_IOC_NR(cmd) <= 3 && _IOC_SIZE(cmd) == sizeof(compat_uptr_t)) {
 		cmd &= ~(_IOC_SIZEMASK << _IOC_SIZESHIFT);
 		cmd |= sizeof(void *) << _IOC_SIZESHIFT;
 	}
-	return phantom_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
+	return phantom_ioctl(inode, filp, cmd, (unsigned long)compat_ptr(arg));
 }
 #else
 #define phantom_compat_ioctl NULL
diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index 23c91f5..fb6d7ad 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -233,7 +233,7 @@ static long gru_get_chiplet_status(unsigned long arg)
  *
  * Called to update file attributes via IOCTL calls.
  */
-static long gru_file_unlocked_ioctl(struct file *file, unsigned int req,
+static int gru_file_unlocked_ioctl(struct inode *inode, struct file *file, unsigned int req,
 				    unsigned long arg)
 {
 	int err = -EBADRQC;
diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index ddccc07..3ec394d 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -547,7 +547,7 @@ static int get_filter(void __user *arg, struct sock_filter **p)
 }
 #endif /* CONFIG_PPP_FILTER */
 
-static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int ppp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct ppp_file *pf = file->private_data;
 	struct ppp *ppp;
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index e1098c3..db5903f 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -201,7 +201,7 @@ struct pci_filp_private {
 	int write_combine;
 };
 
-static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd,
+static int proc_bus_pci_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 			       unsigned long arg)
 {
 	const struct proc_dir_entry *dp = PDE(file->f_dentry->d_inode);
diff --git a/drivers/rtc/rtc-dev.c b/drivers/rtc/rtc-dev.c
index f118252..ac41969 100644
--- a/drivers/rtc/rtc-dev.c
+++ b/drivers/rtc/rtc-dev.c
@@ -203,7 +203,7 @@ static unsigned int rtc_dev_poll(struct file *file, poll_table *wait)
 	return (data != 0) ? (POLLIN | POLLRDNORM) : 0;
 }
 
-static long rtc_dev_ioctl(struct file *file,
+static int rtc_dev_ioctl(struct inode *inode, struct file *file,
 		unsigned int cmd, unsigned long arg)
 {
 	int err = 0;
diff --git a/drivers/s390/block/dasd_int.h b/drivers/s390/block/dasd_int.h
index 31ecaa4..ab20e29 100644
--- a/drivers/s390/block/dasd_int.h
+++ b/drivers/s390/block/dasd_int.h
@@ -611,7 +611,7 @@ void dasd_destroy_partitions(struct dasd_block *);
 
 /* externals in dasd_ioctl.c */
 int  dasd_ioctl(struct inode *, struct file *, unsigned int, unsigned long);
-long dasd_compat_ioctl(struct file *, unsigned int, unsigned long);
+int dasd_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 
 /* externals in dasd_proc.c */
 int dasd_proc_init(void);
diff --git a/drivers/s390/char/tape_char.c b/drivers/s390/char/tape_char.c
index be0ce22..5b42a1f 100644
--- a/drivers/s390/char/tape_char.c
+++ b/drivers/s390/char/tape_char.c
@@ -37,7 +37,7 @@ static int tapechar_open(struct inode *,struct file *);
 static int tapechar_release(struct inode *,struct file *);
 static int tapechar_ioctl(struct inode *, struct file *, unsigned int,
 			  unsigned long);
-static long tapechar_compat_ioctl(struct file *, unsigned int,
+static int tapechar_compat_ioctl(struct inode *inode, struct file *, unsigned int,
 			  unsigned long);
 
 static const struct file_operations tape_fops =
diff --git a/drivers/s390/char/vmcp.c b/drivers/s390/char/vmcp.c
index 09e7d9b..3de2abe 100644
--- a/drivers/s390/char/vmcp.c
+++ b/drivers/s390/char/vmcp.c
@@ -138,7 +138,7 @@ vmcp_write(struct file *file, const char __user *buff, size_t count,
  * let userspace to change the response size, if userspace expects a bigger
  * response
  */
-static long vmcp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int vmcp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct vmcp_session *session;
 	int temp;
diff --git a/drivers/s390/cio/chsc_sch.c b/drivers/s390/cio/chsc_sch.c
index 91ca87a..6a0904e 100644
--- a/drivers/s390/cio/chsc_sch.c
+++ b/drivers/s390/cio/chsc_sch.c
@@ -737,7 +737,7 @@ out_free:
 	return ret;
 }
 
-static long chsc_ioctl(struct file *filp, unsigned int cmd,
+static int chsc_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 		       unsigned long arg)
 {
 	CHSC_MSG(2, "chsc_ioctl called, cmd=%x\n", cmd);
diff --git a/drivers/s390/crypto/zcrypt_api.c b/drivers/s390/crypto/zcrypt_api.c
index cb22b97..6e82f85 100644
--- a/drivers/s390/crypto/zcrypt_api.c
+++ b/drivers/s390/crypto/zcrypt_api.c
@@ -621,7 +621,7 @@ static long zcrypt_ica_status(struct file *filp, unsigned long arg)
 	return ret;
 }
 
-static long zcrypt_unlocked_ioctl(struct file *filp, unsigned int cmd,
+static int zcrypt_unlocked_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 				  unsigned long arg)
 {
 	int rc;
@@ -872,7 +872,7 @@ static long trans_xcRB32(struct file *filp, unsigned int cmd,
 	return rc;
 }
 
-static long zcrypt_compat_ioctl(struct file *filp, unsigned int cmd,
+static int zcrypt_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 			 unsigned long arg)
 {
 	if (cmd == ICARSAMODEXPO)
diff --git a/drivers/s390/scsi/zfcp_cfdc.c b/drivers/s390/scsi/zfcp_cfdc.c
index ec2abce..de0380f 100644
--- a/drivers/s390/scsi/zfcp_cfdc.c
+++ b/drivers/s390/scsi/zfcp_cfdc.c
@@ -160,7 +160,7 @@ static void zfcp_cfdc_req_to_sense(struct zfcp_cfdc_data *data,
 	       sizeof(req->qtcb->bottom.support.els));
 }
 
-static long zfcp_cfdc_dev_ioctl(struct file *file, unsigned int command,
+static int zfcp_cfdc_dev_ioctl(struct inode *inode, struct file *file, unsigned int command,
 				unsigned long buffer)
 {
 	struct zfcp_cfdc_data *data;
diff --git a/drivers/sbus/char/cpwatchdog.c b/drivers/sbus/char/cpwatchdog.c
index 23abfdf..1d272b6 100644
--- a/drivers/sbus/char/cpwatchdog.c
+++ b/drivers/sbus/char/cpwatchdog.c
@@ -397,7 +397,7 @@ static int wd_ioctl(struct inode *inode, struct file *file,
 	return(0);
 }
 
-static long wd_compat_ioctl(struct file *file, unsigned int cmd,
+static int wd_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 		unsigned long arg)
 {
 	int rval = -ENOIOCTLCMD;
diff --git a/drivers/sbus/char/display7seg.c b/drivers/sbus/char/display7seg.c
index d8f5c0c..74842f3 100644
--- a/drivers/sbus/char/display7seg.c
+++ b/drivers/sbus/char/display7seg.c
@@ -117,7 +117,7 @@ static int d7s_release(struct inode *inode, struct file *f)
 	return 0;
 }
 
-static long d7s_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int d7s_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	__u8 regs = readb(d7s_regs);
 	__u8 ireg = 0;
diff --git a/drivers/sbus/char/openprom.c b/drivers/sbus/char/openprom.c
index 29dc735..9a37df0 100644
--- a/drivers/sbus/char/openprom.c
+++ b/drivers/sbus/char/openprom.c
@@ -650,7 +650,7 @@ static int openprom_ioctl(struct inode * inode, struct file * file,
 	};
 }
 
-static long openprom_compat_ioctl(struct file *file, unsigned int cmd,
+static int openprom_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 		unsigned long arg)
 {
 	long rval = -ENOTTY;
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 9aa301c..ff0ec51 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -751,7 +751,7 @@ static int aac_compat_ioctl(struct scsi_device *sdev, int cmd, void __user *arg)
 	return aac_compat_do_ioctl(dev, cmd, (unsigned long)arg);
 }
 
-static long aac_compat_cfg_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+static int aac_compat_cfg_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
 {
 	if (!capable(CAP_SYS_RAWIO))
 		return -EPERM;
diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index 3c257fe..a9ac914 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -596,7 +596,7 @@ ch_checkrange(scsi_changer *ch, unsigned int type, unsigned int unit)
 	return 0;
 }
 
-static long ch_ioctl(struct file *file,
+static int ch_ioctl(struct inode *inode, struct file *file,
 		    unsigned int cmd, unsigned long arg)
 {
 	scsi_changer *ch = file->private_data;
@@ -843,7 +843,7 @@ struct changer_element_status32 {
 };
 #define CHIOGSTATUS32  _IOW('c', 8,struct changer_element_status32)
 
-static long ch_ioctl_compat(struct file * file,
+static int ch_ioctl_compat(struct inode *inode, struct file * file,
 			    unsigned int cmd, unsigned long arg)
 {
 	scsi_changer *ch = file->private_data;
@@ -858,7 +858,7 @@ static long ch_ioctl_compat(struct file * file,
 	case CHIOINITELEM:
 	case CHIOSVOLTAG:
 		/* compatible */
-		return ch_ioctl(file, cmd, arg);
+		return ch_ioctl(inode, file, cmd, arg);
 	case CHIOGSTATUS32:
 	{
 		struct changer_element_status32 ces32;
diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index 1fe0901..0c4e821 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -115,7 +115,7 @@ static int hba_count = 0;
 static struct class *adpt_sysfs_class;
 
 #ifdef CONFIG_COMPAT
-static long compat_adpt_ioctl(struct file *, unsigned int, unsigned long);
+static int compat_adpt_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 #endif
 
 static const struct file_operations adpt_fops = {
@@ -2147,14 +2147,11 @@ static int adpt_ioctl(struct inode *inode, struct file *file, uint cmd,
 }
 
 #ifdef CONFIG_COMPAT
-static long compat_adpt_ioctl(struct file *file,
+static int compat_adpt_ioctl(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode;
 	long ret;
  
-	inode = file->f_dentry->d_inode;
- 
 	lock_kernel();
  
 	switch(cmd) {
diff --git a/drivers/scsi/megaraid/megaraid_mm.c b/drivers/scsi/megaraid/megaraid_mm.c
index f680561..e3d9a55 100644
--- a/drivers/scsi/megaraid/megaraid_mm.c
+++ b/drivers/scsi/megaraid/megaraid_mm.c
@@ -44,7 +44,7 @@ static void mraid_mm_free_adp_resources(mraid_mmadp_t *);
 static void mraid_mm_teardown_dma_pools(mraid_mmadp_t *);
 
 #ifdef CONFIG_COMPAT
-static long mraid_mm_compat_ioctl(struct file *, unsigned int, unsigned long);
+static int mraid_mm_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 #endif
 
 MODULE_AUTHOR("LSI Logic Corporation");
@@ -1218,13 +1218,13 @@ mraid_mm_init(void)
  * @cmd		: ioctl command
  * @arg		: user ioctl packet
  */
-static long
-mraid_mm_compat_ioctl(struct file *filep, unsigned int cmd,
-		      unsigned long arg)
+static int
+mraid_mm_compat_ioctl(struct inode *inode, struct file *filep,
+		      unsigned int cmd, unsigned long arg)
 {
 	int err;
 
-	err = mraid_mm_ioctl(NULL, filep, cmd, arg);
+	err = mraid_mm_ioctl(inode, filep, cmd, arg);
 
 	return err;
 }
diff --git a/drivers/scsi/megaraid/megaraid_sas.c b/drivers/scsi/megaraid/megaraid_sas.c
index 97b7633..8bcd1bd 100644
--- a/drivers/scsi/megaraid/megaraid_sas.c
+++ b/drivers/scsi/megaraid/megaraid_sas.c
@@ -3269,8 +3269,8 @@ static int megasas_mgmt_ioctl_aen(struct file *file, unsigned long arg)
 /**
  * megasas_mgmt_ioctl -	char node ioctl entry point
  */
-static long
-megasas_mgmt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+megasas_mgmt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case MEGASAS_IOC_FIRMWARE:
@@ -3324,9 +3324,9 @@ static int megasas_mgmt_compat_ioctl_fw(struct file *file, unsigned long arg)
 	return error;
 }
 
-static long
-megasas_mgmt_compat_ioctl(struct file *file, unsigned int cmd,
-			  unsigned long arg)
+static int
+megasas_mgmt_compat_ioctl(struct inode *inode, struct file *file,
+			  unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case MEGASAS_IOC_FIRMWARE32:
diff --git a/drivers/scsi/osst.c b/drivers/scsi/osst.c
index 1c79f97..4d6867f 100644
--- a/drivers/scsi/osst.c
+++ b/drivers/scsi/osst.c
@@ -5191,7 +5191,7 @@ out:
 }
 
 #ifdef CONFIG_COMPAT
-static long osst_compat_ioctl(struct file * file, unsigned int cmd_in, unsigned long arg)
+static int osst_compat_ioctl(struct inode *inode, struct file * file, unsigned int cmd_in, unsigned long arg)
 {
 	struct osst_tape *STp = file->private_data;
 	struct scsi_device *sdev = STp->device;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e5e7d78..e283650 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -921,7 +921,7 @@ static void sd_rescan(struct device *dev)
  * This gets directly called from VFS. When the ioctl 
  * is not recognized we go back to the other translation paths. 
  */
-static long sd_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int sd_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct block_device *bdev = file->f_path.dentry->d_inode->i_bdev;
 	struct gendisk *disk = bdev->bd_disk;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 661f9f2..5d4e1aa 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1113,7 +1113,7 @@ sg_ioctl(struct inode *inode, struct file *filp,
 }
 
 #ifdef CONFIG_COMPAT
-static long sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
+static int sg_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
 	Sg_device *sdp;
 	Sg_fd *sfp;
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index c2bb53e..245c8ba 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -3233,7 +3233,7 @@ static int partition_tape(struct scsi_tape *STp, int size)
 
 
 /* The ioctl command */
-static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
+static int st_ioctl(struct inode *inode, struct file *file, unsigned int cmd_in, unsigned long arg)
 {
 	int i, cmd_nr, cmd_type, bt;
 	int retval = 0;
@@ -3586,7 +3586,7 @@ static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
 }
 
 #ifdef CONFIG_COMPAT
-static long st_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int st_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct scsi_tape *STp = file->private_data;
 	struct scsi_device *sdev = STp->device;
diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index e5e0cfe..70b3a16 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -299,8 +299,8 @@ done:
 	return status;
 }
 
-static long
-spidev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int
+spidev_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	int			err = 0;
 	int			retval = 0;
diff --git a/drivers/telephony/ixj.c b/drivers/telephony/ixj.c
index ec7aeb5..afea51d 100644
--- a/drivers/telephony/ixj.c
+++ b/drivers/telephony/ixj.c
@@ -6661,7 +6661,7 @@ static long do_ixj_ioctl(struct file *file_p, unsigned int cmd, unsigned long ar
 	return retval;
 }
 
-static long ixj_ioctl(struct file *file_p, unsigned int cmd, unsigned long arg)
+static int ixj_ioctl(struct inode *inode, struct file *file_p, unsigned int cmd, unsigned long arg)
 {
 	long ret;
 	lock_kernel();
diff --git a/drivers/usb/class/usblp.c b/drivers/usb/class/usblp.c
index 0647164..0fdca42 100644
--- a/drivers/usb/class/usblp.c
+++ b/drivers/usb/class/usblp.c
@@ -487,7 +487,7 @@ static unsigned int usblp_poll(struct file *file, struct poll_table_struct *wait
 	return ret;
 }
 
-static long usblp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int usblp_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct usblp *usblp = file->private_data;
 	int length, err, i;
diff --git a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c
index f4585d3..c772d34 100644
--- a/drivers/usb/gadget/inode.c
+++ b/drivers/usb/gadget/inode.c
@@ -482,7 +482,7 @@ ep_release (struct inode *inode, struct file *fd)
 	return 0;
 }
 
-static long ep_ioctl(struct file *fd, unsigned code, unsigned long value)
+static int ep_ioctl(struct inode *inode, struct file *fd, unsigned code, unsigned long value)
 {
 	struct ep_data		*data = fd->private_data;
 	int			status;
@@ -1292,7 +1292,7 @@ out:
        return mask;
 }
 
-static long dev_ioctl (struct file *fd, unsigned code, unsigned long value)
+static int dev_ioctl(struct inode *inode, struct file *fd, unsigned code, unsigned long value)
 {
 	struct dev_data		*dev = fd->private_data;
 	struct usb_gadget	*gadget = dev->gadget;
@@ -1300,7 +1300,7 @@ static long dev_ioctl (struct file *fd, unsigned code, unsigned long value)
 
 	if (gadget->ops->ioctl) {
 		lock_kernel();
-		ret = gadget->ops->ioctl (gadget, code, value);
+		ret = gadget->ops->ioctl(gadget, code, value);
 		unlock_kernel();
 	}
 	return ret;
diff --git a/drivers/usb/gadget/printer.c b/drivers/usb/gadget/printer.c
index e009008..d02ce89 100644
--- a/drivers/usb/gadget/printer.c
+++ b/drivers/usb/gadget/printer.c
@@ -828,8 +828,8 @@ printer_poll(struct file *fd, poll_table *wait)
 	return status;
 }
 
-static long
-printer_ioctl(struct file *fd, unsigned int code, unsigned long arg)
+static int
+printer_ioctl(struct inode *inode, struct file *fd, unsigned int code, unsigned long arg)
 {
 	struct printer_dev	*dev = fd->private_data;
 	unsigned long		flags;
diff --git a/drivers/usb/misc/iowarrior.c b/drivers/usb/misc/iowarrior.c
index a4ef77e..5e3411a 100644
--- a/drivers/usb/misc/iowarrior.c
+++ b/drivers/usb/misc/iowarrior.c
@@ -473,7 +473,7 @@ exit:
 /**
  *	iowarrior_ioctl
  */
-static long iowarrior_ioctl(struct file *file, unsigned int cmd,
+static int iowarrior_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	struct iowarrior *dev = NULL;
diff --git a/drivers/usb/misc/rio500.c b/drivers/usb/misc/rio500.c
index 248a12a..3ba8ef2 100644
--- a/drivers/usb/misc/rio500.c
+++ b/drivers/usb/misc/rio500.c
@@ -104,7 +104,7 @@ static int close_rio(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long ioctl_rio(struct file *file, unsigned int cmd, unsigned long arg)
+static int ioctl_rio(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct RioCommand rio_cmd;
 	struct rio_usb_data *rio = &rio_instance;
diff --git a/drivers/usb/misc/sisusbvga/sisusb.c b/drivers/usb/misc/sisusbvga/sisusb.c
index 69c34a5..26142aa 100644
--- a/drivers/usb/misc/sisusbvga/sisusb.c
+++ b/drivers/usb/misc/sisusbvga/sisusb.c
@@ -2982,8 +2982,8 @@ sisusb_handle_command(struct sisusb_usb_data *sisusb, struct sisusb_command *y,
 	return retval;
 }
 
-static long
-sisusb_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+sisusb_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct sisusb_usb_data *sisusb;
 	struct sisusb_info x;
@@ -3058,8 +3058,8 @@ err_out:
 }
 
 #ifdef SISUSB_NEW_CONFIG_COMPAT
-static long
-sisusb_compat_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int
+sisusb_compat_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
 {
 	long retval;
 
@@ -3067,7 +3067,7 @@ sisusb_compat_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 		case SISUSB_GET_CONFIG_SIZE:
 		case SISUSB_GET_CONFIG:
 		case SISUSB_COMMAND:
-			retval = sisusb_ioctl(f, cmd, arg);
+			retval = sisusb_ioctl(inode, f, cmd, arg);
 			return retval;
 
 		default:
diff --git a/drivers/usb/misc/usblcd.c b/drivers/usb/misc/usblcd.c
index 2db4228..3f46226 100644
--- a/drivers/usb/misc/usblcd.c
+++ b/drivers/usb/misc/usblcd.c
@@ -146,7 +146,7 @@ static ssize_t lcd_read(struct file *file, char __user * buffer, size_t count, l
 	return retval;
 }
 
-static long lcd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int lcd_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct usb_lcd *dev;
 	u16 bcdDevice;
diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c
index 98843c2..aebf6f0 100644
--- a/drivers/video/fbmem.c
+++ b/drivers/video/fbmem.c
@@ -1223,10 +1223,8 @@ static int fb_get_fscreeninfo(struct inode *inode, struct file *file,
 	return err;
 }
 
-static long
-fb_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fb_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
 	int fbidx = iminor(inode);
 	struct fb_info *info = registered_fb[fbidx];
 	struct fb_ops *fb = info->fbops;
diff --git a/drivers/watchdog/acquirewdt.c b/drivers/watchdog/acquirewdt.c
index 6e46a55..7579e79 100644
--- a/drivers/watchdog/acquirewdt.c
+++ b/drivers/watchdog/acquirewdt.c
@@ -145,7 +145,7 @@ static ssize_t acq_write(struct file *file, const char __user *buf,
 	return count;
 }
 
-static long acq_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int acq_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int options, retval = -EINVAL;
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/advantechwdt.c b/drivers/watchdog/advantechwdt.c
index a5110f9..518c159 100644
--- a/drivers/watchdog/advantechwdt.c
+++ b/drivers/watchdog/advantechwdt.c
@@ -132,7 +132,7 @@ static ssize_t advwdt_write(struct file *file, const char __user *buf,
 	return count;
 }
 
-static long advwdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int advwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int new_timeout;
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/alim1535_wdt.c b/drivers/watchdog/alim1535_wdt.c
index 2a7690e..bde5fbc 100644
--- a/drivers/watchdog/alim1535_wdt.c
+++ b/drivers/watchdog/alim1535_wdt.c
@@ -176,7 +176,7 @@ static ssize_t ali_write(struct file *file, const char __user *data,
  *	we want an extension to enable irq ack monitoring and the like
  */
 
-static long ali_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int ali_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/alim7101_wdt.c b/drivers/watchdog/alim7101_wdt.c
index a045ef8..4c0ef21 100644
--- a/drivers/watchdog/alim7101_wdt.c
+++ b/drivers/watchdog/alim7101_wdt.c
@@ -234,7 +234,7 @@ static int fop_close(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/ar7_wdt.c b/drivers/watchdog/ar7_wdt.c
index 55dcbfe..2dcd13e 100644
--- a/drivers/watchdog/ar7_wdt.c
+++ b/drivers/watchdog/ar7_wdt.c
@@ -240,7 +240,7 @@ static ssize_t ar7_wdt_write(struct file *file, const char *data,
 	return len;
 }
 
-static long ar7_wdt_ioctl(struct file *file,
+static int ar7_wdt_ioctl(struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	static struct watchdog_info ident = {
diff --git a/drivers/watchdog/at32ap700x_wdt.c b/drivers/watchdog/at32ap700x_wdt.c
index e8ae638..aafa445 100644
--- a/drivers/watchdog/at32ap700x_wdt.c
+++ b/drivers/watchdog/at32ap700x_wdt.c
@@ -212,7 +212,7 @@ static struct watchdog_info at32_wdt_info = {
 /*
  * Handle commands from user-space.
  */
-static long at32_wdt_ioctl(struct file *file,
+static int at32_wdt_ioctl(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/at91rm9200_wdt.c b/drivers/watchdog/at91rm9200_wdt.c
index 993e5f5..8658fc7 100644
--- a/drivers/watchdog/at91rm9200_wdt.c
+++ b/drivers/watchdog/at91rm9200_wdt.c
@@ -128,7 +128,7 @@ static struct watchdog_info at91_wdt_info = {
 /*
  * Handle commands from user-space.
  */
-static long at91_wdt_ioctl(struct file *file,
+static int at91_wdt_ioctl(struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/bfin_wdt.c b/drivers/watchdog/bfin_wdt.c
index 31b4225..e82f2ed 100644
--- a/drivers/watchdog/bfin_wdt.c
+++ b/drivers/watchdog/bfin_wdt.c
@@ -248,7 +248,7 @@ static ssize_t bfin_wdt_write(struct file *file, const char __user *data,
  *	Query basic information from the device or ping it, as outlined by the
  *	watchdog API.
  */
-static long bfin_wdt_ioctl(struct file *file,
+static int bfin_wdt_ioctl(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/booke_wdt.c b/drivers/watchdog/booke_wdt.c
index c3b78a7..c5a7ce1 100644
--- a/drivers/watchdog/booke_wdt.c
+++ b/drivers/watchdog/booke_wdt.c
@@ -82,7 +82,7 @@ static struct watchdog_info ident = {
 	.identity = "PowerPC Book-E Watchdog",
 };
 
-static long booke_wdt_ioctl(struct file *file,
+static int booke_wdt_ioctl(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
 	u32 tmp = 0;
diff --git a/drivers/watchdog/cpu5wdt.c b/drivers/watchdog/cpu5wdt.c
index 71f6d7e..feb30cd 100644
--- a/drivers/watchdog/cpu5wdt.c
+++ b/drivers/watchdog/cpu5wdt.c
@@ -148,7 +148,7 @@ static int cpu5wdt_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long cpu5wdt_ioctl(struct file *file, unsigned int cmd,
+static int cpu5wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/davinci_wdt.c b/drivers/watchdog/davinci_wdt.c
index 2e13602..81f676a 100644
--- a/drivers/watchdog/davinci_wdt.c
+++ b/drivers/watchdog/davinci_wdt.c
@@ -142,7 +142,7 @@ static struct watchdog_info ident = {
 	.identity = "DaVinci Watchdog",
 };
 
-static long davinci_wdt_ioctl(struct file *file,
+static int davinci_wdt_ioctl(struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/ep93xx_wdt.c b/drivers/watchdog/ep93xx_wdt.c
index e9f950f..496a5fa 100644
--- a/drivers/watchdog/ep93xx_wdt.c
+++ b/drivers/watchdog/ep93xx_wdt.c
@@ -135,7 +135,7 @@ static struct watchdog_info ident = {
 	.identity = "EP93xx Watchdog",
 };
 
-static long ep93xx_wdt_ioctl(struct file *file,
+static int ep93xx_wdt_ioctl(struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/eurotechwdt.c b/drivers/watchdog/eurotechwdt.c
index bbd14e3..ecb704d 100644
--- a/drivers/watchdog/eurotechwdt.c
+++ b/drivers/watchdog/eurotechwdt.c
@@ -233,7 +233,7 @@ size_t count, loff_t *ppos)
  * according to their available features.
  */
 
-static long eurwdt_ioctl(struct file *file,
+static int eurwdt_ioctl(struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index a3765e0..de4a065 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -556,7 +556,7 @@ static struct watchdog_info ident = {
 	.identity = "HP iLO2 HW Watchdog Timer",
 };
 
-static long hpwdt_ioctl(struct file *file, unsigned int cmd,
+static int hpwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 	unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/i6300esb.c b/drivers/watchdog/i6300esb.c
index c13383f..013eb0d 100644
--- a/drivers/watchdog/i6300esb.c
+++ b/drivers/watchdog/i6300esb.c
@@ -256,7 +256,7 @@ static ssize_t esb_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long esb_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int esb_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int new_options, retval = -EINVAL;
 	int new_heartbeat;
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index bfb93bc..4d1015a 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -510,7 +510,7 @@ static ssize_t iTCO_wdt_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long iTCO_wdt_ioctl(struct file *file, unsigned int cmd,
+static int iTCO_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int new_options, retval = -EINVAL;
diff --git a/drivers/watchdog/ib700wdt.c b/drivers/watchdog/ib700wdt.c
index 05a2810..53bf64d 100644
--- a/drivers/watchdog/ib700wdt.c
+++ b/drivers/watchdog/ib700wdt.c
@@ -187,7 +187,7 @@ static ssize_t ibwdt_write(struct file *file, const char __user *buf,
 	return count;
 }
 
-static long ibwdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int ibwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int new_margin;
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/ibmasr.c b/drivers/watchdog/ibmasr.c
index b82405c..14a32af 100644
--- a/drivers/watchdog/ibmasr.c
+++ b/drivers/watchdog/ibmasr.c
@@ -270,7 +270,7 @@ static ssize_t asr_write(struct file *file, const char __user *buf,
 	return count;
 }
 
-static long asr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int asr_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	static const struct watchdog_info ident = {
 		.options =	WDIOF_KEEPALIVEPING |
diff --git a/drivers/watchdog/indydog.c b/drivers/watchdog/indydog.c
index 73c9e79..97e8619 100644
--- a/drivers/watchdog/indydog.c
+++ b/drivers/watchdog/indydog.c
@@ -108,7 +108,7 @@ static ssize_t indydog_write(struct file *file, const char *data,
 	return len;
 }
 
-static long indydog_ioctl(struct file *file, unsigned int cmd,
+static int indydog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int options, retval = -EINVAL;
diff --git a/drivers/watchdog/iop_wdt.c b/drivers/watchdog/iop_wdt.c
index 96eb2cb..91070a7 100644
--- a/drivers/watchdog/iop_wdt.c
+++ b/drivers/watchdog/iop_wdt.c
@@ -130,7 +130,7 @@ static const struct watchdog_info ident = {
 	.identity = "iop watchdog",
 };
 
-static long iop_wdt_ioctl(struct file *file,
+static int iop_wdt_ioctl(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
 	int options;
diff --git a/drivers/watchdog/it8712f_wdt.c b/drivers/watchdog/it8712f_wdt.c
index 2270ee0..a2851a6 100644
--- a/drivers/watchdog/it8712f_wdt.c
+++ b/drivers/watchdog/it8712f_wdt.c
@@ -231,7 +231,7 @@ static ssize_t it8712f_wdt_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long it8712f_wdt_ioctl(struct file *file, unsigned int cmd,
+static int it8712f_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/ixp2000_wdt.c b/drivers/watchdog/ixp2000_wdt.c
index 4f4b35a..4e8c501 100644
--- a/drivers/watchdog/ixp2000_wdt.c
+++ b/drivers/watchdog/ixp2000_wdt.c
@@ -105,7 +105,7 @@ static struct watchdog_info ident = {
 	.identity	= "IXP2000 Watchdog",
 };
 
-static long ixp2000_wdt_ioctl(struct file *file, unsigned int cmd,
+static int ixp2000_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/ixp4xx_wdt.c b/drivers/watchdog/ixp4xx_wdt.c
index 8302ef0..0933442 100644
--- a/drivers/watchdog/ixp4xx_wdt.c
+++ b/drivers/watchdog/ixp4xx_wdt.c
@@ -96,7 +96,7 @@ static struct watchdog_info ident = {
 };
 
 
-static long ixp4xx_wdt_ioctl(struct file *file, unsigned int cmd,
+static int ixp4xx_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/ks8695_wdt.c b/drivers/watchdog/ks8695_wdt.c
index 0b798fd..76b310f 100644
--- a/drivers/watchdog/ks8695_wdt.c
+++ b/drivers/watchdog/ks8695_wdt.c
@@ -152,7 +152,7 @@ static struct watchdog_info ks8695_wdt_info = {
 /*
  * Handle commands from user-space.
  */
-static long ks8695_wdt_ioctl(struct file *file, unsigned int cmd,
+static int ks8695_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/machzwd.c b/drivers/watchdog/machzwd.c
index 2dfc275..fb840c5 100644
--- a/drivers/watchdog/machzwd.c
+++ b/drivers/watchdog/machzwd.c
@@ -303,7 +303,7 @@ static ssize_t zf_write(struct file *file, const char __user *buf, size_t count,
 	return count;
 }
 
-static long zf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int zf_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/mixcomwd.c b/drivers/watchdog/mixcomwd.c
index 407b025..cc1d238 100644
--- a/drivers/watchdog/mixcomwd.c
+++ b/drivers/watchdog/mixcomwd.c
@@ -195,7 +195,7 @@ static ssize_t mixcomwd_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long mixcomwd_ioctl(struct file *file,
+static int mixcomwd_ioctl(struct inode *inode, struct file *file,
 				unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/mpc5200_wdt.c b/drivers/watchdog/mpc5200_wdt.c
index db91892..614bc2c 100644
--- a/drivers/watchdog/mpc5200_wdt.c
+++ b/drivers/watchdog/mpc5200_wdt.c
@@ -94,7 +94,7 @@ static struct watchdog_info mpc5200_wdt_info = {
 	.options	= WDIOF_SETTIMEOUT | WDIOF_KEEPALIVEPING,
 	.identity	= "mpc5200 watchdog on GPT0",
 };
-static long mpc5200_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mpc5200_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	struct mpc5200_wdt *wdt = file->private_data;
diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
index 38c588e..243bce7 100644
--- a/drivers/watchdog/mpc8xxx_wdt.c
+++ b/drivers/watchdog/mpc8xxx_wdt.c
@@ -143,7 +143,7 @@ static int mpc8xxx_wdt_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long mpc8xxx_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mpc8xxx_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/mpcore_wdt.c b/drivers/watchdog/mpcore_wdt.c
index 2a9bfa8..91db86e 100644
--- a/drivers/watchdog/mpcore_wdt.c
+++ b/drivers/watchdog/mpcore_wdt.c
@@ -218,7 +218,7 @@ static struct watchdog_info ident = {
 	.identity		= "MPcore Watchdog",
 };
 
-static long mpcore_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mpcore_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	struct mpcore_wdt *wdt = file->private_data;
diff --git a/drivers/watchdog/mtx-1_wdt.c b/drivers/watchdog/mtx-1_wdt.c
index b4b7b0a..fd7f85d 100644
--- a/drivers/watchdog/mtx-1_wdt.c
+++ b/drivers/watchdog/mtx-1_wdt.c
@@ -136,7 +136,7 @@ static int mtx1_wdt_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long mtx1_wdt_ioctl(struct file *file, unsigned int cmd,
+static int mtx1_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/mv64x60_wdt.c b/drivers/watchdog/mv64x60_wdt.c
index acf589d..b3dea2d 100644
--- a/drivers/watchdog/mv64x60_wdt.c
+++ b/drivers/watchdog/mv64x60_wdt.c
@@ -173,7 +173,7 @@ static ssize_t mv64x60_wdt_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long mv64x60_wdt_ioctl(struct file *file,
+static int mv64x60_wdt_ioctl(struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	int timeout;
diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c
index 3a11dad..8bbc9bf 100644
--- a/drivers/watchdog/omap_wdt.c
+++ b/drivers/watchdog/omap_wdt.c
@@ -185,7 +185,7 @@ static ssize_t omap_wdt_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long omap_wdt_ioctl(struct file *file, unsigned int cmd,
+static int omap_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	int new_margin;
diff --git a/drivers/watchdog/pc87413_wdt.c b/drivers/watchdog/pc87413_wdt.c
index 484c215..9417f9c 100644
--- a/drivers/watchdog/pc87413_wdt.c
+++ b/drivers/watchdog/pc87413_wdt.c
@@ -397,7 +397,7 @@ static ssize_t pc87413_write(struct file *file, const char __user *data,
  *	querying capabilities and current status.
  */
 
-static long pc87413_ioctl(struct file *file, unsigned int cmd,
+static int pc87413_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	int new_timeout;
diff --git a/drivers/watchdog/pcwd.c b/drivers/watchdog/pcwd.c
index 9e1331a..0f6f9a6 100644
--- a/drivers/watchdog/pcwd.c
+++ b/drivers/watchdog/pcwd.c
@@ -594,7 +594,7 @@ static int pcwd_get_temperature(int *temperature)
  *	/dev/watchdog handling
  */
 
-static long pcwd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int pcwd_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int rv;
 	int status;
diff --git a/drivers/watchdog/pcwd_pci.c b/drivers/watchdog/pcwd_pci.c
index 90eb1d4..dae0372 100644
--- a/drivers/watchdog/pcwd_pci.c
+++ b/drivers/watchdog/pcwd_pci.c
@@ -453,7 +453,7 @@ static ssize_t pcipcwd_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long pcipcwd_ioctl(struct file *file, unsigned int cmd,
+static int pcipcwd_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/pcwd_usb.c b/drivers/watchdog/pcwd_usb.c
index c1685c9..68419ee 100644
--- a/drivers/watchdog/pcwd_usb.c
+++ b/drivers/watchdog/pcwd_usb.c
@@ -368,7 +368,7 @@ static ssize_t usb_pcwd_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long usb_pcwd_ioctl(struct file *file, unsigned int cmd,
+static int usb_pcwd_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/pnx4008_wdt.c b/drivers/watchdog/pnx4008_wdt.c
index 0ed8416..44ab680 100644
--- a/drivers/watchdog/pnx4008_wdt.c
+++ b/drivers/watchdog/pnx4008_wdt.c
@@ -173,7 +173,7 @@ static const struct watchdog_info ident = {
 	.identity = "PNX4008 Watchdog",
 };
 
-static long pnx4008_wdt_ioctl(struct inode *inode, struct file *file,
+static int pnx4008_wdt_ioctl(struct inode *inode, struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/rm9k_wdt.c b/drivers/watchdog/rm9k_wdt.c
index f1ae372..384738c 100644
--- a/drivers/watchdog/rm9k_wdt.c
+++ b/drivers/watchdog/rm9k_wdt.c
@@ -55,7 +55,7 @@ static int wdt_gpi_open(struct inode *, struct file *);
 static int wdt_gpi_release(struct inode *, struct file *);
 static ssize_t wdt_gpi_write(struct file *, const char __user *, size_t,
 								loff_t *);
-static long wdt_gpi_ioctl(struct file *, unsigned int, unsigned long);
+static int wdt_gpi_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 static int wdt_gpi_notify(struct notifier_block *, unsigned long, void *);
 static const struct resource *wdt_gpi_get_resource(struct platform_device *,
 						const char *, unsigned int);
@@ -244,7 +244,7 @@ static ssize_t wdt_gpi_write(struct file *f, const char __user *d, size_t s,
 	return s ? 1 : 0;
 }
 
-static long wdt_gpi_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+static int wdt_gpi_ioctl(struct inode *inode, struct file *f, unsigned int cmd, unsigned long arg)
 {
 	long res = -ENOTTY;
 	const long size = _IOC_SIZE(cmd);
diff --git a/drivers/watchdog/s3c2410_wdt.c b/drivers/watchdog/s3c2410_wdt.c
index 86d4280..28e9488 100644
--- a/drivers/watchdog/s3c2410_wdt.c
+++ b/drivers/watchdog/s3c2410_wdt.c
@@ -272,7 +272,7 @@ static const struct watchdog_info s3c2410_wdt_ident = {
 };
 
 
-static long s3c2410wdt_ioctl(struct file *file,	unsigned int cmd,
+static int s3c2410wdt_ioctl(struct inode *inode, struct file *file,	unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/sa1100_wdt.c b/drivers/watchdog/sa1100_wdt.c
index 31a4843..30d2bda 100644
--- a/drivers/watchdog/sa1100_wdt.c
+++ b/drivers/watchdog/sa1100_wdt.c
@@ -86,7 +86,7 @@ static const struct watchdog_info ident = {
 	.identity	= "SA1100/PXA255 Watchdog",
 };
 
-static long sa1100dog_ioctl(struct file *file, unsigned int cmd,
+static int sa1100dog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/sb_wdog.c b/drivers/watchdog/sb_wdog.c
index 27e526a..55aa97b 100644
--- a/drivers/watchdog/sb_wdog.c
+++ b/drivers/watchdog/sb_wdog.c
@@ -164,7 +164,7 @@ static ssize_t sbwdog_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long sbwdog_ioctl(struct file *file, unsigned int cmd,
+static int sbwdog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	int ret = -ENOTTY;
diff --git a/drivers/watchdog/sbc60xxwdt.c b/drivers/watchdog/sbc60xxwdt.c
index 3266daa..9507175 100644
--- a/drivers/watchdog/sbc60xxwdt.c
+++ b/drivers/watchdog/sbc60xxwdt.c
@@ -225,7 +225,7 @@ static int fop_close(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/sbc7240_wdt.c b/drivers/watchdog/sbc7240_wdt.c
index 67ddeb1..74d648c 100644
--- a/drivers/watchdog/sbc7240_wdt.c
+++ b/drivers/watchdog/sbc7240_wdt.c
@@ -168,7 +168,7 @@ static const struct watchdog_info ident = {
 };
 
 
-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case WDIOC_GETSUPPORT:
diff --git a/drivers/watchdog/sbc_epx_c3.c b/drivers/watchdog/sbc_epx_c3.c
index e5e470c..a367d9f 100644
--- a/drivers/watchdog/sbc_epx_c3.c
+++ b/drivers/watchdog/sbc_epx_c3.c
@@ -100,7 +100,7 @@ static ssize_t epx_c3_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long epx_c3_ioctl(struct file *file, unsigned int cmd,
+static int epx_c3_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	int options, retval = -EINVAL;
diff --git a/drivers/watchdog/sc1200wdt.c b/drivers/watchdog/sc1200wdt.c
index 23da3cc..0f57878 100644
--- a/drivers/watchdog/sc1200wdt.c
+++ b/drivers/watchdog/sc1200wdt.c
@@ -182,7 +182,7 @@ static int sc1200wdt_open(struct inode *inode, struct file *file)
 }
 
 
-static long sc1200wdt_ioctl(struct file *file, unsigned int cmd,
+static int sc1200wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 						unsigned long arg)
 {
 	int new_timeout;
diff --git a/drivers/watchdog/sc520_wdt.c b/drivers/watchdog/sc520_wdt.c
index a2b6c10..d2851bf 100644
--- a/drivers/watchdog/sc520_wdt.c
+++ b/drivers/watchdog/sc520_wdt.c
@@ -279,7 +279,7 @@ static int fop_close(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/scx200_wdt.c b/drivers/watchdog/scx200_wdt.c
index 9e19a10..8203518 100644
--- a/drivers/watchdog/scx200_wdt.c
+++ b/drivers/watchdog/scx200_wdt.c
@@ -155,7 +155,7 @@ static ssize_t scx200_wdt_write(struct file *file, const char __user *data,
 	return 0;
 }
 
-static long scx200_wdt_ioctl(struct file *file, unsigned int cmd,
+static int scx200_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/shwdt.c b/drivers/watchdog/shwdt.c
index cdc7138..8eebcb3 100644
--- a/drivers/watchdog/shwdt.c
+++ b/drivers/watchdog/shwdt.c
@@ -338,7 +338,7 @@ static int sh_wdt_mmap(struct file *file, struct vm_area_struct *vma)
  * 	Query basic information from the device or ping it, as outlined by the
  * 	watchdog API.
  */
-static long sh_wdt_ioctl(struct file *file, unsigned int cmd,
+static int sh_wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int new_heartbeat;
diff --git a/drivers/watchdog/smsc37b787_wdt.c b/drivers/watchdog/smsc37b787_wdt.c
index 988ff1d..09828f1 100644
--- a/drivers/watchdog/smsc37b787_wdt.c
+++ b/drivers/watchdog/smsc37b787_wdt.c
@@ -423,7 +423,7 @@ static ssize_t wb_smsc_wdt_write(struct file *file, const char __user *data,
 
 /* ioctl => control interface */
 
-static long wb_smsc_wdt_ioctl(struct file *file,
+static int wb_smsc_wdt_ioctl(struct inode *inode, struct file *file,
 					unsigned int cmd, unsigned long arg)
 {
 	int new_timeout;
diff --git a/drivers/watchdog/softdog.c b/drivers/watchdog/softdog.c
index c650464..9a2d3fa 100644
--- a/drivers/watchdog/softdog.c
+++ b/drivers/watchdog/softdog.c
@@ -192,7 +192,7 @@ static ssize_t softdog_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long softdog_ioctl(struct file *file, unsigned int cmd,
+static int softdog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/txx9wdt.c b/drivers/watchdog/txx9wdt.c
index 6adab77..8184c55 100644
--- a/drivers/watchdog/txx9wdt.c
+++ b/drivers/watchdog/txx9wdt.c
@@ -127,7 +127,7 @@ static ssize_t txx9wdt_write(struct file *file, const char __user *data,
 	return len;
 }
 
-static long txx9wdt_ioctl(struct file *file, unsigned int cmd,
+static int txx9wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/w83627hf_wdt.c b/drivers/watchdog/w83627hf_wdt.c
index 69396ad..9ec4bed 100644
--- a/drivers/watchdog/w83627hf_wdt.c
+++ b/drivers/watchdog/w83627hf_wdt.c
@@ -191,7 +191,7 @@ static ssize_t wdt_write(struct file *file, const char __user *buf,
 	return count;
 }
 
-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/w83697hf_wdt.c b/drivers/watchdog/w83697hf_wdt.c
index 445d30a..b969baa 100644
--- a/drivers/watchdog/w83697hf_wdt.c
+++ b/drivers/watchdog/w83697hf_wdt.c
@@ -229,7 +229,7 @@ static ssize_t wdt_write(struct file *file, const char __user *buf,
 	return count;
 }
 
-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/w83877f_wdt.c b/drivers/watchdog/w83877f_wdt.c
index 24587d2..36ed0b2 100644
--- a/drivers/watchdog/w83877f_wdt.c
+++ b/drivers/watchdog/w83877f_wdt.c
@@ -242,7 +242,7 @@ static int fop_close(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long fop_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int fop_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/w83977f_wdt.c b/drivers/watchdog/w83977f_wdt.c
index 2525da5..ab029dd 100644
--- a/drivers/watchdog/w83977f_wdt.c
+++ b/drivers/watchdog/w83977f_wdt.c
@@ -377,7 +377,7 @@ static struct watchdog_info ident = {
 	.identity = WATCHDOG_NAME,
 };
 
-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int status;
 	int new_options, retval = -EINVAL;
diff --git a/drivers/watchdog/wafer5823wdt.c b/drivers/watchdog/wafer5823wdt.c
index 68377ae..06e6cae 100644
--- a/drivers/watchdog/wafer5823wdt.c
+++ b/drivers/watchdog/wafer5823wdt.c
@@ -121,7 +121,7 @@ static ssize_t wafwdt_write(struct file *file, const char __user *buf,
 	return count;
 }
 
-static long wafwdt_ioctl(struct file *file, unsigned int cmd,
+static int wafwdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int new_timeout;
diff --git a/drivers/watchdog/wdrtas.c b/drivers/watchdog/wdrtas.c
index 5d3b1a8..823ed73 100644
--- a/drivers/watchdog/wdrtas.c
+++ b/drivers/watchdog/wdrtas.c
@@ -305,7 +305,7 @@ out:
  * wdrtas_ioctl implements the watchdog API ioctls
  */
 
-static long wdrtas_ioctl(struct file *file, unsigned int cmd,
+static int wdrtas_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int __user *argp = (void __user *)arg;
diff --git a/drivers/watchdog/wdt.c b/drivers/watchdog/wdt.c
index deeebb2..cfbba80 100644
--- a/drivers/watchdog/wdt.c
+++ b/drivers/watchdog/wdt.c
@@ -349,7 +349,7 @@ static ssize_t wdt_write(struct file *file, const char __user *buf,
  *	querying capabilities and current status.
  */
 
-static long wdt_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int wdt_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
 	int __user *p = argp;
diff --git a/drivers/watchdog/wdt285.c b/drivers/watchdog/wdt285.c
index db362c3..e799311 100644
--- a/drivers/watchdog/wdt285.c
+++ b/drivers/watchdog/wdt285.c
@@ -132,7 +132,7 @@ static const struct watchdog_info ident = {
 	.identity	= "Footbridge Watchdog",
 };
 
-static long watchdog_ioctl(struct file *file, unsigned int cmd,
+static int watchdog_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	unsigned int new_margin;
diff --git a/drivers/watchdog/wdt977.c b/drivers/watchdog/wdt977.c
index 60e28d4..348674f 100644
--- a/drivers/watchdog/wdt977.c
+++ b/drivers/watchdog/wdt977.c
@@ -351,7 +351,7 @@ static const struct watchdog_info ident = {
  *      according to their available features.
  */
 
-static long wdt977_ioctl(struct file *file, unsigned int cmd,
+static int wdt977_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int status;
diff --git a/drivers/watchdog/wdt_pci.c b/drivers/watchdog/wdt_pci.c
index ed02bdb..2d6a3e5 100644
--- a/drivers/watchdog/wdt_pci.c
+++ b/drivers/watchdog/wdt_pci.c
@@ -403,7 +403,7 @@ static ssize_t wdtpci_write(struct file *file, const char __user *buf,
  *	querying capabilities and current status.
  */
 
-static long wdtpci_ioctl(struct file *file, unsigned int cmd,
+static int wdtpci_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 							unsigned long arg)
 {
 	int new_heartbeat;
diff --git a/fs/bad_inode.c b/fs/bad_inode.c
index 5f1538c..acb7af1 100644
--- a/fs/bad_inode.c
+++ b/fs/bad_inode.c
@@ -61,13 +61,13 @@ static int bad_file_ioctl (struct inode *inode, struct file *filp,
 	return -EIO;
 }
 
-static long bad_file_unlocked_ioctl(struct file *file, unsigned cmd,
+static int bad_file_unlocked_ioctl(struct inode *inode, struct file *file, unsigned cmd,
 			unsigned long arg)
 {
 	return -EIO;
 }
 
-static long bad_file_compat_ioctl(struct file *file, unsigned int cmd,
+static int bad_file_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 			unsigned long arg)
 {
 	return -EIO;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index aff5421..d1384f0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1150,8 +1150,13 @@ static int blkdev_close(struct inode * inode, struct file * filp)
 	return blkdev_put(bdev);
 }
 
-static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+static int block_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
 {
+	/*
+	 * NOTE! We ignore the on-disk inode that was passed as
+	 * an argument, and use the "f_mapping->host" inode for
+	 * all block ioctls!
+	 */
 	return blkdev_ioctl(file->f_mapping->host, file, cmd, arg);
 }
 
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
index 135c965..f46a281 100644
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -95,7 +95,7 @@ extern int 	cifs_setxattr(struct dentry *, const char *, const void *,
 			size_t, int);
 extern ssize_t	cifs_getxattr(struct dentry *, const char *, void *, size_t);
 extern ssize_t	cifs_listxattr(struct dentry *, char *, size_t);
-extern long cifs_ioctl(struct file *filep, unsigned int cmd, unsigned long arg);
+extern int cifs_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg);
 
 #ifdef CONFIG_CIFS_EXPERIMENTAL
 extern const struct export_operations cifs_export_ops;
diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
index 0088a5b..c6b9fa4 100644
--- a/fs/cifs/ioctl.c
+++ b/fs/cifs/ioctl.c
@@ -30,9 +30,8 @@
 
 #define CIFS_IOC_CHECKUMOUNT _IO(0xCF, 2)
 
-long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg)
+int cifs_ioctl(struct inode *inode, struct file *filep, unsigned int command, unsigned long arg)
 {
-	struct inode *inode = filep->f_dentry->d_inode;
 	int rc = -ENOTTY; /* strange error - but the precedent */
 	int xid;
 	struct cifs_sb_info *cifs_sb;
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 5235c67..d3a3093 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -2804,7 +2804,8 @@ asmlinkage long compat_sys_ioctl(unsigned int fd, unsigned int cmd,
 
 	default:
 		if (filp->f_op && filp->f_op->compat_ioctl) {
-			error = filp->f_op->compat_ioctl(filp, cmd, arg);
+			struct inode *inode = filp->f_dentry->d_inode;
+			error = filp->f_op->compat_ioctl(inode, filp, cmd, arg);
 			if (error != -ENOIOCTLCMD)
 				goto out_fput;
 		}
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 47d88da..6924f85 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -138,8 +138,8 @@ int __ext2_write_begin(struct file *file, struct address_space *mapping,
 		struct page **pagep, void **fsdata);
 
 /* ioctl.c */
-extern long ext2_ioctl(struct file *, unsigned int, unsigned long);
-extern long ext2_compat_ioctl(struct file *, unsigned int, unsigned long);
+extern int ext2_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
+extern int ext2_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 
 /* namei.c */
 struct dentry *ext2_get_parent(struct dentry *child);
diff --git a/fs/ext2/ioctl.c b/fs/ext2/ioctl.c
index de876fa..ba84585 100644
--- a/fs/ext2/ioctl.c
+++ b/fs/ext2/ioctl.c
@@ -18,9 +18,8 @@
 #include <asm/uaccess.h>
 
 
-long ext2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int ext2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = filp->f_dentry->d_inode;
 	struct ext2_inode_info *ei = EXT2_I(inode);
 	unsigned int flags;
 	unsigned short rsv_window_size;
@@ -156,7 +155,7 @@ setflags_out:
 }
 
 #ifdef CONFIG_COMPAT
-long ext2_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ext2_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	/* These are just misnamed, they actually get/put from/to user an int */
 	switch (cmd) {
@@ -175,6 +174,6 @@ long ext2_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	default:
 		return -ENOIOCTLCMD;
 	}
-	return ext2_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
+	return ext2_ioctl(inode, file, cmd, (unsigned long) compat_ptr(arg));
 }
 #endif
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index 0d0c701..7cf4617 100644
--- a/fs/ext3/ioctl.c
+++ b/fs/ext3/ioctl.c
@@ -294,9 +294,8 @@ group_add_out:
 }
 
 #ifdef CONFIG_COMPAT
-long ext3_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ext3_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
 	int ret;
 
 	/* These are just misnamed, they actually get/put from/to user an int */
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2950032..4bee000 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1079,8 +1079,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
 /* ioctl.c */
-extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
-extern long ext4_compat_ioctl (struct file *, unsigned int, unsigned long);
+extern int ext4_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
+extern int ext4_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 
 /* migrate.c */
 extern int ext4_ext_migrate(struct inode *, struct file *, unsigned int,
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 7a6c2f1..f72db70 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -18,9 +18,8 @@
 #include "ext4_jbd2.h"
 #include "ext4.h"
 
-long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int ext4_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = filp->f_dentry->d_inode;
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	unsigned int flags;
 	unsigned short rsv_window_size;
@@ -275,7 +274,7 @@ setversion_out:
 }
 
 #ifdef CONFIG_COMPAT
-long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ext4_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	/* These are just misnamed, they actually get/put from/to user an int */
 	switch (cmd) {
@@ -316,6 +315,6 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	default:
 		return -ENOIOCTLCMD;
 	}
-	return ext4_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
+	return ext4_ioctl(inode, file, cmd, (unsigned long) compat_ptr(arg));
 }
 #endif
diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index cd4a016..32c94b1 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -796,10 +796,9 @@ static int fat_dir_ioctl(struct inode *inode, struct file *filp,
 
 FAT_IOCTL_FILLDIR_FUNC(fat_compat_ioctl_filldir, compat_dirent)
 
-static long fat_compat_dir_ioctl(struct file *filp, unsigned cmd,
+static int fat_compat_dir_ioctl(struct inode *inode, struct file *filp, unsigned cmd,
 				 unsigned long arg)
 {
-	struct inode *inode = filp->f_path.dentry->d_inode;
 	struct compat_dirent __user *d1 = compat_ptr(arg);
 	int short_only, both;
 
diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
index e9a366d..b7bf87c 100644
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -289,7 +289,7 @@ static int gfs2_set_flags(struct file *filp, u32 __user *ptr)
 	return do_gfs2_set_flags(filp, gfsflags, ~GFS2_DIF_JDATA);
 }
 
-static long gfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int gfs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	switch(cmd) {
 	case FS_IOC_GETFLAGS:
diff --git a/fs/inotify_user.c b/fs/inotify_user.c
index 6024942..9cfde4e 100644
--- a/fs/inotify_user.c
+++ b/fs/inotify_user.c
@@ -533,7 +533,7 @@ static int inotify_release(struct inode *ignored, struct file *file)
 	return 0;
 }
 
-static long inotify_ioctl(struct file *file, unsigned int cmd,
+static int inotify_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 			  unsigned long arg)
 {
 	struct inotify_device *dev;
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 7db32b3..2adb993 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -31,20 +31,20 @@
 static long vfs_ioctl(struct file *filp, unsigned int cmd,
 		      unsigned long arg)
 {
+	struct inode *inode;
 	int error = -ENOTTY;
 
 	if (!filp->f_op)
 		goto out;
 
+	inode = filp->f_path.dentry->d_inode;
 	if (filp->f_op->unlocked_ioctl) {
-		error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
+		error = filp->f_op->unlocked_ioctl(inode, filp, cmd, arg);
 		if (error == -ENOIOCTLCMD)
 			error = -EINVAL;
-		goto out;
 	} else if (filp->f_op->ioctl) {
 		lock_kernel();
-		error = filp->f_op->ioctl(filp->f_path.dentry->d_inode,
-					  filp, cmd, arg);
+		error = filp->f_op->ioctl(inode, filp, cmd, arg);
 		unlock_kernel();
 	}
 
diff --git a/fs/jffs2/ioctl.c b/fs/jffs2/ioctl.c
index 9d41f43..80aa967 100644
--- a/fs/jffs2/ioctl.c
+++ b/fs/jffs2/ioctl.c
@@ -12,7 +12,7 @@
 #include <linux/fs.h>
 #include "nodelist.h"
 
-long jffs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int jffs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	/* Later, this will provide for lsattr.jffs2 and chattr.jffs2, which
 	   will include compression support etc. */
diff --git a/fs/jffs2/os-linux.h b/fs/jffs2/os-linux.h
index 5e194a5..7ef2c62 100644
--- a/fs/jffs2/os-linux.h
+++ b/fs/jffs2/os-linux.h
@@ -167,7 +167,7 @@ int jffs2_fsync(struct file *, struct dentry *, int);
 int jffs2_do_readpage_unlock (struct inode *inode, struct page *pg);
 
 /* ioctl.c */
-long jffs2_ioctl(struct file *, unsigned int, unsigned long);
+int jffs2_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 
 /* symlink.c */
 extern const struct inode_operations jffs2_symlink_inode_operations;
diff --git a/fs/jfs/ioctl.c b/fs/jfs/ioctl.c
index afe222b..0fdf047 100644
--- a/fs/jfs/ioctl.c
+++ b/fs/jfs/ioctl.c
@@ -52,9 +52,8 @@ static long jfs_map_ext2(unsigned long flags, int from)
 }
 
 
-long jfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int jfs_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = filp->f_dentry->d_inode;
 	struct jfs_inode_info *jfs_inode = JFS_IP(inode);
 	unsigned int flags;
 
@@ -129,7 +128,7 @@ setflags_out:
 }
 
 #ifdef CONFIG_COMPAT
-long jfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int jfs_compat_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	/* While these ioctl numbers defined with 'long' and have different
 	 * numbers than the 64bit ABI,
@@ -143,6 +142,6 @@ long jfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		cmd = JFS_IOC_SETFLAGS;
 		break;
 	}
-	return jfs_ioctl(filp, cmd, arg);
+	return jfs_ioctl(inode, filp, cmd, arg);
 }
 #endif
diff --git a/fs/jfs/jfs_inode.h b/fs/jfs/jfs_inode.h
index adb2faf..a94ca32 100644
--- a/fs/jfs/jfs_inode.h
+++ b/fs/jfs/jfs_inode.h
@@ -22,8 +22,8 @@ struct fid;
 
 extern struct inode *ialloc(struct inode *, umode_t);
 extern int jfs_fsync(struct file *, struct dentry *, int);
-extern long jfs_ioctl(struct file *, unsigned int, unsigned long);
-extern long jfs_compat_ioctl(struct file *, unsigned int, unsigned long);
+extern int jfs_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
+extern int jfs_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 extern struct inode *jfs_iget(struct super_block *, unsigned long);
 extern int jfs_commit_inode(struct inode *, int);
 extern int jfs_write_inode(struct inode*, int);
diff --git a/fs/ncpfs/ioctl.c b/fs/ncpfs/ioctl.c
index 3a97c95..75c3c29 100644
--- a/fs/ncpfs/ioctl.c
+++ b/fs/ncpfs/ioctl.c
@@ -874,9 +874,8 @@ int ncp_ioctl(struct inode *inode, struct file *filp,
 }
 
 #ifdef CONFIG_COMPAT
-long ncp_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ncp_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
 	int ret;
 
 	lock_kernel();
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index 7b142f0..bf5c6a2 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -109,9 +109,8 @@ bail:
 	return status;
 }
 
-long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+int ocfs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = filp->f_path.dentry->d_inode;
 	unsigned int flags;
 	int new_clusters;
 	int status;
@@ -168,7 +167,7 @@ long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 }
 
 #ifdef CONFIG_COMPAT
-long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+int ocfs2_compat_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case OCFS2_IOC32_GETFLAGS:
@@ -189,6 +188,6 @@ long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 		return -ENOIOCTLCMD;
 	}
 
-	return ocfs2_ioctl(file, cmd, arg);
+	return ocfs2_ioctl(inode, file, cmd, arg);
 }
 #endif
diff --git a/fs/ocfs2/ioctl.h b/fs/ocfs2/ioctl.h
index cf9a5ee..0632b05 100644
--- a/fs/ocfs2/ioctl.h
+++ b/fs/ocfs2/ioctl.h
@@ -10,7 +10,7 @@
 #ifndef OCFS2_IOCTL_H
 #define OCFS2_IOCTL_H
 
-long ocfs2_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
-long ocfs2_compat_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+int ocfs2_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg);
+int ocfs2_compat_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg);
 
 #endif /* OCFS2_IOCTL_H */
diff --git a/fs/pipe.c b/fs/pipe.c
index fcba654..8765108 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -577,9 +577,8 @@ bad_pipe_w(struct file *filp, const char __user *buf, size_t count,
 	return -EBADF;
 }
 
-static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int pipe_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = filp->f_path.dentry->d_inode;
 	struct pipe_inode_info *pipe;
 	int count, buf, nrbufs;
 
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 8bb03f0..711bb4f 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -239,11 +239,11 @@ static unsigned int proc_reg_poll(struct file *file, struct poll_table_struct *p
 	return rv;
 }
 
-static long proc_reg_unlocked_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int proc_reg_unlocked_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
 	long rv = -ENOTTY;
-	long (*unlocked_ioctl)(struct file *, unsigned int, unsigned long);
+	int (*unlocked_ioctl)(struct inode *, struct file *, unsigned int, unsigned long);
 	int (*ioctl)(struct inode *, struct file *, unsigned int, unsigned long);
 
 	spin_lock(&pde->pde_unload_lock);
@@ -257,12 +257,12 @@ static long proc_reg_unlocked_ioctl(struct file *file, unsigned int cmd, unsigne
 	spin_unlock(&pde->pde_unload_lock);
 
 	if (unlocked_ioctl) {
-		rv = unlocked_ioctl(file, cmd, arg);
+		rv = unlocked_ioctl(inode, file, cmd, arg);
 		if (rv == -ENOIOCTLCMD)
 			rv = -EINVAL;
 	} else if (ioctl) {
 		lock_kernel();
-		rv = ioctl(file->f_path.dentry->d_inode, file, cmd, arg);
+		rv = ioctl(inode, file, cmd, arg);
 		unlock_kernel();
 	}
 
@@ -271,11 +271,11 @@ static long proc_reg_unlocked_ioctl(struct file *file, unsigned int cmd, unsigne
 }
 
 #ifdef CONFIG_COMPAT
-static long proc_reg_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int proc_reg_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
 	long rv = -ENOTTY;
-	long (*compat_ioctl)(struct file *, unsigned int, unsigned long);
+	int (*compat_ioctl)(struct inode *, struct file *, unsigned int, unsigned long);
 
 	spin_lock(&pde->pde_unload_lock);
 	if (!pde->proc_fops) {
@@ -287,7 +287,7 @@ static long proc_reg_compat_ioctl(struct file *file, unsigned int cmd, unsigned
 	spin_unlock(&pde->pde_unload_lock);
 
 	if (compat_ioctl)
-		rv = compat_ioctl(file, cmd, arg);
+		rv = compat_ioctl(inode, file, cmd, arg);
 
 	pde_users_dec(pde);
 	return rv;
diff --git a/fs/reiserfs/ioctl.c b/fs/reiserfs/ioctl.c
index 8303320..d85fe0d 100644
--- a/fs/reiserfs/ioctl.c
+++ b/fs/reiserfs/ioctl.c
@@ -115,10 +115,9 @@ setversion_out:
 }
 
 #ifdef CONFIG_COMPAT
-long reiserfs_compat_ioctl(struct file *file, unsigned int cmd,
+int reiserfs_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				unsigned long arg)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
 	int ret;
 
 	/* These are just misnamed, they actually get/put from/to user an int */
diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index 5e82cff..08cf595 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -145,10 +145,9 @@ out_unlock:
 	return err;
 }
 
-long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ubifs_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	int flags, err;
-	struct inode *inode = file->f_path.dentry->d_inode;
 
 	switch (cmd) {
 	case FS_IOC_GETFLAGS:
@@ -187,7 +186,7 @@ long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 }
 
 #ifdef CONFIG_COMPAT
-long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int ubifs_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	switch (cmd) {
 	case FS_IOC32_GETFLAGS:
@@ -199,6 +198,6 @@ long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	default:
 		return -ENOIOCTLCMD;
 	}
-	return ubifs_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
+	return ubifs_ioctl(inode, file, cmd, (unsigned long)compat_ptr(arg));
 }
 #endif
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index d7f706f..d82737e 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1639,10 +1639,10 @@ int ubifs_recover_size(struct ubifs_info *c);
 void ubifs_destroy_size_tree(struct ubifs_info *c);
 
 /* ioctl.c */
-long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int ubifs_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 void ubifs_set_inode_flags(struct inode *inode);
 #ifdef CONFIG_COMPAT
-long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int ubifs_compat_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 #endif
 
 /* compressor.c */
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 5311c1a..a4c1d10 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -376,14 +376,14 @@ xfs_file_mmap(
 	return 0;
 }
 
-STATIC long
+STATIC int
 xfs_file_ioctl(
+	struct inode	*inode,
 	struct file	*filp,
 	unsigned int	cmd,
 	unsigned long	p)
 {
 	int		error;
-	struct inode	*inode = filp->f_path.dentry->d_inode;
 
 	error = xfs_ioctl(XFS_I(inode), filp, 0, cmd, (void __user *)p);
 	xfs_iflags_set(XFS_I(inode), XFS_IMODIFIED);
@@ -397,14 +397,14 @@ xfs_file_ioctl(
 	return error;
 }
 
-STATIC long
+STATIC int
 xfs_file_ioctl_invis(
+	struct inode	*inode,
 	struct file	*filp,
 	unsigned int	cmd,
 	unsigned long	p)
 {
 	int		error;
-	struct inode	*inode = filp->f_path.dentry->d_inode;
 
 	error = xfs_ioctl(XFS_I(inode), filp, IO_INVIS, cmd, (void __user *)p);
 	xfs_iflags_set(XFS_I(inode), XFS_IMODIFIED);
diff --git a/fs/xfs/linux-2.6/xfs_ioctl32.c b/fs/xfs/linux-2.6/xfs_ioctl32.c
index a4b254e..dfe42ab 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl32.c
+++ b/fs/xfs/linux-2.6/xfs_ioctl32.c
@@ -469,8 +469,9 @@ xfs_compat_ioctl(
 	return error;
 }
 
-long
+int
 xfs_file_compat_ioctl(
+	struct inode		*inode,
 	struct file		*file,
 	unsigned		cmd,
 	unsigned long		arg)
@@ -478,8 +479,9 @@ xfs_file_compat_ioctl(
 	return xfs_compat_ioctl(0, file, cmd, arg);
 }
 
-long
+int
 xfs_file_compat_invis_ioctl(
+	struct inode		*inode,
 	struct file		*file,
 	unsigned		cmd,
 	unsigned long		arg)
diff --git a/fs/xfs/linux-2.6/xfs_ioctl32.h b/fs/xfs/linux-2.6/xfs_ioctl32.h
index 02de6e6..7e64783 100644
--- a/fs/xfs/linux-2.6/xfs_ioctl32.h
+++ b/fs/xfs/linux-2.6/xfs_ioctl32.h
@@ -18,7 +18,7 @@
 #ifndef __XFS_IOCTL32_H__
 #define __XFS_IOCTL32_H__
 
-extern long xfs_file_compat_ioctl(struct file *, unsigned, unsigned long);
-extern long xfs_file_compat_invis_ioctl(struct file *, unsigned, unsigned long);
+extern int xfs_file_compat_ioctl(struct inode *inode, struct file *, unsigned, unsigned long);
+extern int xfs_file_compat_invis_ioctl(struct inode *inode, struct file *, unsigned, unsigned long);
 
 #endif /* __XFS_IOCTL32_H__ */
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 80171ee..c30e0ab 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -841,7 +841,7 @@ extern void ext3_set_aops(struct inode *inode);
 /* ioctl.c */
 extern int ext3_ioctl (struct inode *, struct file *, unsigned int,
 		       unsigned long);
-extern long ext3_compat_ioctl (struct file *, unsigned int, unsigned long);
+extern int ext3_compat_ioctl (struct inode *, struct file *, unsigned int, unsigned long);
 
 /* namei.c */
 extern int ext3_orphan_add(handle_t *, struct inode *);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 580b513..9bcfbcd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1211,8 +1211,8 @@ struct block_device_operations {
 	int (*open) (struct inode *, struct file *);
 	int (*release) (struct inode *, struct file *);
 	int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);
-	long (*unlocked_ioctl) (struct file *, unsigned, unsigned long);
-	long (*compat_ioctl) (struct file *, unsigned, unsigned long);
+	int (*unlocked_ioctl) (struct inode *, struct file *, unsigned, unsigned long);
+	int (*compat_ioctl) (struct inode *, struct file *, unsigned, unsigned long);
 	int (*direct_access) (struct block_device *, sector_t,
 						void **, unsigned long *);
 	int (*media_changed) (struct gendisk *);
@@ -1242,8 +1242,8 @@ struct file_operations {
 	int (*readdir) (struct file *, void *, filldir_t);
 	unsigned int (*poll) (struct file *, struct poll_table_struct *);
 	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
-	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
-	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+	int (*unlocked_ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
+	int (*compat_ioctl) (struct inode *,struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
@@ -1656,7 +1656,7 @@ extern int blkdev_ioctl(struct inode *, struct file *, unsigned, unsigned long);
 extern int blkdev_driver_ioctl(struct inode *inode, struct file *file,
 			       struct gendisk *disk, unsigned cmd,
 			       unsigned long arg);
-extern long compat_blkdev_ioctl(struct file *, unsigned, unsigned long);
+extern int compat_blkdev_ioctl(struct inode *inode, struct file *, unsigned, unsigned long);
 extern int blkdev_get(struct block_device *, mode_t, unsigned);
 extern int blkdev_put(struct block_device *);
 extern int bd_claim(struct block_device *, void *);
diff --git a/include/linux/ncp_fs.h b/include/linux/ncp_fs.h
index 9f2d763..af7d026 100644
--- a/include/linux/ncp_fs.h
+++ b/include/linux/ncp_fs.h
@@ -211,7 +211,7 @@ void ncp_date_unix2dos(int unix_date, __le16 * time, __le16 * date);
 
 /* linux/fs/ncpfs/ioctl.c */
 int ncp_ioctl(struct inode *, struct file *, unsigned int, unsigned long);
-long ncp_compat_ioctl(struct file *, unsigned int, unsigned long);
+int ncp_compat_ioctl(struct inode *inode, struct file *, unsigned int, unsigned long);
 
 /* linux/fs/ncpfs/sock.c */
 int ncp_request2(struct ncp_server *server, int function,
diff --git a/include/linux/reiserfs_fs.h b/include/linux/reiserfs_fs.h
index e9963af..3422037 100644
--- a/include/linux/reiserfs_fs.h
+++ b/include/linux/reiserfs_fs.h
@@ -2174,7 +2174,7 @@ __u32 r5_hash(const signed char *msg, int len);
 /* prototypes from ioctl.c */
 int reiserfs_ioctl(struct inode *inode, struct file *filp,
 		   unsigned int cmd, unsigned long arg);
-long reiserfs_compat_ioctl(struct file *filp,
+int reiserfs_compat_ioctl(struct inode *inode, struct file *filp,
 		   unsigned int cmd, unsigned long arg);
 int reiserfs_unpack(struct inode *inode, struct file *filp);
 
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 0cbec74..bdb65a2 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -365,7 +365,7 @@ extern const struct file_operations tty_ldiscs_proc_fops;
 extern void tty_wakeup(struct tty_struct *tty);
 extern void tty_ldisc_flush(struct tty_struct *tty);
 
-extern long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+extern int tty_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 extern int tty_mode_ioctl(struct tty_struct *tty, struct file *file,
 			unsigned int cmd, unsigned long arg);
 extern int tty_perform_flush(struct tty_struct *tty, unsigned long arg);
diff --git a/include/linux/wanrouter.h b/include/linux/wanrouter.h
index e0aa396..82d2547 100644
--- a/include/linux/wanrouter.h
+++ b/include/linux/wanrouter.h
@@ -522,7 +522,7 @@ extern int wanrouter_proc_init(void);
 extern void wanrouter_proc_cleanup(void);
 extern int wanrouter_proc_add(struct wan_device *wandev);
 extern int wanrouter_proc_delete(struct wan_device *wandev);
-extern long wanrouter_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+extern int wanrouter_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 
 /* Public Data */
 /* list of registered devices */
diff --git a/include/media/v4l2-ioctl.h b/include/media/v4l2-ioctl.h
index dc64046..9ab9474 100644
--- a/include/media/v4l2-ioctl.h
+++ b/include/media/v4l2-ioctl.h
@@ -286,7 +286,7 @@ int v4l_compat_translate_ioctl(struct inode *inode, struct file *file,
 #endif
 
 /* 32 Bits compatibility layer for 64 bits processors */
-extern long v4l_compat_ioctl32(struct file *file, unsigned int cmd,
+extern int v4l_compat_ioctl32(struct inode *inode, struct file *file, unsigned int cmd,
 				unsigned long arg);
 
 extern int video_ioctl2(struct inode *inode, struct file *file,
diff --git a/kernel/power/user.c b/kernel/power/user.c
index a6332a3..6f8b19d 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -187,7 +187,7 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
 	return res;
 }
 
-static long snapshot_ioctl(struct file *filp, unsigned int cmd,
+static int snapshot_ioctl(struct inode *inode, struct file *filp, unsigned int cmd,
 							unsigned long arg)
 {
 	int error = 0;
diff --git a/net/irda/irnet/irnet_ppp.c b/net/irda/irnet/irnet_ppp.c
index 6d8ae03..ae45f37 100644
--- a/net/irda/irnet/irnet_ppp.c
+++ b/net/irda/irnet/irnet_ppp.c
@@ -631,8 +631,9 @@ dev_irnet_poll(struct file *	file,
  * This is the way pppd configure us and control us while the PPP
  * instance is active.
  */
-static long
+static int
 dev_irnet_ioctl(
+		struct inode *	inode,
 		struct file *	file,
 		unsigned int	cmd,
 		unsigned long	arg)
diff --git a/net/irda/irnet/irnet_ppp.h b/net/irda/irnet/irnet_ppp.h
index d9f8bd4..44bd8ec 100644
--- a/net/irda/irnet/irnet_ppp.h
+++ b/net/irda/irnet/irnet_ppp.h
@@ -76,8 +76,9 @@ static ssize_t
 static unsigned int
 	dev_irnet_poll(struct file *,
 		       poll_table *);
-static long
-	dev_irnet_ioctl(struct file *,
+static int
+	dev_irnet_ioctl(struct inode *,
+			struct file *,
 			unsigned int,
 			unsigned long);
 /* ------------------------ PPP INTERFACE ------------------------ */
diff --git a/net/socket.c b/net/socket.c
index 8ef8ba8..5d6824b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -107,9 +107,9 @@ static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 static int sock_close(struct inode *inode, struct file *file);
 static unsigned int sock_poll(struct file *file,
 			      struct poll_table_struct *wait);
-static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int sock_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 #ifdef CONFIG_COMPAT
-static long compat_sock_ioctl(struct file *file,
+static int compat_sock_ioctl(struct inode *inode, struct file *file,
 			      unsigned int cmd, unsigned long arg);
 #endif
 static int sock_fasync(int fd, struct file *filp, int on);
@@ -850,7 +850,7 @@ EXPORT_SYMBOL(dlci_ioctl_set);
  *	what to do with it - that's up to the protocol still.
  */
 
-static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+static int sock_ioctl(struct inode *inode, struct file *file, unsigned cmd, unsigned long arg)
 {
 	struct socket *sock;
 	struct sock *sk;
@@ -2316,7 +2316,7 @@ void socket_seq_show(struct seq_file *seq)
 #endif				/* CONFIG_PROC_FS */
 
 #ifdef CONFIG_COMPAT
-static long compat_sock_ioctl(struct file *file, unsigned cmd,
+static int compat_sock_ioctl(struct inode *inode, struct file *file, unsigned cmd,
 			      unsigned long arg)
 {
 	struct socket *sock = file->private_data;
diff --git a/net/wanrouter/wanmain.c b/net/wanrouter/wanmain.c
index 7f07152..2974428 100644
--- a/net/wanrouter/wanmain.c
+++ b/net/wanrouter/wanmain.c
@@ -349,9 +349,8 @@ __be16 wanrouter_type_trans(struct sk_buff *skb, struct net_device *dev)
  *	o execute requested action or pass command to the device driver
  */
 
-long wanrouter_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+int wanrouter_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
-	struct inode *inode = file->f_path.dentry->d_inode;
 	int err = 0;
 	struct proc_dir_entry *dent;
 	struct wan_device *wandev;
diff --git a/sound/core/control.c b/sound/core/control.c
index 281b2e2..f10a3f0 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -1149,7 +1149,7 @@ static int snd_ctl_tlv_ioctl(struct snd_ctl_file *file,
 	return err;
 }
 
-static long snd_ctl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_ctl_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_ctl_file *ctl;
 	struct snd_card *card;
diff --git a/sound/core/control_compat.c b/sound/core/control_compat.c
index 6101259..0af5c5b 100644
--- a/sound/core/control_compat.c
+++ b/sound/core/control_compat.c
@@ -390,7 +390,7 @@ enum {
 	SNDRV_CTL_IOCTL_ELEM_REPLACE32 = _IOWR('U', 0x18, struct snd_ctl_elem_info32),
 };
 
-static inline long snd_ctl_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static inline int snd_ctl_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_ctl_file *ctl;
 	struct snd_kctl_ioctl *p;
@@ -412,7 +412,7 @@ static inline long snd_ctl_ioctl_compat(struct file *file, unsigned int cmd, uns
 	case SNDRV_CTL_IOCTL_TLV_READ:
 	case SNDRV_CTL_IOCTL_TLV_WRITE:
 	case SNDRV_CTL_IOCTL_TLV_COMMAND:
-		return snd_ctl_ioctl(file, cmd, (unsigned long)argp);
+		return snd_ctl_ioctl(inode, file, cmd, (unsigned long)argp);
 	case SNDRV_CTL_IOCTL_ELEM_LIST32:
 		return snd_ctl_elem_list_compat(ctl->card, argp);
 	case SNDRV_CTL_IOCTL_ELEM_INFO32:
diff --git a/sound/core/hwdep.c b/sound/core/hwdep.c
index 6d6589f..7518eaa 100644
--- a/sound/core/hwdep.c
+++ b/sound/core/hwdep.c
@@ -231,7 +231,7 @@ static int snd_hwdep_dsp_load(struct snd_hwdep *hw,
 	return 0;
 }
 
-static long snd_hwdep_ioctl(struct file * file, unsigned int cmd,
+static int snd_hwdep_ioctl(struct inode *inode, struct file * file, unsigned int cmd,
 			    unsigned long arg)
 {
 	struct snd_hwdep *hw = file->private_data;
diff --git a/sound/core/hwdep_compat.c b/sound/core/hwdep_compat.c
index 3827c0c..3c7cc2a 100644
--- a/sound/core/hwdep_compat.c
+++ b/sound/core/hwdep_compat.c
@@ -59,7 +59,7 @@ enum {
 	SNDRV_HWDEP_IOCTL_DSP_LOAD32   = _IOW('H', 0x03, struct snd_hwdep_dsp_image32)
 };
 
-static long snd_hwdep_ioctl_compat(struct file * file, unsigned int cmd,
+static int snd_hwdep_ioctl_compat(struct inode *inode, struct file * file, unsigned int cmd,
 				   unsigned long arg)
 {
 	struct snd_hwdep *hw = file->private_data;
@@ -68,7 +68,7 @@ static long snd_hwdep_ioctl_compat(struct file * file, unsigned int cmd,
 	case SNDRV_HWDEP_IOCTL_PVERSION:
 	case SNDRV_HWDEP_IOCTL_INFO:
 	case SNDRV_HWDEP_IOCTL_DSP_STATUS:
-		return snd_hwdep_ioctl(file, cmd, (unsigned long)argp);
+		return snd_hwdep_ioctl(inode, file, cmd, (unsigned long)argp);
 	case SNDRV_HWDEP_IOCTL_DSP_LOAD32:
 		return snd_hwdep_dsp_load_compat(hw, argp);
 	}
diff --git a/sound/core/info.c b/sound/core/info.c
index c67773a..5f8e1e9 100644
--- a/sound/core/info.c
+++ b/sound/core/info.c
@@ -465,7 +465,7 @@ static unsigned int snd_info_entry_poll(struct file *file, poll_table * wait)
 	return mask;
 }
 
-static long snd_info_entry_ioctl(struct file *file, unsigned int cmd,
+static int snd_info_entry_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				unsigned long arg)
 {
 	struct snd_info_private_data *data;
diff --git a/sound/core/init.c b/sound/core/init.c
index df46bbc..82323a2 100644
--- a/sound/core/init.c
+++ b/sound/core/init.c
@@ -275,7 +275,7 @@ static unsigned int snd_disconnect_poll(struct file * file, poll_table * wait)
 	return POLLERR | POLLNVAL;
 }
 
-static long snd_disconnect_ioctl(struct file *file,
+static int snd_disconnect_ioctl(struct inode *inode, struct file *file,
 				 unsigned int cmd, unsigned long arg)
 {
 	return -ENODEV;
diff --git a/sound/core/oss/mixer_oss.c b/sound/core/oss/mixer_oss.c
index 581aa2c..273f177 100644
--- a/sound/core/oss/mixer_oss.c
+++ b/sound/core/oss/mixer_oss.c
@@ -359,7 +359,7 @@ static int snd_mixer_oss_ioctl1(struct snd_mixer_oss_file *fmixer, unsigned int
 	return -ENXIO;
 }
 
-static long snd_mixer_oss_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_mixer_oss_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	return snd_mixer_oss_ioctl1((struct snd_mixer_oss_file *) file->private_data, cmd, arg);
 }
diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c
index 4c601b1..229513c 100644
--- a/sound/core/oss/pcm_oss.c
+++ b/sound/core/oss/pcm_oss.c
@@ -2428,7 +2428,7 @@ static int snd_pcm_oss_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static long snd_pcm_oss_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_pcm_oss_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_pcm_oss_file *pcm_oss_file;
 	int __user *p = (int __user *)arg;
diff --git a/sound/core/pcm_compat.c b/sound/core/pcm_compat.c
index 49aa693..f480fda 100644
--- a/sound/core/pcm_compat.c
+++ b/sound/core/pcm_compat.c
@@ -460,7 +460,7 @@ enum {
 
 };
 
-static long snd_pcm_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_pcm_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_pcm_file *pcm_file;
 	struct snd_pcm_substream *substream;
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index c49b9d9..4c703aa 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2726,7 +2726,7 @@ static int snd_pcm_capture_ioctl1(struct file *file,
 	return snd_pcm_common_ioctl1(file, substream, cmd, arg);
 }
 
-static long snd_pcm_playback_ioctl(struct file *file, unsigned int cmd,
+static int snd_pcm_playback_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				   unsigned long arg)
 {
 	struct snd_pcm_file *pcm_file;
@@ -2740,7 +2740,7 @@ static long snd_pcm_playback_ioctl(struct file *file, unsigned int cmd,
 				       (void __user *)arg);
 }
 
-static long snd_pcm_capture_ioctl(struct file *file, unsigned int cmd,
+static int snd_pcm_capture_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				  unsigned long arg)
 {
 	struct snd_pcm_file *pcm_file;
diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c
index f7ea728..8c103ca 100644
--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -687,7 +687,7 @@ static int snd_rawmidi_input_status(struct snd_rawmidi_substream *substream,
 	return 0;
 }
 
-static long snd_rawmidi_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_rawmidi_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_rawmidi_file *rfile;
 	void __user *argp = (void __user *)arg;
diff --git a/sound/core/rawmidi_compat.c b/sound/core/rawmidi_compat.c
index 5268c1f..2764275 100644
--- a/sound/core/rawmidi_compat.c
+++ b/sound/core/rawmidi_compat.c
@@ -99,7 +99,7 @@ enum {
 	SNDRV_RAWMIDI_IOCTL_STATUS32 = _IOWR('W', 0x20, struct snd_rawmidi_status32),
 };
 
-static long snd_rawmidi_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_rawmidi_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_rawmidi_file *rfile;
 	void __user *argp = compat_ptr(arg);
@@ -110,7 +110,7 @@ static long snd_rawmidi_ioctl_compat(struct file *file, unsigned int cmd, unsign
 	case SNDRV_RAWMIDI_IOCTL_INFO:
 	case SNDRV_RAWMIDI_IOCTL_DROP:
 	case SNDRV_RAWMIDI_IOCTL_DRAIN:
-		return snd_rawmidi_ioctl(file, cmd, (unsigned long)argp);
+		return snd_rawmidi_ioctl(inode, file, cmd, (unsigned long)argp);
 	case SNDRV_RAWMIDI_IOCTL_PARAMS32:
 		return snd_rawmidi_ioctl_params_compat(rfile, argp);
 	case SNDRV_RAWMIDI_IOCTL_STATUS32:
diff --git a/sound/core/seq/oss/seq_oss.c b/sound/core/seq/oss/seq_oss.c
index 777796e..b1fd18e 100644
--- a/sound/core/seq/oss/seq_oss.c
+++ b/sound/core/seq/oss/seq_oss.c
@@ -63,7 +63,7 @@ static int odev_open(struct inode *inode, struct file *file);
 static int odev_release(struct inode *inode, struct file *file);
 static ssize_t odev_read(struct file *file, char __user *buf, size_t count, loff_t *offset);
 static ssize_t odev_write(struct file *file, const char __user *buf, size_t count, loff_t *offset);
-static long odev_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+static int odev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg);
 static unsigned int odev_poll(struct file *file, poll_table * wait);
 
 
@@ -178,8 +178,8 @@ odev_write(struct file *file, const char __user *buf, size_t count, loff_t *offs
 	return snd_seq_oss_write(dp, buf, count, file);
 }
 
-static long
-odev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int
+odev_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct seq_oss_devinfo *dp;
 	dp = file->private_data;
diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c
index 7a1545d..d9ebb9d 100644
--- a/sound/core/seq/seq_clientmgr.c
+++ b/sound/core/seq/seq_clientmgr.c
@@ -2191,7 +2191,7 @@ static int snd_seq_do_ioctl(struct snd_seq_client *client, unsigned int cmd,
 }
 
 
-static long snd_seq_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_seq_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_seq_client *client = file->private_data;
 
diff --git a/sound/core/seq/seq_compat.c b/sound/core/seq/seq_compat.c
index 9628c06..f1a7060 100644
--- a/sound/core/seq/seq_compat.c
+++ b/sound/core/seq/seq_compat.c
@@ -87,7 +87,7 @@ enum {
 	SNDRV_SEQ_IOCTL_QUERY_NEXT_PORT32 = _IOWR('S', 0x52, struct snd_seq_port_info32),
 };
 
-static long snd_seq_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_seq_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	struct snd_seq_client *client = file->private_data;
 	void __user *argp = compat_ptr(arg);
diff --git a/sound/core/timer.c b/sound/core/timer.c
index 0af337e..f505d69 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1756,7 +1756,7 @@ enum {
 	SNDRV_TIMER_IOCTL_PAUSE_OLD = _IO('T', 0x23),
 };
 
-static long snd_timer_user_ioctl(struct file *file, unsigned int cmd,
+static int snd_timer_user_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				 unsigned long arg)
 {
 	struct snd_timer_user *tu;
diff --git a/sound/core/timer_compat.c b/sound/core/timer_compat.c
index 5512f53..2dc4785 100644
--- a/sound/core/timer_compat.c
+++ b/sound/core/timer_compat.c
@@ -93,7 +93,7 @@ enum {
 	SNDRV_TIMER_IOCTL_STATUS32 = _IOW('T', 0x14, struct snd_timer_status32),
 };
 
-static long snd_timer_user_ioctl_compat(struct file *file, unsigned int cmd, unsigned long arg)
+static int snd_timer_user_ioctl_compat(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
 {
 	void __user *argp = compat_ptr(arg);
 
@@ -114,7 +114,7 @@ static long snd_timer_user_ioctl_compat(struct file *file, unsigned int cmd, uns
 	case SNDRV_TIMER_IOCTL_PAUSE:
 	case SNDRV_TIMER_IOCTL_PAUSE_OLD:
 	case SNDRV_TIMER_IOCTL_NEXT_DEVICE:
-		return snd_timer_user_ioctl(file, cmd, (unsigned long)argp);
+		return snd_timer_user_ioctl(inode, file, cmd, (unsigned long)argp);
 	case SNDRV_TIMER_IOCTL_INFO32:
 		return snd_timer_user_info_compat(file, argp);
 	case SNDRV_TIMER_IOCTL_STATUS32:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7dd9b0b..51368d7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,7 +66,7 @@ static __read_mostly struct preempt_ops kvm_preempt_ops;
 
 struct dentry *kvm_debugfs_dir;
 
-static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
+static int kvm_vcpu_ioctl(struct inode *inode, struct file *file, unsigned int ioctl,
 			   unsigned long arg);
 
 bool kvm_rebooting;
@@ -1112,7 +1112,7 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
 	return 0;
 }
 
-static long kvm_vcpu_ioctl(struct file *filp,
+static int kvm_vcpu_ioctl(struct inode *inode, struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
 	struct kvm_vcpu *vcpu = filp->private_data;
@@ -1295,7 +1295,7 @@ out:
 	return r;
 }
 
-static long kvm_vm_ioctl(struct file *filp,
+static int kvm_vm_ioctl(struct inode *inode, struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
 	struct kvm *kvm = filp->private_data;
@@ -1415,7 +1415,7 @@ static int kvm_dev_ioctl_create_vm(void)
 	return fd;
 }
 
-static long kvm_dev_ioctl(struct file *filp,
+static int kvm_dev_ioctl(struct inode *inode, struct file *filp,
 			  unsigned int ioctl, unsigned long arg)
 {
 	long r = -EINVAL;

^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-08-28  0:32                                                                             ` Paul Mundt
@ 2008-08-28  0:46                                                                               ` David Miller
       [not found]                                                                                 ` <20080827.174605.85608276.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
       [not found]                                                                               ` <20080828003211.GA18893-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: David Miller @ 2008-08-28  0:46 UTC (permalink / raw)
  To: lethal
  Cc: bunk, torvalds, rusty, Alan.Brunelle, rjw, linux-kernel,
	kernel-testers, akpm, arjan, mingo, linux-embedded

From: Paul Mundt <lethal@linux-sh.org>
Date: Thu, 28 Aug 2008 09:32:13 +0900

> On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> > CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> > wanted with an arbitrary limit.
>
> In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
> only performed from do_IRQ(), which is sporadic at best, especially on
> tickless. While it catches some things, it's not a complete solution in
> and of iteslf.

BTW, on sparc64 we have a stack overflow checker that runs via
the profiling _mcount hook.  So every function call we check
if the stack is getting overused.

If so, we jump onto a special static debugging stack and print
the stack overflow message.

And yes it works with IRQ stacks which is all that sparc64 uses
nowadays.

Perhaps this is useful enough to make generic.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                                 ` <20080827.174605.85608276.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-08-28  1:02                                                                                   ` Paul Mundt
  0 siblings, 0 replies; 318+ messages in thread
From: Paul Mundt @ 2008-08-28  1:02 UTC (permalink / raw)
  To: David Miller
  Cc: bunk-DgEjT+Ai2ygdnm+yROfE0A,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rusty-8n+1lVoiYb80n/F98K4Iww, Alan.Brunelle-VXdhtT5mjnY,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, mingo-X9Un+BFzKDI,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 27, 2008 at 05:46:05PM -0700, David Miller wrote:
> From: Paul Mundt <lethal-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
> Date: Thu, 28 Aug 2008 09:32:13 +0900
> 
> > On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> > > CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> > > wanted with an arbitrary limit.
> >
> > In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
> > only performed from do_IRQ(), which is sporadic at best, especially on
> > tickless. While it catches some things, it's not a complete solution in
> > and of iteslf.
> 
> BTW, on sparc64 we have a stack overflow checker that runs via
> the profiling _mcount hook.  So every function call we check
> if the stack is getting overused.
> 
> If so, we jump onto a special static debugging stack and print
> the stack overflow message.
> 
> And yes it works with IRQ stacks which is all that sparc64 uses
> nowadays.
> 
> Perhaps this is useful enough to make generic.

Thanks for the pointer, I'll take a look at it!

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                           ` <20080827160052.GA15968-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
  2008-08-27 17:35                                                                             ` Adrian Bunk
@ 2008-08-28  1:05                                                                             ` Greg Ungerer
  1 sibling, 0 replies; 318+ messages in thread
From: Greg Ungerer @ 2008-08-28  1:05 UTC (permalink / raw)
  To: Paul Mundt, Adrian Bunk, Linus Torvalds, Rusty Russell,
	"Alan D. Brunelle" <Alan.Brunel>


Paul Mundt wrote:
> On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
>> On Tue, Aug 26, 2008 at 05:28:37PM -0700, Linus Torvalds wrote:
>>> On Wed, 27 Aug 2008, Adrian Bunk wrote:
>>>> When did we get callpaths like like nfs+xfs+md+scsi reliably 
>>>> working with 4kB stacks on x86-32?
>>> XFS may never have been usable, but the rest, sure.
>>>
>>> And you seem to be making this whole argument an excuse to SUCK, adn an 
>>> excuse to let gcc crap even more on our stack space.
>>>
>>> Why?
>>>
>>> Why aren't you saying that we should be able to do better? Instead, you 
>>> seem to asking us to do even worse than we do now?
>> My main point is:
>> - getting 4kB stacks working reliably is a hard task
>> - having an eye on gcc increasing the stack usage, and fixing it if
>>   required, is relatively easy
>>
>> If we should be able to do better at getting (and keeping) 4kB stacks 
>> working, then coping with possible inlining problems caused by gcc
>> should not be a big problem for us.
>>
> Out of the architectures you've mentioned for 4k stacks, they also tend
> to do IRQ stacks, which is something you seem to have overlooked.
> 
> In addition to that, debugging the runaway stack users on 4k tends to be
> easier anyways since you end up blowing the stack a lot sooner. On sh
> we've had pretty good luck with it, though most of our users are using
> fairly deterministic workloads and continually profiling the footprint.
> Anything that runs away or uses an insane amount of stack space needs to
> be fixed well before that anyways, so catching it sooner is always
> preferable. I imagine the same case is true for m68knommu (even sans IRQ
> stacks).

Yep, definitely true for m68knommu in my experience. I haven't had
any problems with 4k stacks recently. But yes the workloads do tend
to be constrained - and almost never use any of the more exotic
filesystems or drivers.



> Things might be more sensitive on x86, but it's certainly not something
> that's a huge problem for the various embedded platforms to wire up,
> whether they want to go the IRQ stack route or not.
> 
> In any event, lack of support for something on embedded architectures in
> the kernel is more often due to apathy/utter indifference on the part of
> the architecture maintainer rather than being indicative of any intrinsic
> difficulty in supporting the thing in question. Most new "features" on the
> lesser maintained architectures tend to end up there either out of peer
> pressure or copying-and-pasting accidents rather than any sort of design.
> ;-)

Indeed :-)

Regards
Greg


------------------------------------------------------------------------
Greg Ungerer  --  Chief Software Dude       EMAIL:     gerg-XXXsiaCtIV5Wk0Htik3J/w@public.gmane.org
Secure Computing Corporation                PHONE:       +61 7 3435 2888
825 Stanley St,                             FAX:         +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia         WEB: http://www.SnapGear.com

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]           ` <alpine.LFD.1.10.0808271335260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-27 20:45             ` Linus Torvalds
@ 2008-08-28 13:52             ` Christoph Hellwig
       [not found]               ` <20080828135245.GA12410-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Christoph Hellwig @ 2008-08-28 13:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Osterlund, Rafael J. Wysocki, Alan Cox, Jens Axboe,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Wed, Aug 27, 2008 at 01:40:10PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 27 Aug 2008, Peter Osterlund wrote:
> > 
> > Why not just revert the offending change and try again during the next
> > merge window, assuming someone has figured out an acceptable way to
> > handle this mess by then?
> 
> Well,, for 2.6.27 that's what we'll have to do. But there's actually a 
> real problem here - the unlocked ioctl's (which we _should_ prefer) have a 
> strictly weaker and worse interface. I also wonder if any other 
> block_ioctl users were converted..

Actually both interfaces are a fscking disaster.  The right things to
pass is neither and inode nor a file but a struct block_device.  Al had
all this work done a while and it just needs rebasing to a current tree:

	http://git.kernel.org/?p=linux/kernel/git/viro/bdev.git;a=summary

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                               ` <20080828003211.GA18893-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
@ 2008-08-28 16:16                                                                                 ` Adrian Bunk
  0 siblings, 0 replies; 318+ messages in thread
From: Adrian Bunk @ 2008-08-28 16:16 UTC (permalink / raw)
  To: Paul Mundt, Linus Torvalds, Rusty Russell, Alan D. Brunelle,
	Rafael 

On Thu, Aug 28, 2008 at 09:32:13AM +0900, Paul Mundt wrote:
> On Wed, Aug 27, 2008 at 08:35:44PM +0300, Adrian Bunk wrote:
> > On Thu, Aug 28, 2008 at 01:00:52AM +0900, Paul Mundt wrote:
> > > On Wed, Aug 27, 2008 at 02:58:30PM +0300, Adrian Bunk wrote:
> > > In addition to that, debugging the runaway stack users on 4k tends to be
> > > easier anyways since you end up blowing the stack a lot sooner. On sh
> > > we've had pretty good luck with it, though most of our users are using
> > > fairly deterministic workloads and continually profiling the footprint.
> > > Anything that runs away or uses an insane amount of stack space needs to
> > > be fixed well before that anyways, so catching it sooner is always
> > > preferable. I imagine the same case is true for m68knommu (even sans IRQ
> > > stacks).
> > 
> > CONFIG_DEBUG_STACKOVERFLOW should give you the same information, and if
> > wanted with an arbitrary limit.
> > 
> In some cases, yes. In the CONFIG_DEBUG_STACKOVERFLOW case the check is
> only performed from do_IRQ(), which is sporadic at best, especially on
> tickless. While it catches some things, it's not a complete solution in
> and of iteslf.
> 
> In addition to this, there are even fewer platforms that support it than
> there are platforms that do 4k stacks. At first glance, it looks like
> it's only m32r, powerpc, sh, x86, and xtensa.
>...

As far as I can see the only architectures that optionally offer 4kB 
stacks today are m68knommu, s390, sh and x86.

Did I miss some architectures or is 5 < 4 ;) ?

> Others support the Kconfig
> option, but don't seem to realize that it's not an option that the kernel
> does anything with by itself, and so don't actually do anything (ie,
> FRV).

Unless I miss anything these "others" include only FRV.

> > IMHO there seems to currently be a mismatch between it's maintainance 
> > cost and the actual number of users. That's in my opinion the main 
> > problem with it, no matter in which direction it gets resolved.
> > 
> Perhaps that's true on x86, but in general I take issue with that. On sh
> we've had to do very little maintenance for it and most shipping products
> are using it today (at least on MMU-Linux, we don't bother with it on
> nommu). Most of the problems we ran in to with 4k stacks tended to be
> stuff that we wanted to fix for 8k anyways. I suspect that this case is
> true for the other embedded platforms also.
>...

Most stack issues are not platform or architecture specific.

The maintainance effort therefore mostly depends on whether a non-zero 
number of architectures uses 4kB stacks.

And if something is considered to be important for small embedded 
systems, but not supported on ARM, MIPS or PowerPC, then that's 
a bit strange.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                       ` <20080826.134535.193703558.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-08-29 12:42                                                         ` Jes Sorensen
       [not found]                                                           ` <48B7EEA2.7090300-sJ/iWh9BUns@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Jes Sorensen @ 2008-08-29 12:42 UTC (permalink / raw)
  To: David Miller
  Cc: travis-sJ/iWh9BUns, mingo-X9Un+BFzKDI,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

David Miller wrote:
>>> Otherwise you have to modify cpumask_t objects and thus pluck
>>> them onto the stack where they take up silly amounts of space.
>> Yes, I had proposed either modifying, or supplementing a new
>> smp_call function to pass the cpumask_t as a pointer (similar
>> to set_cpus_allowed_ptr.)  But an ABI change such as this was
>> not well received at the time.
> 
> What it seems to come down to is that any cpumask_t not inside of
> a dynamically allocated object should be marked const.
> 
> And that is something we can enforce at compile time.
> 
> Linus has just suggested dynamically allocating cpumask_t's
> for such cases but I don't see that as the fix either.
> 
> Just mark them const and enforce that cpumask_t objects can only
> be modified when they appear in dynamically allocated objects.

Dave and others,

Sorry if I jump into the middle of the thread. Stopped subscribing to
lkml for a while, so this is through the archives.

Ran into some of these issues with KVM too, and noticed just how much
we pass cpumask_t around in the smp_call functions :-( In fact, the
only arch that did pretty well on this front was sparc64.

I totally agree, that marking them const makes a ton of sense, but at
the same time I suggest we convert smp_call_function_mask() to take a
pointer to the cpumask_t. I whipped up the following patch, which cuts
down the amont of memcpy calls emitted quite a fair bit.

I have only tested this on ia64, but it boots, so it's obviously
perfect<tm> :-)

Comments, suggestions welcome.

I have a followup patch that makes virt/kvm/kvm_main.c use the new
interface.

Cheers,
Jes


[-- Attachment #2: 0040-smp-call-cpumask.patch --]
[-- Type: text/plain, Size: 14294 bytes --]

Change smp_call_function_mask() to take a pointer to the cpumask_t
rather than passing it by value. This avoids recursive copies of the
cpumask_t on the stack in the IPI call. For large NR_CPUS, this is
particularly bad, and the cost of doing this for
NR_CPUS < bits_per_long is negligeble.

Signed-off-by: Jes Sorensen <jes-sJ/iWh9BUns@public.gmane.org>

---
 arch/alpha/include/asm/smp.h    |    2 +-
 arch/alpha/kernel/smp.c         |    4 ++--
 arch/arm/include/asm/smp.h      |    2 +-
 arch/arm/kernel/smp.c           |    4 ++--
 arch/ia64/include/asm/smp.h     |    2 +-
 arch/ia64/kernel/smp.c          |    6 +++---
 arch/m32r/kernel/smp.c          |    4 ++--
 arch/mips/kernel/smp.c          |    4 ++--
 arch/parisc/kernel/smp.c        |    6 +++---
 arch/powerpc/include/asm/smp.h  |    2 +-
 arch/powerpc/kernel/smp.c       |    4 ++--
 arch/sh/include/asm/smp.h       |    2 +-
 arch/sh/kernel/smp.c            |    4 ++--
 arch/sparc/include/asm/smp_64.h |    2 +-
 arch/sparc64/kernel/smp.c       |    4 ++--
 include/asm-m32r/smp.h          |    2 +-
 include/asm-mips/smp.h          |    2 +-
 include/asm-parisc/smp.h        |    2 +-
 include/asm-x86/smp.h           |    4 ++--
 include/linux/smp.h             |    2 +-
 kernel/smp.c                    |   15 ++++++++-------
 virt/kvm/kvm_main.c             |    4 ++--
 22 files changed, 42 insertions(+), 41 deletions(-)

Index: linux-2.6.git/arch/alpha/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/alpha/include/asm/smp.h
+++ linux-2.6.git/arch/alpha/include/asm/smp.h
@@ -48,7 +48,7 @@
 #define cpu_possible_map	cpu_present_map
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #else /* CONFIG_SMP */
 
Index: linux-2.6.git/arch/alpha/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/alpha/kernel/smp.c
+++ linux-2.6.git/arch/alpha/kernel/smp.c
@@ -637,9 +637,9 @@
 	send_ipi_message(to_whom, IPI_CPU_STOP);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	send_ipi_message(mask, IPI_CALL_FUNC);
+	send_ipi_message(*mask, IPI_CALL_FUNC);
 }
 
 void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/arm/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/arm/include/asm/smp.h
+++ linux-2.6.git/arch/arm/include/asm/smp.h
@@ -102,7 +102,7 @@
 extern void platform_cpu_enable(unsigned int cpu);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 /*
  * Local timer interrupt handling function (can be IPI'ed).
Index: linux-2.6.git/arch/arm/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/arm/kernel/smp.c
+++ linux-2.6.git/arch/arm/kernel/smp.c
@@ -356,9 +356,9 @@
 	local_irq_restore(flags);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	send_ipi_message(mask, IPI_CALL_FUNC);
+	send_ipi_message(*mask, IPI_CALL_FUNC);
 }
 
 void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/ia64/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/ia64/include/asm/smp.h
+++ linux-2.6.git/arch/ia64/include/asm/smp.h
@@ -127,7 +127,7 @@
 extern int is_multithreading_enabled(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #else /* CONFIG_SMP */
 
Index: linux-2.6.git/arch/ia64/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/ia64/kernel/smp.c
+++ linux-2.6.git/arch/ia64/kernel/smp.c
@@ -166,11 +166,11 @@
  * Called with preemption disabled.
  */
 static inline void
-send_IPI_mask(cpumask_t mask, int op)
+send_IPI_mask(cpumask_t *mask, int op)
 {
 	unsigned int cpu;
 
-	for_each_cpu_mask(cpu, mask) {
+	for_each_cpu_mask(cpu, *mask) {
 			send_IPI_single(cpu, op);
 	}
 }
@@ -316,7 +316,7 @@
 	send_IPI_single(cpu, IPI_CALL_FUNC_SINGLE);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	send_IPI_mask(mask, IPI_CALL_FUNC);
 }
Index: linux-2.6.git/arch/m32r/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/m32r/kernel/smp.c
+++ linux-2.6.git/arch/m32r/kernel/smp.c
@@ -546,9 +546,9 @@
 	for ( ; ; );
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	send_IPI_mask(mask, CALL_FUNCTION_IPI, 0);
+	send_IPI_mask(*mask, CALL_FUNCTION_IPI, 0);
 }
 
 void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/mips/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/mips/kernel/smp.c
+++ linux-2.6.git/arch/mips/kernel/smp.c
@@ -131,9 +131,9 @@
 	cpu_idle();
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	mp_ops->send_ipi_mask(mask, SMP_CALL_FUNCTION);
+	mp_ops->send_ipi_mask(*mask, SMP_CALL_FUNCTION);
 }
 
 /*
Index: linux-2.6.git/arch/parisc/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/parisc/kernel/smp.c
+++ linux-2.6.git/arch/parisc/kernel/smp.c
@@ -228,11 +228,11 @@
 }
 
 static void
-send_IPI_mask(cpumask_t mask, enum ipi_message_type op)
+send_IPI_mask(cpumask_t *mask, enum ipi_message_type op)
 {
 	int cpu;
 
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask(cpu, *mask)
 		ipi_send(cpu, op);
 }
 
@@ -274,7 +274,7 @@
 	send_IPI_allbutself(IPI_NOP);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	send_IPI_mask(mask, IPI_CALL_FUNC);
 }
Index: linux-2.6.git/arch/powerpc/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/powerpc/include/asm/smp.h
+++ linux-2.6.git/arch/powerpc/include/asm/smp.h
@@ -119,7 +119,7 @@
 extern struct smp_ops_t *smp_ops;
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif /* __ASSEMBLY__ */
 
Index: linux-2.6.git/arch/powerpc/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
+++ linux-2.6.git/arch/powerpc/kernel/smp.c
@@ -135,11 +135,11 @@
 	smp_ops->message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	unsigned int cpu;
 
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask(cpu, *mask)
 		smp_ops->message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
Index: linux-2.6.git/arch/sh/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/sh/include/asm/smp.h
+++ linux-2.6.git/arch/sh/include/asm/smp.h
@@ -39,7 +39,7 @@
 int plat_register_ipi_handler(unsigned int message,
 			      void (*handler)(void *), void *arg);
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #else
 
Index: linux-2.6.git/arch/sh/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/sh/kernel/smp.c
+++ linux-2.6.git/arch/sh/kernel/smp.c
@@ -171,11 +171,11 @@
 	smp_call_function(stop_this_cpu, 0, 0);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	int cpu;
 
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask(cpu, *mask)
 		plat_send_ipi(cpu, SMP_MSG_FUNCTION);
 }
 
Index: linux-2.6.git/arch/sparc/include/asm/smp_64.h
===================================================================
--- linux-2.6.git.orig/arch/sparc/include/asm/smp_64.h
+++ linux-2.6.git/arch/sparc/include/asm/smp_64.h
@@ -35,7 +35,7 @@
 extern int sparc64_multi_core;
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 /*
  *	General functions that each host system must provide.
Index: linux-2.6.git/arch/sparc64/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/sparc64/kernel/smp.c
+++ linux-2.6.git/arch/sparc64/kernel/smp.c
@@ -810,9 +810,9 @@
 
 extern unsigned long xcall_call_function;
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	xcall_deliver((u64) &xcall_call_function, 0, 0, &mask);
+	xcall_deliver((u64) &xcall_call_function, 0, 0, mask);
 }
 
 extern unsigned long xcall_call_function_single;
Index: linux-2.6.git/include/asm-m32r/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-m32r/smp.h
+++ linux-2.6.git/include/asm-m32r/smp.h
@@ -90,7 +90,7 @@
 extern unsigned long send_IPI_mask_phys(cpumask_t, int, int);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif	/* not __ASSEMBLY__ */
 
Index: linux-2.6.git/include/asm-mips/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-mips/smp.h
+++ linux-2.6.git/include/asm-mips/smp.h
@@ -58,6 +58,6 @@
 extern asmlinkage void smp_call_function_interrupt(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif /* __ASM_SMP_H */
Index: linux-2.6.git/include/asm-parisc/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-parisc/smp.h
+++ linux-2.6.git/include/asm-parisc/smp.h
@@ -31,7 +31,7 @@
 extern void smp_send_all_nop(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif /* !ASSEMBLY */
 
Index: linux-2.6.git/include/asm-x86/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/smp.h
+++ linux-2.6.git/include/asm-x86/smp.h
@@ -101,9 +101,9 @@
 	smp_ops.send_call_func_single_ipi(cpu);
 }
 
-static inline void arch_send_call_function_ipi(cpumask_t mask)
+static inline void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	smp_ops.send_call_func_ipi(mask);
+	smp_ops.send_call_func_ipi(*mask);
 }
 
 void native_smp_prepare_boot_cpu(void);
Index: linux-2.6.git/include/linux/smp.h
===================================================================
--- linux-2.6.git.orig/include/linux/smp.h
+++ linux-2.6.git/include/linux/smp.h
@@ -62,7 +62,7 @@
  * Call a function on all other processors
  */
 int smp_call_function(void(*func)(void *info), void *info, int wait);
-int smp_call_function_mask(cpumask_t mask, void(*func)(void *info), void *info,
+int smp_call_function_mask(cpumask_t *mask, void(*func)(void *info), void *info,
 				int wait);
 int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
 				int wait);
Index: linux-2.6.git/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/kernel/smp.c
+++ linux-2.6.git/kernel/smp.c
@@ -318,7 +318,7 @@
  * hardware interrupt handler or from a bottom half handler. Preemption
  * must be disabled when calling this function.
  */
-int smp_call_function_mask(cpumask_t mask, void (*func)(void *), void *info,
+int smp_call_function_mask(cpumask_t *mask, void (*func)(void *), void *info,
 			   int wait)
 {
 	struct call_function_data d;
@@ -334,8 +334,8 @@
 	cpu = smp_processor_id();
 	allbutself = cpu_online_map;
 	cpu_clear(cpu, allbutself);
-	cpus_and(mask, mask, allbutself);
-	num_cpus = cpus_weight(mask);
+	cpus_and(*mask, *mask, allbutself);
+	num_cpus = cpus_weight(*mask);
 
 	/*
 	 * If zero CPUs, return. If just a single CPU, turn this request
@@ -344,7 +344,7 @@
 	if (!num_cpus)
 		return 0;
 	else if (num_cpus == 1) {
-		cpu = first_cpu(mask);
+		cpu = first_cpu(*mask);
 		return smp_call_function_single(cpu, func, info, wait);
 	}
 
@@ -364,7 +364,7 @@
 	data->csd.func = func;
 	data->csd.info = info;
 	data->refs = num_cpus;
-	data->cpumask = mask;
+	data->cpumask = *mask;
 
 	spin_lock_irqsave(&call_function_lock, flags);
 	list_add_tail_rcu(&data->csd.list, &call_function_queue);
@@ -377,7 +377,7 @@
 	if (wait) {
 		csd_flag_wait(&data->csd);
 		if (unlikely(slowpath))
-			smp_call_function_mask_quiesce_stack(mask);
+			smp_call_function_mask_quiesce_stack(*mask);
 	}
 
 	return 0;
@@ -402,9 +402,10 @@
 int smp_call_function(void (*func)(void *), void *info, int wait)
 {
 	int ret;
+	cpumask_t tmp_online_map = cpu_online_map;
 
 	preempt_disable();
-	ret = smp_call_function_mask(cpu_online_map, func, info, wait);
+	ret = smp_call_function_mask(&tmp_online_map, func, info, wait);
 	preempt_enable();
 	return ret;
 }
Index: linux-2.6.git/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.git.orig/virt/kvm/kvm_main.c
+++ linux-2.6.git/virt/kvm/kvm_main.c
@@ -124,7 +124,7 @@
 	if (cpus_empty(cpus))
 		goto out;
 	++kvm->stat.remote_tlb_flush;
-	smp_call_function_mask(cpus, ack_flush, NULL, 1);
+	smp_call_function_mask(&cpus, ack_flush, NULL, 1);
 out:
 	put_cpu();
 }
@@ -149,7 +149,7 @@
 	}
 	if (cpus_empty(cpus))
 		goto out;
-	smp_call_function_mask(cpus, ack_flush, NULL, 1);
+	smp_call_function_mask(&cpus, ack_flush, NULL, 1);
 out:
 	put_cpu();
 }

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                           ` <48B7EEA2.7090300-sJ/iWh9BUns@public.gmane.org>
@ 2008-08-29 16:14                                                             ` Linus Torvalds
       [not found]                                                               ` <alpine.LFD.1.10.0808290909020.3300-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-08-29 16:14 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: David Miller, travis-sJ/iWh9BUns, mingo-X9Un+BFzKDI,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww



On Fri, 29 Aug 2008, Jes Sorensen wrote:
> 
> I have only tested this on ia64, but it boots, so it's obviously
> perfect<tm> :-)

Well, it probably boots because it doesn't really seem to _change_ much of 
anything.

Things like this:

	-static inline void arch_send_call_function_ipi(cpumask_t mask)
	+static inline void arch_send_call_function_ipi(cpumask_t *mask)
	 {
	-       smp_ops.send_call_func_ipi(mask);
	+       smp_ops.send_call_func_ipi(*mask);
	 }

will still do that stack allocation at the time of the call. You'd have to 
pass the thing all the way down as a pointer..

			Linus


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                               ` <alpine.LFD.1.10.0808290909020.3300-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-08-29 20:04                                                                 ` David Miller
  2008-09-01 11:53                                                                 ` Jes Sorensen
  2008-09-02 14:27                                                                 ` Jes Sorensen
  2 siblings, 0 replies; 318+ messages in thread
From: David Miller @ 2008-08-29 20:04 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: jes-sJ/iWh9BUns, travis-sJ/iWh9BUns, mingo-X9Un+BFzKDI,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Fri, 29 Aug 2008 09:14:44 -0700 (PDT)

> Well, it probably boots because it doesn't really seem to _change_ much of 
> anything.
> 
> Things like this:
> 
> 	-static inline void arch_send_call_function_ipi(cpumask_t mask)
> 	+static inline void arch_send_call_function_ipi(cpumask_t *mask)
> 	 {
> 	-       smp_ops.send_call_func_ipi(mask);
> 	+       smp_ops.send_call_func_ipi(*mask);
> 	 }
> 
> will still do that stack allocation at the time of the call. You'd have to 
> pass the thing all the way down as a pointer..

True, but we have to get there one step at a time.

BTW, sparc64 already wants a pointer here, so it's completely ready for
this:

void arch_send_call_function_ipi(cpumask_t mask)
{
	xcall_deliver((u64) &xcall_call_function, 0, 0, &mask);
}

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-08-30 19:46 2.6.27-rc5-git2: Reported regressions from 2.6.26 Rafael J. Wysocki
@ 2008-08-30 19:50 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-08-30 19:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (20 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel
       [not found]     ` <200808242334.05993.rjw-KKrjLPT3xs0@public.gmane.org>
@ 2008-09-01  9:35       ` David Woodhouse
       [not found]         ` <1220261720.2982.51.camel-ZP4jZrcIevRpWr+L1FloEB2eb7JE58TQ@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Woodhouse @ 2008-09-01  9:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Larry Finger,
	Sam Ravnborg, Andrew Morton, Linus Torvalds

On Sun, 2008-08-24 at 23:34 +0200, Rafael J. Wysocki wrote:
> On Saturday, 23 of August 2008, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11355
> > Subject		: Regression in 2.6.27-rc2 when cross-building the kernel
> > Submitter	: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
> > Date		: 2008-08-16 2:38 (8 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121885432118368&w=4
> 
> As I wrote in the Bugzilla, I'm seeing a related problem.
> 
> Namely, I build kernels on one box, with 'make O=<target>', then I mount
> <target> on another one over NFS, 'cd' to it and try to install the kernel
> modules with 'make modules_install'.  This results in 'HOSTCC firmware/ihex2fw'
> and 'fatal error: ...: Read-only file system'.  It's readily reproducible.
> 
> Commenting out line 1130 of Makefile
> ("$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.fwinst obj=firmware __fw_modinst")
> obviously helps, so it looks like Makefile.fwinst needs fixing.

I don't like this much, but it should do the trick... please confirm.

diff --git a/firmware/Makefile b/firmware/Makefile
index aab12bf..e7130b5 100644
--- a/firmware/Makefile
+++ b/firmware/Makefile
@@ -166,15 +166,28 @@ $(patsubst %,$(obj)/%.gen.o, $(fw-external-y)): $(obj)/%.gen.o: $(fwdir)/%
 $(obj)/%: $(obj)/%.ihex | $(objtree)/$(obj)/$$(dir %)
 	$(call cmd,ihex)
 
+# Don't depend on ihex2fw if we're installing and it already exists.
+# Putting it after | in the dependencies doesn't seem sufficient when
+# we're installing after a cross-compile, because ihex2fw has dependencies
+# on stuff like /usr/lib/gcc/ppc64-redhat-linux/4.3.0/include/stddef.h and 
+# thus wants to be rebuilt. Which it can't be, if the prebuilt kernel tree
+# is exported read-only for someone to run 'make install'.
+ifeq ($(INSTALL):$(wildcard $(obj)/ihex2fw),install:$(obj)/ihex2fw)
+ihex2fw_dep :=
+else
+ihex2fw_dep := $(obj)/ihex2fw
+endif
+
+
 # .HEX is also Intel HEX, but where the offset and length in each record
 # is actually meaningful, because the firmware has to be loaded in a certain
 # order rather than as a single binary blob. Thus, we convert them into our
 # more compact binary representation of ihex records (<linux/ihex.h>)
-$(obj)/%.fw: $(obj)/%.HEX $(obj)/ihex2fw | $(objtree)/$(obj)/$$(dir %)
+$(obj)/%.fw: $(obj)/%.HEX $(ihex2fw_dep) | $(objtree)/$(obj)/$$(dir %)
 	$(call cmd,ihex2fw)
 
 # .H16 is our own modified form of Intel HEX, with 16-bit length for records.
-$(obj)/%.fw: $(obj)/%.H16 $(obj)/ihex2fw | $(objtree)/$(obj)/$$(dir %)
+$(obj)/%.fw: $(obj)/%.H16 $(ihex2fw_dep) | $(objtree)/$(obj)/$$(dir %)
 	$(call cmd,h16tofw)
 
 $(firmware-dirs):

-- 
dwmw2

^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                               ` <alpine.LFD.1.10.0808290909020.3300-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-29 20:04                                                                 ` David Miller
@ 2008-09-01 11:53                                                                 ` Jes Sorensen
  2008-09-02 14:27                                                                 ` Jes Sorensen
  2 siblings, 0 replies; 318+ messages in thread
From: Jes Sorensen @ 2008-09-01 11:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, travis-sJ/iWh9BUns, mingo-X9Un+BFzKDI,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

Linus Torvalds wrote:
> 
> On Fri, 29 Aug 2008, Jes Sorensen wrote:
>> I have only tested this on ia64, but it boots, so it's obviously
>> perfect<tm> :-)
> 
> Well, it probably boots because it doesn't really seem to _change_ much of 
> anything.

Hi Linus,

I realize that, but as I have been doing this work on ia64, I didn't
want to mess too much with the x86 code. The ia64 part of the patch does
change things :-)

If someone with more x86 knowledge would want to try and make that part
of the patch better, and more importantly test it, I'd be quite keen on
helping out.

Cheers,
Jes


> Things like this:
> 
> 	-static inline void arch_send_call_function_ipi(cpumask_t mask)
> 	+static inline void arch_send_call_function_ipi(cpumask_t *mask)
> 	 {
> 	-       smp_ops.send_call_func_ipi(mask);
> 	+       smp_ops.send_call_func_ipi(*mask);
> 	 }
> 
> will still do that stack allocation at the time of the call. You'd have to 
> pass the thing all the way down as a pointer..
> 
> 			Linus
> 

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel
       [not found]         ` <1220261720.2982.51.camel-ZP4jZrcIevRpWr+L1FloEB2eb7JE58TQ@public.gmane.org>
@ 2008-09-01 16:51           ` Larry Finger
       [not found]             ` <48BC1D8E.9050608-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Larry Finger @ 2008-09-01 16:51 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Sam Ravnborg, Andrew Morton, Linus Torvalds

David Woodhouse wrote:
> On Sun, 2008-08-24 at 23:34 +0200, Rafael J. Wysocki wrote:
>> On Saturday, 23 of August 2008, Rafael J. Wysocki wrote:
>>> This message has been generated automatically as a part of a report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.26.  Please verify if it still should be listed and let me know
>>> (either way).
>>>
>>>
>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11355
>>> Subject		: Regression in 2.6.27-rc2 when cross-building the kernel
>>> Submitter	: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
>>> Date		: 2008-08-16 2:38 (8 days old)
>>> References	: http://marc.info/?l=linux-kernel&m=121885432118368&w=4
>> As I wrote in the Bugzilla, I'm seeing a related problem.
>>
>> Namely, I build kernels on one box, with 'make O=<target>', then I mount
>> <target> on another one over NFS, 'cd' to it and try to install the kernel
>> modules with 'make modules_install'.  This results in 'HOSTCC firmware/ihex2fw'
>> and 'fatal error: ...: Read-only file system'.  It's readily reproducible.
>>
>> Commenting out line 1130 of Makefile
>> ("$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.fwinst obj=firmware __fw_modinst")
>> obviously helps, so it looks like Makefile.fwinst needs fixing.
> 
> I don't like this much, but it should do the trick... please confirm.

Yes, the patch fixes the problem. Thanks.

Larry

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel
       [not found]             ` <48BC1D8E.9050608-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
@ 2008-09-01 17:37               ` David Woodhouse
  0 siblings, 0 replies; 318+ messages in thread
From: David Woodhouse @ 2008-09-01 17:37 UTC (permalink / raw)
  To: Larry Finger
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Sam Ravnborg, Andrew Morton, Linus Torvalds

On Mon, 2008-09-01 at 11:51 -0500, Larry Finger wrote:
> David Woodhouse wrote:
> > On Sun, 2008-08-24 at 23:34 +0200, Rafael J. Wysocki wrote:
> >> On Saturday, 23 of August 2008, Rafael J. Wysocki wrote:
> >>> This message has been generated automatically as a part of a report
> >>> of recent regressions.
> >>>
> >>> The following bug entry is on the current list of known regressions
> >>> from 2.6.26.  Please verify if it still should be listed and let me know
> >>> (either way).
> >>>
> >>>
> >>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11355
> >>> Subject		: Regression in 2.6.27-rc2 when cross-building the kernel
> >>> Submitter	: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
> >>> Date		: 2008-08-16 2:38 (8 days old)
> >>> References	: http://marc.info/?l=linux-kernel&m=121885432118368&w=4
> >> As I wrote in the Bugzilla, I'm seeing a related problem.
> >>
> >> Namely, I build kernels on one box, with 'make O=<target>', then I mount
> >> <target> on another one over NFS, 'cd' to it and try to install the kernel
> >> modules with 'make modules_install'.  This results in 'HOSTCC firmware/ihex2fw'
> >> and 'fatal error: ...: Read-only file system'.  It's readily reproducible.
> >>
> >> Commenting out line 1130 of Makefile
> >> ("$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.fwinst obj=firmware __fw_modinst")
> >> obviously helps, so it looks like Makefile.fwinst needs fixing.
> > 
> > I don't like this much, but it should do the trick... please confirm.
> 
> Yes, the patch fixes the problem. Thanks.

Ok, I have a small handful of fixes to send to Linus for 2.6.27; Unless
Sam (or someone else) comes up with a better answer, I'll make sure that
goes with them.

Thanks.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]               ` <20080828135245.GA12410-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2008-09-02  7:26                 ` Jens Axboe
       [not found]                   ` <20080902072642.GX20055-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Jens Axboe @ 2008-09-02  7:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Peter Osterlund, Rafael J. Wysocki, Alan Cox,
	Linux Kernel Mailing List, Adrian Bunk, Andrew Morton,
	Natalie Protasevich, Kernel Testers List,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Thu, Aug 28 2008, Christoph Hellwig wrote:
> On Wed, Aug 27, 2008 at 01:40:10PM -0700, Linus Torvalds wrote:
> > 
> > 
> > On Wed, 27 Aug 2008, Peter Osterlund wrote:
> > > 
> > > Why not just revert the offending change and try again during the next
> > > merge window, assuming someone has figured out an acceptable way to
> > > handle this mess by then?
> > 
> > Well,, for 2.6.27 that's what we'll have to do. But there's actually a 
> > real problem here - the unlocked ioctl's (which we _should_ prefer) have a 
> > strictly weaker and worse interface. I also wonder if any other 
> > block_ioctl users were converted..
> 
> Actually both interfaces are a fscking disaster.  The right things to
> pass is neither and inode nor a file but a struct block_device.  Al had
> all this work done a while and it just needs rebasing to a current tree:
> 
> 	http://git.kernel.org/?p=linux/kernel/git/viro/bdev.git;a=summary

Completely agreed. Al, I remember talking to you about this at the
storage summit back in february. What are your current plans wrt moving
this forward?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                               ` <alpine.LFD.1.10.0808290909020.3300-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-29 20:04                                                                 ` David Miller
  2008-09-01 11:53                                                                 ` Jes Sorensen
@ 2008-09-02 14:27                                                                 ` Jes Sorensen
  2 siblings, 0 replies; 318+ messages in thread
From: Jes Sorensen @ 2008-09-02 14:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, travis-sJ/iWh9BUns, mingo-X9Un+BFzKDI,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	arjan-VuQAYsv1563Yd54FQh9/CA, rusty-8n+1lVoiYb80n/F98K4Iww

[-- Attachment #1: Type: text/plain, Size: 834 bytes --]

Linus Torvalds wrote:
> 
> On Fri, 29 Aug 2008, Jes Sorensen wrote:
>> I have only tested this on ia64, but it boots, so it's obviously
>> perfect<tm> :-)
> 
> Well, it probably boots because it doesn't really seem to _change_ much of 
> anything.
> 
> Things like this:
> 
> 	-static inline void arch_send_call_function_ipi(cpumask_t mask)
> 	+static inline void arch_send_call_function_ipi(cpumask_t *mask)
> 	 {
> 	-       smp_ops.send_call_func_ipi(mask);
> 	+       smp_ops.send_call_func_ipi(*mask);
> 	 }
> 
> will still do that stack allocation at the time of the call. You'd have to 
> pass the thing all the way down as a pointer..

Linus,

Ok, so here's a version which tries to do the right thing on x86 as
well. Build tested on x86_64, but don't have an easy way to test it
right now. It's booting on ia64.

Cheers,
Jes


[-- Attachment #2: 0040-smp-call-cpumask.patch --]
[-- Type: text/plain, Size: 26221 bytes --]

Change smp_call_function_mask() to take a pointer to the cpumask_t
rather than passing it by value. This avoids recursive copies of the
cpumask_t on the stack in the IPI call. For large NR_CPUS, this is
particularly bad, and the cost of doing this for
NR_CPUS < bits_per_long is negligeble.

Signed-off-by: Jes Sorensen <jes-sJ/iWh9BUns@public.gmane.org>

---
 arch/alpha/include/asm/smp.h            |    2 +-
 arch/alpha/kernel/smp.c                 |    4 ++--
 arch/arm/include/asm/smp.h              |    2 +-
 arch/arm/kernel/smp.c                   |    4 ++--
 arch/ia64/include/asm/smp.h             |    2 +-
 arch/ia64/kernel/smp.c                  |    6 +++---
 arch/m32r/kernel/smp.c                  |    4 ++--
 arch/mips/kernel/smp.c                  |    4 ++--
 arch/parisc/kernel/smp.c                |    6 +++---
 arch/powerpc/include/asm/smp.h          |    2 +-
 arch/powerpc/kernel/smp.c               |    4 ++--
 arch/sh/include/asm/smp.h               |    2 +-
 arch/sh/kernel/smp.c                    |    4 ++--
 arch/sparc/include/asm/smp_64.h         |    2 +-
 arch/sparc64/kernel/smp.c               |    4 ++--
 arch/x86/kernel/apic_32.c               |    2 +-
 arch/x86/kernel/apic_64.c               |    2 +-
 arch/x86/kernel/crash.c                 |    2 +-
 arch/x86/kernel/genapic_flat_64.c       |   20 ++++++++++++--------
 arch/x86/kernel/genx2apic_uv_x.c        |   10 ++++++----
 arch/x86/kernel/io_apic_64.c            |    6 ++++--
 arch/x86/kernel/smp.c                   |   12 ++++++++----
 arch/x86/kernel/tlb_32.c                |    2 +-
 arch/x86/kernel/tlb_64.c                |    2 +-
 arch/x86/xen/smp.c                      |   13 +++++++------
 include/asm-m32r/smp.h                  |    2 +-
 include/asm-mips/smp.h                  |    2 +-
 include/asm-parisc/smp.h                |    2 +-
 include/asm-x86/genapic_32.h            |    2 +-
 include/asm-x86/genapic_64.h            |    2 +-
 include/asm-x86/mach-default/mach_ipi.h |   10 ++++++----
 include/asm-x86/smp.h                   |    6 +++---
 include/linux/smp.h                     |    2 +-
 kernel/smp.c                            |   15 ++++++++-------
 virt/kvm/kvm_main.c                     |    4 ++--
 35 files changed, 93 insertions(+), 77 deletions(-)

Index: linux-2.6.git/arch/alpha/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/alpha/include/asm/smp.h
+++ linux-2.6.git/arch/alpha/include/asm/smp.h
@@ -48,7 +48,7 @@
 #define cpu_possible_map	cpu_present_map
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #else /* CONFIG_SMP */
 
Index: linux-2.6.git/arch/alpha/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/alpha/kernel/smp.c
+++ linux-2.6.git/arch/alpha/kernel/smp.c
@@ -637,9 +637,9 @@
 	send_ipi_message(to_whom, IPI_CPU_STOP);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	send_ipi_message(mask, IPI_CALL_FUNC);
+	send_ipi_message(*mask, IPI_CALL_FUNC);
 }
 
 void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/arm/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/arm/include/asm/smp.h
+++ linux-2.6.git/arch/arm/include/asm/smp.h
@@ -102,7 +102,7 @@
 extern void platform_cpu_enable(unsigned int cpu);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 /*
  * Local timer interrupt handling function (can be IPI'ed).
Index: linux-2.6.git/arch/arm/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/arm/kernel/smp.c
+++ linux-2.6.git/arch/arm/kernel/smp.c
@@ -356,9 +356,9 @@
 	local_irq_restore(flags);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	send_ipi_message(mask, IPI_CALL_FUNC);
+	send_ipi_message(*mask, IPI_CALL_FUNC);
 }
 
 void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/ia64/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/ia64/include/asm/smp.h
+++ linux-2.6.git/arch/ia64/include/asm/smp.h
@@ -127,7 +127,7 @@
 extern int is_multithreading_enabled(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #else /* CONFIG_SMP */
 
Index: linux-2.6.git/arch/ia64/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/ia64/kernel/smp.c
+++ linux-2.6.git/arch/ia64/kernel/smp.c
@@ -166,11 +166,11 @@
  * Called with preemption disabled.
  */
 static inline void
-send_IPI_mask(cpumask_t mask, int op)
+send_IPI_mask(cpumask_t *mask, int op)
 {
 	unsigned int cpu;
 
-	for_each_cpu_mask(cpu, mask) {
+	for_each_cpu_mask(cpu, *mask) {
 			send_IPI_single(cpu, op);
 	}
 }
@@ -316,7 +316,7 @@
 	send_IPI_single(cpu, IPI_CALL_FUNC_SINGLE);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	send_IPI_mask(mask, IPI_CALL_FUNC);
 }
Index: linux-2.6.git/arch/m32r/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/m32r/kernel/smp.c
+++ linux-2.6.git/arch/m32r/kernel/smp.c
@@ -546,9 +546,9 @@
 	for ( ; ; );
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	send_IPI_mask(mask, CALL_FUNCTION_IPI, 0);
+	send_IPI_mask(*mask, CALL_FUNCTION_IPI, 0);
 }
 
 void arch_send_call_function_single_ipi(int cpu)
Index: linux-2.6.git/arch/mips/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/mips/kernel/smp.c
+++ linux-2.6.git/arch/mips/kernel/smp.c
@@ -131,9 +131,9 @@
 	cpu_idle();
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	mp_ops->send_ipi_mask(mask, SMP_CALL_FUNCTION);
+	mp_ops->send_ipi_mask(*mask, SMP_CALL_FUNCTION);
 }
 
 /*
Index: linux-2.6.git/arch/parisc/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/parisc/kernel/smp.c
+++ linux-2.6.git/arch/parisc/kernel/smp.c
@@ -228,11 +228,11 @@
 }
 
 static void
-send_IPI_mask(cpumask_t mask, enum ipi_message_type op)
+send_IPI_mask(cpumask_t *mask, enum ipi_message_type op)
 {
 	int cpu;
 
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask(cpu, *mask)
 		ipi_send(cpu, op);
 }
 
@@ -274,7 +274,7 @@
 	send_IPI_allbutself(IPI_NOP);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	send_IPI_mask(mask, IPI_CALL_FUNC);
 }
Index: linux-2.6.git/arch/powerpc/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/powerpc/include/asm/smp.h
+++ linux-2.6.git/arch/powerpc/include/asm/smp.h
@@ -119,7 +119,7 @@
 extern struct smp_ops_t *smp_ops;
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif /* __ASSEMBLY__ */
 
Index: linux-2.6.git/arch/powerpc/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
+++ linux-2.6.git/arch/powerpc/kernel/smp.c
@@ -135,11 +135,11 @@
 	smp_ops->message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	unsigned int cpu;
 
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask(cpu, *mask)
 		smp_ops->message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
Index: linux-2.6.git/arch/sh/include/asm/smp.h
===================================================================
--- linux-2.6.git.orig/arch/sh/include/asm/smp.h
+++ linux-2.6.git/arch/sh/include/asm/smp.h
@@ -39,7 +39,7 @@
 int plat_register_ipi_handler(unsigned int message,
 			      void (*handler)(void *), void *arg);
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #else
 
Index: linux-2.6.git/arch/sh/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/sh/kernel/smp.c
+++ linux-2.6.git/arch/sh/kernel/smp.c
@@ -171,11 +171,11 @@
 	smp_call_function(stop_this_cpu, 0, 0);
 }
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	int cpu;
 
-	for_each_cpu_mask(cpu, mask)
+	for_each_cpu_mask(cpu, *mask)
 		plat_send_ipi(cpu, SMP_MSG_FUNCTION);
 }
 
Index: linux-2.6.git/arch/sparc/include/asm/smp_64.h
===================================================================
--- linux-2.6.git.orig/arch/sparc/include/asm/smp_64.h
+++ linux-2.6.git/arch/sparc/include/asm/smp_64.h
@@ -35,7 +35,7 @@
 extern int sparc64_multi_core;
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 /*
  *	General functions that each host system must provide.
Index: linux-2.6.git/arch/sparc64/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/sparc64/kernel/smp.c
+++ linux-2.6.git/arch/sparc64/kernel/smp.c
@@ -810,9 +810,9 @@
 
 extern unsigned long xcall_call_function;
 
-void arch_send_call_function_ipi(cpumask_t mask)
+void arch_send_call_function_ipi(cpumask_t *mask)
 {
-	xcall_deliver((u64) &xcall_call_function, 0, 0, &mask);
+	xcall_deliver((u64) &xcall_call_function, 0, 0, mask);
 }
 
 extern unsigned long xcall_call_function_single;
Index: linux-2.6.git/arch/x86/kernel/apic_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/apic_32.c
+++ linux-2.6.git/arch/x86/kernel/apic_32.c
@@ -291,7 +291,7 @@
 static void lapic_timer_broadcast(cpumask_t mask)
 {
 #ifdef CONFIG_SMP
-	send_IPI_mask(mask, LOCAL_TIMER_VECTOR);
+	send_IPI_mask(&mask, LOCAL_TIMER_VECTOR);
 #endif
 }
 
Index: linux-2.6.git/arch/x86/kernel/apic_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/apic_64.c
+++ linux-2.6.git/arch/x86/kernel/apic_64.c
@@ -280,7 +280,7 @@
 static void lapic_timer_broadcast(cpumask_t mask)
 {
 #ifdef CONFIG_SMP
-	send_IPI_mask(mask, LOCAL_TIMER_VECTOR);
+	send_IPI_mask(&mask, LOCAL_TIMER_VECTOR);
 #endif
 }
 
Index: linux-2.6.git/arch/x86/kernel/crash.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/crash.c
+++ linux-2.6.git/arch/x86/kernel/crash.c
@@ -80,7 +80,7 @@
 	cpumask_t mask = cpu_online_map;
 	cpu_clear(safe_smp_processor_id(), mask);
 	if (!cpus_empty(mask))
-		send_IPI_mask(mask, NMI_VECTOR);
+		send_IPI_mask(&mask, NMI_VECTOR);
 }
 
 static struct notifier_block crash_nmi_nb = {
Index: linux-2.6.git/arch/x86/kernel/genapic_flat_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/genapic_flat_64.c
+++ linux-2.6.git/arch/x86/kernel/genapic_flat_64.c
@@ -58,9 +58,9 @@
 	apic_write(APIC_LDR, val);
 }
 
-static void flat_send_IPI_mask(cpumask_t cpumask, int vector)
+static void flat_send_IPI_mask(cpumask_t *cpumask, int vector)
 {
-	unsigned long mask = cpus_addr(cpumask)[0];
+	unsigned long mask = cpus_addr(*cpumask)[0];
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -81,7 +81,7 @@
 		cpu_clear(smp_processor_id(), allbutme);
 
 		if (!cpus_empty(allbutme))
-			flat_send_IPI_mask(allbutme, vector);
+			flat_send_IPI_mask(&allbutme, vector);
 	} else if (num_online_cpus() > 1) {
 		__send_IPI_shortcut(APIC_DEST_ALLBUT, vector,APIC_DEST_LOGICAL);
 	}
@@ -89,8 +89,10 @@
 
 static void flat_send_IPI_all(int vector)
 {
+	cpumask_t tmp_online_map = cpu_online_map;
+
 	if (vector == NMI_VECTOR)
-		flat_send_IPI_mask(cpu_online_map, vector);
+		flat_send_IPI_mask(&tmp_online_map, vector);
 	else
 		__send_IPI_shortcut(APIC_DEST_ALLINC, vector, APIC_DEST_LOGICAL);
 }
@@ -141,9 +143,9 @@
 	return cpumask_of_cpu(cpu);
 }
 
-static void physflat_send_IPI_mask(cpumask_t cpumask, int vector)
+static void physflat_send_IPI_mask(cpumask_t *cpumask, int vector)
 {
-	send_IPI_mask_sequence(cpumask, vector);
+	send_IPI_mask_sequence(*cpumask, vector);
 }
 
 static void physflat_send_IPI_allbutself(int vector)
@@ -151,12 +153,14 @@
 	cpumask_t allbutme = cpu_online_map;
 
 	cpu_clear(smp_processor_id(), allbutme);
-	physflat_send_IPI_mask(allbutme, vector);
+	physflat_send_IPI_mask(&allbutme, vector);
 }
 
 static void physflat_send_IPI_all(int vector)
 {
-	physflat_send_IPI_mask(cpu_online_map, vector);
+	cpumask_t tmp_online_map = cpu_online_map;
+
+	physflat_send_IPI_mask(&tmp_online_map, vector);
 }
 
 static unsigned int physflat_cpu_mask_to_apicid(cpumask_t cpumask)
Index: linux-2.6.git/arch/x86/kernel/genx2apic_uv_x.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/genx2apic_uv_x.c
+++ linux-2.6.git/arch/x86/kernel/genx2apic_uv_x.c
@@ -94,12 +94,12 @@
 	uv_write_global_mmr64(pnode, UVH_IPI_INT, val);
 }
 
-static void uv_send_IPI_mask(cpumask_t mask, int vector)
+static void uv_send_IPI_mask(cpumask_t *mask, int vector)
 {
 	unsigned int cpu;
 
 	for_each_possible_cpu(cpu)
-		if (cpu_isset(cpu, mask))
+		if (cpu_isset(cpu, *mask))
 			uv_send_IPI_one(cpu, vector);
 }
 
@@ -110,12 +110,14 @@
 	cpu_clear(smp_processor_id(), mask);
 
 	if (!cpus_empty(mask))
-		uv_send_IPI_mask(mask, vector);
+		uv_send_IPI_mask(&mask, vector);
 }
 
 static void uv_send_IPI_all(int vector)
 {
-	uv_send_IPI_mask(cpu_online_map, vector);
+	cpumask_t mask = cpu_online_map;
+
+	uv_send_IPI_mask(&mask, vector);
 }
 
 static int uv_apic_id_registered(void)
Index: linux-2.6.git/arch/x86/kernel/io_apic_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6.git/arch/x86/kernel/io_apic_64.c
@@ -1379,9 +1379,11 @@
 {
 	struct irq_cfg *cfg = &irq_cfg[irq];
 	unsigned long flags;
+	cpumask_t mask;
 
 	spin_lock_irqsave(&vector_lock, flags);
-	send_IPI_mask(cpumask_of_cpu(first_cpu(cfg->domain)), cfg->vector);
+	mask = cpumask_of_cpu(first_cpu(cfg->domain));
+	send_IPI_mask(&mask, cfg->vector);
 	spin_unlock_irqrestore(&vector_lock, flags);
 
 	return 1;
@@ -1446,7 +1448,7 @@
 
 		cpus_and(cleanup_mask, cfg->old_domain, cpu_online_map);
 		cfg->move_cleanup_count = cpus_weight(cleanup_mask);
-		send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
+		send_IPI_mask(&cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
 		cfg->move_in_progress = 0;
 	}
 }
Index: linux-2.6.git/arch/x86/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/smp.c
+++ linux-2.6.git/arch/x86/kernel/smp.c
@@ -114,26 +114,30 @@
  */
 static void native_smp_send_reschedule(int cpu)
 {
+	cpumask_t mask;
+
 	if (unlikely(cpu_is_offline(cpu))) {
 		WARN_ON(1);
 		return;
 	}
-	send_IPI_mask(cpumask_of_cpu(cpu), RESCHEDULE_VECTOR);
+	mask = cpumask_of_cpu(cpu);
+	send_IPI_mask(&mask, RESCHEDULE_VECTOR);
 }
 
 void native_send_call_func_single_ipi(int cpu)
 {
-	send_IPI_mask(cpumask_of_cpu(cpu), CALL_FUNCTION_SINGLE_VECTOR);
+	cpumask_t mask = cpumask_of_cpu(cpu);
+	send_IPI_mask(&mask, CALL_FUNCTION_SINGLE_VECTOR);
 }
 
-void native_send_call_func_ipi(cpumask_t mask)
+void native_send_call_func_ipi(cpumask_t *mask)
 {
 	cpumask_t allbutself;
 
 	allbutself = cpu_online_map;
 	cpu_clear(smp_processor_id(), allbutself);
 
-	if (cpus_equal(mask, allbutself) &&
+	if (cpus_equal(*mask, allbutself) &&
 	    cpus_equal(cpu_online_map, cpu_callout_map))
 		send_IPI_allbutself(CALL_FUNCTION_VECTOR);
 	else
Index: linux-2.6.git/arch/x86/kernel/tlb_32.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/tlb_32.c
+++ linux-2.6.git/arch/x86/kernel/tlb_32.c
@@ -158,7 +158,7 @@
 	 * We have to send the IPI only to
 	 * CPUs affected.
 	 */
-	send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
+	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR);
 
 	while (!cpus_empty(flush_cpumask))
 		/* nothing. lockup detection does not belong here */
Index: linux-2.6.git/arch/x86/kernel/tlb_64.c
===================================================================
--- linux-2.6.git.orig/arch/x86/kernel/tlb_64.c
+++ linux-2.6.git/arch/x86/kernel/tlb_64.c
@@ -186,7 +186,7 @@
 	 * We have to send the IPI only to
 	 * CPUs affected.
 	 */
-	send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR_START + sender);
+	send_IPI_mask(&cpumask, INVALIDATE_TLB_VECTOR_START + sender);
 
 	while (!cpus_empty(f->flush_cpumask))
 		cpu_relax();
Index: linux-2.6.git/arch/x86/xen/smp.c
===================================================================
--- linux-2.6.git.orig/arch/x86/xen/smp.c
+++ linux-2.6.git/arch/x86/xen/smp.c
@@ -361,24 +361,25 @@
 	xen_send_IPI_one(cpu, XEN_RESCHEDULE_VECTOR);
 }
 
-static void xen_send_IPI_mask(cpumask_t mask, enum ipi_vector vector)
+static void xen_send_IPI_mask(cpumask_t *mask, enum ipi_vector vector)
 {
 	unsigned cpu;
+	cpumask_t newmask;
 
-	cpus_and(mask, mask, cpu_online_map);
+	cpus_and(newmask, *mask, cpu_online_map);
 
-	for_each_cpu_mask_nr(cpu, mask)
+	for_each_cpu_mask_nr(cpu, newmask)
 		xen_send_IPI_one(cpu, vector);
 }
 
-static void xen_smp_send_call_function_ipi(cpumask_t mask)
+static void xen_smp_send_call_function_ipi(cpumask_t *mask)
 {
 	int cpu;
 
 	xen_send_IPI_mask(mask, XEN_CALL_FUNCTION_VECTOR);
 
 	/* Make sure other vcpus get a chance to run if they need to. */
-	for_each_cpu_mask_nr(cpu, mask) {
+	for_each_cpu_mask_nr(cpu, *mask) {
 		if (xen_vcpu_stolen(cpu)) {
 			HYPERVISOR_sched_op(SCHEDOP_yield, 0);
 			break;
@@ -388,7 +389,7 @@
 
 static void xen_smp_send_call_function_single_ipi(int cpu)
 {
-	xen_send_IPI_mask(cpumask_of_cpu(cpu), XEN_CALL_FUNCTION_SINGLE_VECTOR);
+	xen_send_IPI_mask(&cpumask_of_cpu(cpu), XEN_CALL_FUNCTION_SINGLE_VECTOR);
 }
 
 static irqreturn_t xen_call_function_interrupt(int irq, void *dev_id)
Index: linux-2.6.git/include/asm-m32r/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-m32r/smp.h
+++ linux-2.6.git/include/asm-m32r/smp.h
@@ -90,7 +90,7 @@
 extern unsigned long send_IPI_mask_phys(cpumask_t, int, int);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif	/* not __ASSEMBLY__ */
 
Index: linux-2.6.git/include/asm-mips/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-mips/smp.h
+++ linux-2.6.git/include/asm-mips/smp.h
@@ -58,6 +58,6 @@
 extern asmlinkage void smp_call_function_interrupt(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif /* __ASM_SMP_H */
Index: linux-2.6.git/include/asm-parisc/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-parisc/smp.h
+++ linux-2.6.git/include/asm-parisc/smp.h
@@ -31,7 +31,7 @@
 extern void smp_send_all_nop(void);
 
 extern void arch_send_call_function_single_ipi(int cpu);
-extern void arch_send_call_function_ipi(cpumask_t mask);
+extern void arch_send_call_function_ipi(cpumask_t *mask);
 
 #endif /* !ASSEMBLY */
 
Index: linux-2.6.git/include/asm-x86/genapic_32.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/genapic_32.h
+++ linux-2.6.git/include/asm-x86/genapic_32.h
@@ -60,7 +60,7 @@
 
 #ifdef CONFIG_SMP
 	/* ipi */
-	void (*send_IPI_mask)(cpumask_t mask, int vector);
+	void (*send_IPI_mask)(cpumask_t *mask, int vector);
 	void (*send_IPI_allbutself)(int vector);
 	void (*send_IPI_all)(int vector);
 #endif
Index: linux-2.6.git/include/asm-x86/genapic_64.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/genapic_64.h
+++ linux-2.6.git/include/asm-x86/genapic_64.h
@@ -21,7 +21,7 @@
 	cpumask_t (*vector_allocation_domain)(int cpu);
 	void (*init_apic_ldr)(void);
 	/* ipi */
-	void (*send_IPI_mask)(cpumask_t mask, int vector);
+	void (*send_IPI_mask)(cpumask_t *mask, int vector);
 	void (*send_IPI_allbutself)(int vector);
 	void (*send_IPI_all)(int vector);
 	/* */
Index: linux-2.6.git/include/asm-x86/mach-default/mach_ipi.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/mach-default/mach_ipi.h
+++ linux-2.6.git/include/asm-x86/mach-default/mach_ipi.h
@@ -13,9 +13,9 @@
 #include <asm/genapic.h>
 #define send_IPI_mask (genapic->send_IPI_mask)
 #else
-static inline void send_IPI_mask(cpumask_t mask, int vector)
+static inline void send_IPI_mask(cpumask_t *mask, int vector)
 {
-	send_IPI_mask_bitmask(mask, vector);
+	send_IPI_mask_bitmask(*mask, vector);
 }
 #endif
 
@@ -25,15 +25,17 @@
 		cpumask_t mask = cpu_online_map;
 
 		cpu_clear(smp_processor_id(), mask);
-		send_IPI_mask(mask, vector);
+		send_IPI_mask(&mask, vector);
 	} else
 		__send_IPI_shortcut(APIC_DEST_ALLBUT, vector);
 }
 
 static inline void __local_send_IPI_all(int vector)
 {
+	cpumask_t mask = cpu_online_map;
+
 	if (no_broadcast || vector == NMI_VECTOR)
-		send_IPI_mask(cpu_online_map, vector);
+		send_IPI_mask(&mask, vector);
 	else
 		__send_IPI_shortcut(APIC_DEST_ALLINC, vector);
 }
Index: linux-2.6.git/include/asm-x86/smp.h
===================================================================
--- linux-2.6.git.orig/include/asm-x86/smp.h
+++ linux-2.6.git/include/asm-x86/smp.h
@@ -53,7 +53,7 @@
 	void (*smp_send_stop)(void);
 	void (*smp_send_reschedule)(int cpu);
 
-	void (*send_call_func_ipi)(cpumask_t mask);
+	void (*send_call_func_ipi)(cpumask_t *mask);
 	void (*send_call_func_single_ipi)(int cpu);
 };
 
@@ -101,7 +101,7 @@
 	smp_ops.send_call_func_single_ipi(cpu);
 }
 
-static inline void arch_send_call_function_ipi(cpumask_t mask)
+static inline void arch_send_call_function_ipi(cpumask_t *mask)
 {
 	smp_ops.send_call_func_ipi(mask);
 }
@@ -110,7 +110,7 @@
 void native_smp_prepare_cpus(unsigned int max_cpus);
 void native_smp_cpus_done(unsigned int max_cpus);
 int native_cpu_up(unsigned int cpunum);
-void native_send_call_func_ipi(cpumask_t mask);
+void native_send_call_func_ipi(cpumask_t *mask);
 void native_send_call_func_single_ipi(int cpu);
 
 extern int __cpu_disable(void);
Index: linux-2.6.git/include/linux/smp.h
===================================================================
--- linux-2.6.git.orig/include/linux/smp.h
+++ linux-2.6.git/include/linux/smp.h
@@ -62,7 +62,7 @@
  * Call a function on all other processors
  */
 int smp_call_function(void(*func)(void *info), void *info, int wait);
-int smp_call_function_mask(cpumask_t mask, void(*func)(void *info), void *info,
+int smp_call_function_mask(cpumask_t *mask, void(*func)(void *info), void *info,
 				int wait);
 int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
 				int wait);
Index: linux-2.6.git/kernel/smp.c
===================================================================
--- linux-2.6.git.orig/kernel/smp.c
+++ linux-2.6.git/kernel/smp.c
@@ -318,7 +318,7 @@
  * hardware interrupt handler or from a bottom half handler. Preemption
  * must be disabled when calling this function.
  */
-int smp_call_function_mask(cpumask_t mask, void (*func)(void *), void *info,
+int smp_call_function_mask(cpumask_t *mask, void (*func)(void *), void *info,
 			   int wait)
 {
 	struct call_function_data d;
@@ -334,8 +334,8 @@
 	cpu = smp_processor_id();
 	allbutself = cpu_online_map;
 	cpu_clear(cpu, allbutself);
-	cpus_and(mask, mask, allbutself);
-	num_cpus = cpus_weight(mask);
+	cpus_and(*mask, *mask, allbutself);
+	num_cpus = cpus_weight(*mask);
 
 	/*
 	 * If zero CPUs, return. If just a single CPU, turn this request
@@ -344,7 +344,7 @@
 	if (!num_cpus)
 		return 0;
 	else if (num_cpus == 1) {
-		cpu = first_cpu(mask);
+		cpu = first_cpu(*mask);
 		return smp_call_function_single(cpu, func, info, wait);
 	}
 
@@ -364,7 +364,7 @@
 	data->csd.func = func;
 	data->csd.info = info;
 	data->refs = num_cpus;
-	data->cpumask = mask;
+	data->cpumask = *mask;
 
 	spin_lock_irqsave(&call_function_lock, flags);
 	list_add_tail_rcu(&data->csd.list, &call_function_queue);
@@ -377,7 +377,7 @@
 	if (wait) {
 		csd_flag_wait(&data->csd);
 		if (unlikely(slowpath))
-			smp_call_function_mask_quiesce_stack(mask);
+			smp_call_function_mask_quiesce_stack(*mask);
 	}
 
 	return 0;
@@ -402,9 +402,10 @@
 int smp_call_function(void (*func)(void *), void *info, int wait)
 {
 	int ret;
+	cpumask_t tmp_online_map = cpu_online_map;
 
 	preempt_disable();
-	ret = smp_call_function_mask(cpu_online_map, func, info, wait);
+	ret = smp_call_function_mask(&tmp_online_map, func, info, wait);
 	preempt_enable();
 	return ret;
 }
Index: linux-2.6.git/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.git.orig/virt/kvm/kvm_main.c
+++ linux-2.6.git/virt/kvm/kvm_main.c
@@ -124,7 +124,7 @@
 	if (cpus_empty(cpus))
 		goto out;
 	++kvm->stat.remote_tlb_flush;
-	smp_call_function_mask(cpus, ack_flush, NULL, 1);
+	smp_call_function_mask(&cpus, ack_flush, NULL, 1);
 out:
 	put_cpu();
 }
@@ -149,7 +149,7 @@
 	}
 	if (cpus_empty(cpus))
 		goto out;
-	smp_call_function_mask(cpus, ack_flush, NULL, 1);
+	smp_call_function_mask(&cpus, ack_flush, NULL, 1);
 out:
 	put_cpu();
 }

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]                   ` <20080902072642.GX20055-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
@ 2008-09-03  2:06                     ` Al Viro
       [not found]                       ` <20080903020629.GS28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Al Viro @ 2008-09-03  2:06 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Linus Torvalds, Peter Osterlund,
	Rafael J. Wysocki, Alan Cox, Linux Kernel Mailing List,
	Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Kernel Testers List

On Tue, Sep 02, 2008 at 09:26:43AM +0200, Jens Axboe wrote:
> > Actually both interfaces are a fscking disaster.  The right things to
> > pass is neither and inode nor a file but a struct block_device.  Al had
> > all this work done a while and it just needs rebasing to a current tree:
> > 
> > 	http://git.kernel.org/?p=linux/kernel/git/viro/bdev.git;a=summary
> 
> Completely agreed. Al, I remember talking to you about this at the
> storage summit back in february. What are your current plans wrt moving
> this forward?

Rebased, with nfs parts of fmode_t patch taken out (irrelevant for
bdev anyway and really better off in intent-killing queue).  Other
than that, it's a straight port...  Same place, same branch.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]                       ` <20080903020629.GS28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2008-09-04 10:13                         ` Jens Axboe
       [not found]                           ` <20080904101326.GD20055-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Jens Axboe @ 2008-09-04 10:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, Peter Osterlund,
	Rafael J. Wysocki, Alan Cox, Linux Kernel Mailing List,
	Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Kernel Testers List

On Wed, Sep 03 2008, Al Viro wrote:
> On Tue, Sep 02, 2008 at 09:26:43AM +0200, Jens Axboe wrote:
> > > Actually both interfaces are a fscking disaster.  The right things to
> > > pass is neither and inode nor a file but a struct block_device.  Al had
> > > all this work done a while and it just needs rebasing to a current tree:
> > > 
> > > 	http://git.kernel.org/?p=linux/kernel/git/viro/bdev.git;a=summary
> > 
> > Completely agreed. Al, I remember talking to you about this at the
> > storage summit back in february. What are your current plans wrt moving
> > this forward?
> 
> Rebased, with nfs parts of fmode_t patch taken out (irrelevant for
> bdev anyway and really better off in intent-killing queue).  Other
> than that, it's a straight port...  Same place, same branch.

So what's your plan with this - 2.6.28?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-09-06 21:24 2.6.27-rc5-git8: Reported regressions from 2.6.26 Rafael J. Wysocki
@ 2008-09-06 21:30 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-09-06 21:30 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (27 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-09-12 18:59 2.6.27-rc6-git2: Reported regressions from 2.6.26 Rafael J. Wysocki
@ 2008-09-12 19:06 ` Rafael J. Wysocki
  2008-09-12 22:05   ` Christoph Lameter
  0 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-09-12 19:06 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (33 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-09-12 19:06 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
@ 2008-09-12 22:05   ` Christoph Lameter
       [not found]     ` <48CAE7A0.8000004-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Christoph Lameter @ 2008-09-12 22:05 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List

Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
> 
> The following bug entry is on the current list of known regressions
> from 2.6.26.  Please verify if it still should be listed and let me know
> (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
> Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
> Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Date		: 2008-08-11 18:36 (33 days old)
> References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
> 
> 
> 

tbench

2.6.27-rc6	2760 MB/sec
2.6.22 		3235.47 MB/sec

diff on the .config files for each (took .22 config and did a make oldconfig)

--- /boot/config-2.6.22.1-4U4JUMP1.12	2008-01-22 08:06:38.000000000 -0600
+++ .config	2008-09-12 16:33:52.000000000 -0500
@@ -1,55 +1,89 @@
 #
 # Automatically generated make config: don't edit
-# Linux kernel version: 2.6.22.1-4U4JUMP1.12
-# Mon Jan 21 16:05:52 2008
+# Linux kernel version: 2.6.27-rc6
+# Fri Sep 12 16:33:52 2008
 #
+# CONFIG_64BIT is not set
 CONFIG_X86_32=y
+# CONFIG_X86_64 is not set
+CONFIG_X86=y
+CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
+# CONFIG_GENERIC_LOCKBREAK is not set
 CONFIG_GENERIC_TIME=y
+CONFIG_GENERIC_CMOS_UPDATE=y
 CONFIG_CLOCKSOURCE_WATCHDOG=y
 CONFIG_GENERIC_CLOCKEVENTS=y
 CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
 CONFIG_LOCKDEP_SUPPORT=y
 CONFIG_STACKTRACE_SUPPORT=y
-CONFIG_SEMAPHORE_SLEEPERS=y
-CONFIG_X86=y
+CONFIG_HAVE_LATENCYTOP_SUPPORT=y
+CONFIG_FAST_CMPXCHG_LOCAL=y
 CONFIG_MMU=y
 CONFIG_ZONE_DMA=y
-CONFIG_QUICKLIST=y
 CONFIG_GENERIC_ISA_DMA=y
 CONFIG_GENERIC_IOMAP=y
 CONFIG_GENERIC_BUG=y
 CONFIG_GENERIC_HWEIGHT=y
+# CONFIG_GENERIC_GPIO is not set
 CONFIG_ARCH_MAY_HAVE_PC_FDC=y
-CONFIG_DMI=y
+# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
+CONFIG_RWSEM_XCHGADD_ALGORITHM=y
+# CONFIG_ARCH_HAS_ILOG2_U32 is not set
+# CONFIG_ARCH_HAS_ILOG2_U64 is not set
+CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
+CONFIG_GENERIC_CALIBRATE_DELAY=y
+# CONFIG_GENERIC_TIME_VSYSCALL is not set
+CONFIG_ARCH_HAS_CPU_RELAX=y
+CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
+CONFIG_HAVE_SETUP_PER_CPU_AREA=y
+# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
+CONFIG_ARCH_HIBERNATION_POSSIBLE=y
+CONFIG_ARCH_SUSPEND_POSSIBLE=y
+# CONFIG_ZONE_DMA32 is not set
+CONFIG_ARCH_POPULATES_NODE_MAP=y
+# CONFIG_AUDIT_ARCH is not set
+CONFIG_ARCH_SUPPORTS_AOUT=y
+CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
+CONFIG_GENERIC_HARDIRQS=y
+CONFIG_GENERIC_IRQ_PROBE=y
+CONFIG_GENERIC_PENDING_IRQ=y
+CONFIG_X86_SMP=y
+CONFIG_X86_32_SMP=y
+CONFIG_X86_HT=y
+CONFIG_X86_BIOS_REBOOT=y
+CONFIG_X86_TRAMPOLINE=y
+CONFIG_KTIME_SCALAR=y
 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

 #
-# Code maturity level options
+# General setup
 #
 CONFIG_EXPERIMENTAL=y
 CONFIG_LOCK_KERNEL=y
 CONFIG_INIT_ENV_ARG_LIMIT=32
-
-#
-# General setup
-#
 CONFIG_LOCALVERSION=""
 CONFIG_LOCALVERSION_AUTO=y
 CONFIG_SWAP=y
 CONFIG_SYSVIPC=y
-# CONFIG_IPC_NS is not set
 CONFIG_SYSVIPC_SYSCTL=y
 CONFIG_POSIX_MQUEUE=y
 # CONFIG_BSD_PROCESS_ACCT is not set
 # CONFIG_TASKSTATS is not set
-# CONFIG_UTS_NS is not set
 # CONFIG_AUDIT is not set
 CONFIG_IKCONFIG=y
 CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=18
-# CONFIG_CPUSETS is not set
+# CONFIG_CGROUPS is not set
+CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
+# CONFIG_GROUP_SCHED is not set
 CONFIG_SYSFS_DEPRECATED=y
+CONFIG_SYSFS_DEPRECATED_V2=y
 # CONFIG_RELAY is not set
+CONFIG_NAMESPACES=y
+# CONFIG_UTS_NS is not set
+# CONFIG_IPC_NS is not set
+# CONFIG_USER_NS is not set
+# CONFIG_PID_NS is not set
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_INITRAMFS_SOURCE=""
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
@@ -64,6 +98,8 @@
 CONFIG_PRINTK=y
 CONFIG_BUG=y
 CONFIG_ELF_CORE=y
+CONFIG_PCSPKR_PLATFORM=y
+CONFIG_COMPAT_BRK=y
 CONFIG_BASE_FULL=y
 CONFIG_FUTEX=y
 CONFIG_ANON_INODES=y
@@ -76,28 +112,40 @@
 CONFIG_SLAB=y
 # CONFIG_SLUB is not set
 # CONFIG_SLOB is not set
+CONFIG_PROFILING=y
+# CONFIG_MARKERS is not set
+CONFIG_OPROFILE=y
+CONFIG_HAVE_OPROFILE=y
+CONFIG_KPROBES=y
+CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
+CONFIG_KRETPROBES=y
+CONFIG_HAVE_IOREMAP_PROT=y
+CONFIG_HAVE_KPROBES=y
+CONFIG_HAVE_KRETPROBES=y
+# CONFIG_HAVE_ARCH_TRACEHOOK is not set
+# CONFIG_HAVE_DMA_ATTRS is not set
+CONFIG_USE_GENERIC_SMP_HELPERS=y
+# CONFIG_HAVE_CLK is not set
+CONFIG_PROC_PAGE_MONITOR=y
+CONFIG_HAVE_GENERIC_DMA_COHERENT=y
+CONFIG_SLABINFO=y
 CONFIG_RT_MUTEXES=y
 # CONFIG_TINY_SHMEM is not set
 CONFIG_BASE_SMALL=0
-
-#
-# Loadable module support
-#
 CONFIG_MODULES=y
+# CONFIG_MODULE_FORCE_LOAD is not set
 CONFIG_MODULE_UNLOAD=y
 CONFIG_MODULE_FORCE_UNLOAD=y
 # CONFIG_MODVERSIONS is not set
 # CONFIG_MODULE_SRCVERSION_ALL is not set
-# CONFIG_KMOD is not set
+CONFIG_KMOD=y
 CONFIG_STOP_MACHINE=y
-
-#
-# Block layer
-#
 CONFIG_BLOCK=y
 CONFIG_LBD=y
 # CONFIG_BLK_DEV_IO_TRACE is not set
 # CONFIG_LSF is not set
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_BLK_DEV_INTEGRITY is not set

 #
 # IO Schedulers
@@ -111,6 +159,7 @@
 # CONFIG_DEFAULT_CFQ is not set
 # CONFIG_DEFAULT_NOOP is not set
 CONFIG_DEFAULT_IOSCHED="anticipatory"
+CONFIG_CLASSIC_RCU=y

 #
 # Processor type and features
@@ -118,17 +167,23 @@
 CONFIG_TICK_ONESHOT=y
 CONFIG_NO_HZ=y
 CONFIG_HIGH_RES_TIMERS=y
+CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
 CONFIG_SMP=y
+CONFIG_X86_FIND_SMP_CONFIG=y
+CONFIG_X86_MPPARSE=y
 # CONFIG_X86_PC is not set
 # CONFIG_X86_ELAN is not set
 # CONFIG_X86_VOYAGER is not set
+CONFIG_X86_GENERICARCH=y
 # CONFIG_X86_NUMAQ is not set
 # CONFIG_X86_SUMMIT is not set
-# CONFIG_X86_BIGSMP is not set
-# CONFIG_X86_VISWS is not set
-CONFIG_X86_GENERICARCH=y
 # CONFIG_X86_ES7000 is not set
-# CONFIG_PARAVIRT is not set
+# CONFIG_X86_BIGSMP is not set
+# CONFIG_X86_VSMP is not set
+# CONFIG_X86_RDC321X is not set
+CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
+# CONFIG_PARAVIRT_GUEST is not set
+# CONFIG_MEMTEST is not set
 CONFIG_X86_CYCLONE_TIMER=y
 # CONFIG_M386 is not set
 # CONFIG_M486 is not set
@@ -139,7 +194,6 @@
 # CONFIG_MPENTIUMII is not set
 # CONFIG_MPENTIUMIII is not set
 # CONFIG_MPENTIUMM is not set
-CONFIG_MCORE2=y
 # CONFIG_MPENTIUM4 is not set
 # CONFIG_MK6 is not set
 # CONFIG_MK7 is not set
@@ -154,33 +208,34 @@
 # CONFIG_MCYRIXIII is not set
 # CONFIG_MVIAC3_2 is not set
 # CONFIG_MVIAC7 is not set
+# CONFIG_MPSC is not set
+CONFIG_MCORE2=y
+# CONFIG_GENERIC_CPU is not set
 CONFIG_X86_GENERIC=y
+CONFIG_X86_CPU=y
 CONFIG_X86_CMPXCHG=y

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
       [not found]     ` <48CAE7A0.8000004-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2008-09-13 11:44       ` Mike Galbraith
       [not found]         ` <1221306287.5213.111.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-13 11:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Fri, 2008-09-12 at 17:05 -0500, Christoph Lameter wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
> > Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
> > Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > Date		: 2008-08-11 18:36 (33 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
> > 
> > 
> > 
> 
> tbench
> 
> 2.6.27-rc6	2760 MB/sec
> 2.6.22 		3235.47 MB/sec

Numbers from my Q6600 Aldi supermarket box (hm, your box is from different shelf)

tbench -t 60 4 localhost followed by 4 60s netperf TCP_RR pairs, each pair
jabbering on a separate port and affine to a separate CPU.  Configs are
as close as I can make them, all kernels built and tested today by
identical userland.

2.6.22.19
Throughput 1136.02 MB/sec 4 procs

16384  87380  1        1       60.01    94179.12
16384  87380  1        1       60.01    88780.61
16384  87380  1        1       60.01    91057.72
16384  87380  1        1       60.01    94242.16

2.6.22.19-cfs-v24.1  (identical config)
Throughput 1126.79 MB/sec 4 procs

16384  87380  1        1       60.00    88809.14
16384  87380  1        1       60.00    89971.25
16384  87380  1        1       60.01    89452.91
16384  87380  1        1       60.01    89478.63

2.6.23.17
Throughput 1073.2 MB/sec 4 procs

16384  87380  1        1       60.00    83635.61
16384  87380  1        1       60.00    82754.36
16384  87380  1        1       60.00    84594.59
16384  87380  1        1       60.00    82995.81

2.6.23.17-cfs-v24.1  (identical config)
Throughput 1145.28 MB/sec 4 procs

16384  87380  1        1       60.00    90278.55
16384  87380  1        1       60.01    90579.31
16384  87380  1        1       60.01    89412.14
16384  87380  1        1       60.00    90270.97

2.6.24.7
Throughput 1119.28 MB/sec 4 procs

16384  87380  1        1       60.00    84092.78
16384  87380  1        1       60.00    84120.68
16384  87380  1        1       60.00    84076.73
16384  87380  1        1       60.00    83995.07

2.6.25.17
Throughput 1113.82 MB/sec 4 procs

16384  87380  1        1       60.00    84629.98
16384  87380  1        1       60.00    84776.38
16384  87380  1        1       60.00    84356.49
16384  87380  1        1       60.00    84469.71

2.6.26.5
Throughput 1095.26 MB/sec 4 procs

16384  87380  1        1       60.00    84481.11
16384  87380  1        1       60.00    84604.38
16384  87380  1        1       60.01    86526.84
16384  87380  1        1       60.01    84478.01

2.6.27-rc6
Throughput 1037.98 MB/sec 4 procs

16384  87380  1        1       60.00    80293.80
16384  87380  1        1       60.00    80266.60
16384  87380  1        1       60.00    80394.83
16384  87380  1        1       60.01    80397.27

I spent two weeks chasing various and sundry netperf numbers recently,
only learning in the process that netperf is _utterly immune_ to
bisection.  Tbench numbers don't look promising for bisection from here.

Note to quixotic self: destroy log immediately lest you be tempted.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
       [not found]         ` <1221306287.5213.111.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-13 11:57           ` Mike Galbraith
  2008-09-14  6:24           ` Mike Galbraith
  2008-09-14 14:18           ` Christoph Lameter
  2 siblings, 0 replies; 318+ messages in thread
From: Mike Galbraith @ 2008-09-13 11:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Sat, 2008-09-13 at 13:44 +0200, Mike Galbraith wrote:

> 2.6.23.17-cfs-v24.1  (identical config)
> Throughput 1145.28 MB/sec 4 procs
> 
> 16384  87380  1        1       60.00    90278.55
> 16384  87380  1        1       60.01    90579.31
> 16384  87380  1        1       60.01    89412.14
> 16384  87380  1        1       60.00    90270.97
> 
> 2.6.24.7
> Throughput 1119.28 MB/sec 4 procs
> 
> 16384  87380  1        1       60.00    84092.78
> 16384  87380  1        1       60.00    84120.68
> 16384  87380  1        1       60.00    84076.73
> 16384  87380  1        1       60.00    83995.07

P.S. fwiw, scheduler difference between these two kernels is practically
nill.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
       [not found]         ` <1221306287.5213.111.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  2008-09-13 11:57           ` Mike Galbraith
@ 2008-09-14  6:24           ` Mike Galbraith
       [not found]             ` <1221373494.4979.1.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  2008-09-14 14:18           ` Christoph Lameter
  2 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-14  6:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Sat, 2008-09-13 at 13:44 +0200, Mike Galbraith wrote:

> 2.6.27-rc6
> Throughput 1037.98 MB/sec 4 procs
> 
> 16384  87380  1        1       60.00    80293.80
> 16384  87380  1        1       60.00    80266.60
> 16384  87380  1        1       60.00    80394.83
> 16384  87380  1        1       60.01    80397.27

<snip... sigh>

goes back to current real .27 config

Throughput 1022.52 MB/sec 4 procs

16384  87380  1        1       60.00    75941.95
16384  87380  1        1       60.01    76002.46
16384  87380  1        1       60.01    76367.55
16384  87380  1        1       60.00    76188.66

...

demodularizes seriously over-configured network

Throughput 750.175 MB/sec 4 procs

16384  87380  1        1       60.00    49270.35
16384  87380  1        1       60.01    49233.86
16384  87380  1        1       60.00    49265.72
16384  87380  1        1       60.00    49227.00

Very un-good thing to try.  Stupid thing too?

-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
       [not found]             ` <1221373494.4979.1.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-14  7:02               ` Mike Galbraith
  0 siblings, 0 replies; 318+ messages in thread
From: Mike Galbraith @ 2008-09-14  7:02 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Sun, 2008-09-14 at 08:24 +0200, Mike Galbraith wrote:
> On Sat, 2008-09-13 at 13:44 +0200, Mike Galbraith wrote:
> 
> > 2.6.27-rc6
> > Throughput 1037.98 MB/sec 4 procs
> > 
> > 16384  87380  1        1       60.00    80293.80
> > 16384  87380  1        1       60.00    80266.60
> > 16384  87380  1        1       60.00    80394.83
> > 16384  87380  1        1       60.01    80397.27
> 
> <snip... sigh>
> 
> goes back to current real .27 config
> 
> Throughput 1022.52 MB/sec 4 procs
> 
> 16384  87380  1        1       60.00    75941.95
> 16384  87380  1        1       60.01    76002.46
> 16384  87380  1        1       60.01    76367.55
> 16384  87380  1        1       60.00    76188.66
> 
> ...
> 
> demodularizes seriously over-configured network
> 
> Throughput 750.175 MB/sec 4 procs
> 
> 16384  87380  1        1       60.00    49270.35
> 16384  87380  1        1       60.01    49233.86
> 16384  87380  1        1       60.00    49265.72
> 16384  87380  1        1       60.00    49227.00
> 
> Very un-good thing to try.  Stupid thing too?

Apparently stupid for netfilter.  Uninteresting to this thread I
suppose, so disregard.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
       [not found]         ` <1221306287.5213.111.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  2008-09-13 11:57           ` Mike Galbraith
  2008-09-14  6:24           ` Mike Galbraith
@ 2008-09-14 14:18           ` Christoph Lameter
       [not found]             ` <48CD1D25.9080301-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2 siblings, 1 reply; 318+ messages in thread
From: Christoph Lameter @ 2008-09-14 14:18 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

Mike Galbraith wrote:
> Numbers from my Q6600 Aldi supermarket box (hm, your box is from different shelf)
>   
My box is an 8p with recent quad core processors. 8G, 32bit Linux.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]             ` <48CD1D25.9080301-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2008-09-14 19:51               ` Mike Galbraith
       [not found]                 ` <1221421907.4597.24.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-14 19:51 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Sun, 2008-09-14 at 09:18 -0500, Christoph Lameter wrote:
> Mike Galbraith wrote:
> > Numbers from my Q6600 Aldi supermarket box (hm, your box is from different shelf)
> >   
> My box is an 8p with recent quad core processors. 8G, 32bit Linux.

Don't hold your breath, but after putting my network config of a very
severe diet, I'm starting to see something resembling sensible results.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: 2.6.27-rc4-git1: Reported regressions from 2.6.26
       [not found]                           ` <20080904101326.GD20055-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
@ 2008-09-15  5:30                             ` Al Viro
  0 siblings, 0 replies; 318+ messages in thread
From: Al Viro @ 2008-09-15  5:30 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Linus Torvalds, Peter Osterlund,
	Rafael J. Wysocki, Alan Cox, Linux Kernel Mailing List,
	Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Kernel Testers List

On Thu, Sep 04, 2008 at 12:13:27PM +0200, Jens Axboe wrote:
> On Wed, Sep 03 2008, Al Viro wrote:
> > On Tue, Sep 02, 2008 at 09:26:43AM +0200, Jens Axboe wrote:
> > > > Actually both interfaces are a fscking disaster.  The right things to
> > > > pass is neither and inode nor a file but a struct block_device.  Al had
> > > > all this work done a while and it just needs rebasing to a current tree:
> > > > 
> > > > 	http://git.kernel.org/?p=linux/kernel/git/viro/bdev.git;a=summary
> > > 
> > > Completely agreed. Al, I remember talking to you about this at the
> > > storage summit back in february. What are your current plans wrt moving
> > > this forward?
> > 
> > Rebased, with nfs parts of fmode_t patch taken out (irrelevant for
> > bdev anyway and really better off in intent-killing queue).  Other
> > than that, it's a straight port...  Same place, same branch.
> 
> So what's your plan with this - 2.6.28?

Yes.  The only nastiness is around drivers/ide - there it gets a bunch of
annoying conflicts from the ide-{disk,floppy}_ioctl.c splitoff.  Other than
that, it's trivially ported on top of current linux-next.  Merge order is
going to be interesting - depending on whether block merge happens before
or after ide one.

I'm going to put linux-next-based series on kernel.org tonight, before
going to Portland...

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                 ` <1221421907.4597.24.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-15 10:44                   ` Mike Galbraith
       [not found]                     ` <1221475440.4784.39.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-15 10:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Sun, 2008-09-14 at 21:51 +0200, Mike Galbraith wrote:
> On Sun, 2008-09-14 at 09:18 -0500, Christoph Lameter wrote:
> > Mike Galbraith wrote:
> > > Numbers from my Q6600 Aldi supermarket box (hm, your box is from different shelf)
> > >   
> > My box is an 8p with recent quad core processors. 8G, 32bit Linux.
> 
> Don't hold your breath, but after putting my network config of a very
> severe diet, I'm starting to see something resembling sensible results.

Turns off all netfilter options except tables, etc.

Since 2.6.22.19-cfs-v24.1 and 2.6.23.17-cfs-v24.1 schedulers are
identical, and these are essentially identical with 2.6.24.7, what I
read from numbers below is that cfs in 2.6.23 was somewhat less than
wonderful for either netperf or tbench,  Something happened somewhere
other than the scheduler at 23->24 which cost us some performance, and
another something happened at 26->27.  I'll likely go looking again..
and likely regret it again ;-)

Math ain't free is part of it, though apparently not much.  For me,
tbench regression 22->27 is ~10%, and netperf regression is ~16%.

Data:

2.6.22.19

Throughput 1250.73 MB/sec 4 procs                  1.00

16384  87380  1        1       60.01    111272.55  1.00
16384  87380  1        1       60.00    104689.58
16384  87380  1        1       60.00    110733.05
16384  87380  1        1       60.00    110748.88

2.6.22.19-cfs-v24.1

Throughput 1204.14 MB/sec 4 procs                  .962

16384  87380  1        1       60.01    101799.85  .929
16384  87380  1        1       60.01    101659.41
16384  87380  1        1       60.01    101628.78
16384  87380  1        1       60.01    101700.53

wakeup granularity = 0 (make scheduler as preempt happy as 2.6.22 is)

Throughput 1213.21 MB/sec 4 procs                  .970

16384  87380  1        1       60.01    108569.27  .992
16384  87380  1        1       60.01    108541.04
16384  87380  1        1       60.00    108579.63
16384  87380  1        1       60.01    108519.09

2.6.23.17

Throughput 1192.49 MB/sec 4 procs                  .953

16384  87380  1        1       60.00    91124.67   .866
16384  87380  1        1       60.00    93124.38
16384  87380  1        1       60.01    92249.69
16384  87380  1        1       60.01    91103.12

wakeup granularity = 0

Throughput 1200.46 MB/sec 4 procs                  .959

16384  87380  1        1       60.01    95987.66   .866
16384  87380  1        1       60.01    92819.98
16384  87380  1        1       60.01    95454.00
16384  87380  1        1       60.01    94834.84

2.6.23.17-cfs-v24.1

Throughput 1242.47 MB/sec 4 procs                  .993

16384  87380  1        1       60.00    101728.34  .931
16384  87380  1        1       60.00    101930.23
16384  87380  1        1       60.00    101803.15
16384  87380  1        1       60.00    101908.29

wakeup granularity = 0

Throughput 1238.68 MB/sec 4 procs                  .990

16384  87380  1        1       60.01    105871.52  .969
16384  87380  1        1       60.01    105813.11
16384  87380  1        1       60.01    106106.31
16384  87380  1        1       60.01    106310.20

2.6.24.7

Throughput 1202.49 MB/sec 4 procs                  .961

16384  87380  1        1       60.00    94643.23   .868
16384  87380  1        1       60.00    94754.37
16384  87380  1        1       60.00    94909.77
16384  87380  1        1       60.00    95457.41

wakeup granularity = 0

Throughput 1204 MB/sec 4 procs                     .962

16384  87380  1        1       60.00    99599.27   .910
16384  87380  1        1       60.00    99439.95
16384  87380  1        1       60.00    99556.38
16384  87380  1        1       60.00    99500.45

2.6.25.17

Throughput 1220.47 MB/sec 4 procs                  .975

16384  87380  1        1       60.00    94641.06   .867
16384  87380  1        1       60.00    94864.87
16384  87380  1        1       60.01    95033.81
16384  87380  1        1       60.00    94863.49

wakeup granularity = 0

Throughput 1223.16 MB/sec 4 procs                  .977
16384  87380  1        1       60.00    101768.95  .930
16384  87380  1        1       60.00    101888.46
16384  87380  1        1       60.01    101608.21
16384  87380  1        1       60.01    101833.05

2.6.26.5

Throughput 1182.24 MB/sec 4 procs                  .945

16384  87380  1        1       60.00    93814.75   .854
16384  87380  1        1       60.00    94173.41
16384  87380  1        1       60.00    92925.24
16384  87380  1        1       60.00    93002.51

wakeup granularity = 0

Throughput 1183.47 MB/sec 4 procs                  .945

16384  87380  1        1       60.00    100837.12  .922
16384  87380  1        1       60.00    101230.12
16384  87380  1        1       60.00    100868.45
16384  87380  1        1       60.00    100491.41

2.6.27

Throughput 1088.17 MB/sec 4 procs                  .870

16384  87380  1        1       60.00    84225.59   .766
16384  87380  1        1       60.00    83362.65
16384  87380  1        1       60.00    84060.73
16384  87380  1        1       60.00    83462.72

wakeup granularity = 0

Throughput 1116.22 MB/sec 4 procs                  .892

16384  87380  1        1       60.00    92502.44   .841
16384  87380  1        1       60.01    92213.72
16384  87380  1        1       60.00    91445.86
16384  87380  1        1       60.00    91832.84

revert sched weight/asym changes, gran = 0

Throughput 1149.16 MB/sec 4 proc                   .918

16384  87380  1        1       60.00    94824.92   .868
16384  87380  1        1       60.01    94579.45
16384  87380  1        1       60.01    95284.94
16384  87380  1        1       60.01    95228.22

Weight/asym changes cost ~3%.  Mysql+oltp agrees.  Preempt happy loads
lose a bit, preempt haters gain a bit.  Performance shift.



^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                     ` <1221475440.4784.39.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-16 12:28                       ` Mike Galbraith
       [not found]                         ` <1221568105.5020.17.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-16 12:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List

On Mon, 2008-09-15 at 12:44 +0200, Mike Galbraith wrote:
> On Sun, 2008-09-14 at 21:51 +0200, Mike Galbraith wrote:
> > On Sun, 2008-09-14 at 09:18 -0500, Christoph Lameter wrote:
> > > Mike Galbraith wrote:
> > > > Numbers from my Q6600 Aldi supermarket box (hm, your box is from different shelf)
> > > >   
> > > My box is an 8p with recent quad core processors. 8G, 32bit Linux.
> > 
> > Don't hold your breath, but after putting my network config of a very
> > severe diet, I'm starting to see something resembling sensible results.
> 
> Turns off all netfilter options except tables, etc.
> 
> Since 2.6.22.19-cfs-v24.1 and 2.6.23.17-cfs-v24.1 schedulers are
> identical, and these are essentially identical with 2.6.24.7, what I
> read from numbers below is that cfs in 2.6.23 was somewhat less than
> wonderful for either netperf or tbench,  Something happened somewhere
> other than the scheduler at 23->24 which cost us some performance, and
> another something happened at 26->27.  I'll likely go looking again..
> and likely regret it again ;-)

Bisecting 26->27 yet again turned up a repeatable downturn in netperf
throughput.  There is no difference at this point with tbench. 

Bisect says first bad commit is 847106f, a security merge.  Post
bisection sanity checkouts say...

v2.6.26-21-g2069f45
16384  87380  1        1       60.00    98435.13
16384  87380  1        1       60.01    99259.90
16384  87380  1        1       60.01    99325.61
16384  87380  1        1       60.00    99039.84

v2.6.26-343-g847106f
16384  87380  1        1       60.00    94764.59
16384  87380  1        1       60.00    94909.89
16384  87380  1        1       60.00    94858.63
16384  87380  1        1       60.00    94801.12

...every time.  I knew I'd regret doing this.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                         ` <1221568105.5020.17.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-16 14:07                           ` Ilpo Järvinen
  2008-09-17  4:39                             ` Mike Galbraith
  0 siblings, 1 reply; 318+ messages in thread
From: Ilpo Järvinen @ 2008-09-16 14:07 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 16 Sep 2008, Mike Galbraith wrote:

> On Mon, 2008-09-15 at 12:44 +0200, Mike Galbraith wrote:
> > On Sun, 2008-09-14 at 21:51 +0200, Mike Galbraith wrote:
> > 
> > Since 2.6.22.19-cfs-v24.1 and 2.6.23.17-cfs-v24.1 schedulers are
> > identical, and these are essentially identical with 2.6.24.7, what I
> > read from numbers below is that cfs in 2.6.23 was somewhat less than
> > wonderful for either netperf or tbench,  Something happened somewhere
> > other than the scheduler at 23->24 which cost us some performance, and
> > another something happened at 26->27.  I'll likely go looking again..
> > and likely regret it again ;-)
> 
> Bisecting 26->27 yet again turned up a repeatable downturn in netperf
> throughput.  There is no difference at this point with tbench. 
> 
> Bisect says first bad commit is 847106f, a security merge.  Post
> bisection sanity checkouts say...
> 
> v2.6.26-21-g2069f45
> 16384  87380  1        1       60.00    98435.13
> 16384  87380  1        1       60.01    99259.90
> 16384  87380  1        1       60.01    99325.61
> 16384  87380  1        1       60.00    99039.84
> 
> v2.6.26-343-g847106f
> 16384  87380  1        1       60.00    94764.59
> 16384  87380  1        1       60.00    94909.89
> 16384  87380  1        1       60.00    94858.63
> 16384  87380  1        1       60.00    94801.12
> 
> ...every time.  I knew I'd regret doing this.

I assume that c142bda458a gave a good results as well...

One additional sanity check could be to rebase security 6f0f0fd4963 on top 
of the c142bda458a and then see if bisection among those security commits 
on top yields to the the same result... Though I doubt it can change much 
because there was not that much relevant non-security things in the merge 
in question.

-- 
 i.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
  2008-09-16 14:07                           ` Ilpo Järvinen
@ 2008-09-17  4:39                             ` Mike Galbraith
  2008-09-17  5:01                               ` Mike Galbraith
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-17  4:39 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Christoph Lameter, Rafael J. Wysocki, LKML, kernel-testers

On Tue, 2008-09-16 at 17:07 +0300, Ilpo JÃ¤rvinen wrote:
> On Tue, 16 Sep 2008, Mike Galbraith wrote:
> 
> > On Mon, 2008-09-15 at 12:44 +0200, Mike Galbraith wrote:
> > > On Sun, 2008-09-14 at 21:51 +0200, Mike Galbraith wrote:
> > > 
> > > Since 2.6.22.19-cfs-v24.1 and 2.6.23.17-cfs-v24.1 schedulers are
> > > identical, and these are essentially identical with 2.6.24.7, what I
> > > read from numbers below is that cfs in 2.6.23 was somewhat less than
> > > wonderful for either netperf or tbench,  Something happened somewhere
> > > other than the scheduler at 23->24 which cost us some performance, and
> > > another something happened at 26->27.  I'll likely go looking again..
> > > and likely regret it again ;-)
> > 
> > Bisecting 26->27 yet again turned up a repeatable downturn in netperf
> > throughput.  There is no difference at this point with tbench. 
> > 
> > Bisect says first bad commit is 847106f, a security merge.  Post
> > bisection sanity checkouts say...
> > 
> > v2.6.26-21-g2069f45
> > 16384  87380  1        1       60.00    98435.13
> > 16384  87380  1        1       60.01    99259.90
> > 16384  87380  1        1       60.01    99325.61
> > 16384  87380  1        1       60.00    99039.84
> > 
> > v2.6.26-343-g847106f
> > 16384  87380  1        1       60.00    94764.59
> > 16384  87380  1        1       60.00    94909.89
> > 16384  87380  1        1       60.00    94858.63
> > 16384  87380  1        1       60.00    94801.12
> > 
> > ...every time.  I knew I'd regret doing this.
> 
> I assume that c142bda458a gave a good results as well...

Yes, just tried it.

> One additional sanity check could be to rebase security 6f0f0fd4963 on top 
> of the c142bda458a and then see if bisection among those security commits 
> on top yields to the the same result... Though I doubt it can change much 
> because there was not that much relevant non-security things in the merge 
> in question.

I'm not a master of git-foo, so that is not an option.  However, a dinky
bisection c142bda4..847106f very clearly says...

marge:..kernel/linux-2.6.27.git # git bisect bad
6f0f0fd496333777d53daff21a4e3b28c4d03a6d is first bad commit
commit 6f0f0fd496333777d53daff21a4e3b28c4d03a6d
Author: James Morris <jmorris@namei.org>
Date:   Thu Jul 10 17:02:07 2008 +0900

    security: remove register_security hook

    The register security hook is no longer required, as the capability
    module is always registered.  LSMs wishing to stack capability as
    a secondary module should do so explicitly.

    Signed-off-by: James Morris <jmorris@namei.org>
    Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
    Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

:040000 040000 0177ef46d305e51e27bfcc4350a40577f8ba8d3d 64b64c10a424df4539653a8ee34f1a2329300931 M      include
:040000 040000 e318891e514de674fd064f6bfad70d5633b1aff1 0dbb38d5aa7fc3e4b2e09dc65796ce7cd5faeb26 M      security

git bisect start
# good: [c142bda458a9c81097238800e1bd8eeeea09913d] Merge branch 'drm-reorg' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
git bisect good c142bda458a9c81097238800e1bd8eeeea09913d
# bad: [847106ff628805e1a0aa91e7f53381f3fdfcd839] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
git bisect bad 847106ff628805e1a0aa91e7f53381f3fdfcd839
# good: [cea78dc4ca044e9666e8f5d797ec50ab85253e49] SELinux: fix off by 1 reference of class_to_string in context_struct_compute_av
git bisect good cea78dc4ca044e9666e8f5d797ec50ab85253e49
# good: [65fc7668006b537f7ae8451990c0ed9ec882544e] security: fix return of void-valued expressions
git bisect good 65fc7668006b537f7ae8451990c0ed9ec882544e
# good: [b478a9f9889c81e88077d1495daadee64c0af541] security: remove unused sb_get_mnt_opts hook
git bisect good b478a9f9889c81e88077d1495daadee64c0af541
# good: [93cbace7a058bce7f99319ef6ceff4b78cf45051] security: remove dummy module fix
git bisect good 93cbace7a058bce7f99319ef6ceff4b78cf45051
# bad: [6f0f0fd496333777d53daff21a4e3b28c4d03a6d] security: remove register_security hook
git bisect bad 6f0f0fd496333777d53daff21a4e3b28c4d03a6d


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
  2008-09-17  4:39                             ` Mike Galbraith
@ 2008-09-17  5:01                               ` Mike Galbraith
       [not found]                                 ` <1221627676.5125.3.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-17  5:01 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Christoph Lameter, Rafael J. Wysocki, LKML, kernel-testers

On Wed, 2008-09-17 at 06:40 +0200, Mike Galbraith wrote:
> On Tue, 2008-09-16 at 17:07 +0300, Ilpo JÃ¤rvinen wrote:

> > One additional sanity check could be to rebase security 6f0f0fd4963 on top 
> > of the c142bda458a and then see if bisection among those security commits 
> > on top yields to the the same result... Though I doubt it can change much 
> > because there was not that much relevant non-security things in the merge 
> > in question.
> 
> I'm not a master of git-foo, so that is not an option.  However, a dinky
> bisection c142bda4..847106f very clearly says...
> 
> marge:..kernel/linux-2.6.27.git # git bisect bad
> 6f0f0fd496333777d53daff21a4e3b28c4d03a6d is first bad commit
> commit 6f0f0fd496333777d53daff21a4e3b28c4d03a6d
> Author: James Morris <jmorris@namei.org>
> Date:   Thu Jul 10 17:02:07 2008 +0900
> 
>     security: remove register_security hook
> 
>     The register security hook is no longer required, as the capability
>     module is always registered.  LSMs wishing to stack capability as
>     a secondary module should do so explicitly.
> 
>     Signed-off-by: James Morris <jmorris@namei.org>
>     Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
>     Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
> :040000 040000 0177ef46d305e51e27bfcc4350a40577f8ba8d3d 64b64c10a424df4539653a8ee34f1a2329300931 M      include
> :040000 040000 e318891e514de674fd064f6bfad70d5633b1aff1 0dbb38d5aa7fc3e4b2e09dc65796ce7cd5faeb26 M      security

Which is high grade horse-pookey.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                 ` <1221627676.5125.3.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-17 10:40                                   ` Ingo Molnar
       [not found]                                     ` <20080917104044.GC18764-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-09-17 10:40 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ilpo Järvinen, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA


* Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org> wrote:

> On Wed, 2008-09-17 at 06:40 +0200, Mike Galbraith wrote:
> > On Tue, 2008-09-16 at 17:07 +0300, Ilpo Järvinen wrote:
> 
> > > One additional sanity check could be to rebase security 6f0f0fd4963 on top 
> > > of the c142bda458a and then see if bisection among those security commits 
> > > on top yields to the the same result... Though I doubt it can change much 
> > > because there was not that much relevant non-security things in the merge 
> > > in question.
> > 
> > I'm not a master of git-foo, so that is not an option.  However, a dinky
> > bisection c142bda4..847106f very clearly says...
> > 
> > marge:..kernel/linux-2.6.27.git # git bisect bad
> > 6f0f0fd496333777d53daff21a4e3b28c4d03a6d is first bad commit
> > commit 6f0f0fd496333777d53daff21a4e3b28c4d03a6d
> > Author: James Morris <jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>
> > Date:   Thu Jul 10 17:02:07 2008 +0900
> > 
> >     security: remove register_security hook
> > 
> >     The register security hook is no longer required, as the capability
> >     module is always registered.  LSMs wishing to stack capability as
> >     a secondary module should do so explicitly.
> > 
> >     Signed-off-by: James Morris <jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>
> >     Acked-by: Stephen Smalley <sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org>
> >     Acked-by: Greg Kroah-Hartman <gregkh-l3A5Bk7waGM@public.gmane.org>
> > 
> > :040000 040000 0177ef46d305e51e27bfcc4350a40577f8ba8d3d 64b64c10a424df4539653a8ee34f1a2329300931 M      include
> > :040000 040000 e318891e514de674fd064f6bfad70d5633b1aff1 0dbb38d5aa7fc3e4b2e09dc65796ce7cd5faeb26 M      security
> 
> Which is high grade horse-pookey.

perhaps re-test commit 6f0f0fd49 and its parent commit, 93cbace7a0.

It looks like a potentially bogus bisection result, but _maybe_ it has 
relevance: changes the size of "struct security_operations", which could 
have alignment and layout effects on all sorts of kernel variables, 
kmalloc sizes, etc.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                     ` <20080917104044.GC18764-X9Un+BFzKDI@public.gmane.org>
@ 2008-09-17 11:41                                       ` Mike Galbraith
       [not found]                                         ` <1221651701.5102.17.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-17 11:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ilpo Järvinen, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 2008-09-17 at 12:40 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org> wrote:
> 
> > On Wed, 2008-09-17 at 06:40 +0200, Mike Galbraith wrote:
> > > On Tue, 2008-09-16 at 17:07 +0300, Ilpo JÃ¤rvinen wrote:
> > 
> > > > One additional sanity check could be to rebase security 6f0f0fd4963 on top 
> > > > of the c142bda458a and then see if bisection among those security commits 
> > > > on top yields to the the same result... Though I doubt it can change much 
> > > > because there was not that much relevant non-security things in the merge 
> > > > in question.
> > > 
> > > I'm not a master of git-foo, so that is not an option.  However, a dinky
> > > bisection c142bda4..847106f very clearly says...
> > > 
> > > marge:..kernel/linux-2.6.27.git # git bisect bad
> > > 6f0f0fd496333777d53daff21a4e3b28c4d03a6d is first bad commit
> > > commit 6f0f0fd496333777d53daff21a4e3b28c4d03a6d
> > > Author: James Morris <jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>
> > > Date:   Thu Jul 10 17:02:07 2008 +0900
> > > 
> > >     security: remove register_security hook
> > > 
> > >     The register security hook is no longer required, as the capability
> > >     module is always registered.  LSMs wishing to stack capability as
> > >     a secondary module should do so explicitly.
> > > 
> > >     Signed-off-by: James Morris <jmorris-gx6/JNMH7DfYtjvyW6yDsg@public.gmane.org>
> > >     Acked-by: Stephen Smalley <sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org>
> > >     Acked-by: Greg Kroah-Hartman <gregkh-l3A5Bk7waGM@public.gmane.org>
> > > 
> > > :040000 040000 0177ef46d305e51e27bfcc4350a40577f8ba8d3d 64b64c10a424df4539653a8ee34f1a2329300931 M      include
> > > :040000 040000 e318891e514de674fd064f6bfad70d5633b1aff1 0dbb38d5aa7fc3e4b2e09dc65796ce7cd5faeb26 M      security
> > 
> > Which is high grade horse-pookey.
> 
> perhaps re-test commit 6f0f0fd49 and its parent commit, 93cbace7a0.

Will do.

> It looks like a potentially bogus bisection result, but _maybe_ it has 
> relevance: changes the size of "struct security_operations", which could 
> have alignment and layout effects on all sorts of kernel variables, 
> kmalloc sizes, etc.

This may well be a mythical creature infestation for all I know ;-), but
it's address is somewhere in the 2069f45..847106f block, 316 commits,
none of which look like they should be the least bit interesting to
netperf.  I reverted this particular commit in 27.git, got the expected
result.  Looks like I'll keep poking at it, can't seem to resist.  Grr.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                         ` <1221651701.5102.17.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-17 12:49                                           ` Ingo Molnar
       [not found]                                             ` <20080917124943.GA7738-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-09-17 12:49 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ilpo Järvinen, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA


* Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org> wrote:

> > It looks like a potentially bogus bisection result, but _maybe_ it 
> > has relevance: changes the size of "struct security_operations", 
> > which could have alignment and layout effects on all sorts of kernel 
> > variables, kmalloc sizes, etc.
> 
> This may well be a mythical creature infestation for all I know ;-), 
> but it's address is somewhere in the 2069f45..847106f block, 316 
> commits, none of which look like they should be the least bit 
> interesting to netperf.  I reverted this particular commit in 27.git, 
> got the expected result.  Looks like I'll keep poking at it, can't 
> seem to resist.  Grr.

are you sure it's 2069f45..847106f? Filtering out the 
likely-uninteresting commits:

git log --pretty=format:"%h: %s" 2069f45..847106f | grep -viE \
'block|alsa|pcmcia|sound|Merge|iosched|blk|DAC960|scsi|s390|paride|pktcdvd|filter|cdrom|drm'

gives us:

 7daf705: Start using the new '%pS' infrastructure to print symbols
 6f0f0fd: security: remove register_security hook
 93cbace: security: remove dummy module fix
 5915eb5: security: remove dummy module
 b478a9f: security: remove unused sb_get_mnt_opts hook
 32502b8: splice: fix generic_file_splice_read() race with page invalidation
 8b3d356: ramfs: enable splice write
 a144ff0: xen: Avoid allocations causing swap activity on the resume path

which really only leaves that security commit your bisection fingered. 
Which _slightly_ raises its likelyhood of being implicated. Structure 
size changes can move two formerly far-apart netperf-relevant symbols on 
the same cacheline, which can start cache ping-pong-ing badly.

It wouldnt be the first such incident - alignment changes impacting 
macro benchmarks. (and it's hard to find it as the thing that changes 
alignment/size/sharedness might be something totally unrelated)

It's still a bit too early to say this for sure though ...

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                             ` <20080917124943.GA7738-X9Un+BFzKDI@public.gmane.org>
@ 2008-09-17 13:11                                               ` Mike Galbraith
       [not found]                                                 ` <1221657111.5511.14.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-17 13:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ilpo Järvinen, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 2008-09-17 at 14:49 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org> wrote:
> 
> > > It looks like a potentially bogus bisection result, but _maybe_ it 
> > > has relevance: changes the size of "struct security_operations", 
> > > which could have alignment and layout effects on all sorts of kernel 
> > > variables, kmalloc sizes, etc.
> > 
> > This may well be a mythical creature infestation for all I know ;-), 
> > but it's address is somewhere in the 2069f45..847106f block, 316 
> > commits, none of which look like they should be the least bit 
> > interesting to netperf.  I reverted this particular commit in 27.git, 
> > got the expected result.  Looks like I'll keep poking at it, can't 
> > seem to resist.  Grr.
> 
> are you sure it's 2069f45..847106f? Filtering out the 
> likely-uninteresting commits:

Yeah, as sure as I can be.  I've built both (et al) kernels several
times, and it has repeated every time.  Would be nice if someone would
try to confirm/deny though.  For my little quad, I do..

#!/bin/sh

echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns 

netserver -p 12865
netserver -p 12866
netserver -p 12867
netserver -p 12868

netperf -p 12865 -t TCP_RR -l 60 -H 127.0.0.1 -T 0,0 -- -r 1,1&
netperf -p 12866 -t TCP_RR -l 60 -H 127.0.0.1 -T 1,1 -- -r 1,1&
netperf -p 12867 -t TCP_RR -l 60 -H 127.0.0.1 -T 2,2 -- -r 1,1&
netperf -p 12868 -t TCP_RR -l 60 -H 127.0.0.1 -T 3,3 -- -r 1,1&

wait
killall netserver


> git log --pretty=format:"%h: %s" 2069f45..847106f | grep -viE \
> 'block|alsa|pcmcia|sound|Merge|iosched|blk|DAC960|scsi|s390|paride|pktcdvd|filter|cdrom|drm'
> 
> gives us:
> 
>  7daf705: Start using the new '%pS' infrastructure to print symbols
>  6f0f0fd: security: remove register_security hook
>  93cbace: security: remove dummy module fix
>  5915eb5: security: remove dummy module
>  b478a9f: security: remove unused sb_get_mnt_opts hook
>  32502b8: splice: fix generic_file_splice_read() race with page invalidation
>  8b3d356: ramfs: enable splice write
>  a144ff0: xen: Avoid allocations causing swap activity on the resume path
> 
> which really only leaves that security commit your bisection fingered. 
> Which _slightly_ raises its likelyhood of being implicated. Structure 
> size changes can move two formerly far-apart netperf-relevant symbols on 
> the same cacheline, which can start cache ping-pong-ing badly.

I sure hope it's something like ping-pong, it's driving me NUTS.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                 ` <1221657111.5511.14.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-17 13:36                                                   ` Ilpo Järvinen
       [not found]                                                     ` <Pine.LNX.4.64.0809171629550.1034-x/A8LOkYjdVsRR2hCrRKtT03IgOmwywn@public.gmane.org>
  2008-09-17 14:47                                                   ` Eric Dumazet
  1 sibling, 1 reply; 318+ messages in thread
From: Ilpo Järvinen @ 2008-09-17 13:36 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 17 Sep 2008, Mike Galbraith wrote:

> On Wed, 2008-09-17 at 14:49 +0200, Ingo Molnar wrote:
> 
> > git log --pretty=format:"%h: %s" 2069f45..847106f | grep -viE \
> > 'block|alsa|pcmcia|sound|Merge|iosched|blk|DAC960|scsi|s390|paride|pktcdvd|filter|cdrom|drm'
> > 
> > gives us:
> > 
> >  7daf705: Start using the new '%pS' infrastructure to print symbols
> >  6f0f0fd: security: remove register_security hook
> >  93cbace: security: remove dummy module fix
> >  5915eb5: security: remove dummy module
> >  b478a9f: security: remove unused sb_get_mnt_opts hook
> >  32502b8: splice: fix generic_file_splice_read() race with page invalidation
> >  8b3d356: ramfs: enable splice write
> >  a144ff0: xen: Avoid allocations causing swap activity on the resume path
> > 
> > which really only leaves that security commit your bisection fingered. 
> > Which _slightly_ raises its likelyhood of being implicated. Structure 
> > size changes can move two formerly far-apart netperf-relevant symbols on 
> > the same cacheline, which can start cache ping-pong-ing badly.
> 
> I sure hope it's something like ping-pong, it's driving me NUTS.

How about dividing the problem to smaller blocks then by restoring 
parts of the change...


-- 
 i.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                     ` <Pine.LNX.4.64.0809171629550.1034-x/A8LOkYjdVsRR2hCrRKtT03IgOmwywn@public.gmane.org>
@ 2008-09-17 13:57                                                       ` Mike Galbraith
       [not found]                                                         ` <1221659858.8818.13.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-17 13:57 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Ingo Molnar, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 2008-09-17 at 16:36 +0300, Ilpo JÃ¤rvinen wrote:
> On Wed, 17 Sep 2008, Mike Galbraith wrote:
> 
> > On Wed, 2008-09-17 at 14:49 +0200, Ingo Molnar wrote:
> > 
> > > git log --pretty=format:"%h: %s" 2069f45..847106f | grep -viE \
> > > 'block|alsa|pcmcia|sound|Merge|iosched|blk|DAC960|scsi|s390|paride|pktcdvd|filter|cdrom|drm'
> > > 
> > > gives us:
> > > 
> > >  7daf705: Start using the new '%pS' infrastructure to print symbols
> > >  6f0f0fd: security: remove register_security hook
> > >  93cbace: security: remove dummy module fix
> > >  5915eb5: security: remove dummy module
> > >  b478a9f: security: remove unused sb_get_mnt_opts hook
> > >  32502b8: splice: fix generic_file_splice_read() race with page invalidation
> > >  8b3d356: ramfs: enable splice write
> > >  a144ff0: xen: Avoid allocations causing swap activity on the resume path
> > > 
> > > which really only leaves that security commit your bisection fingered. 
> > > Which _slightly_ raises its likelyhood of being implicated. Structure 
> > > size changes can move two formerly far-apart netperf-relevant symbols on 
> > > the same cacheline, which can start cache ping-pong-ing badly.
> > 
> > I sure hope it's something like ping-pong, it's driving me NUTS.
> 
> How about dividing the problem to smaller blocks then by restoring 
> parts of the change...

Well, what I've done is check out the "bad" tree, reverted every darn
commit between there and the "good" tree, and then reverted the reverts
so I have a nice merge-free line and don't have to remember to think
backward.  (probably sounds silly to git-foo masters)  I'll try
bisecting that in the a.m. and see what happens.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                 ` <1221657111.5511.14.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  2008-09-17 13:36                                                   ` Ilpo Järvinen
@ 2008-09-17 14:47                                                   ` Eric Dumazet
       [not found]                                                     ` <48D11871.4090805-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Eric Dumazet @ 2008-09-17 14:47 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Ilpo Järvinen, Christoph Lameter,
	Rafael J. Wysocki, LKML, kernel-testers-u79uwXL29TY76Z2rM5mHXA

Mike Galbraith a écrit :
> On Wed, 2008-09-17 at 14:49 +0200, Ingo Molnar wrote:
>> * Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org> wrote:
>>
>>>> It looks like a potentially bogus bisection result, but _maybe_ it 
>>>> has relevance: changes the size of "struct security_operations", 
>>>> which could have alignment and layout effects on all sorts of kernel 
>>>> variables, kmalloc sizes, etc.
>>> This may well be a mythical creature infestation for all I know ;-), 
>>> but it's address is somewhere in the 2069f45..847106f block, 316 
>>> commits, none of which look like they should be the least bit 
>>> interesting to netperf.  I reverted this particular commit in 27.git, 
>>> got the expected result.  Looks like I'll keep poking at it, can't 
>>> seem to resist.  Grr.
>> are you sure it's 2069f45..847106f? Filtering out the 
>> likely-uninteresting commits:
> 
> Yeah, as sure as I can be.  I've built both (et al) kernels several
> times, and it has repeated every time.  Would be nice if someone would
> try to confirm/deny though.  For my little quad, I do..
> 
> #!/bin/sh
> 
> echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns 
> 
> netserver -p 12865
> netserver -p 12866
> netserver -p 12867
> netserver -p 12868
> 
> netperf -p 12865 -t TCP_RR -l 60 -H 127.0.0.1 -T 0,0 -- -r 1,1&
> netperf -p 12866 -t TCP_RR -l 60 -H 127.0.0.1 -T 1,1 -- -r 1,1&
> netperf -p 12867 -t TCP_RR -l 60 -H 127.0.0.1 -T 2,2 -- -r 1,1&
> netperf -p 12868 -t TCP_RR -l 60 -H 127.0.0.1 -T 3,3 -- -r 1,1&
> 
> wait
> killall netserver
> 
> 
>> git log --pretty=format:"%h: %s" 2069f45..847106f | grep -viE \
>> 'block|alsa|pcmcia|sound|Merge|iosched|blk|DAC960|scsi|s390|paride|pktcdvd|filter|cdrom|drm'
>>
>> gives us:
>>
>>  7daf705: Start using the new '%pS' infrastructure to print symbols
>>  6f0f0fd: security: remove register_security hook
>>  93cbace: security: remove dummy module fix
>>  5915eb5: security: remove dummy module
>>  b478a9f: security: remove unused sb_get_mnt_opts hook
>>  32502b8: splice: fix generic_file_splice_read() race with page invalidation
>>  8b3d356: ramfs: enable splice write
>>  a144ff0: xen: Avoid allocations causing swap activity on the resume path
>>
>> which really only leaves that security commit your bisection fingered. 
>> Which _slightly_ raises its likelyhood of being implicated. Structure 
>> size changes can move two formerly far-apart netperf-relevant symbols on 
>> the same cacheline, which can start cache ping-pong-ing badly.
> 
> I sure hope it's something like ping-pong, it's driving me NUTS.

Could you please try following patch ?

[PATCH] security_ops moved to read_mostly section

"struct security_operations *security_ops" should be moved to read_mostly 
section in order to NOT let it share a cache line with higly modified variables.

Signed-off-by: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>

diff --git a/security/security.c b/security/security.c
index 3a4b4f5..0b13d65 100644
--- a/security/security.c
+++ b/security/security.c
@@ -24,7 +24,7 @@ static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1];
 extern struct security_operations default_security_ops;
 extern void security_fixup_ops(struct security_operations *ops);
 
-struct security_operations *security_ops;	/* Initialized to NULL */
+struct security_operations *security_ops __read_mostly;e
 
 /* amount of vm to protect from userspace access */
 unsigned long mmap_min_addr = CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR;



^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                     ` <48D11871.4090805-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2008-09-17 14:50                                                       ` Eric Dumazet
  2008-09-17 18:16                                                       ` Mike Galbraith
  1 sibling, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-09-17 14:50 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Ilpo Järvinen, Christoph Lameter,
	Rafael J. Wysocki, LKML, kernel-testers-u79uwXL29TY76Z2rM5mHXA

Eric Dumazet a écrit :
> Mike Galbraith a écrit :

>> I sure hope it's something like ping-pong, it's driving me NUTS.
> 
> Could you please try following patch ?
> 
> [PATCH] security_ops moved to read_mostly section
> 
> "struct security_operations *security_ops" should be moved to 
> read_mostly section in order to NOT let it share a cache line with higly 
> modified variables.
> 
> Signed-off-by: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
> 
> diff --git a/security/security.c b/security/security.c
> index 3a4b4f5..0b13d65 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -24,7 +24,7 @@ static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1];
> extern struct security_operations default_security_ops;
> extern void security_fixup_ops(struct security_operations *ops);
> 
> -struct security_operations *security_ops;    /* Initialized to NULL */
> +struct security_operations *security_ops __read_mostly;e

Sorry for the extra 'e' at the end of this line, please remove it :)

> 
> /* amount of vm to protect from userspace access */
> unsigned long mmap_min_addr = CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR;
> 



^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                         ` <1221659858.8818.13.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-17 17:04                                                           ` Ilpo Järvinen
  2008-09-18  7:12                                                           ` Mike Galbraith
  1 sibling, 0 replies; 318+ messages in thread
From: Ilpo Järvinen @ 2008-09-17 17:04 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2319 bytes --]

On Wed, 17 Sep 2008, Mike Galbraith wrote:

> On Wed, 2008-09-17 at 16:36 +0300, Ilpo Järvinen wrote:
> > On Wed, 17 Sep 2008, Mike Galbraith wrote:
> > 
> > > On Wed, 2008-09-17 at 14:49 +0200, Ingo Molnar wrote:
> > > 
> > > > git log --pretty=format:"%h: %s" 2069f45..847106f | grep -viE \
> > > > 'block|alsa|pcmcia|sound|Merge|iosched|blk|DAC960|scsi|s390|paride|pktcdvd|filter|cdrom|drm'
> > > > 
> > > > gives us:
> > > > 
> > > >  7daf705: Start using the new '%pS' infrastructure to print symbols
> > > >  6f0f0fd: security: remove register_security hook
> > > >  93cbace: security: remove dummy module fix
> > > >  5915eb5: security: remove dummy module
> > > >  b478a9f: security: remove unused sb_get_mnt_opts hook
> > > >  32502b8: splice: fix generic_file_splice_read() race with page invalidation
> > > >  8b3d356: ramfs: enable splice write
> > > >  a144ff0: xen: Avoid allocations causing swap activity on the resume path
> > > > 
> > > > which really only leaves that security commit your bisection fingered. 
> > > > Which _slightly_ raises its likelyhood of being implicated. Structure 
> > > > size changes can move two formerly far-apart netperf-relevant symbols on 
> > > > the same cacheline, which can start cache ping-pong-ing badly.
> > > 
> > > I sure hope it's something like ping-pong, it's driving me NUTS.
> > 
> > How about dividing the problem to smaller blocks then by restoring 
> > parts of the change...
> 
> Well, what I've done is check out the "bad" tree, reverted every darn
> commit between there and the "good" tree, and then reverted the reverts
> so I have a nice merge-free line and don't have to remember to think
> backward.  (probably sounds silly to git-foo masters)  I'll try
> bisecting that in the a.m. and see what happens.

This was my initial idea (which was mainly an error from my part as I 
misread some shaids and midunderstood that the first regressing would be 
the merge instead of the actual change), but in here I meant taking parts 
of the 6f0f0fd on top of 6f0f0fd^. The most easiest way to do that 
actually might be to do in fact the opposite, ie., but some of the 
datastructure/layout changes back on top of 6f0f0fd and see if the 
performance get restored (besides testing the Eric's patch).

-- 
 i.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                     ` <48D11871.4090805-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  2008-09-17 14:50                                                       ` Eric Dumazet
@ 2008-09-17 18:16                                                       ` Mike Galbraith
  1 sibling, 0 replies; 318+ messages in thread
From: Mike Galbraith @ 2008-09-17 18:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Ilpo Järvinen, Christoph Lameter,
	Rafael J. Wysocki, LKML, kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 2008-09-17 at 16:47 +0200, Eric Dumazet wrote:

> Could you please try following patch ?
> 
> [PATCH] security_ops moved to read_mostly section
> 
> "struct security_operations *security_ops" should be moved to read_mostly 
> section in order to NOT let it share a cache line with higly modified variables.

v2.6.26-974-g2846693 (tip of revert reverts tree, == 847106f)
16384  87380  1        1       60.00    94350.45
16384  87380  1        1       60.01    95857.25
16384  87380  1        1       60.00    95334.84
16384  87380  1        1       60.00    95052.11

v2.6.26-659-g7804ad8 (first commit prior, == 2069f45)
16384  87380  1        1       60.00    98630.64
16384  87380  1        1       60.00    98653.14
16384  87380  1        1       60.00    99162.65
16384  87380  1        1       60.00    98652.38

v2.6.26-974-g2846693 patched
16384  87380  1        1       60.00    95877.41
16384  87380  1        1       60.00    95810.27
16384  87380  1        1       60.00    95530.03
16384  87380  1        1       60.00    94968.12

(poo, "it" didn't die)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                         ` <1221659858.8818.13.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  2008-09-17 17:04                                                           ` Ilpo Järvinen
@ 2008-09-18  7:12                                                           ` Mike Galbraith
       [not found]                                                             ` <1221721970.5003.9.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-18  7:12 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Ingo Molnar, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, 2008-09-17 at 15:57 +0200, Mike Galbraith wrote:
> On Wed, 2008-09-17 at 16:36 +0300, Ilpo JÃ¤rvinen wrote:
> > On Wed, 17 Sep 2008, Mike Galbraith wrote:
> > 
> > > On Wed, 2008-09-17 at 14:49 +0200, Ingo Molnar wrote:
> > > 
> > > > git log --pretty=format:"%h: %s" 2069f45..847106f | grep -viE \
> > > > 'block|alsa|pcmcia|sound|Merge|iosched|blk|DAC960|scsi|s390|paride|pktcdvd|filter|cdrom|drm'
> > > > 
> > > > gives us:
> > > > 
> > > >  7daf705: Start using the new '%pS' infrastructure to print symbols
> > > >  6f0f0fd: security: remove register_security hook
> > > >  93cbace: security: remove dummy module fix
> > > >  5915eb5: security: remove dummy module
> > > >  b478a9f: security: remove unused sb_get_mnt_opts hook
> > > >  32502b8: splice: fix generic_file_splice_read() race with page invalidation
> > > >  8b3d356: ramfs: enable splice write
> > > >  a144ff0: xen: Avoid allocations causing swap activity on the resume path
> > > > 
> > > > which really only leaves that security commit your bisection fingered. 
> > > > Which _slightly_ raises its likelyhood of being implicated. Structure 
> > > > size changes can move two formerly far-apart netperf-relevant symbols on 
> > > > the same cacheline, which can start cache ping-pong-ing badly.
> > > 
> > > I sure hope it's something like ping-pong, it's driving me NUTS.
> > 
> > How about dividing the problem to smaller blocks then by restoring 
> > parts of the change...
> 
> Well, what I've done is check out the "bad" tree, reverted every darn
> commit between there and the "good" tree, and then reverted the reverts
> so I have a nice merge-free line and don't have to remember to think
> backward.  (probably sounds silly to git-foo masters)  I'll try
> bisecting that in the a.m. and see what happens.

It bisected to 1c9ce52.  Reverting that in 27.git had the expected
result, nada.  Full bisection/test log below - you can jump straight to
post run sanity checks.  I'm torn between building a straight line tree
from v2.6.26 through git.today and bisecting that sucker, or exorcising
netperf from my box and swearing a sacred oath to never download the
damned thing again.  Nuking netperf is most attractive option.

1e65e841bb5584136ed6047c55cf77532afbbb55 is first bad commit
commit 1e65e841bb5584136ed6047c55cf77532afbbb55
Author: Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org>
Date:   Wed Sep 17 14:55:50 2008 +0200

    Revert "Revert "block: export "ro" attribute""

    This reverts commit 2c8803af5c1bf41200167f29349f7f1396683a51.

:040000 040000 08ca8ba7ff3f9506a5462b4122256356cae28ceb bef679485bc924ad1dc867858ebda1b68196b5a8 M      block


git bisect start
# good: [7804ad865f7d0cd9bdc51da601772ce4d2e252ca] Revert "[ALSA] soc - tlv320aic3x - revisit clock setup"
git bisect good 7804ad865f7d0cd9bdc51da601772ce4d2e252ca
# bad: [2846693a63a34ac6d582dd55a7e00605b49b1cec] Revert "Revert "[S390] sclp_tty: Fix scheduling while atomic bug.""
git bisect bad 2846693a63a34ac6d582dd55a7e00605b49b1cec
# bad: [70477f86f63640be2dd1d8968aeb47870a5c21c6] Revert "Revert "xen/blkfront: Make sure we don't use bounce buffers, we don't need them.""
git bisect bad 70477f86f63640be2dd1d8968aeb47870a5c21c6
# good: [7c6ccb520424939deff0a50f3fae621c6477dbbe] Revert "Revert "ALSA: hda - Add bdl_pos_adj option""
git bisect good 7c6ccb520424939deff0a50f3fae621c6477dbbe
# good: [66036beae94f043f99044e285935486126a9c4bd] Revert "Revert "pcmcia: fix Alchemy warnings""
git bisect good 66036beae94f043f99044e285935486126a9c4bd
# good: [6b7b5ef18871c8f7c15eedc6eb53270eaf8bc613] Revert "Revert "ALSA: hda - Added SSID for 'Fujitsu Siemens Amilo M1451G' laptop""
git bisect good 6b7b5ef18871c8f7c15eedc6eb53270eaf8bc613
# good: [024905ea4b2d1ab5d6b845ba84ddfc0857fb2d2a] Revert "Revert "ALSA: hda - Add MacBook 3.1 support""
git bisect good 024905ea4b2d1ab5d6b845ba84ddfc0857fb2d2a
# good: [4bbe3501e06eddaa4900894efa519f1167cb8624] Revert "Revert "as-iosched: properly protect ioc_gone and ioc count""
git bisect good 4bbe3501e06eddaa4900894efa519f1167cb8624
# good: [e124683c1cd5c73c494f9ea6cee8a71e68bb70ca] Revert "Revert "Added in user-injected messages into blk traces""
git bisect good e124683c1cd5c73c494f9ea6cee8a71e68bb70ca
# bad: [f148fae0bc009aa23122c5762368a1a16bb55b86] Revert "Revert "block: kill request_queue_t""
git bisect bad f148fae0bc009aa23122c5762368a1a16bb55b86
# bad: [1e65e841bb5584136ed6047c55cf77532afbbb55] Revert "Revert "block: export "ro" attribute""
git bisect bad 1e65e841bb5584136ed6047c55cf77532afbbb55


v2.6.26-974-g2846693 (847106f)
16384  87380  1        1       60.00    94350.45
16384  87380  1        1       60.01    95857.25
16384  87380  1        1       60.00    95334.84
16384  87380  1        1       60.00    95052.11

v2.6.26-659-g7804ad8 (2069f45)
16384  87380  1        1       60.00    98630.64
16384  87380  1        1       60.00    98653.14
16384  87380  1        1       60.00    99162.65
16384  87380  1        1       60.00    98652.38

v2.6.26-816-g70477f8 (call it bad)
16384  87380  1        1       60.00    95532.19
16384  87380  1        1       60.01    96211.39
16384  87380  1        1       60.01    96246.73
16384  87380  1        1       60.01    96286.40

v2.6.26-737-g7c6ccb5
16384  87380  1        1       60.00    98478.00
16384  87380  1        1       60.00    99221.33
16384  87380  1        1       60.00    98930.70
16384  87380  1        1       60.00    98958.73

v2.6.26-776-g66036be
16384  87380  1        1       60.00    97958.10
16384  87380  1        1       60.01    98683.80
16384  87380  1        1       60.01    98515.34
16384  87380  1        1       60.00    98396.11

v2.6.26-796-g6b7b5ef
16384  87380  1        1       60.00    99047.21
16384  87380  1        1       60.00    98095.23
16384  87380  1        1       60.00    99811.18
16384  87380  1        1       60.00    98651.32

v2.6.26-806-g024905e
16384  87380  1        1       60.00    98823.46
16384  87380  1        1       60.00    98959.11
16384  87380  1        1       60.00    98709.95
16384  87380  1        1       60.00    99042.13

v2.6.26-811-g4bbe350
16384  87380  1        1       60.00    98144.99
16384  87380  1        1       60.01    99023.30
16384  87380  1        1       60.01    98685.45
16384  87380  1        1       60.01    98606.18

v2.6.26-813-ge124683
16384  87380  1        1       60.00    98458.18
16384  87380  1        1       60.00    98163.92
16384  87380  1        1       60.00    98115.62
16384  87380  1        1       60.00    98633.62

v2.6.26-815-gf148fae
16384  87380  1        1       60.00    95649.91
16384  87380  1        1       60.00    96292.34
16384  87380  1        1       60.00    96043.82
16384  87380  1        1       60.00    96093.81

v2.6.26-814-g1e65e84
16384  87380  1        1       60.00    94906.81
16384  87380  1        1       60.00    95445.05
16384  87380  1        1       60.01    94698.68
16384  87380  1        1       60.00    94938.65

Post bisection checkouts
------------------------------------------------
v2.6.26-rc8-208-g02c6230
16384  87380  1        1       60.00    98392.94
16384  87380  1        1       60.00    98199.96
16384  87380  1        1       60.00    98534.27
16384  87380  1        1       60.00    98501.02

v2.6.26-rc8-209-g1c9ce52
16384  87380  1        1       60.00    97583.88
16384  87380  1        1       60.00    97326.23
16384  87380  1        1       60.00    97582.80
16384  87380  1        1       60.00    97568.63

v2.6.26-814-g1e65e84
16384  87380  1        1       60.00    94856.33
16384  87380  1        1       60.00    94594.03
16384  87380  1        1       60.00    94751.74
16384  87380  1        1       60.01    96825.28

v2.6.26-813-ge124683
16384  87380  1        1       60.00    97550.64
16384  87380  1        1       60.00    98024.28
16384  87380  1        1       60.00    98486.85
16384  87380  1        1       60.00    98493.41


marge:..git/linux-2.6 # git rev-list v2.6.26-813-ge124683..v2.6.26-814-g1e65e84
1e65e841bb5584136ed6047c55cf77532afbbb55
marge:..git/linux-2.6 # git show 1e65e841bb5584136ed6047c55cf77532afbbb55

commit 1e65e841bb5584136ed6047c55cf77532afbbb55
Author: Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org>
Date:   Wed Sep 17 14:55:50 2008 +0200

    Revert "Revert "block: export "ro" attribute""
    
    This reverts commit 2c8803af5c1bf41200167f29349f7f1396683a51.

diff --git a/block/genhd.c b/block/genhd.c
index b922d48..43e468e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -400,6 +400,14 @@ static ssize_t disk_removable_show(struct device *dev,
 		       (disk->flags & GENHD_FL_REMOVABLE ? 1 : 0));
 }
 
+static ssize_t disk_ro_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+
+	return sprintf(buf, "%d\n", disk->policy ? 1 : 0);
+}
+
 static ssize_t disk_size_show(struct device *dev,
 			      struct device_attribute *attr, char *buf)
 {
@@ -472,6 +480,7 @@ static ssize_t disk_fail_store(struct device *dev,
 
 static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL);
 static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL);
+static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL);
 static DEVICE_ATTR(size, S_IRUGO, disk_size_show, NULL);
 static DEVICE_ATTR(capability, S_IRUGO, disk_capability_show, NULL);
 static DEVICE_ATTR(stat, S_IRUGO, disk_stat_show, NULL);
@@ -483,6 +492,7 @@ static struct device_attribute dev_attr_fail =
 static struct attribute *disk_attrs[] = {
 	&dev_attr_range.attr,
 	&dev_attr_removable.attr,
+	&dev_attr_ro.attr,
 	&dev_attr_size.attr,
 	&dev_attr_capability.attr,
 	&dev_attr_stat.attr,


^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                             ` <1221721970.5003.9.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-18  7:25                                                               ` Mike Galbraith
       [not found]                                                                 ` <1221722733.5149.6.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Galbraith @ 2008-09-18  7:25 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Ingo Molnar, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Thu, 2008-09-18 at 09:12 +0200, Mike Galbraith wrote:

> 1e65e841bb5584136ed6047c55cf77532afbbb55 is first bad commit
> commit 1e65e841bb5584136ed6047c55cf77532afbbb55
> Author: Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org>
> Date:   Wed Sep 17 14:55:50 2008 +0200
> 
>     Revert "Revert "block: export "ro" attribute""
> 
>     This reverts commit 2c8803af5c1bf41200167f29349f7f1396683a51.

BTW, the reason it's revert revert is that I reverse bisected the revert
tree yesterday, and it emitted the same darn result.  I immediately said
"yeah right, ya screwed up", and created the revert revert tree to make
sure I couldn't fumble negation.

	-Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                                 ` <1221722733.5149.6.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
@ 2008-09-18  7:58                                                                   ` Ilpo Järvinen
  0 siblings, 0 replies; 318+ messages in thread
From: Ilpo Järvinen @ 2008-09-18  7:58 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Christoph Lameter, Rafael J. Wysocki, LKML,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Thu, 18 Sep 2008, Mike Galbraith wrote:

> On Thu, 2008-09-18 at 09:12 +0200, Mike Galbraith wrote:
> 
> > 1e65e841bb5584136ed6047c55cf77532afbbb55 is first bad commit
> > commit 1e65e841bb5584136ed6047c55cf77532afbbb55
> > Author: Mike Galbraith <efault-Mmb7MZpHnFY@public.gmane.org>
> > Date:   Wed Sep 17 14:55:50 2008 +0200
> > 
> >     Revert "Revert "block: export "ro" attribute""
> > 
> >     This reverts commit 2c8803af5c1bf41200167f29349f7f1396683a51.
> 
> BTW, the reason it's revert revert is that I reverse bisected the revert
> tree yesterday, and it emitted the same darn result.  I immediately said
> "yeah right, ya screwed up", and created the revert revert tree to make
> sure I couldn't fumble negation.

:-)

gcc compiling something slightly differently would be a nice theory but 
it sort of breaks down now as this commit touches only one .c file...
In the past when I did some static inline .h -> .c uninline sizing tests
I noticed that some changes happened also in places which should have been 
quite much unrelated. Though all the changes were minor anyway (in the 
places I did look), e.g., routed the conditional paths slightly 
differently and added one xor clear reg.

-- 
 i.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-09-21 18:52 2.6.27-rc6-git6: Reported regressions from 2.6.26 Rafael J. Wysocki
@ 2008-09-21 18:54 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-09-21 18:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (42 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                             ` <alpine.LFD.1.10.0808260939070.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-08-26 17:08                                                               ` Yinghai Lu
@ 2008-09-25  1:50                                                               ` Rusty Russell
       [not found]                                                                 ` <200809251150.26760.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Rusty Russell @ 2008-09-25  1:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yinghai Lu, Ingo Molnar, David Miller, Alan.Brunelle-VXdhtT5mjnY,
	travis-sJ/iWh9BUns, tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	Linux Kernel Mailing List, kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

On Wednesday 27 August 2008 02:51:46 Linus Torvalds wrote:
> On Tue, 26 Aug 2008, Yinghai Lu wrote:
> > wonder if could use "unsigned long *" directly.
>
> I would actually suggest something like this:
>
>  - we continue to have a magic "cpumask_t".
>
>  - we do different cases for big and small NR_CPUS:
>
> 	#if NR_CPUS <= BITS_PER_LONG
>
> 	/*
> 	 * Make it an array - that way passing it as an argument will
> 	 * always pass it as a pointer!
> 	 */
> 	typedef unsigned long cpumask_t[1];
>
> 	static inline void create_cpumask(cpumask_t *p)
> 	{
> 		*p = 0;
> 	}
> 	static inline void free_cpumask(cpumask_t *p)
> 	{
> 	}
>
> 	#else
>
> 	typedef unsigned long *cpumask_t;
>
> 	static inline void create_cpumask(cpumask_t *p)
> 	{
> 		*p = kcalloc(..);
> 	}
>
> 	static inline void free_cpumask(cpumask_t *p)
> 	{
> 		kfree(*p);
> 	}
>
> 	#endif
>
> and now after you do this, you can just do something like
>
> 	cpumask_t mycpu;
>
> 	create_cpumask(&mycpu);
> 	..
> 	free_cpumask(&mycpu);
>
> and in between, you can use 'cpumask' as a pointer, because even when it
> is an array directly allocated on the stack, the array can always
> degenerate into a pointer by C type rules!

Hi Linus,

    This turns out to be awful in practice, mainly due to const.  Consider:

	#ifdef CONFIG_CPUMASK_OFFSTACK
	typedef unsigned long *cpumask_t;
	#else
	typedef unsigned long cpumask_t[1];
	#endif

	cpumask_t returns_cpumask(void);

That's obviously illegal if cpumask_t is an array.  So we need a typedef which 
says "really always a pointer".

	typedef unsigned long *cpumask_return_t;
	cpumask_return_t returns_cpumask(void);

But we usually want it to return a const ptr, and this doesn't work:

	const cpumask_return_t returns_cpumask(void);
	foo.c:12: warning: type qualifiers ignored on function return type

So now we need:
	typedef const unsigned long *cpumask_const_return_t;
	cpumask_const_return_t returns_cpumask(void);

OK, now consider a function which wants to take a const cpu bitmap:

	void cpus_copy(cpumask_t dst, const cpumask_t src);
	...
	cpus_copy(cpus, returns_cpumask());
	foo.c:34: warning: passing argument 2 of ‘cpus_copy’ discards qualifiers from 
pointer target type

Oops, that didn't work with the pointer version.  So we need another typedef:
	#ifdef CONFIG_CPUMASK_OFFSTACK
	typedef const unsigned long *cpumask_const_t;
	#else
	typedef const unsigned long cpumask_const_t[1];
	#endif

	void cpus_copy(cpumask_t dst, cpumask_const_t src);

We end up with this:
	#ifdef CONFIG_CPUMASK_OFFSTACK
	typedef unsigned long *cpumask_t;
	typedef const unsigned long *cpumask_const_t;
	#else
	typedef unsigned long cpumask_t[1];
	typedef const unsigned long cpumask_const_t[1];
	#endif
	typedef unsigned long *cpumask_return_t;
	typedef const unsigned long *cpumask_const_return_t;
	typedef unsigned long cpumask_data_t[1];

I can't see a neater way down this path, and I don't want to lose const.

I can see three alternatives:
1) An ONSTACK_CPUMASK(name) macro which declares "struct cpumask name[1]" or
   "struct cpumask *name".  Same idea as yours, without the typedef.
2) Use a normal struct for cpumask, make everyone use pointers, but have an
   struct cpumask *alloc_stack_cpumask() which uses alloca() for small
   NR_CPUS.
3) Same, but just use kmalloc everywhere.  Optimize important cases by hand.

Anyone see a better way?
Rusty.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                 ` <200809251150.26760.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
@ 2008-09-25  8:55                                                                   ` Ingo Molnar
  2008-09-25 15:42                                                                   ` Linus Torvalds
  1 sibling, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-09-25  8:55 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, travis-sJ/iWh9BUns,
	tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	Linux Kernel Mailing List, kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner


* Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org> wrote:

> I can't see a neater way down this path, and I don't want to lose 
> const.
> 
> I can see three alternatives:
> 1) An ONSTACK_CPUMASK(name) macro which declares "struct cpumask name[1]" or
>    "struct cpumask *name".  Same idea as yours, without the typedef.
> 2) Use a normal struct for cpumask, make everyone use pointers, but have an
>    struct cpumask *alloc_stack_cpumask() which uses alloca() for small
>    NR_CPUS.
> 3) Same, but just use kmalloc everywhere.  Optimize important cases by hand.
> 
> Anyone see a better way?

since most of the important cpumasks in high-perf codepaths are already 
pre-constructed and embedded in some existing object (say task_struct), 
i think a variant of #3 is the best approach:

 - get rid of cpumask_t and use 'struct cpumask' everywhere.

 - do not expose normal kernel code to struct cpumask's definition, only
   declare the type via 'struct cpumask;' in sched.h - a'la 
   kmem_cache_t.

 - even hide the structure from sched.h - use an extra indirection for 
   struct cpumask *cpus_allowed in task_struct and be done with it.

 - have normal cpumask object alloc/free codepaths.

 - optimize any remaining important cases by hand, if needed. (the 
   scheduler mostly)

(wrt. #2, alloca() is a nightmare i think.)

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                 ` <200809251150.26760.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
  2008-09-25  8:55                                                                   ` Ingo Molnar
@ 2008-09-25 15:42                                                                   ` Linus Torvalds
       [not found]                                                                     ` <alpine.LFD.1.10.0809250836270.3265-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-09-25 15:42 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Yinghai Lu, Ingo Molnar, David Miller, Alan.Brunelle-VXdhtT5mjnY,
	travis-sJ/iWh9BUns, tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	Linux Kernel Mailing List, kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

On Thu, 25 Sep 2008, Rusty Russell wrote:
> 
>     This turns out to be awful in practice, mainly due to const.  Consider:
> 
> 	#ifdef CONFIG_CPUMASK_OFFSTACK
> 	typedef unsigned long *cpumask_t;
> 	#else
> 	typedef unsigned long cpumask_t[1];
> 	#endif
> 
> 	cpumask_t returns_cpumask(void);

No. That's already broken. You cannot return a cpumask_t, regardless of 
interface. We must not do it regardless of how we pass those things 
around, since it generates _yet_ another temporary on the stack for the 
return slot for any kind of structure.

So all cpumask functions should always return pointers and/or take 
pointers to be filled in. That's true *regardless* of how we actually are 
to then allocate them.

So forget returning cpumasks. It's irrelevant.

What _is_ relevant is how we allocate them when we need temporary CPU 
masks. And _that_ is where my suggestion comes in. For small NR_CPUS, we 
really do want to allocate them on the stack, because calling kmalloc for 
a 4- or 8-byte allocation is just _stupid_.

So all your arguments are invalid, because you're looking at the wrong 
thing. The thing that I was talking about is converting current code that 
has

   random_function(..)
   {
	cpumask_t mask;

	.. do something with mask ...
   }

which has to be converted some way. And I think it needs to be converted 
in a way that does *not* force us to call kmalloc() for idiotically small 
values.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Subject: [RFC 1/1] cpumask: Provide new cpumask API
       [not found]                                                                     ` <alpine.LFD.1.10.0809250836270.3265-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-09-25 20:59                                                                       ` Mike Travis
  2008-09-26  5:25                                                                       ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rusty Russell
  1 sibling, 0 replies; 318+ messages in thread
From: Mike Travis @ 2008-09-25 20:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rusty Russell, Yinghai Lu, Ingo Molnar, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

Linus Torvalds wrote:
> 
> On Thu, 25 Sep 2008, Rusty Russell wrote:
>>     This turns out to be awful in practice, mainly due to const.  Consider:
>>
>> 	#ifdef CONFIG_CPUMASK_OFFSTACK
>> 	typedef unsigned long *cpumask_t;
>> 	#else
>> 	typedef unsigned long cpumask_t[1];
>> 	#endif
>>
>> 	cpumask_t returns_cpumask(void);
> 
> No. That's already broken. You cannot return a cpumask_t, regardless of 
> interface. We must not do it regardless of how we pass those things 
> around, since it generates _yet_ another temporary on the stack for the 
> return slot for any kind of structure.
> 
> So all cpumask functions should always return pointers and/or take 
> pointers to be filled in. That's true *regardless* of how we actually are 
> to then allocate them.
> 
> So forget returning cpumasks. It's irrelevant.
> 
> What _is_ relevant is how we allocate them when we need temporary CPU 
> masks. And _that_ is where my suggestion comes in. For small NR_CPUS, we 
> really do want to allocate them on the stack, because calling kmalloc for 
> a 4- or 8-byte allocation is just _stupid_.
> 
> So all your arguments are invalid, because you're looking at the wrong 
> thing. The thing that I was talking about is converting current code that 
> has
> 
>    random_function(..)
>    {
> 	cpumask_t mask;
> 
> 	.. do something with mask ...
>    }
> 
> which has to be converted some way. And I think it needs to be converted 
> in a way that does *not* force us to call kmalloc() for idiotically small 
> values.
> 
> 			Linus


Subject: [RFC 1/1] cpumask: Provide new cpumask API

Provide new cpumask interface API.  The relevant change is basically
cpumask becomes an opaque object.  I believe this results in the
minimum amount of editing while still allowing the inline cpumask
functions, and the ability to declare static cpumask objects.


    /* raw declaration */
    struct __cpumask_data_s { DECLARE_BITMAP(bits, NR_CPUS); };

    /* cpumask_map_t used for declaring static cpumask maps */
    typedef struct __cpumask_data_s cpumask_map_t[1];

    /* cpumask_t used for function args and return pointers */
    typedef struct __cpumask_data_s *cpumask_t;

    /* cpumask_var_t used for local variable */
    typedef struct __cpumask_data_s	cpumask_var_t[1]; /* SMALL NR_CPUS */
    typedef struct __cpumask_data_s	*cpumask_var_t;	  /* LARGE NR_CPUS */

    /* replaces cpumask_t dst = (cpumask_t)src */
    void cpus_copy(cpumask_t dst, const cpumask_t src);

Remove the '*' indirection in all references to cpumask_t objects.  You can
change the reference to the cpumask object but not the cpumask object itself
without using the functions that operate on cpumask objects (f.e. the cpu_*
operators).  Functions can return a cpumask_t (which is a pointer to the
cpumask object) and only be passed a cpumask_t.

All uses of cpumask_t on the stack are changed to be cpumask_var_t. 
Allocation of local cpumask objects will follow...

All cpumask operators now operate using nr_cpu_ids instead of NR_CPUS.  All
variants of the cpumask operators which used nr_cpu_ids instead of NR_CPUS
are deleted.

All variants of functions which use the (old cpumask_t *) pointer are deleted
(f.e. set_cpus_allowed_ptr()).

Based on code from Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org> (THANKS!!)

Signed-of-by: Mike Travis <travis-sJ/iWh9BUns@public.gmane.org>

---
 include/linux/cpumask.h |  340 ++++++++++++++++++++++++------------------------
 1 file changed, 174 insertions(+), 166 deletions(-)

--- struct-cpumasks.orig/include/linux/cpumask.h
+++ struct-cpumasks/include/linux/cpumask.h
@@ -3,7 +3,8 @@
 
 /*
  * Cpumasks provide a bitmap suitable for representing the
- * set of CPU's in a system, one bit position per CPU number.
+ * set of CPU's in a system, one bit position per CPU number up to
+ * nr_cpu_ids (<= NR_CPUS).
  *
  * See detailed comments in the file linux/bitmap.h describing the
  * data type on which these cpumasks are based.
@@ -18,18 +19,6 @@
  * For details of cpus_fold(), see bitmap_fold in lib/bitmap.c.
  *
  * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- * Note: The alternate operations with the suffix "_nr" are used
- *       to limit the range of the loop to nr_cpu_ids instead of
- *       NR_CPUS when NR_CPUS > 64 for performance reasons.
- *       If NR_CPUS is <= 64 then most assembler bitmask
- *       operators execute faster with a constant range, so
- *       the operator will continue to use NR_CPUS.
- *
- *       Another consideration is that nr_cpu_ids is initialized
- *       to NR_CPUS and isn't lowered until the possible cpus are
- *       discovered (including any disabled cpus).  So early uses
- *       will span the entire range of NR_CPUS.
- * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  *
  * The available cpumask operations are:
  *
@@ -37,6 +26,7 @@
  * void cpu_clear(cpu, mask)		turn off bit 'cpu' in mask
  * void cpus_setall(mask)		set all bits
  * void cpus_clear(mask)		clear all bits
+ * void cpus_copy(dst, src)		copies cpumask bits from src to dst
  * int cpu_isset(cpu, mask)		true iff bit 'cpu' set in mask
  * int cpu_test_and_set(cpu, mask)	test and set bit 'cpu' in mask
  *
@@ -52,17 +42,17 @@
  * int cpus_empty(mask)			Is mask empty (no bits sets)?
  * int cpus_full(mask)			Is mask full (all bits sets)?
  * int cpus_weight(mask)		Hamming weigh - number of set bits
- * int cpus_weight_nr(mask)		Same using nr_cpu_ids instead of NR_CPUS
  *
  * void cpus_shift_right(dst, src, n)	Shift right
  * void cpus_shift_left(dst, src, n)	Shift left
  *
- * int first_cpu(mask)			Number lowest set bit, or NR_CPUS
- * int next_cpu(cpu, mask)		Next cpu past 'cpu', or NR_CPUS
- * int next_cpu_nr(cpu, mask)		Next cpu past 'cpu', or nr_cpu_ids
+ * int first_cpu(mask)			Number lowest set bit, or nr_cpu_ids
+ * int next_cpu(cpu, mask)		Next cpu past 'cpu', or nr_cpu_ids
+ *
+ * cpumask_t cpumask_of_cpu(cpu)	Return pointer to cpumask with bit
+ *					'cpu' set
  *
- * cpumask_t cpumask_of_cpu(cpu)	Return cpumask with bit 'cpu' set
- *					(can be used as an lvalue)
+ * cpu_mask_all				cpumask_map_t of all bits set
  * CPU_MASK_ALL				Initializer - all bits set
  * CPU_MASK_NONE			Initializer - no bits set
  * unsigned long *cpus_addr(mask)	Array of unsigned long's in mask
@@ -76,8 +66,7 @@
  * void cpus_onto(dst, orig, relmap)	*dst = orig relative to relmap
  * void cpus_fold(dst, orig, sz)	dst bits = orig bits mod sz
  *
- * for_each_cpu_mask(cpu, mask)		for-loop cpu over mask using NR_CPUS
- * for_each_cpu_mask_nr(cpu, mask)	for-loop cpu over mask using nr_cpu_ids
+ * for_each_cpu_mask(cpu, mask)		for-loop cpu over mask
  *
  * int num_online_cpus()		Number of online CPUs
  * int num_possible_cpus()		Number of all possible CPUs
@@ -107,129 +96,209 @@
 #include <linux/threads.h>
 #include <linux/bitmap.h>
 
-typedef struct { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
-extern cpumask_t _unused_cpumask_arg_;
+/* raw declaration */
+struct __cpumask_data_s { DECLARE_BITMAP(bits, NR_CPUS); };
+
+/* cpumask_map_t used for declaring static cpumask maps */
+typedef struct __cpumask_data_s cpumask_map_t[1];
+
+/* cpumask_t used for function args and return pointers */
+typedef struct __cpumask_data_s *cpumask_t;
+
+/* cpumask_var_t used for local variable, definition follows */
+
+#if NR_CPUS == 1
+
+/* cpumask_var_t used for local variable */
+typedef struct __cpumask_data_s	cpumask_var_t[1];
+
+#define nr_cpu_ids		1
+#define first_cpu(src)		({ (void)(src); 0; })
+#define next_cpu(n, src)	({ (void)(src); 1; })
+#define any_online_cpu(mask)	0
+#define for_each_cpu_mask(cpu, mask)	\
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
+
+#define num_online_cpus()	1
+#define num_possible_cpus()	1
+#define num_present_cpus()	1
+#define cpu_online(cpu)		((cpu) == 0)
+#define cpu_possible(cpu)	((cpu) == 0)
+#define cpu_present(cpu)	((cpu) == 0)
+#define cpu_active(cpu)		((cpu) == 0)
+
+#else /* ... NR_CPUS > 1 */
+
+#ifdef CONFIG_CPUMASKS_ONSTACK
+
+/* Constant is usually more efficient than a variable for small NR_CPUS */
+#define nr_cpu_ids		NR_CPUS
+typedef struct __cpumask_data_s	cpumask_var_t[1];
+static inline int cpumask_size(void)
+{
+	return sizeof(struct __cpumask_data_s);
+}
+
+#else
+
+/* Starts at NR_CPUS until acpi code discovers actual number. */
+extern int nr_cpu_ids;
+typedef struct __cpumask_data_s	*cpumask_var_t;
+static inline int cpumask_size(void)
+{
+	return sizeof BITS_TO_LONGS(nr_cpu_ids) * sizeof(long);
+}
+
+#endif /* NR_CPUS > BITS_PER_LONG */
+
+int __first_cpu(const cpumask_t srcp);
+int __next_cpu(int n, const cpumask_t srcp);
+int __any_online_cpu(const cpumask_t mask);
+
+#define first_cpu(src)		__first_cpu((src))
+#define next_cpu(n, src)	__next_cpu((n), (src))
+#define any_online_cpu(mask) __any_online_cpu((mask))
+
+#define for_each_cpu_mask(cpu, mask)			\
+	for ((cpu) = -1;				\
+		(cpu) = next_cpu((cpu), (mask)),	\
+		(cpu) < nr_cpu_ids; )
+
+#define num_online_cpus()	cpus_weight(cpu_online_map)
+#define num_possible_cpus()	cpus_weight(cpu_possible_map)
+#define num_present_cpus()	cpus_weight(cpu_present_map)
+#define cpu_online(cpu)		cpu_isset((cpu), cpu_online_map)
+#define cpu_possible(cpu)	cpu_isset((cpu), cpu_possible_map)
+#define cpu_present(cpu)	cpu_isset((cpu), cpu_present_map)
+#define cpu_active(cpu)		cpu_isset((cpu), cpu_active_map)
+#endif /* NR_CPUS > 1 */
 
-#define cpu_set(cpu, dst) __cpu_set((cpu), &(dst))
-static inline void __cpu_set(int cpu, volatile cpumask_t *dstp)
+#define cpu_set(cpu, dst) __cpu_set((cpu), (dst))
+static inline void __cpu_set(int cpu, volatile cpumask_t dstp)
 {
 	set_bit(cpu, dstp->bits);
 }
 
-#define cpu_clear(cpu, dst) __cpu_clear((cpu), &(dst))
-static inline void __cpu_clear(int cpu, volatile cpumask_t *dstp)
+#define cpu_clear(cpu, dst) __cpu_clear((cpu), (dst))
+static inline void __cpu_clear(int cpu, volatile cpumask_t dstp)
 {
 	clear_bit(cpu, dstp->bits);
 }
 
-#define cpus_setall(dst) __cpus_setall(&(dst), NR_CPUS)
-static inline void __cpus_setall(cpumask_t *dstp, int nbits)
+#define cpus_setall(dst) __cpus_setall((dst), nr_cpu_ids)
+static inline void __cpus_setall(cpumask_t dstp, int nbits)
 {
 	bitmap_fill(dstp->bits, nbits);
 }
 
-#define cpus_clear(dst) __cpus_clear(&(dst), NR_CPUS)
-static inline void __cpus_clear(cpumask_t *dstp, int nbits)
+#define cpus_clear(dst) __cpus_clear((dst), nr_cpu_ids)
+static inline void __cpus_clear(cpumask_t dstp, int nbits)
 {
 	bitmap_zero(dstp->bits, nbits);
 }
 
+#define cpus_copy(dst, src) __cpus_copy((dst), (src), nr_cpu_ids)
+static inline void __cpus_copy(cpumask_t dstp, const cpumask_t srcp, int nbits)
+{
+	bitmap_copy(dstp->bits, srcp->bits, nbits);
+}
+
 /* No static inline type checking - see Subtlety (1) above. */
-#define cpu_isset(cpu, cpumask) test_bit((cpu), (cpumask).bits)
+#define cpu_isset(cpu, cpumask) test_bit((cpu), (cpumask)->bits)
 
-#define cpu_test_and_set(cpu, cpumask) __cpu_test_and_set((cpu), &(cpumask))
-static inline int __cpu_test_and_set(int cpu, cpumask_t *addr)
+#define cpu_test_and_set(cpu, cpumask) __cpu_test_and_set((cpu), (cpumask))
+static inline int __cpu_test_and_set(int cpu, cpumask_t addr)
 {
 	return test_and_set_bit(cpu, addr->bits);
 }
 
-#define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_and(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_and(dst, src1, src2) __cpus_and((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_and(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_or(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_or(dst, src1, src2) __cpus_or((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_or(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_xor(dst, src1, src2) __cpus_xor((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_xor(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_xor(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
 #define cpus_andnot(dst, src1, src2) \
-				__cpus_andnot(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_andnot(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+				__cpus_andnot((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_andnot(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_complement(dst, src) __cpus_complement(&(dst), &(src), NR_CPUS)
-static inline void __cpus_complement(cpumask_t *dstp,
-					const cpumask_t *srcp, int nbits)
+#define cpus_complement(dst, src) __cpus_complement((dst), (src), nr_cpu_ids)
+static inline void __cpus_complement(cpumask_t dstp,
+					const cpumask_t srcp, int nbits)
 {
 	bitmap_complement(dstp->bits, srcp->bits, nbits);
 }
 
-#define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_equal(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_equal(src1, src2) __cpus_equal((src1), (src2), nr_cpu_ids)
+static inline int __cpus_equal(const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	return bitmap_equal(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_intersects(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_intersects(src1, src2) __cpus_intersects((src1), (src2), nr_cpu_ids)
+static inline int __cpus_intersects(const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	return bitmap_intersects(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_subset(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_subset(src1, src2) __cpus_subset((src1), (src2), nr_cpu_ids)
+static inline int __cpus_subset(const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	return bitmap_subset(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_empty(src) __cpus_empty(&(src), NR_CPUS)
-static inline int __cpus_empty(const cpumask_t *srcp, int nbits)
+#define cpus_empty(src) __cpus_empty((src), nr_cpu_ids)
+static inline int __cpus_empty(const cpumask_t srcp, int nbits)
 {
 	return bitmap_empty(srcp->bits, nbits);
 }
 
-#define cpus_full(cpumask) __cpus_full(&(cpumask), NR_CPUS)
-static inline int __cpus_full(const cpumask_t *srcp, int nbits)
+#define cpus_full(cpumask) __cpus_full((cpumask), nr_cpu_ids)
+static inline int __cpus_full(const cpumask_t srcp, int nbits)
 {
 	return bitmap_full(srcp->bits, nbits);
 }
 
-#define cpus_weight(cpumask) __cpus_weight(&(cpumask), NR_CPUS)
-static inline int __cpus_weight(const cpumask_t *srcp, int nbits)
+#define cpus_weight(cpumask) __cpus_weight((cpumask), nr_cpu_ids)
+static inline int __cpus_weight(const cpumask_t srcp, int nbits)
 {
 	return bitmap_weight(srcp->bits, nbits);
 }
 
 #define cpus_shift_right(dst, src, n) \
-			__cpus_shift_right(&(dst), &(src), (n), NR_CPUS)
-static inline void __cpus_shift_right(cpumask_t *dstp,
-					const cpumask_t *srcp, int n, int nbits)
+			__cpus_shift_right((dst), (src), (n), nr_cpu_ids)
+static inline void __cpus_shift_right(cpumask_t dstp,
+					const cpumask_t srcp, int n, int nbits)
 {
 	bitmap_shift_right(dstp->bits, srcp->bits, n, nbits);
 }
 
 #define cpus_shift_left(dst, src, n) \
-			__cpus_shift_left(&(dst), &(src), (n), NR_CPUS)
-static inline void __cpus_shift_left(cpumask_t *dstp,
-					const cpumask_t *srcp, int n, int nbits)
+			__cpus_shift_left((dst), (src), (n), nr_cpu_ids)
+static inline void __cpus_shift_left(cpumask_t dstp,
+					const cpumask_t srcp, int n, int nbits)
 {
 	bitmap_shift_left(dstp->bits, srcp->bits, n, nbits);
 }
@@ -244,11 +313,11 @@ static inline void __cpus_shift_left(cpu
 extern const unsigned long
 	cpu_bit_bitmap[BITS_PER_LONG+1][BITS_TO_LONGS(NR_CPUS)];
 
-static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
+static inline const cpumask_t get_cpu_mask(unsigned int cpu)
 {
 	const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
 	p -= cpu / BITS_PER_LONG;
-	return (const cpumask_t *)p;
+	return (const cpumask_t)p;
 }
 
 /*
@@ -256,7 +325,7 @@ static inline const cpumask_t *get_cpu_m
  * gcc optimizes it out (it's a constant) and there's no huge stack
  * variable created:
  */
-#define cpumask_of_cpu(cpu) (*get_cpu_mask(cpu))
+#define cpumask_of_cpu(cpu) (get_cpu_mask(cpu))
 
 
 #define CPU_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(NR_CPUS)
@@ -264,143 +333,100 @@ static inline const cpumask_t *get_cpu_m
 #if NR_CPUS <= BITS_PER_LONG
 
 #define CPU_MASK_ALL							\
-(cpumask_t) { {								\
+(cpumask_map_t) { {							\
 	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
 } }
 
-#define CPU_MASK_ALL_PTR	(&CPU_MASK_ALL)
+#define CPU_MASK_ALL_PTR	((cpumask_t)CPU_MASK_ALL)
 
 #else
 
 #define CPU_MASK_ALL							\
-(cpumask_t) { {								\
+(cpumask_map_t) { {							\
 	[0 ... BITS_TO_LONGS(NR_CPUS)-2] = ~0UL,			\
 	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
 } }
 
 /* cpu_mask_all is in init/main.c */
-extern cpumask_t cpu_mask_all;
-#define CPU_MASK_ALL_PTR	(&cpu_mask_all)
+extern cpumask_map_t cpu_mask_all;
+#define CPU_MASK_ALL_PTR	(cpu_mask_all)
 
 #endif
 
 #define CPU_MASK_NONE							\
-(cpumask_t) { {								\
+(cpumask_map_t) { {							\
 	[0 ... BITS_TO_LONGS(NR_CPUS)-1] =  0UL				\
 } }
 
 #define CPU_MASK_CPU0							\
-(cpumask_t) { {								\
+(cpumask_map_t) { {							\
 	[0] =  1UL							\
 } }
 
 #define cpus_addr(src) ((src).bits)
 
 #define cpumask_scnprintf(buf, len, src) \
-			__cpumask_scnprintf((buf), (len), &(src), NR_CPUS)
+			__cpumask_scnprintf((buf), (len), (src), nr_cpu_ids)
 static inline int __cpumask_scnprintf(char *buf, int len,
-					const cpumask_t *srcp, int nbits)
+					const cpumask_t srcp, int nbits)
 {
 	return bitmap_scnprintf(buf, len, srcp->bits, nbits);
 }
 
 #define cpumask_parse_user(ubuf, ulen, dst) \
-			__cpumask_parse_user((ubuf), (ulen), &(dst), NR_CPUS)
+			__cpumask_parse_user((ubuf), (ulen), (dst), nr_cpu_ids)
 static inline int __cpumask_parse_user(const char __user *buf, int len,
-					cpumask_t *dstp, int nbits)
+					cpumask_t dstp, int nbits)
 {
 	return bitmap_parse_user(buf, len, dstp->bits, nbits);
 }
 
 #define cpulist_scnprintf(buf, len, src) \
-			__cpulist_scnprintf((buf), (len), &(src), NR_CPUS)
+			__cpulist_scnprintf((buf), (len), (src), nr_cpu_ids)
 static inline int __cpulist_scnprintf(char *buf, int len,
-					const cpumask_t *srcp, int nbits)
+					const cpumask_t srcp, int nbits)
 {
 	return bitmap_scnlistprintf(buf, len, srcp->bits, nbits);
 }
 
-#define cpulist_parse(buf, dst) __cpulist_parse((buf), &(dst), NR_CPUS)
-static inline int __cpulist_parse(const char *buf, cpumask_t *dstp, int nbits)
+#define cpulist_parse(buf, dst) __cpulist_parse((buf), (dst), nr_cpu_ids)
+static inline int __cpulist_parse(const char *buf, cpumask_t dstp, int nbits)
 {
 	return bitmap_parselist(buf, dstp->bits, nbits);
 }
 
 #define cpu_remap(oldbit, old, new) \
-		__cpu_remap((oldbit), &(old), &(new), NR_CPUS)
+		__cpu_remap((oldbit), (old), (new), nr_cpu_ids)
 static inline int __cpu_remap(int oldbit,
-		const cpumask_t *oldp, const cpumask_t *newp, int nbits)
+		const cpumask_t oldp, const cpumask_t newp, int nbits)
 {
 	return bitmap_bitremap(oldbit, oldp->bits, newp->bits, nbits);
 }
 
 #define cpus_remap(dst, src, old, new) \
-		__cpus_remap(&(dst), &(src), &(old), &(new), NR_CPUS)
-static inline void __cpus_remap(cpumask_t *dstp, const cpumask_t *srcp,
-		const cpumask_t *oldp, const cpumask_t *newp, int nbits)
+		__cpus_remap((dst), (src), (old), (new), nr_cpu_ids)
+static inline void __cpus_remap(cpumask_t dstp, const cpumask_t srcp,
+		const cpumask_t oldp, const cpumask_t newp, int nbits)
 {
 	bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
 }
 
 #define cpus_onto(dst, orig, relmap) \
-		__cpus_onto(&(dst), &(orig), &(relmap), NR_CPUS)
-static inline void __cpus_onto(cpumask_t *dstp, const cpumask_t *origp,
-		const cpumask_t *relmapp, int nbits)
+		__cpus_onto((dst), (orig), (relmap), nr_cpu_ids)
+static inline void __cpus_onto(cpumask_t dstp, const cpumask_t origp,
+		const cpumask_t relmapp, int nbits)
 {
 	bitmap_onto(dstp->bits, origp->bits, relmapp->bits, nbits);
 }
 
 #define cpus_fold(dst, orig, sz) \
-		__cpus_fold(&(dst), &(orig), sz, NR_CPUS)
-static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
+		__cpus_fold((dst), (orig), sz, nr_cpu_ids)
+static inline void __cpus_fold(cpumask_t dstp, const cpumask_t origp,
 		int sz, int nbits)
 {
 	bitmap_fold(dstp->bits, origp->bits, sz, nbits);
 }
 
-#if NR_CPUS == 1
-
-#define nr_cpu_ids		1
-#define first_cpu(src)		({ (void)(src); 0; })
-#define next_cpu(n, src)	({ (void)(src); 1; })
-#define any_online_cpu(mask)	0
-#define for_each_cpu_mask(cpu, mask)	\
-	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
-
-#else /* NR_CPUS > 1 */
-
-extern int nr_cpu_ids;
-int __first_cpu(const cpumask_t *srcp);
-int __next_cpu(int n, const cpumask_t *srcp);
-int __any_online_cpu(const cpumask_t *mask);
-
-#define first_cpu(src)		__first_cpu(&(src))
-#define next_cpu(n, src)	__next_cpu((n), &(src))
-#define any_online_cpu(mask) __any_online_cpu(&(mask))
-#define for_each_cpu_mask(cpu, mask)			\
-	for ((cpu) = -1;				\
-		(cpu) = next_cpu((cpu), (mask)),	\
-		(cpu) < NR_CPUS; )
-#endif
-
-#if NR_CPUS <= 64
-
-#define next_cpu_nr(n, src)		next_cpu(n, src)
-#define cpus_weight_nr(cpumask)		cpus_weight(cpumask)
-#define for_each_cpu_mask_nr(cpu, mask)	for_each_cpu_mask(cpu, mask)
-
-#else /* NR_CPUS > 64 */
-
-int __next_cpu_nr(int n, const cpumask_t *srcp);
-#define next_cpu_nr(n, src)	__next_cpu_nr((n), &(src))
-#define cpus_weight_nr(cpumask)	__cpus_weight(&(cpumask), nr_cpu_ids)
-#define for_each_cpu_mask_nr(cpu, mask)			\
-	for ((cpu) = -1;				\
-		(cpu) = next_cpu_nr((cpu), (mask)),	\
-		(cpu) < nr_cpu_ids; )
-
-#endif /* NR_CPUS > 64 */
-
 /*
  * The following particular system cpumasks and operations manage
  * possible, present, active and online cpus.  Each of them is a fixed size
@@ -458,33 +484,15 @@ int __next_cpu_nr(int n, const cpumask_t
  *        main(){ set1(3); set2(5); }
  */
 
-extern cpumask_t cpu_possible_map;
-extern cpumask_t cpu_online_map;
-extern cpumask_t cpu_present_map;
-extern cpumask_t cpu_active_map;
-
-#if NR_CPUS > 1
-#define num_online_cpus()	cpus_weight_nr(cpu_online_map)
-#define num_possible_cpus()	cpus_weight_nr(cpu_possible_map)
-#define num_present_cpus()	cpus_weight_nr(cpu_present_map)
-#define cpu_online(cpu)		cpu_isset((cpu), cpu_online_map)
-#define cpu_possible(cpu)	cpu_isset((cpu), cpu_possible_map)
-#define cpu_present(cpu)	cpu_isset((cpu), cpu_present_map)
-#define cpu_active(cpu)		cpu_isset((cpu), cpu_active_map)
-#else
-#define num_online_cpus()	1
-#define num_possible_cpus()	1
-#define num_present_cpus()	1
-#define cpu_online(cpu)		((cpu) == 0)
-#define cpu_possible(cpu)	((cpu) == 0)
-#define cpu_present(cpu)	((cpu) == 0)
-#define cpu_active(cpu)		((cpu) == 0)
-#endif
+extern cpumask_map_t cpu_possible_map;
+extern cpumask_map_t cpu_online_map;
+extern cpumask_map_t cpu_present_map;
+extern cpumask_map_t cpu_active_map;
 
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
 
-#define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), cpu_possible_map)
-#define for_each_online_cpu(cpu)   for_each_cpu_mask_nr((cpu), cpu_online_map)
-#define for_each_present_cpu(cpu)  for_each_cpu_mask_nr((cpu), cpu_present_map)
+#define for_each_possible_cpu(cpu) for_each_cpu_mask((cpu), cpu_possible_map)
+#define for_each_online_cpu(cpu)   for_each_cpu_mask((cpu), cpu_online_map)
+#define for_each_present_cpu(cpu)  for_each_cpu_mask((cpu), cpu_present_map)
 
 #endif /* __LINUX_CPUMASK_H */

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                     ` <alpine.LFD.1.10.0809250836270.3265-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-09-25 20:59                                                                       ` Subject: [RFC 1/1] cpumask: Provide new cpumask API Mike Travis
@ 2008-09-26  5:25                                                                       ` Rusty Russell
       [not found]                                                                         ` <200809261525.30258.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Rusty Russell @ 2008-09-26  5:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Yinghai Lu, Ingo Molnar, David Miller, Alan.Brunelle-VXdhtT5mjnY,
	travis-sJ/iWh9BUns, tglx-hfZtesqFncYOwBW4kG4KsQ, rjw-KKrjLPT3xs0,
	Linux Kernel Mailing List, kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	Andrew Morton, arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

On Friday 26 September 2008 01:42:13 Linus Torvalds wrote:
> On Thu, 25 Sep 2008, Rusty Russell wrote:
> >     This turns out to be awful in practice, mainly due to const. 
> > Consider:
> >
> > 	#ifdef CONFIG_CPUMASK_OFFSTACK
> > 	typedef unsigned long *cpumask_t;
> > 	#else
> > 	typedef unsigned long cpumask_t[1];
> > 	#endif
> >
> > 	cpumask_t returns_cpumask(void);
>
> No. That's already broken. You cannot return a cpumask_t, regardless of
> interface. We must not do it regardless of how we pass those things
> around, since it generates _yet_ another temporary on the stack for the
> return slot for any kind of structure.

No, for large NR_CPUS, cpumask_t is a pointer as shown.  And we have numerous 
basic functions which return a cpumask_t.  Yes, this is part of the problem.

> What _is_ relevant is how we allocate them when we need temporary CPU
> masks. And _that_ is where my suggestion comes in. For small NR_CPUS, we
> really do want to allocate them on the stack, because calling kmalloc for
> a 4- or 8-byte allocation is just _stupid_.

Right, but cpumask_t is used for far more than stack decls, thus the problems.

I can make a separate "cpumask_stack_t" and use your method tho.  I think that 
might even reduce churn and allow us to do this in parts.

> which has to be converted some way. And I think it needs to be converted
> in a way that does *not* force us to call kmalloc() for idiotically small
> values.

Yeah, got that.  But your suggestion to change cpumask_t turned out horribly 
ugly.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                         ` <200809261525.30258.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
@ 2008-09-26  5:53                                                                           ` Mike Travis
       [not found]                                                                             ` <48DC78F2.8060400-sJ/iWh9BUns@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-09-26  5:53 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Yinghai Lu, Ingo Molnar, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

Rusty Russell wrote:
> On Friday 26 September 2008 01:42:13 Linus Torvalds wrote:
>> On Thu, 25 Sep 2008, Rusty Russell wrote:
>>>     This turns out to be awful in practice, mainly due to const. 
>>> Consider:
>>>
>>> 	#ifdef CONFIG_CPUMASK_OFFSTACK
>>> 	typedef unsigned long *cpumask_t;
>>> 	#else
>>> 	typedef unsigned long cpumask_t[1];
>>> 	#endif
>>>
>>> 	cpumask_t returns_cpumask(void);
>> No. That's already broken. You cannot return a cpumask_t, regardless of
>> interface. We must not do it regardless of how we pass those things
>> around, since it generates _yet_ another temporary on the stack for the
>> return slot for any kind of structure.
> 
> No, for large NR_CPUS, cpumask_t is a pointer as shown.  And we have numerous 
> basic functions which return a cpumask_t.  Yes, this is part of the problem.
> 
>> What _is_ relevant is how we allocate them when we need temporary CPU
>> masks. And _that_ is where my suggestion comes in. For small NR_CPUS, we
>> really do want to allocate them on the stack, because calling kmalloc for
>> a 4- or 8-byte allocation is just _stupid_.
> 
> Right, but cpumask_t is used for far more than stack decls, thus the problems.
> 
> I can make a separate "cpumask_stack_t" and use your method tho.  I think that 
> might even reduce churn and allow us to do this in parts.
> 
>> which has to be converted some way. And I think it needs to be converted
>> in a way that does *not* force us to call kmalloc() for idiotically small
>> values.
> 
> Yeah, got that.  But your suggestion to change cpumask_t turned out horribly 
> ugly.
> 
> Cheers,
> Rusty.

Hi Rusty,

I've gotten some good traction on the changes in the following patch.  About 30%
of the kernel is compiling right now and I'm picking up errors and warnings as
I'm going along.  I think it's doing most of what we need.  Attempting to hide
the cpumask struct definition caused all kinds of problems with the inline
functions and statically declaring cpumask's.

(The following patch is a combination of all the changes to cpumask.h with the
header from the first patch.  I'll send you a complete copy in separate email.)

Thanks,
Mike
--

Subject: [RFC 1/1] cpumask: Provide new cpumask API

Provide new cpumask interface API.  The relevant change is basically
cpumask_t becomes an opaque object.  I believe this results in the
minimum amount of editing while still allowing the inline cpumask
functions, and the ability to declare static cpumask objects.


    /* raw declaration */
    struct __cpumask_data_s { DECLARE_BITMAP(bits, NR_CPUS); };

    /* cpumask_map_t used for declaring static cpumask maps */
    typedef struct __cpumask_data_s cpumask_map_t[1];

    /* cpumask_t used for function args and return pointers */
    typedef struct __cpumask_data_s *cpumask_t;

    /* cpumask_var_t used for local variable, definition follows */
    typedef struct __cpumask_data_s	cpumask_var_t[1]; /* SMALL NR_CPUS */
    typedef struct __cpumask_data_s	*cpumask_var_t;	  /* LARGE NR_CPUS */

    /* replaces cpumask_t dst = (cpumask_t)src */
    void cpus_copy(cpumask_t dst, const cpumask_t src);

Remove the '*' indirection in all references to cpumask_t objects.  You can
change the reference to the cpumask object but not the cpumask object itself
without using the functions that operate on cpumask objects (f.e. the cpu_*
operators).  Functions can return a cpumask_t (which is a pointer to the
cpumask object) and only be passed a cpumask_t.

All uses of cpumask_t on the stack are changed to be cpumask_var_t except
for pointers to static cpumask objects.  Allocation of local (temp) cpumask
objects will follow...

All cpumask operators now operate using nr_cpu_ids instead of NR_CPUS.  All
variants of the cpumask operators which used nr_cpu_ids instead of NR_CPUS
are deleted.

All variants of functions which use the (old cpumask_t *) pointer are deleted
(f.e. set_cpus_allowed_ptr()).

Based on code from Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org> (THANKS!!)

Signed-of-by: Mike Travis <travis-sJ/iWh9BUns@public.gmane.org>

--- struct-cpumasks.orig/include/linux/cpumask.h	2008-09-25 20:40:59.303546951 -0700
+++ struct-cpumasks/include/linux/cpumask.h	2008-09-25 22:41:00.764472541 -0700
@@ -3,7 +3,8 @@
 
 /*
  * Cpumasks provide a bitmap suitable for representing the
- * set of CPU's in a system, one bit position per CPU number.
+ * set of CPU's in a system, one bit position per CPU number up to
+ * nr_cpu_ids (<= NR_CPUS).
  *
  * See detailed comments in the file linux/bitmap.h describing the
  * data type on which these cpumasks are based.
@@ -18,18 +19,6 @@
  * For details of cpus_fold(), see bitmap_fold in lib/bitmap.c.
  *
  * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- * Note: The alternate operations with the suffix "_nr" are used
- *       to limit the range of the loop to nr_cpu_ids instead of
- *       NR_CPUS when NR_CPUS > 64 for performance reasons.
- *       If NR_CPUS is <= 64 then most assembler bitmask
- *       operators execute faster with a constant range, so
- *       the operator will continue to use NR_CPUS.
- *
- *       Another consideration is that nr_cpu_ids is initialized
- *       to NR_CPUS and isn't lowered until the possible cpus are
- *       discovered (including any disabled cpus).  So early uses
- *       will span the entire range of NR_CPUS.
- * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  *
  * The available cpumask operations are:
  *
@@ -37,6 +26,7 @@
  * void cpu_clear(cpu, mask)		turn off bit 'cpu' in mask
  * void cpus_setall(mask)		set all bits
  * void cpus_clear(mask)		clear all bits
+ * void cpus_copy(dst, src)		copies cpumask bits from src to dst
  * int cpu_isset(cpu, mask)		true iff bit 'cpu' set in mask
  * int cpu_test_and_set(cpu, mask)	test and set bit 'cpu' in mask
  *
@@ -52,52 +42,22 @@
  * int cpus_empty(mask)			Is mask empty (no bits sets)?
  * int cpus_full(mask)			Is mask full (all bits sets)?
  * int cpus_weight(mask)		Hamming weigh - number of set bits
- * int cpus_weight_nr(mask)		Same using nr_cpu_ids instead of NR_CPUS
  *
  * void cpus_shift_right(dst, src, n)	Shift right
  * void cpus_shift_left(dst, src, n)	Shift left
  *
- * int first_cpu(mask)			Number lowest set bit, or NR_CPUS
- * int next_cpu(cpu, mask)		Next cpu past 'cpu', or NR_CPUS
- * int next_cpu_nr(cpu, mask)		Next cpu past 'cpu', or nr_cpu_ids
+ * int cpus_first(mask)			Number lowest set bit, or nr_cpu_ids
+ * int cpus_next(cpu, mask)		Next cpu past 'cpu', or nr_cpu_ids
+ * int cpus_next_in(cpu, mask, andmask)	Next cpu in mask & andmask or nr_cpu_ids
+ *
+ * cpumask_t cpumask_of_cpu(cpu)	Return pointer to cpumask with bit
+ *					'cpu' set
  *
- * cpumask_t cpumask_of_cpu(cpu)	Return cpumask with bit 'cpu' set
- *					(can be used as an lvalue)
+ * cpu_mask_all				cpumask_map_t of all bits set
  * CPU_MASK_ALL				Initializer - all bits set
  * CPU_MASK_NONE			Initializer - no bits set
  * unsigned long *cpus_addr(mask)	Array of unsigned long's in mask
  *
- * CPUMASK_ALLOC kmalloc's a structure that is a composite of many cpumask_t
- * variables, and CPUMASK_PTR provides pointers to each field.
- *
- * The structure should be defined something like this:
- * struct my_cpumasks {
- *	cpumask_t mask1;
- *	cpumask_t mask2;
- * };
- *
- * Usage is then:
- *	CPUMASK_ALLOC(my_cpumasks);
- *	CPUMASK_PTR(mask1, my_cpumasks);
- *	CPUMASK_PTR(mask2, my_cpumasks);
- *
- *	--- DO NOT reference cpumask_t pointers until this check ---
- *	if (my_cpumasks == NULL)
- *		"kmalloc failed"...
- *
- * References are now pointers to the cpumask_t variables (*mask1, ...)
- *
- *if NR_CPUS > BITS_PER_LONG
- *   CPUMASK_ALLOC(m)			Declares and allocates struct m *m =
- *						kmalloc(sizeof(*m), GFP_KERNEL)
- *   CPUMASK_FREE(m)			Macro for kfree(m)
- *else
- *   CPUMASK_ALLOC(m)			Declares struct m _m, *m = &_m
- *   CPUMASK_FREE(m)			Nop
- *endif
- *   CPUMASK_PTR(v, m)			Declares cpumask_t *v = &(m->v)
- * ------------------------------------------------------------------------
- *
  * int cpumask_scnprintf(buf, len, mask) Format cpumask for printing
  * int cpumask_parse_user(ubuf, ulen, mask)	Parse ascii string as cpumask
  * int cpulist_scnprintf(buf, len, mask) Format cpumask as list for printing
@@ -107,8 +67,8 @@
  * void cpus_onto(dst, orig, relmap)	*dst = orig relative to relmap
  * void cpus_fold(dst, orig, sz)	dst bits = orig bits mod sz
  *
- * for_each_cpu_mask(cpu, mask)		for-loop cpu over mask using NR_CPUS
- * for_each_cpu_mask_nr(cpu, mask)	for-loop cpu over mask using nr_cpu_ids
+ * for_each_cpu(cpu, mask)		for-loop cpu over mask
+ * for_each_cpu_in(cpu, mask, andmask)	for-loop cpu over mask & andmask
  *
  * int num_online_cpus()		Number of online CPUs
  * int num_possible_cpus()		Number of all possible CPUs
@@ -118,6 +78,7 @@
  * int cpu_possible(cpu)		Is some cpu possible?
  * int cpu_present(cpu)			Is some cpu present (can schedule)?
  *
+ * int any_cpu_in(mask, andmask)	First cpu in mask & andmask
  * int any_online_cpu(mask)		First online cpu in mask
  *
  * for_each_possible_cpu(cpu)		for-loop cpu over cpu_possible_map
@@ -138,129 +99,229 @@
 #include <linux/threads.h>
 #include <linux/bitmap.h>
 
-typedef struct { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
-extern cpumask_t _unused_cpumask_arg_;
+/* raw declaration */
+struct __cpumask_data_s { DECLARE_BITMAP(bits, NR_CPUS); };
+
+/* cpumask_map_t used for declaring static cpumask maps */
+typedef struct __cpumask_data_s cpumask_map_t[1];
+
+/* cpumask_t used for function args and return pointers */
+typedef struct __cpumask_data_s *cpumask_t;
+
+/* cpumask_var_t used for local variable, definition follows */
+
+#if NR_CPUS == 1
+
+/* cpumask_var_t used for local variable */
+typedef struct __cpumask_data_s	cpumask_var_t[1];
+
+#define nr_cpu_ids			1
+#define cpus_first(src)			({ (void)(src); 0; })
+#define cpus_next(n, src)		({ (void)(src); 1; })
+#define cpus_next_in(n, src, andsrc)	({ (void)(src); 1; })
+#define any_online_cpu(mask)		0
+#define for_each_cpu(cpu, mask)	\
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
+#define for_each_cpu_in(cpu, mask, andmask) \
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)andmask)
+
+#define num_online_cpus()		1
+#define num_possible_cpus()		1
+#define num_present_cpus()		1
+#define cpu_online(cpu)			((cpu) == 0)
+#define cpu_possible(cpu)		((cpu) == 0)
+#define cpu_present(cpu)		((cpu) == 0)
+#define cpu_active(cpu)			((cpu) == 0)
+
+#else /* ... NR_CPUS > 1 */
+
+#ifdef CONFIG_CPUMASKS_ONSTACK
+
+/* Constant is usually more efficient than a variable for small NR_CPUS */
+#define nr_cpu_ids		NR_CPUS
+
+/* cpumask_var_t used for local variable */
+typedef struct __cpumask_data_s	cpumask_var_t[1];
+static inline int cpumask_size(void)
+{
+	return sizeof(struct __cpumask_data_s);
+}
+
+#else
+
+/* Starts at NR_CPUS until acpi code discovers actual number. */
+extern int nr_cpu_ids;
+
+/* cpumask_var_t used for local variable */
+typedef struct __cpumask_data_s	*cpumask_var_t;
+static inline int cpumask_size(void)
+{
+	return BITS_TO_LONGS(nr_cpu_ids) * sizeof(long);
+}
+
+#endif /* NR_CPUS > BITS_PER_LONG */
+
+/* Deprecated: use for_each_cpu() */
+#define for_each_cpu_mask(cpu, mask)	for_each_cpu(cpu, mask)
+
+/* Deprecated: use cpus_first()/cpus_next() */
+#define first_cpu(src)		cpus_first((src))
+#define next_cpu(n, src)	cpus_next((n), (src))
+
+extern int cpus_first(const cpumask_t srcp);
+extern int cpus_next(int n, const cpumask_t srcp);
+extern int cpus_next_in(int n, const cpumask_t srcp, const cpumask_t andsrc);
+extern int any_cpu_in(const cpumask_t mask);
+
+#define any_online_cpu(mask)	any_cpu_in((const cpumask_t)(mask), \
+					   (const cpumask_t)cpu_online_map)
+
+#define for_each_cpu(cpu, mask)				\
+	for ((cpu) = -1;				\
+		(cpu) = cpus_next((cpu), (mask)),	\
+		(cpu) < nr_cpu_ids; )
+
+#define for_each_cpu_in(cpu, mask, andmask)			\
+	for ((cpu) = -1;					\
+		(cpu) = cpus_next_in((cpu), (mask), (andmask)),	\
+		(cpu) < nr_cpu_ids; )
+
 
-#define cpu_set(cpu, dst) __cpu_set((cpu), &(dst))
-static inline void __cpu_set(int cpu, volatile cpumask_t *dstp)
+#define num_online_cpus()	cpus_weight(cpu_online_map)
+#define num_possible_cpus()	cpus_weight(cpu_possible_map)
+#define num_present_cpus()	cpus_weight(cpu_present_map)
+#define cpu_online(cpu)		cpu_isset((cpu), cpu_online_map)
+#define cpu_possible(cpu)	cpu_isset((cpu), cpu_possible_map)
+#define cpu_present(cpu)	cpu_isset((cpu), cpu_present_map)
+#define cpu_active(cpu)		cpu_isset((cpu), cpu_active_map)
+#endif /* NR_CPUS > 1 */
+
+#define cpu_set(cpu, dst) __cpu_set((cpu), (dst))
+static inline void __cpu_set(int cpu, volatile cpumask_t dstp)
 {
 	set_bit(cpu, dstp->bits);
 }
 
-#define cpu_clear(cpu, dst) __cpu_clear((cpu), &(dst))
-static inline void __cpu_clear(int cpu, volatile cpumask_t *dstp)
+#define cpu_clear(cpu, dst) __cpu_clear((cpu), (dst))
+static inline void __cpu_clear(int cpu, volatile cpumask_t dstp)
 {
 	clear_bit(cpu, dstp->bits);
 }
 
-#define cpus_setall(dst) __cpus_setall(&(dst), NR_CPUS)
-static inline void __cpus_setall(cpumask_t *dstp, int nbits)
+#define cpus_setall(dst) __cpus_setall((dst), nr_cpu_ids)
+static inline void __cpus_setall(cpumask_t dstp, int nbits)
 {
 	bitmap_fill(dstp->bits, nbits);
 }
 
-#define cpus_clear(dst) __cpus_clear(&(dst), NR_CPUS)
-static inline void __cpus_clear(cpumask_t *dstp, int nbits)
+#define cpus_clear(dst) __cpus_clear((dst), nr_cpu_ids)
+static inline void __cpus_clear(cpumask_t dstp, int nbits)
 {
 	bitmap_zero(dstp->bits, nbits);
 }
 
+#define cpus_copy(dst, src) __cpus_copy((dst), (src), nr_cpu_ids)
+static inline void __cpus_copy(cpumask_t dstp, const cpumask_t srcp, int nbits)
+{
+	bitmap_copy(dstp->bits, srcp->bits, nbits);
+}
+
 /* No static inline type checking - see Subtlety (1) above. */
-#define cpu_isset(cpu, cpumask) test_bit((cpu), (cpumask).bits)
+#define cpu_isset(cpu, cpumask) test_bit((cpu), (cpumask)->bits)
 
-#define cpu_test_and_set(cpu, cpumask) __cpu_test_and_set((cpu), &(cpumask))
-static inline int __cpu_test_and_set(int cpu, cpumask_t *addr)
+#define cpu_test_and_set(cpu, cpumask) __cpu_test_and_set((cpu), (cpumask))
+static inline int __cpu_test_and_set(int cpu, cpumask_t addr)
 {
 	return test_and_set_bit(cpu, addr->bits);
 }
 
-#define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_and(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_and(dst, src1, src2) __cpus_and((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_and(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_or(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_or(dst, src1, src2) __cpus_or((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_or(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_xor(dst, src1, src2) __cpus_xor((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_xor(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_xor(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
 #define cpus_andnot(dst, src1, src2) \
-				__cpus_andnot(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_andnot(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+				__cpus_andnot((dst), (src1), (src2), nr_cpu_ids)
+static inline void __cpus_andnot(cpumask_t dstp, const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_complement(dst, src) __cpus_complement(&(dst), &(src), NR_CPUS)
-static inline void __cpus_complement(cpumask_t *dstp,
-					const cpumask_t *srcp, int nbits)
+#define cpus_complement(dst, src) __cpus_complement((dst), (src), nr_cpu_ids)
+static inline void __cpus_complement(cpumask_t dstp,
+					const cpumask_t srcp, int nbits)
 {
 	bitmap_complement(dstp->bits, srcp->bits, nbits);
 }
 
-#define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_equal(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_equal(src1, src2) __cpus_equal((src1), (src2), nr_cpu_ids)
+static inline int __cpus_equal(const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	return bitmap_equal(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_intersects(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_intersects(src1, src2) __cpus_intersects((src1), (src2), nr_cpu_ids)
+static inline int __cpus_intersects(const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	return bitmap_intersects(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_subset(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+#define cpus_subset(src1, src2) __cpus_subset((src1), (src2), nr_cpu_ids)
+static inline int __cpus_subset(const cpumask_t src1p,
+					const cpumask_t src2p, int nbits)
 {
 	return bitmap_subset(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_empty(src) __cpus_empty(&(src), NR_CPUS)
-static inline int __cpus_empty(const cpumask_t *srcp, int nbits)
+#define cpus_empty(src) __cpus_empty((src), nr_cpu_ids)
+static inline int __cpus_empty(const cpumask_t srcp, int nbits)
 {
 	return bitmap_empty(srcp->bits, nbits);
 }
 
-#define cpus_full(cpumask) __cpus_full(&(cpumask), NR_CPUS)
-static inline int __cpus_full(const cpumask_t *srcp, int nbits)
+#define cpus_full(cpumask) __cpus_full((cpumask), nr_cpu_ids)
+static inline int __cpus_full(const cpumask_t srcp, int nbits)
 {
 	return bitmap_full(srcp->bits, nbits);
 }
 
-#define cpus_weight(cpumask) __cpus_weight(&(cpumask), NR_CPUS)
-static inline int __cpus_weight(const cpumask_t *srcp, int nbits)
+#define cpus_weight(cpumask) __cpus_weight((cpumask), nr_cpu_ids)
+static inline int __cpus_weight(const cpumask_t srcp, int nbits)
 {
 	return bitmap_weight(srcp->bits, nbits);
 }
 
 #define cpus_shift_right(dst, src, n) \
-			__cpus_shift_right(&(dst), &(src), (n), NR_CPUS)
-static inline void __cpus_shift_right(cpumask_t *dstp,
-					const cpumask_t *srcp, int n, int nbits)
+			__cpus_shift_right((dst), (src), (n), nr_cpu_ids)
+static inline void __cpus_shift_right(cpumask_t dstp,
+					const cpumask_t srcp, int n, int nbits)
 {
 	bitmap_shift_right(dstp->bits, srcp->bits, n, nbits);
 }
 
 #define cpus_shift_left(dst, src, n) \
-			__cpus_shift_left(&(dst), &(src), (n), NR_CPUS)
-static inline void __cpus_shift_left(cpumask_t *dstp,
-					const cpumask_t *srcp, int n, int nbits)
+			__cpus_shift_left((dst), (src), (n), nr_cpu_ids)
+static inline void __cpus_shift_left(cpumask_t dstp,
+					const cpumask_t srcp, int n, int nbits)
 {
 	bitmap_shift_left(dstp->bits, srcp->bits, n, nbits);
 }
@@ -275,11 +336,12 @@ static inline void __cpus_shift_left(cpu
 extern const unsigned long
 	cpu_bit_bitmap[BITS_PER_LONG+1][BITS_TO_LONGS(NR_CPUS)];
 
-static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
+/* XXX - "const" causes: "warning: type qualifiers ignored on function return type" */
+static inline /*const*/ cpumask_t get_cpu_mask(unsigned int cpu)
 {
 	const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
 	p -= cpu / BITS_PER_LONG;
-	return (const cpumask_t *)p;
+	return (const cpumask_t)p;
 }
 
 /*
@@ -287,7 +349,7 @@ static inline const cpumask_t *get_cpu_m
  * gcc optimizes it out (it's a constant) and there's no huge stack
  * variable created:
  */
-#define cpumask_of_cpu(cpu) (*get_cpu_mask(cpu))
+#define cpumask_of_cpu(cpu) (get_cpu_mask(cpu))
 
 
 #define CPU_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(NR_CPUS)
@@ -295,152 +357,94 @@ static inline const cpumask_t *get_cpu_m
 #if NR_CPUS <= BITS_PER_LONG
 
 #define CPU_MASK_ALL							\
-(cpumask_t) { {								\
+(cpumask_map_t) { [0] = { {						\
 	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
-} }
-
-#define CPU_MASK_ALL_PTR	(&CPU_MASK_ALL)
+} } }
 
 #else
 
 #define CPU_MASK_ALL							\
-(cpumask_t) { {								\
+(struct __cpumask_data_s) { [0] = { {					\
 	[0 ... BITS_TO_LONGS(NR_CPUS)-2] = ~0UL,			\
 	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
-} }
-
-/* cpu_mask_all is in init/main.c */
-extern cpumask_t cpu_mask_all;
-#define CPU_MASK_ALL_PTR	(&cpu_mask_all)
+} } }
 
 #endif
 
 #define CPU_MASK_NONE							\
-(cpumask_t) { {								\
+(cpumask_map_t) { [0] = { {						\
 	[0 ... BITS_TO_LONGS(NR_CPUS)-1] =  0UL				\
-} }
+} } }
 
 #define CPU_MASK_CPU0							\
-(cpumask_t) { {								\
+(cpumask_map_t) { [0] = { {						\
 	[0] =  1UL							\
-} }
+} } }
 
-#define cpus_addr(src) ((src).bits)
-
-#if NR_CPUS > BITS_PER_LONG
-#define	CPUMASK_ALLOC(m)	struct m *m = kmalloc(sizeof(*m), GFP_KERNEL)
-#define	CPUMASK_FREE(m)		kfree(m)
-#else
-#define	CPUMASK_ALLOC(m)	struct m _m, *m = &_m
-#define	CPUMASK_FREE(m)
-#endif
-#define	CPUMASK_PTR(v, m) 	cpumask_t *v = &(m->v)
+#define cpus_addr(src) ((src)->bits)
 
 #define cpumask_scnprintf(buf, len, src) \
-			__cpumask_scnprintf((buf), (len), &(src), NR_CPUS)
+			__cpumask_scnprintf((buf), (len), (src), nr_cpu_ids)
 static inline int __cpumask_scnprintf(char *buf, int len,
-					const cpumask_t *srcp, int nbits)
+					const cpumask_t srcp, int nbits)
 {
 	return bitmap_scnprintf(buf, len, srcp->bits, nbits);
 }
 
 #define cpumask_parse_user(ubuf, ulen, dst) \
-			__cpumask_parse_user((ubuf), (ulen), &(dst), NR_CPUS)
+			__cpumask_parse_user((ubuf), (ulen), (dst), nr_cpu_ids)
 static inline int __cpumask_parse_user(const char __user *buf, int len,
-					cpumask_t *dstp, int nbits)
+					cpumask_t dstp, int nbits)
 {
 	return bitmap_parse_user(buf, len, dstp->bits, nbits);
 }
 
 #define cpulist_scnprintf(buf, len, src) \
-			__cpulist_scnprintf((buf), (len), &(src), NR_CPUS)
+			__cpulist_scnprintf((buf), (len), (src), nr_cpu_ids)
 static inline int __cpulist_scnprintf(char *buf, int len,
-					const cpumask_t *srcp, int nbits)
+					const cpumask_t srcp, int nbits)
 {
 	return bitmap_scnlistprintf(buf, len, srcp->bits, nbits);
 }
 
-#define cpulist_parse(buf, dst) __cpulist_parse((buf), &(dst), NR_CPUS)
-static inline int __cpulist_parse(const char *buf, cpumask_t *dstp, int nbits)
+#define cpulist_parse(buf, dst) __cpulist_parse((buf), (dst), nr_cpu_ids)
+static inline int __cpulist_parse(const char *buf, cpumask_t dstp, int nbits)
 {
 	return bitmap_parselist(buf, dstp->bits, nbits);
 }
 
 #define cpu_remap(oldbit, old, new) \
-		__cpu_remap((oldbit), &(old), &(new), NR_CPUS)
+		__cpu_remap((oldbit), (old), (new), nr_cpu_ids)
 static inline int __cpu_remap(int oldbit,
-		const cpumask_t *oldp, const cpumask_t *newp, int nbits)
+		const cpumask_t oldp, const cpumask_t newp, int nbits)
 {
 	return bitmap_bitremap(oldbit, oldp->bits, newp->bits, nbits);
 }
 
 #define cpus_remap(dst, src, old, new) \
-		__cpus_remap(&(dst), &(src), &(old), &(new), NR_CPUS)
-static inline void __cpus_remap(cpumask_t *dstp, const cpumask_t *srcp,
-		const cpumask_t *oldp, const cpumask_t *newp, int nbits)
+		__cpus_remap((dst), (src), (old), (new), nr_cpu_ids)
+static inline void __cpus_remap(cpumask_t dstp, const cpumask_t srcp,
+		const cpumask_t oldp, const cpumask_t newp, int nbits)
 {
 	bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
 }
 
 #define cpus_onto(dst, orig, relmap) \
-		__cpus_onto(&(dst), &(orig), &(relmap), NR_CPUS)
-static inline void __cpus_onto(cpumask_t *dstp, const cpumask_t *origp,
-		const cpumask_t *relmapp, int nbits)
+		__cpus_onto((dst), (orig), (relmap), nr_cpu_ids)
+static inline void __cpus_onto(cpumask_t dstp, const cpumask_t origp,
+		const cpumask_t relmapp, int nbits)
 {
 	bitmap_onto(dstp->bits, origp->bits, relmapp->bits, nbits);
 }
 
 #define cpus_fold(dst, orig, sz) \
-		__cpus_fold(&(dst), &(orig), sz, NR_CPUS)
-static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
+		__cpus_fold((dst), (orig), sz, nr_cpu_ids)
+static inline void __cpus_fold(cpumask_t dstp, const cpumask_t origp,
 		int sz, int nbits)
 {
 	bitmap_fold(dstp->bits, origp->bits, sz, nbits);
 }
 
-#if NR_CPUS == 1
-
-#define nr_cpu_ids		1
-#define first_cpu(src)		({ (void)(src); 0; })
-#define next_cpu(n, src)	({ (void)(src); 1; })
-#define any_online_cpu(mask)	0
-#define for_each_cpu_mask(cpu, mask)	\
-	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
-
-#else /* NR_CPUS > 1 */
-
-extern int nr_cpu_ids;
-int __first_cpu(const cpumask_t *srcp);
-int __next_cpu(int n, const cpumask_t *srcp);
-int __any_online_cpu(const cpumask_t *mask);
-
-#define first_cpu(src)		__first_cpu(&(src))
-#define next_cpu(n, src)	__next_cpu((n), &(src))
-#define any_online_cpu(mask) __any_online_cpu(&(mask))
-#define for_each_cpu_mask(cpu, mask)			\
-	for ((cpu) = -1;				\
-		(cpu) = next_cpu((cpu), (mask)),	\
-		(cpu) < NR_CPUS; )
-#endif
-
-#if NR_CPUS <= 64
-
-#define next_cpu_nr(n, src)		next_cpu(n, src)
-#define cpus_weight_nr(cpumask)		cpus_weight(cpumask)
-#define for_each_cpu_mask_nr(cpu, mask)	for_each_cpu_mask(cpu, mask)
-
-#else /* NR_CPUS > 64 */
-
-int __next_cpu_nr(int n, const cpumask_t *srcp);
-#define next_cpu_nr(n, src)	__next_cpu_nr((n), &(src))
-#define cpus_weight_nr(cpumask)	__cpus_weight(&(cpumask), nr_cpu_ids)
-#define for_each_cpu_mask_nr(cpu, mask)			\
-	for ((cpu) = -1;				\
-		(cpu) = next_cpu_nr((cpu), (mask)),	\
-		(cpu) < nr_cpu_ids; )
-
-#endif /* NR_CPUS > 64 */
-
 /*
  * The following particular system cpumasks and operations manage
  * possible, present, active and online cpus.  Each of them is a fixed size
@@ -498,33 +502,16 @@ int __next_cpu_nr(int n, const cpumask_t
  *        main(){ set1(3); set2(5); }
  */
 
-extern cpumask_t cpu_possible_map;
-extern cpumask_t cpu_online_map;
-extern cpumask_t cpu_present_map;
-extern cpumask_t cpu_active_map;
-
-#if NR_CPUS > 1
-#define num_online_cpus()	cpus_weight_nr(cpu_online_map)
-#define num_possible_cpus()	cpus_weight_nr(cpu_possible_map)
-#define num_present_cpus()	cpus_weight_nr(cpu_present_map)
-#define cpu_online(cpu)		cpu_isset((cpu), cpu_online_map)
-#define cpu_possible(cpu)	cpu_isset((cpu), cpu_possible_map)
-#define cpu_present(cpu)	cpu_isset((cpu), cpu_present_map)
-#define cpu_active(cpu)		cpu_isset((cpu), cpu_active_map)
-#else
-#define num_online_cpus()	1
-#define num_possible_cpus()	1
-#define num_present_cpus()	1
-#define cpu_online(cpu)		((cpu) == 0)
-#define cpu_possible(cpu)	((cpu) == 0)
-#define cpu_present(cpu)	((cpu) == 0)
-#define cpu_active(cpu)		((cpu) == 0)
-#endif
+extern cpumask_map_t cpu_possible_map;
+extern cpumask_map_t cpu_online_map;
+extern cpumask_map_t cpu_present_map;
+extern cpumask_map_t cpu_active_map;
+extern cpumask_map_t cpu_mask_all;
 
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
 
-#define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), cpu_possible_map)
-#define for_each_online_cpu(cpu)   for_each_cpu_mask_nr((cpu), cpu_online_map)
-#define for_each_present_cpu(cpu)  for_each_cpu_mask_nr((cpu), cpu_present_map)
+#define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_map)
+#define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_map)
+#define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_map)
 
 #endif /* __LINUX_CPUMASK_H */

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                             ` <48DC78F2.8060400-sJ/iWh9BUns@public.gmane.org>
@ 2008-09-27 19:16                                                                               ` Ingo Molnar
       [not found]                                                                                 ` <20080927191653.GB18619-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-09-27 19:16 UTC (permalink / raw)
  To: Mike Travis
  Cc: Rusty Russell, Linus Torvalds, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner


* Mike Travis <travis-sJ/iWh9BUns@public.gmane.org> wrote:

> Hi Rusty,
> 
> I've gotten some good traction on the changes in the following patch.  
> About 30% of the kernel is compiling right now and I'm picking up 
> errors and warnings as I'm going along.  I think it's doing most of 
> what we need.  Attempting to hide the cpumask struct definition caused 
> all kinds of problems with the inline functions and statically 
> declaring cpumask's.
> 
> (The following patch is a combination of all the changes to cpumask.h 
> with the header from the first patch.  I'll send you a complete copy 
> in separate email.)

could you please send whatever .c changes you have already, so that we 
can have a look at how the end result will look like? Doesnt have to 
build, i'm just curious about how it looks like in practice, 
semantically.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                                 ` <20080927191653.GB18619-X9Un+BFzKDI@public.gmane.org>
@ 2008-09-29 14:33                                                                                   ` Mike Travis
       [not found]                                                                                     ` <48E0E73A.40803-sJ/iWh9BUns@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-09-29 14:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Linus Torvalds, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

Ingo Molnar wrote:
> * Mike Travis <travis-sJ/iWh9BUns@public.gmane.org> wrote:
> 
>> Hi Rusty,
>>
>> I've gotten some good traction on the changes in the following patch.  
>> About 30% of the kernel is compiling right now and I'm picking up 
>> errors and warnings as I'm going along.  I think it's doing most of 
>> what we need.  Attempting to hide the cpumask struct definition caused 
>> all kinds of problems with the inline functions and statically 
>> declaring cpumask's.
>>
>> (The following patch is a combination of all the changes to cpumask.h 
>> with the header from the first patch.  I'll send you a complete copy 
>> in separate email.)
> 
> could you please send whatever .c changes you have already, so that we 
> can have a look at how the end result will look like? Doesnt have to 
> build, i'm just curious about how it looks like in practice, 
> semantically.
> 
> 	Ingo


I will, and the full "allyesconfig" does compile.  And it's basically a
benign change in that the functionality is still the same.  I'm currently
reordering it a bit to clean it up.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                                     ` <48E0E73A.40803-sJ/iWh9BUns@public.gmane.org>
@ 2008-09-30 11:04                                                                                       ` Ingo Molnar
  2008-09-30 16:14                                                                                         ` Mike Travis
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-09-30 11:04 UTC (permalink / raw)
  To: Mike Travis
  Cc: Rusty Russell, Linus Torvalds, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner


* Mike Travis <travis-sJ/iWh9BUns@public.gmane.org> wrote:

> > could you please send whatever .c changes you have already, so that 
> > we can have a look at how the end result will look like? Doesnt have 
> > to build, i'm just curious about how it looks like in practice, 
> > semantically.
> 
> 
> I will, and the full "allyesconfig" does compile.  And it's basically 
> a benign change in that the functionality is still the same.  I'm 
> currently reordering it a bit to clean it up.

btw., are the resulting instructions also expected to be the same? If 
yes then you might want to verify it all by making sure the md5's of the 
.o's do not change.

(If that's not possible (gcc decides to compile it a bit differently) 
then no big deal, just wanted to mention the possibility.)

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
  2008-09-30 11:04                                                                                       ` Ingo Molnar
@ 2008-09-30 16:14                                                                                         ` Mike Travis
       [not found]                                                                                           ` <48E2506C.7000406-sJ/iWh9BUns@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-09-30 16:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Linus Torvalds, Yinghai Lu, David Miller,
	Alan.Brunelle, tglx, rjw, Linux Kernel Mailing List,
	kernel-testers, Andrew Morton, arjan, Jack Steiner

Ingo Molnar wrote:
> * Mike Travis <travis@sgi.com> wrote:
> 
>>> could you please send whatever .c changes you have already, so that 
>>> we can have a look at how the end result will look like? Doesnt have 
>>> to build, i'm just curious about how it looks like in practice, 
>>> semantically.
>>
>> I will, and the full "allyesconfig" does compile.  And it's basically 
>> a benign change in that the functionality is still the same.  I'm 
>> currently reordering it a bit to clean it up.
> 
> btw., are the resulting instructions also expected to be the same? If 
> yes then you might want to verify it all by making sure the md5's of the 
> .o's do not change.
> 
> (If that's not possible (gcc decides to compile it a bit differently) 
> then no big deal, just wanted to mention the possibility.)
> 
> 	Ingo

Well, not exactly... ;-)  It does institute the new API change that specifies
only pointers to cpumask's can be passed to functions and returned from
functions.  I really wanted the default cpumask_t to be a constant so those
instances where the passed in cpumask is used as a read/write temp variable
would be caught.  But it started getting messy.

One pain is:

	typedef struct __cpumask_s *cpumask_t;
	const cpumask_t xxx;

is not the same as:

	typedef const struct __cpumask_s *const_cpumask_t;
	const_cpumask_t xxx;

and I'm not exactly sure why.  It came up when I tried to declare
functions that returned a constant cpumask_t pointer (node_to_cpumask,
cpumask_of_cpu, etc.)

The other major change I'm contemplating is to remove "cpumask_t" completely
(maybe cpumask_ptr_t?).  This would force every instance of cpumask_t to be
examined.  (I found quite a few I had missed in my original edits when I
added the task struct temp cpumask's.)

Oh yeah, one question ... is "current" always valid?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                                           ` <48E2506C.7000406-sJ/iWh9BUns@public.gmane.org>
@ 2008-09-30 16:46                                                                                             ` Linus Torvalds
       [not found]                                                                                               ` <alpine.LFD.2.00.0809300939450.3389-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-09-30 16:46 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Rusty Russell, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

On Tue, 30 Sep 2008, Mike Travis wrote:
> 
> One pain is:
> 
> 	typedef struct __cpumask_s *cpumask_t;
> 	const cpumask_t xxx;
> 
> is not the same as:
> 
> 	typedef const struct __cpumask_s *const_cpumask_t;
> 	const_cpumask_t xxx;
> 
> and I'm not exactly sure why.

Umm. The const has different 

One is

	typedef const struct __cpumask_s *const_cpumask_t;

which becomes

	(const struct __cpumask_s) *

while the other is

	const cpumask_t xxx

which is

	const (struct __cpumask_s *)

and if you look a bit more closely, you'll see that they are _obviously_ 
not the same thing at all.

Quite frankly, I personally do hate typedefs that end up being pointers, 
and used as pointers, without showing that in the source code.

When you do

	type_t a;

	fn(a);

I expect the code to essentially do a pass-by-value. But when the type_t 
is a pointer, that doesn't really work.

Your issue with 'const' is just another version of the same. You don't 
want the _pointer_ to be const, you want what it points _to_ to be const. 
But because you hid the pointerness inside the typedef, you simply cannot 
do that.

The problem with cpumask's, of course, is that for the "small mask" case, 
we really don't want it to be a pointer. So now it's sometimes a pointer 
and sometimes not. The typedef hides that, and I understand why it's a 
good idea, but I'm surprised you didn't understand what the implications 
were for 'const', and I'm now a bit more leery about this whole thing just 
because the typedef ends up hiding so much - it doesn't just hide the 
basic type, it hides a very basic *code* issue.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                                               ` <alpine.LFD.2.00.0809300939450.3389-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-09-30 18:02                                                                                                 ` Mike Travis
       [not found]                                                                                                   ` <48E269B6.1080904-sJ/iWh9BUns@public.gmane.org>
  2008-10-01  0:44                                                                                                 ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rusty Russell
  1 sibling, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-09-30 18:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Rusty Russell, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

Linus Torvalds wrote:
> 
> On Tue, 30 Sep 2008, Mike Travis wrote:
>> One pain is:
>>
>> 	typedef struct __cpumask_s *cpumask_t;
>> 	const cpumask_t xxx;
>>
>> is not the same as:
>>
>> 	typedef const struct __cpumask_s *const_cpumask_t;
>> 	const_cpumask_t xxx;
>>
>> and I'm not exactly sure why.
> 
> Umm. The const has different 
> 
> One is
> 
> 	typedef const struct __cpumask_s *const_cpumask_t;
> 
> which becomes
> 
> 	(const struct __cpumask_s) *
> 
> while the other is
> 
> 	const cpumask_t xxx
> 
> which is
> 
> 	const (struct __cpumask_s *)
> 
> and if you look a bit more closely, you'll see that they are _obviously_ 
> not the same thing at all.

Thanks, yes that explains what I should have figured out.  (Nice way to
explain it btw... ;-)

> 
> Quite frankly, I personally do hate typedefs that end up being pointers, 
> and used as pointers, without showing that in the source code.
> 
> When you do
> 
> 	type_t a;
> 
> 	fn(a);
> 
> I expect the code to essentially do a pass-by-value. But when the type_t 
> is a pointer, that doesn't really work.

I agree, and as I mentioned, Rusty was working towards an alternative
method of declaring cpumask's which does not hide this.

My goal was to create an (apparent) opaque handle for cpumask_t and modify
the code so all changes to the contents of the cpumask are via functions.

> 
> Your issue with 'const' is just another version of the same. You don't 
> want the _pointer_ to be const, you want what it points _to_ to be const. 
> But because you hid the pointerness inside the typedef, you simply cannot 
> do that.
> 
> The problem with cpumask's, of course, is that for the "small mask" case, 
> we really don't want it to be a pointer. So now it's sometimes a pointer 
> and sometimes not. The typedef hides that, and I understand why it's a 
> good idea, but I'm surprised you didn't understand what the implications 
> were for 'const', and I'm now a bit more leery about this whole thing just 
> because the typedef ends up hiding so much - it doesn't just hide the 
> basic type, it hides a very basic *code* issue.

Perhaps then defining the cpumask as a list of unsigned longs (remove the
outer struct) would play more "naturally".  Lists by definition are always
referenced by pointers.  I have another version of the patchset that shows
this and I'll post just the cpumask.h and doc files.

> 
> 			Linus

Thanks!
Mike

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [RFC 1/1] cpumask: New cpumask API - take 2 - use unsigned longs
       [not found]                                                                                                   ` <48E269B6.1080904-sJ/iWh9BUns@public.gmane.org>
@ 2008-09-30 22:22                                                                                                     ` Mike Travis
       [not found]                                                                                                       ` <48E2A691.7060407-sJ/iWh9BUns@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Mike Travis @ 2008-09-30 22:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Rusty Russell, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

Mike Travis wrote:
> Linus Torvalds wrote:
...
>> Your issue with 'const' is just another version of the same. You don't 
>> want the _pointer_ to be const, you want what it points _to_ to be const. 
>> But because you hid the pointerness inside the typedef, you simply cannot 
>> do that.
>>
>> The problem with cpumask's, of course, is that for the "small mask" case, 
>> we really don't want it to be a pointer. So now it's sometimes a pointer 
>> and sometimes not. The typedef hides that, and I understand why it's a 
>> good idea, but I'm surprised you didn't understand what the implications 
>> were for 'const', and I'm now a bit more leery about this whole thing just 
>> because the typedef ends up hiding so much - it doesn't just hide the 
>> basic type, it hides a very basic *code* issue.
> 
> Perhaps then defining the cpumask as a list of unsigned longs (remove the
> outer struct) would play more "naturally".  Lists by definition are always
> referenced by pointers.  I have another version of the patchset that shows
> this and I'll post just the cpumask.h and doc files.
...

Here's an alternate proposal for the new cpumask API.  I have not yet begun
the tedious work of making it compile so there may still be some syntax
errors.  (Init/main.c compiles.)  But I believe it more closely adheres to
proper 'C' coding styles.

Thanks,
Mike
--
Subject: cpumask: Provide new cpumask API

[Copied from Documentation/cpumask.txt]

Introduction

Cpumask variables are used to represent (with a single bit) all the
CPU's in a system.  With the prolific increase in the number of CPU's
in a single system image (SSI), managing this large number of cpus has
become an ordeal.  When the limit of CPU's in a system was small, the
cpumask could fit within an integer.  Even with the increase to 128,
a bit pattern was well within a manageable size.  Now with systems
available that have 2048 cores, with hyper-threading there are 4096
cpu threads.  And the number of cpus in an SSI is growing, 16,384 is
right around the corner.  Even desktop systems with only 2 or 4 sockets,
a new generation of Intel processors that have 128 cores per socket will
increase the count of CPU threads tremendously.

Thus the handling of the cpumask that represents this 4096 limit needs
512 bytes, putting pressure on the amount of stack space needed to
accommodate temporary cpumask variables.

The primary goal of the cpumasks API is to accommodate these large
number of cpus without effecting the compactness of Linux for small,
compact systems.


The Changes

Provide new cpumask interface API.  The relevant change is basically
cpumask_t becomes an pointer to a constant list of unsigned longs
and two new typedef's are used for declaring cpumask_t variables,
cpumask_var_t and cpumask_map_t.  This should result in the minimum
amount of modifications while still allowing the inline cpumask
functions, and the ability to declare static cpumask objects.


    /* cpumask_t is used when pointing to a constant cpumask */
    typedef const unsigned long *cpumask_t;

    /* cpumask_map_t used for declaring static cpumask maps */
    typedef unsigned long cpumask_map_t[BITS_TO_LONGS(nr_cpu_ids)];

    /* cpumask_var_t used for read/write cpumask variables */
    typedef unsigned long cpumask_var_t[BITS_TO_LONGS(nr_cpu_ids)];
    							/* SMALL NR_CPUS */
    typedef unsigned long *cpumask_var_t;		/* LARGE NR_CPUS */

    /* replaces cpumask_t dst = (cpumask_t)src */
    void cpus_copy(cpumask_var_t dst, cpumask_t src);

You can change the reference to the cpumask object but to change the
contents of the cpumask object itself you must use the functions that
operate on cpumask objects (f.e. the cpu_* operators).  Functions can
return a cpumask_t (cpumask pointer) and can only be passed a cpumask_t.

All uses of a cpumask_t variable on the stack are changed to be
cpumask_var_t except for pointers to static (read only) cpumask objects.
Allocation of local (temp) cpumask objects use the functions available
in cpumask_alloc.h.  (descriptions to be supplied.)

All cpumask operators now operate using nr_cpu_ids instead of NR_CPUS.
All variants of the cpumask operators which used nr_cpu_ids instead of
NR_CPUS have been deleted.

All variants of functions which use the (old cpumask_t *) pointer have been
deleted (f.e. set_cpus_allowed_ptr()).

Based on code from Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org> (THANKS!!)

Signed-of-by: Mike Travis <travis-sJ/iWh9BUns@public.gmane.org>

---
 include/linux/cpumask.h       |  432 +++++++++++++++++++-----------------------
 include/linux/cpumask_alloc.h |   58 +++--
 lib/cpumask.c                 |   36 ++-
 3 files changed, 258 insertions(+), 268 deletions(-)

--- longs-cpumasks.orig/include/linux/cpumask.h
+++ longs-cpumasks/include/linux/cpumask.h
@@ -3,7 +3,8 @@
 
 /*
  * Cpumasks provide a bitmap suitable for representing the
- * set of CPU's in a system, one bit position per CPU number.
+ * set of CPU's in a system, one bit position per CPU number up to
+ * nr_cpu_ids (<= NR_CPUS).
  *
  * See detailed comments in the file linux/bitmap.h describing the
  * data type on which these cpumasks are based.
@@ -18,18 +19,6 @@
  * For details of cpus_fold(), see bitmap_fold in lib/bitmap.c.
  *
  * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- * Note: The alternate operations with the suffix "_nr" are used
- *       to limit the range of the loop to nr_cpu_ids instead of
- *       NR_CPUS when NR_CPUS > 64 for performance reasons.
- *       If NR_CPUS is <= 64 then most assembler bitmask
- *       operators execute faster with a constant range, so
- *       the operator will continue to use NR_CPUS.
- *
- *       Another consideration is that nr_cpu_ids is initialized
- *       to NR_CPUS and isn't lowered until the possible cpus are
- *       discovered (including any disabled cpus).  So early uses
- *       will span the entire range of NR_CPUS.
- * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  *
  * The available cpumask operations are:
  *
@@ -37,6 +26,7 @@
  * void cpu_clear(cpu, mask)		turn off bit 'cpu' in mask
  * void cpus_setall(mask)		set all bits
  * void cpus_clear(mask)		clear all bits
+ * void cpus_copy(dst, src)		copies cpumask bits from src to dst
  * int cpu_isset(cpu, mask)		true iff bit 'cpu' set in mask
  * int cpu_test_and_set(cpu, mask)	test and set bit 'cpu' in mask
  *
@@ -52,19 +42,21 @@
  * int cpus_empty(mask)			Is mask empty (no bits sets)?
  * int cpus_full(mask)			Is mask full (all bits sets)?
  * int cpus_weight(mask)		Hamming weigh - number of set bits
- * int cpus_weight_nr(mask)		Same using nr_cpu_ids instead of NR_CPUS
  *
  * void cpus_shift_right(dst, src, n)	Shift right
  * void cpus_shift_left(dst, src, n)	Shift left
  *
- * int first_cpu(mask)			Number lowest set bit, or NR_CPUS
- * int next_cpu(cpu, mask)		Next cpu past 'cpu', or NR_CPUS
- * int next_cpu_nr(cpu, mask)		Next cpu past 'cpu', or nr_cpu_ids
- *
- * cpumask_t cpumask_of_cpu(cpu)	Return cpumask with bit 'cpu' set
- *					(can be used as an lvalue)
- * CPU_MASK_ALL				Initializer - all bits set
- * CPU_MASK_NONE			Initializer - no bits set
+ * int cpus_first(mask)			Number lowest set bit, or nr_cpu_ids
+ * int cpus_next(cpu, mask)		Next cpu past 'cpu', or nr_cpu_ids
+ * int cpus_next_in(cpu, mask, andmask)	Next cpu in mask & andmask or nr_cpu_ids
+ *
+ * const cpumask_t cpumask_of_cpu(cpu)	Return pointer to cpumask with bit
+ *					'cpu' set
+ *
+ * cpu_mask_all				cpumask_map_t of all bits set
+ * CPU_MASK_ALL				Initializer only - all bits set
+ * CPU_MASK_NONE			Initializer only - no bits set
+ * CPU_MASK_CPU0			Initializer only - cpu 0 bit set
  * unsigned long *cpus_addr(mask)	Array of unsigned long's in mask
  *
  * int cpumask_scnprintf(buf, len, mask) Format cpumask for printing
@@ -76,8 +68,8 @@
  * void cpus_onto(dst, orig, relmap)	*dst = orig relative to relmap
  * void cpus_fold(dst, orig, sz)	dst bits = orig bits mod sz
  *
- * for_each_cpu_mask(cpu, mask)		for-loop cpu over mask using NR_CPUS
- * for_each_cpu_mask_nr(cpu, mask)	for-loop cpu over mask using nr_cpu_ids
+ * for_each_cpu(cpu, mask)		for-loop cpu over mask
+ * for_each_cpu_in(cpu, mask, andmask)	for-loop cpu over mask & andmask
  *
  * int num_online_cpus()		Number of online CPUs
  * int num_possible_cpus()		Number of all possible CPUs
@@ -87,6 +79,7 @@
  * int cpu_possible(cpu)		Is some cpu possible?
  * int cpu_present(cpu)			Is some cpu present (can schedule)?
  *
+ * int any_cpu_in(mask, andmask)	First cpu in mask & andmask
  * int any_online_cpu(mask)		First online cpu in mask
  *
  * for_each_possible_cpu(cpu)		for-loop cpu over cpu_possible_map
@@ -107,131 +100,200 @@
 #include <linux/threads.h>
 #include <linux/bitmap.h>
 
-typedef struct { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
-extern cpumask_t _unused_cpumask_arg_;
+#if NR_CPUS == 1
+
+typedef (const unsigned long) *cpumask_t;
+typedef unsigned long cpumask_map_t[1];
+typedef unsigned long cpumask_var_t[1];
+
+#define nr_cpu_ids			1
+#define cpus_first(src)			({ (void)(src); 0; })
+#define cpus_next(n, src)		({ (void)(src); 1; })
+#define cpus_next_in(n, src, andsrc)	({ (void)(src); 1; })
+#define any_online_cpu(mask)		0
+#define for_each_cpu(cpu, mask)	\
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
+#define for_each_cpu_in(cpu, mask, andmask) \
+	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)andmask)
+
+#define num_online_cpus()		1
+#define num_possible_cpus()		1
+#define num_present_cpus()		1
+#define cpu_online(cpu)			((cpu) == 0)
+#define cpu_possible(cpu)		((cpu) == 0)
+#define cpu_present(cpu)		((cpu) == 0)
+#define cpu_active(cpu)			((cpu) == 0)
+
+#else /* ... NR_CPUS > 1 */
+
+#ifdef CONFIG_CPUMASKS_OFFSTACK
+
+/* Starts at NR_CPUS until acpi code discovers actual number. */
+extern int nr_cpu_ids;
+typedef unsigned long *cpumask_var_t;
+
+#else /* CPUMASKS "ONSTACK" */
+
+/* Constant is usually more efficient than a variable for small NR_CPUS */
+#define nr_cpu_ids		NR_CPUS
+typedef unsigned long cpumask_var_t[BITS_TO_LONGS(nr_cpu_ids)];
+
+#endif
+
+/* cpumask_t defaults to pointer to constant bit map */
+typedef const unsigned long *cpumask_t;
+typedef unsigned long cpumask_map_t[BITS_TO_LONGS(nr_cpu_ids)];
+
+extern int cpus_first(cpumask_t src);
+extern int cpus_next(int n, cpumask_t src);
+extern int cpus_next_in(int n, cpumask_t src, cpumask_t andsrc);
+extern int any_cpu_in(cpumask_t mask, cpumask_t andmask);
+
+#define any_online_cpu(mask)	any_cpu_in((cpumask_t)(mask), \
+					   (cpumask_t)cpu_online_map)
+
+#define for_each_cpu(cpu, mask)				\
+	for ((cpu) = -1;				\
+		(cpu) = cpus_next((cpu), (mask)),	\
+		(cpu) < nr_cpu_ids; )
+
+#define for_each_cpu_in(cpu, mask, andmask)			\
+	for ((cpu) = -1;					\
+		(cpu) = cpus_next_in((cpu), (mask), (andmask)),	\
+		(cpu) < nr_cpu_ids; )
+
+#define num_online_cpus()	cpus_weight(cpu_online_map)
+#define num_possible_cpus()	cpus_weight(cpu_possible_map)
+#define num_present_cpus()	cpus_weight(cpu_present_map)
+#define cpu_online(cpu)		cpu_isset((cpu), cpu_online_map)
+#define cpu_possible(cpu)	cpu_isset((cpu), cpu_possible_map)
+#define cpu_present(cpu)	cpu_isset((cpu), cpu_present_map)
+#define cpu_active(cpu)		cpu_isset((cpu), cpu_active_map)
+
+#endif /* NR_CPUS > 1 */
+
+/* Deprecated: use for_each_cpu() */
+#define for_each_cpu_mask(cpu, mask)	\
+	for_each_cpu(cpu, mask)
+/*
+ * XXX - how to add this to the above macro?
+ * #warning "for_each_cpu_mask is deprecated, use for_each_cpu"
+ */
+
+/* Deprecated: use cpus_first() */
+static inline int __deprecated first_cpu(cpumask_t src)
+{
+	return cpus_first(src);
+}
+
+/* Deprecated: use cpus_next() */
+static inline int __deprecated next_cpu(int n, cpumask_t src)
+{
+	return cpus_next(n, src);
+}
+
+static inline int cpumask_size(void)
+{
+	return BITS_TO_LONGS(nr_cpu_ids) * sizeof(unsigned long);
+}
 
-#define cpu_set(cpu, dst) __cpu_set((cpu), &(dst))
-static inline void __cpu_set(int cpu, volatile cpumask_t *dstp)
+static inline void cpu_set(int cpu, volatile cpumask_var_t dst)
 {
-	set_bit(cpu, dstp->bits);
+	set_bit(cpu, dst);
 }
 
-#define cpu_clear(cpu, dst) __cpu_clear((cpu), &(dst))
-static inline void __cpu_clear(int cpu, volatile cpumask_t *dstp)
+static inline void cpu_clear(int cpu, volatile cpumask_var_t dst)
 {
-	clear_bit(cpu, dstp->bits);
+	clear_bit(cpu, dst);
 }
 
-#define cpus_setall(dst) __cpus_setall(&(dst), NR_CPUS)
-static inline void __cpus_setall(cpumask_t *dstp, int nbits)
+static inline void cpus_setall(cpumask_var_t dst)
 {
-	bitmap_fill(dstp->bits, nbits);
+	bitmap_fill(dst, nr_cpu_ids);
 }
 
-#define cpus_clear(dst) __cpus_clear(&(dst), NR_CPUS)
-static inline void __cpus_clear(cpumask_t *dstp, int nbits)
+static inline void cpus_clear(cpumask_var_t dst)
 {
-	bitmap_zero(dstp->bits, nbits);
+	bitmap_zero(dst, nr_cpu_ids);
+}
+
+static inline void cpus_copy(cpumask_var_t dst, cpumask_t src)
+{
+	bitmap_copy(dst, src, nr_cpu_ids);
 }
 
 /* No static inline type checking - see Subtlety (1) above. */
-#define cpu_isset(cpu, cpumask) test_bit((cpu), (cpumask).bits)
+#define cpu_isset(cpu, cpumask) test_bit((cpu), (cpumask))
 
-#define cpu_test_and_set(cpu, cpumask) __cpu_test_and_set((cpu), &(cpumask))
-static inline int __cpu_test_and_set(int cpu, cpumask_t *addr)
+static inline int cpu_test_and_set(int cpu, cpumask_var_t addr)
 {
-	return test_and_set_bit(cpu, addr->bits);
+	return test_and_set_bit(cpu, addr);
 }
 
-#define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_and(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+static inline void cpus_and(cpumask_var_t dst, cpumask_t src1, cpumask_t src2)
 {
-	bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits);
+	bitmap_and(dst, src1, src2, nr_cpu_ids);
 }
 
-#define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_or(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+static inline void cpus_or(cpumask_var_t dst, cpumask_t src1, cpumask_t src2)
 {
-	bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits);
+	bitmap_or(dst, src1, src2, nr_cpu_ids);
 }
 
-#define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+static inline void cpus_xor(cpumask_var_t dst, cpumask_t src1, cpumask_t src2)
 {
-	bitmap_xor(dstp->bits, src1p->bits, src2p->bits, nbits);
+	bitmap_xor(dst, src1, src2, nr_cpu_ids);
 }
 
-#define cpus_andnot(dst, src1, src2) \
-				__cpus_andnot(&(dst), &(src1), &(src2), NR_CPUS)
-static inline void __cpus_andnot(cpumask_t *dstp, const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+static inline void cpus_andnot(cpumask_var_t dst, cpumask_t src1,
+						cpumask_t src2)
 {
-	bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits);
+	bitmap_andnot(dst, src1, src2, nr_cpu_ids);
 }
 
-#define cpus_complement(dst, src) __cpus_complement(&(dst), &(src), NR_CPUS)
-static inline void __cpus_complement(cpumask_t *dstp,
-					const cpumask_t *srcp, int nbits)
+static inline void cpus_complement(cpumask_var_t dst, cpumask_t src)
 {
-	bitmap_complement(dstp->bits, srcp->bits, nbits);
+	bitmap_complement(dst, src, nr_cpu_ids);
 }
 
-#define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_equal(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+static inline int cpus_equal(cpumask_t src1, cpumask_t src2)
 {
-	return bitmap_equal(src1p->bits, src2p->bits, nbits);
+	return bitmap_equal(src1, src2, nr_cpu_ids);
 }
 
-#define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_intersects(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+static inline int cpus_intersects(cpumask_t src1, cpumask_t src2)
 {
-	return bitmap_intersects(src1p->bits, src2p->bits, nbits);
+	return bitmap_intersects(src1, src2, nr_cpu_ids);
 }
 
-#define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), NR_CPUS)
-static inline int __cpus_subset(const cpumask_t *src1p,
-					const cpumask_t *src2p, int nbits)
+static inline int cpus_subset(cpumask_t src1, cpumask_t src2)
 {
-	return bitmap_subset(src1p->bits, src2p->bits, nbits);
+	return bitmap_subset(src1, src2, nr_cpu_ids);
 }
 
-#define cpus_empty(src) __cpus_empty(&(src), NR_CPUS)
-static inline int __cpus_empty(const cpumask_t *srcp, int nbits)
+static inline int cpus_empty(cpumask_t src)
 {
-	return bitmap_empty(srcp->bits, nbits);
+	return bitmap_empty(src, nr_cpu_ids);
 }
 
-#define cpus_full(cpumask) __cpus_full(&(cpumask), NR_CPUS)
-static inline int __cpus_full(const cpumask_t *srcp, int nbits)
+static inline int cpus_full(cpumask_t src)
 {
-	return bitmap_full(srcp->bits, nbits);
+	return bitmap_full(src, nr_cpu_ids);
 }
 
-#define cpus_weight(cpumask) __cpus_weight(&(cpumask), NR_CPUS)
-static inline int __cpus_weight(const cpumask_t *srcp, int nbits)
+static inline int cpus_weight(cpumask_t src)
 {
-	return bitmap_weight(srcp->bits, nbits);
+	return bitmap_weight(src, nr_cpu_ids);
 }
 
-#define cpus_shift_right(dst, src, n) \
-			__cpus_shift_right(&(dst), &(src), (n), NR_CPUS)
-static inline void __cpus_shift_right(cpumask_t *dstp,
-					const cpumask_t *srcp, int n, int nbits)
+static inline void cpus_shift_right(cpumask_var_t dst, cpumask_t src, int n)
 {
-	bitmap_shift_right(dstp->bits, srcp->bits, n, nbits);
+	bitmap_shift_right(dst, src, n, nr_cpu_ids);
 }
 
-#define cpus_shift_left(dst, src, n) \
-			__cpus_shift_left(&(dst), &(src), (n), NR_CPUS)
-static inline void __cpus_shift_left(cpumask_t *dstp,
-					const cpumask_t *srcp, int n, int nbits)
+static inline void cpus_shift_left(cpumask_var_t dst, cpumask_t src, int n)
 {
-	bitmap_shift_left(dstp->bits, srcp->bits, n, nbits);
+	bitmap_shift_left(dst, src, n, nr_cpu_ids);
 }
 
 /*
@@ -244,11 +306,11 @@ static inline void __cpus_shift_left(cpu
 extern const unsigned long
 	cpu_bit_bitmap[BITS_PER_LONG+1][BITS_TO_LONGS(NR_CPUS)];
 
-static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
+static inline cpumask_t get_cpu_mask(unsigned int cpu)
 {
 	const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
 	p -= cpu / BITS_PER_LONG;
-	return (const cpumask_t *)p;
+	return (cpumask_t)p;
 }
 
 /*
@@ -256,150 +318,79 @@ static inline const cpumask_t *get_cpu_m
  * gcc optimizes it out (it's a constant) and there's no huge stack
  * variable created:
  */
-#define cpumask_of_cpu(cpu) (*get_cpu_mask(cpu))
-
+#define cpumask_of_cpu(cpu) ((cpumask_t)get_cpu_mask(cpu))
 
 #define CPU_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(NR_CPUS)
+#define	CPU_MASK_INIT(value) {					\
+	[0 ... BITS_TO_LONGS(NR_CPUS)-1] =  value		\
+}
 
 #if NR_CPUS <= BITS_PER_LONG
 
-#define CPU_MASK_ALL							\
-(cpumask_t) { {								\
-	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
-} }
-
-#define CPU_MASK_ALL_PTR	(&CPU_MASK_ALL)
+/* Initializer only, use cpu_mask_all in function arguments */
+#define CPU_MASK_ALL CPU_MASK_INIT(CPU_MASK_LAST_WORD)
 
 #else
 
-#define CPU_MASK_ALL							\
-(cpumask_t) { {								\
-	[0 ... BITS_TO_LONGS(NR_CPUS)-2] = ~0UL,			\
-	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD			\
-} }
-
-/* cpu_mask_all is in init/main.c */
-extern cpumask_t cpu_mask_all;
-#define CPU_MASK_ALL_PTR	(&cpu_mask_all)
+/* Initializer only, use cpu_mask_all in function arguments */
+#define CPU_MASK_ALL {						\
+	[0 ... BITS_TO_LONGS(NR_CPUS)-2] = ~0UL,		\
+	[BITS_TO_LONGS(NR_CPUS)-1] = CPU_MASK_LAST_WORD		\
+}
 
 #endif
 
-#define CPU_MASK_NONE							\
-(cpumask_t) { {								\
-	[0 ... BITS_TO_LONGS(NR_CPUS)-1] =  0UL				\
-} }
-
-#define CPU_MASK_CPU0							\
-(cpumask_t) { {								\
-	[0] =  1UL							\
-} }
+/* Initializers only */
+#define CPU_MASK_NONE CPU_MASK_INIT(0UL)
+#define CPU_MASK_CPU0 CPU_MASK_INIT(1UL)
 
-#define cpus_addr(src) ((src).bits)
-
-#define cpumask_scnprintf(buf, len, src) \
-			__cpumask_scnprintf((buf), (len), &(src), NR_CPUS)
-static inline int __cpumask_scnprintf(char *buf, int len,
-					const cpumask_t *srcp, int nbits)
+static inline unsigned long *cpus_addr(cpumask_t src)
 {
-	return bitmap_scnprintf(buf, len, srcp->bits, nbits);
+	return (unsigned long *)src;
 }
 
-#define cpumask_parse_user(ubuf, ulen, dst) \
-			__cpumask_parse_user((ubuf), (ulen), &(dst), NR_CPUS)
-static inline int __cpumask_parse_user(const char __user *buf, int len,
-					cpumask_t *dstp, int nbits)
+static inline int cpumask_scnprintf(char *buf, int len, cpumask_t src)
 {
-	return bitmap_parse_user(buf, len, dstp->bits, nbits);
+	return bitmap_scnprintf(buf, len, src, nr_cpu_ids);
 }
 
-#define cpulist_scnprintf(buf, len, src) \
-			__cpulist_scnprintf((buf), (len), &(src), NR_CPUS)
-static inline int __cpulist_scnprintf(char *buf, int len,
-					const cpumask_t *srcp, int nbits)
+static inline int cpumask_parse_user(const char __user *buf, int len,
+							cpumask_var_t dst)
 {
-	return bitmap_scnlistprintf(buf, len, srcp->bits, nbits);
+	return bitmap_parse_user(buf, len, dst, nr_cpu_ids);
 }
 
-#define cpulist_parse(buf, dst) __cpulist_parse((buf), &(dst), NR_CPUS)
-static inline int __cpulist_parse(const char *buf, cpumask_t *dstp, int nbits)
+static inline int cpulist_scnprintf(char *buf, int len, cpumask_t src)
 {
-	return bitmap_parselist(buf, dstp->bits, nbits);
+	return bitmap_scnlistprintf(buf, len, src, nr_cpu_ids);
 }
 
-#define cpu_remap(oldbit, old, new) \
-		__cpu_remap((oldbit), &(old), &(new), NR_CPUS)
-static inline int __cpu_remap(int oldbit,
-		const cpumask_t *oldp, const cpumask_t *newp, int nbits)
+static inline int cpulist_parse(const char *buf, cpumask_var_t dst)
 {
-	return bitmap_bitremap(oldbit, oldp->bits, newp->bits, nbits);
+	return bitmap_parselist(buf, dst, nr_cpu_ids);
 }
 
-#define cpus_remap(dst, src, old, new) \
-		__cpus_remap(&(dst), &(src), &(old), &(new), NR_CPUS)
-static inline void __cpus_remap(cpumask_t *dstp, const cpumask_t *srcp,
-		const cpumask_t *oldp, const cpumask_t *newp, int nbits)
+static inline int cpu_remap(int oldbit, cpumask_t old, cpumask_t new)
 {
-	bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
+	return bitmap_bitremap(oldbit, old, new, nr_cpu_ids);
 }
 
-#define cpus_onto(dst, orig, relmap) \
-		__cpus_onto(&(dst), &(orig), &(relmap), NR_CPUS)
-static inline void __cpus_onto(cpumask_t *dstp, const cpumask_t *origp,
-		const cpumask_t *relmapp, int nbits)
+static inline void cpus_remap(cpumask_var_t dst, cpumask_t src, cpumask_t old,
+								cpumask_t new)
 {
-	bitmap_onto(dstp->bits, origp->bits, relmapp->bits, nbits);
+	bitmap_remap(dst, src, old, new, nr_cpu_ids);
 }
 
-#define cpus_fold(dst, orig, sz) \
-		__cpus_fold(&(dst), &(orig), sz, NR_CPUS)
-static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
-		int sz, int nbits)
+static inline void cpus_onto(cpumask_var_t dst, cpumask_t orig,
+						cpumask_t relmap)
 {
-	bitmap_fold(dstp->bits, origp->bits, sz, nbits);
+	bitmap_onto(dst, orig, relmap, nr_cpu_ids);
 }
 
-#if NR_CPUS == 1
-
-#define nr_cpu_ids		1
-#define first_cpu(src)		({ (void)(src); 0; })
-#define next_cpu(n, src)	({ (void)(src); 1; })
-#define any_online_cpu(mask)	0
-#define for_each_cpu_mask(cpu, mask)	\
-	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
-
-#else /* NR_CPUS > 1 */
-
-extern int nr_cpu_ids;
-int __first_cpu(const cpumask_t *srcp);
-int __next_cpu(int n, const cpumask_t *srcp);
-int __any_online_cpu(const cpumask_t *mask);
-
-#define first_cpu(src)		__first_cpu(&(src))
-#define next_cpu(n, src)	__next_cpu((n), &(src))
-#define any_online_cpu(mask) __any_online_cpu(&(mask))
-#define for_each_cpu_mask(cpu, mask)			\
-	for ((cpu) = -1;				\
-		(cpu) = next_cpu((cpu), (mask)),	\
-		(cpu) < NR_CPUS; )
-#endif
-
-#if NR_CPUS <= 64
-
-#define next_cpu_nr(n, src)		next_cpu(n, src)
-#define cpus_weight_nr(cpumask)		cpus_weight(cpumask)
-#define for_each_cpu_mask_nr(cpu, mask)	for_each_cpu_mask(cpu, mask)
-
-#else /* NR_CPUS > 64 */
-
-int __next_cpu_nr(int n, const cpumask_t *srcp);
-#define next_cpu_nr(n, src)	__next_cpu_nr((n), &(src))
-#define cpus_weight_nr(cpumask)	__cpus_weight(&(cpumask), nr_cpu_ids)
-#define for_each_cpu_mask_nr(cpu, mask)			\
-	for ((cpu) = -1;				\
-		(cpu) = next_cpu_nr((cpu), (mask)),	\
-		(cpu) < nr_cpu_ids; )
-
-#endif /* NR_CPUS > 64 */
+static inline void cpus_fold(cpumask_var_t dst, cpumask_t orig, int sz)
+{
+	bitmap_fold(dst, orig, sz, nr_cpu_ids);
+}
 
 /*
  * The following particular system cpumasks and operations manage
@@ -458,33 +449,16 @@ int __next_cpu_nr(int n, const cpumask_t
  *        main(){ set1(3); set2(5); }
  */
 
-extern cpumask_t cpu_possible_map;
-extern cpumask_t cpu_online_map;
-extern cpumask_t cpu_present_map;
-extern cpumask_t cpu_active_map;
-
-#if NR_CPUS > 1
-#define num_online_cpus()	cpus_weight_nr(cpu_online_map)
-#define num_possible_cpus()	cpus_weight_nr(cpu_possible_map)
-#define num_present_cpus()	cpus_weight_nr(cpu_present_map)
-#define cpu_online(cpu)		cpu_isset((cpu), cpu_online_map)
-#define cpu_possible(cpu)	cpu_isset((cpu), cpu_possible_map)
-#define cpu_present(cpu)	cpu_isset((cpu), cpu_present_map)
-#define cpu_active(cpu)		cpu_isset((cpu), cpu_active_map)
-#else
-#define num_online_cpus()	1
-#define num_possible_cpus()	1
-#define num_present_cpus()	1
-#define cpu_online(cpu)		((cpu) == 0)
-#define cpu_possible(cpu)	((cpu) == 0)
-#define cpu_present(cpu)	((cpu) == 0)
-#define cpu_active(cpu)		((cpu) == 0)
-#endif
+extern cpumask_map_t cpu_possible_map;
+extern cpumask_map_t cpu_online_map;
+extern cpumask_map_t cpu_present_map;
+extern cpumask_map_t cpu_active_map;
+extern cpumask_map_t cpu_mask_all;
 
 #define cpu_is_offline(cpu)	unlikely(!cpu_online(cpu))
 
-#define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), cpu_possible_map)
-#define for_each_online_cpu(cpu)   for_each_cpu_mask_nr((cpu), cpu_online_map)
-#define for_each_present_cpu(cpu)  for_each_cpu_mask_nr((cpu), cpu_present_map)
+#define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_map)
+#define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_map)
+#define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_map)
 
 #endif /* __LINUX_CPUMASK_H */
--- longs-cpumasks.orig/lib/cpumask.c
+++ longs-cpumasks/lib/cpumask.c
@@ -3,34 +3,40 @@
 #include <linux/cpumask.h>
 #include <linux/module.h>
 
-int __first_cpu(const cpumask_t *srcp)
+int cpus_first(cpumask_t srcp)
 {
-	return find_first_bit(srcp->bits, NR_CPUS);
+	return find_first_bit(srcp, nr_cpu_ids);
 }
-EXPORT_SYMBOL(__first_cpu);
+EXPORT_SYMBOL(cpus_first);
 
-int __next_cpu(int n, const cpumask_t *srcp)
+int cpus_next(int n, cpumask_t srcp)
 {
-	return find_next_bit(srcp->bits, NR_CPUS, n+1);
+	return find_next_bit(srcp, nr_cpu_ids, n+1);
 }
-EXPORT_SYMBOL(__next_cpu);
+EXPORT_SYMBOL(cpus_next);
 
-#if NR_CPUS > 64
-int __next_cpu_nr(int n, const cpumask_t *srcp)
+int cpus_next_in(int n, cpumask_t srcp, cpumask_t andp)
 {
-	return find_next_bit(srcp->bits, nr_cpu_ids, n+1);
+	int cpu;
+
+	for (cpu = n + 1; cpu < nr_cpu_ids; cpu++) {
+		cpu = find_next_bit(srcp, nr_cpu_ids, cpu);
+
+		if (cpu < nr_cpu_ids && cpu_isset(cpu, andp))
+			return cpu;
+	}
+	return nr_cpu_ids;
 }
-EXPORT_SYMBOL(__next_cpu_nr);
-#endif
+EXPORT_SYMBOL(cpus_next_in);
 
-int __any_online_cpu(const cpumask_t *mask)
+int any_cpu_in(cpumask_t mask, cpumask_t andmask)
 {
 	int cpu;
 
-	for_each_cpu_mask(cpu, *mask) {
-		if (cpu_online(cpu))
+	for_each_cpu(cpu, mask) {
+		if (cpu_isset(cpu, andmask))
 			break;
 	}
 	return cpu;
 }
-EXPORT_SYMBOL(__any_online_cpu);
+EXPORT_SYMBOL(any_cpu_in);

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
       [not found]                                                                                               ` <alpine.LFD.2.00.0809300939450.3389-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-09-30 18:02                                                                                                 ` Mike Travis
@ 2008-10-01  0:44                                                                                                 ` Rusty Russell
  1 sibling, 0 replies; 318+ messages in thread
From: Rusty Russell @ 2008-10-01  0:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Travis, Ingo Molnar, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

On Wednesday 01 October 2008 02:46:59 Linus Torvalds wrote:
> Quite frankly, I personally do hate typedefs that end up being pointers,
> and used as pointers, without showing that in the source code.
...
> I'm now a bit more leery about this whole thing just
> because the typedef ends up hiding so much - it doesn't just hide the
> basic type, it hides a very basic *code* issue.

Yes, this is why my version of the rework moved away from typedefs, except for 
the special case of "cpumask_var_t" for stack vars where this trick is really 
desired.

Everywhere else, the code becomes nice and clear: struct cpumask *.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [RFC 1/1] cpumask: New cpumask API - take 2 - use unsigned longs
       [not found]                                                                                                       ` <48E2A691.7060407-sJ/iWh9BUns@public.gmane.org>
@ 2008-10-01  0:45                                                                                                         ` Rusty Russell
  0 siblings, 0 replies; 318+ messages in thread
From: Rusty Russell @ 2008-10-01  0:45 UTC (permalink / raw)
  To: Mike Travis
  Cc: Linus Torvalds, Ingo Molnar, Yinghai Lu, David Miller,
	Alan.Brunelle-VXdhtT5mjnY, tglx-hfZtesqFncYOwBW4kG4KsQ,
	rjw-KKrjLPT3xs0, Linux Kernel Mailing List,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	arjan-VuQAYsv1563Yd54FQh9/CA, Jack Steiner

On Wednesday 01 October 2008 08:22:09 Mike Travis wrote:
> Here's an alternate proposal for the new cpumask API.  I have not yet begun
...
> +/* cpumask_t defaults to pointer to constant bit map */
> +typedef const unsigned long *cpumask_t;

Hiding a const here is not a good idea either, I think :(

Rusty.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-10-04 17:28 2.6.27-rc8-git7: Reported regressions from 2.6.26 Rafael J. Wysocki
@ 2008-10-04 17:32 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-10-04 17:32 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of recent regressions.

The following bug entry is on the current list of known regressions
from 2.6.26.  Please verify if it still should be listed and let me know
(either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (55 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-10-25 21:04 2.6.28-rc1-git1: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
@ 2008-10-25 21:07 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-10-25 21:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.26 and 2.6.27.

The following bug entry is on the current list of known regressions
introduced between 2.6.26 and 2.6.27.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (76 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-11-02 16:47 2.6.28-rc2-git7: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
@ 2008-11-02 16:49 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-11-02 16:49 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.26 and 2.6.27.

The following bug entry is on the current list of known regressions
introduced between 2.6.26 and 2.6.27.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (84 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-11-09 19:40 2.6.28-rc3-git6: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
@ 2008-11-09 19:43 ` Rafael J. Wysocki
  0 siblings, 0 replies; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-11-09 19:43 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.26 and 2.6.27.

The following bug entry is on the current list of known regressions
introduced between 2.6.26 and 2.6.27.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (91 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* [Bug #11308] tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
  2008-11-16 17:38 2.6.28-rc5: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
@ 2008-11-16 17:40 ` Rafael J. Wysocki
  2008-11-17  9:06   ` Ingo Molnar
  0 siblings, 1 reply; 318+ messages in thread
From: Rafael J. Wysocki @ 2008-11-16 17:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christoph Lameter

This message has been generated automatically as a part of a report
of regressions introduced between 2.6.26 and 2.6.27.

The following bug entry is on the current list of known regressions
introduced between 2.6.26 and 2.6.27.  Please verify if it still should
be listed and let me know (either way).


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date		: 2008-08-11 18:36 (98 days old)
References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
  2008-11-16 17:40 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
@ 2008-11-17  9:06   ` Ingo Molnar
       [not found]     ` <20081117090648.GG28786-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17  9:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Kernel Testers List, Christoph Lameter,
	Mike Galbraith, Peter Zijlstra


* Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:

> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.26 and 2.6.27.
> 
> The following bug entry is on the current list of known regressions
> introduced between 2.6.26 and 2.6.27.  Please verify if it still should
> be listed and let me know (either way).
> 
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
> Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
> Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Date		: 2008-08-11 18:36 (98 days old)
> References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
> 		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4

Christoph, as per the recent analysis of Mike:

 http://fixunix.com/kernel/556867-regression-benchmark-throughput-loss-a622cf6-f7160c7-pull.html

all scheduler components of this regression have been eliminated.

In fact his numbers show that scheduler speedups since 2.6.22 have 
offset and hidden most other sources of tbench regression. (i.e. the 
scheduler portion got 5% faster, hence it was able to offset a 
slowdown of 5% in other areas of the kernel that tbench triggers)

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]     ` <20081117090648.GG28786-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17  9:14       ` David Miller
       [not found]         ` <20081117.011403.06989342.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-11-19 19:43       ` Christoph Lameter
  1 sibling, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-17  9:14 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date: Mon, 17 Nov 2008 10:06:48 +0100

> 
> * Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.26 and 2.6.27.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.26 and 2.6.27.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
> > Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
> > Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > Date		: 2008-08-11 18:36 (98 days old)
> > References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
> > 		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4
> 
> Christoph, as per the recent analysis of Mike:
> 
>  http://fixunix.com/kernel/556867-regression-benchmark-throughput-loss-a622cf6-f7160c7-pull.html
> 
> all scheduler components of this regression have been eliminated.
> 
> In fact his numbers show that scheduler speedups since 2.6.22 have 
> offset and hidden most other sources of tbench regression. (i.e. the 
> scheduler portion got 5% faster, hence it was able to offset a 
> slowdown of 5% in other areas of the kernel that tbench triggers)

Although I respect the improvements, wake_up() is still several orders
of magnitude slower than it was in 2.6.22 and wake_up() is at the top
of the profiles in tbench runs.

It really is premature to close this regression at this time.

I am working with every spare moment I have to try and nail this
stuff, but unless someone else helps me people need to be patient.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]         ` <20081117.011403.06989342.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-17 11:01           ` Ingo Molnar
  2008-11-17 11:20             ` Eric Dumazet
       [not found]             ` <20081117110119.GL28786-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 2 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 11:01 UTC (permalink / raw)
  To: David Miller
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 13847 bytes --]


* David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:

> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Date: Mon, 17 Nov 2008 10:06:48 +0100
> 
> > 
> > * Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> > 
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.26 and 2.6.27.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.26 and 2.6.27.  Please verify if it still should
> > > be listed and let me know (either way).
> > > 
> > > 
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
> > > Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
> > > Submitter	: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > > Date		: 2008-08-11 18:36 (98 days old)
> > > References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
> > > 		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4
> > 
> > Christoph, as per the recent analysis of Mike:
> > 
> >  http://fixunix.com/kernel/556867-regression-benchmark-throughput-loss-a622cf6-f7160c7-pull.html
> > 
> > all scheduler components of this regression have been eliminated.
> > 
> > In fact his numbers show that scheduler speedups since 2.6.22 have 
> > offset and hidden most other sources of tbench regression. (i.e. the 
> > scheduler portion got 5% faster, hence it was able to offset a 
> > slowdown of 5% in other areas of the kernel that tbench triggers)
> 
> Although I respect the improvements, wake_up() is still several 
> orders of magnitude slower than it was in 2.6.22 and wake_up() is at 
> the top of the profiles in tbench runs.

hm, several orders of magnitude slower? That contradicts Mike's 
numbers and my own numbers and profiles as well: see below.

The scheduler's overhead barely even registers on a 16-way x86 system 
i'm running tbench on. Here's the NMI profile during 64 threads tbench 
on a 16-way x86 box with an v2.6.28-rc5 kernel [config attached]:

  Throughput 3437.65 MB/sec 64 procs
  ==================================
  21570252  total 
  ........
   1494803  copy_user_generic_string 
    998232  sock_rfree 
    491471  tcp_ack 
    482405  ip_dont_fragment 
    470685  ip_local_deliver 
    436325  constant_test_bit         [ called by napi_disable_pending() ]
    375469  avc_has_perm_noaudit 
    347663  tcp_sendmsg 
    310383  tcp_recvmsg 
    300412  __inet_lookup_established 
    294377  system_call 
    286603  tcp_transmit_skb 
    251782  selinux_ip_postroute 
    236028  tcp_current_mss 
    235631  schedule 
    234013  netif_rx 
    229854  _local_bh_enable_ip 
    219501  tcp_v4_rcv 

    [ etc. - see full profile attached further below ]

Note that the scheduler does not even show up in the profile up to 
entry #15!

I've also summarized NMI profiler output by major subsystems:

           NET       overhead (12603450/21570252): 58.43%
           security  overhead ( 1903598/21570252):  8.83%
           usercopy  overhead ( 1753617/21570252):  8.13%
           sched     overhead ( 1599406/21570252):  7.41%
           syscall   overhead (  560487/21570252):  2.60%
           IRQ       overhead (  555439/21570252):  2.58%
           slab      overhead (  492421/21570252):  2.28%
           timer     overhead (  226573/21570252):  1.05%
           pagealloc overhead (  192681/21570252):  0.89%
           PID       overhead (  115123/21570252):  0.53%
           VFS       overhead (  107926/21570252):  0.50%
           pagecache overhead (   62552/21570252):  0.29%
           gtod      overhead (   38651/21570252):  0.18%
           IDLE      overhead (       0/21570252):  0.00%
---------------------------------------------------------
                         left ( 1349494/21570252):  6.26%

The scheduler's functions are absolutely flat, and consistent with an 
extreme context-switching rate of 1.35 million per second. The 
scheduler can go up to about 20 million context switches per second on 
this system:

 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 32  0      0 32229696  29308 649880    0    0     0     0 164135 20026853 24 76  0  0  0
 32  0      0 32229752  29308 649880    0    0     0     0 164203 20032770 24 76  0  0  0
 32  0      0 32229752  29308 649880    0    0     0     0 164201 20036492 25 75  0  0  0

... and 7% scheduling overhead is roughly consistent with 1.35/20.0.

Wake up affinities and data flow caching is just fine in this workload 
- we've got scheduler statistics for that and they look good too.

It all looks like pure old-fashioned straight overhead in the 
networking layer to me. Do we still touch the same global cacheline 
for every localhost packet we process? Anything like that would show 
up big time.

Anyway, in terms of scheduling there's absolutely nothing anomalous i 
can see about this workload. Scheduling looks healthy throughout - and 
the few things we noticed causing unnecessary overhead are now fixed 
in -rc5. (but it's all in the <5% range of impact of total scheduling 
overhead - i.e. in the 0.4% absolute range in this workload)

And the thing is, the scheduler's task in this workload is by far the 
most difficult one conceptually: it has to manage and optimize 
concurrency of _future_ processing, with an event frequency that is 
_WAY_ out of the normal patterns: more than 1.3 million context 
switches per second (!). It also switches to/from completely 
independent contexts of computing, with the all the implications that 
this brings.

Networking and VFS "just" has to shuffle around bits in memory along a 
very specific plan given to it by user-space. That plan is 
well-specified and goes along the lines of: "copy this (already 
cached) file content to that socket" and back.

By the raw throughput figures the system is pushing a couple of 
million data packets per second.

Still we spend 7 times more CPU time in the networking code than in 
the scheduler or in the user-copy code. Why?

	Ingo

------------------------->
  21570252 total 
  ........
  1494803 copy_user_generic_string 
  998232 sock_rfree 
  491471 tcp_ack 
  482405 ip_dont_fragment 
  470685 ip_local_deliver 
  436325 constant_test_bit 
  375469 avc_has_perm_noaudit 
  347663 tcp_sendmsg 
  310383 tcp_recvmsg 
  300412 __inet_lookup_established 
  294377 system_call 
  286603 tcp_transmit_skb 
  251782 selinux_ip_postroute 
  236028 tcp_current_mss 
  235631 schedule 
  234013 netif_rx 
  229854 _local_bh_enable_ip 
  219501 tcp_v4_rcv 
  210046 netlbl_enabled 
  205022 constant_test_bit 
  199598 skb_release_head_state 
  187952 ip_queue_xmit 
  178779 tcp_established_options 
  175955 dev_queue_xmit 
  169904 netif_receive_skb 
  166629 ip_finish_output2 
  162291 sysret_check 
  151262 __switch_to 
  143355 audit_syscall_entry 
  142694 load_cr3 
  136571 memset_c 
  136115 nf_hook_slow 
  130825 ip_local_deliver_finish 
  128795 ip_rcv 
  125995 selinux_socket_sock_rcv_skb 
  123944 net_rx_action 
  123100 __copy_skb_header 
  122052 __inet_lookup 
  121744 constant_test_bit 
  119444 get_page_from_freelist 
  116486 avc_has_perm 
  115643 audit_syscall_exit 
  115123 find_pid_ns 
  114483 tcp_cleanup_rbuf 
  111350 tcp_rcv_established 
  109853 __mod_timer 
  107891 lock_sock_nested 
  107316 napi_disable_pending 
  106581 release_sock 
  104402 skb_copy_datagram_iovec 
  101591 __tcp_push_pending_frames 
  101206 tcp_event_data_recv 
   98046 kmem_cache_alloc_node
   97982 tcp_v4_do_rcv
   92714 sys_recvfrom
   91551 rb_erase
   89730 kfree
   87979 ip_rcv_finish
   87166 compare_ether_addr
   86982 selinux_parse_skb
   86731 nf_iterate
   79690 selinux_ipv4_output
   79347 __cache_free
   78992 audit_free_names
   78127 skb_release_data
   77501 mod_timer
   77241 __sock_recvmsg
   77228 sock_recvmsg
   77211 ____cache_alloc
   76495 tcp_rcv_space_adjust
   75283 sk_wait_data
   71772 sys_sendto
   71594 sched_clock
   70880 eth_type_trans
   70238 memcpy_toiovec
   69193 do_softirq
   68341 __update_sched_clock
   67597 tcp_v4_md5_lookup
   67424 try_to_wake_up
   64465 sock_common_recvmsg
   64116 put_prev_task_fair
   63964 process_backlog
   62216 __do_softirq
   62093 tcp_cwnd_validate
   61128 __alloc_skb
   60588 put_page
   59536 dput
   58411 __ip_local_out
   56349 avc_audit
   55626 __napi_schedule
   55525 selinux_ipv4_postroute
   54499 __enqueue_entity
   53599 local_bh_disable
   53418 unroll_tree_refs
   53162 __unlazy_fpu
   53084 cfs_rq_of
   52475 set_next_entity
   51108 thread_return
   50458 ip_output
   50268 sched_clock_cpu
   49974 tcp_send_delayed_ack
   49736 ip_finish_output
   49670 finish_task_switch
   49070 ___swab16
   48499 audit_get_context
   48347 raw_local_deliver
   47824 tcp_rtt_estimator
   46707 tcp_push
   46405 constant_test_bit
   45859 select_task_rq_fair
   45188 math_state_restore
   44889 check_preempt_wakeup
   44449 task_rq_lock
   43704 sel_netif_sid
   43377 sock_sendmsg
   42612 sk_reset_timer
   42606 __skb_clone
   42223 __find_general_cachep
   41950 selinux_socket_sendmsg
   41716 constant_test_bit
   41097 skb_push
   40723 lock_sock
   40715 system_call_after_swapgs
   40399 selinux_netlbl_inode_permission
   40179 rb_insert_color
   40021 __kfree_skb
   40015 sockfd_lookup_light
   39216 internal_add_timer
   39024 skb_can_coalesce
   38838 __tcp_select_window
   38651 current_kernel_time
   38533 tcp_v4_md5_do_lookup
   38372 __sock_sendmsg
   38162 selinux_socket_recvmsg
   37812 sel_netport_sid
   37727 account_group_exec_runtime
   37695 switch_mm
   36247 nf_hook_thresh
   36057 auditsys
   35266 pick_next_task_fair
   35064 __tcp_ack_snd_check
   35052 sock_def_readable
   34826 sysret_careful
   34578 _local_bh_enable
   34498 free_hot_cold_page
   34338 kmap
   34028 loopback_xmit
   33320 sk_stream_alloc_skb
   33269 test_ti_thread_flag
   33219 skb_fill_page_desc
   33049 tcp_is_cwnd_limited
   33012 update_min_vruntime
   32431 native_read_tsc
   32398 dst_release
   31661 get_pageblock_flags_group
   31652 path_put
   31516 tcp_push_pending_frames
   31265 netif_needs_gso
   31175 constant_test_bit
   31077 __cycles_2_ns
   30971 socket_has_perm
   30893 __phys_addr
   30867 lock_timer_base
   30585 __wake_up
   30456 ret_from_sys_call
   30147 skb_release_all
   29356 local_bh_enable
   29334 __skb_insert
   28681 tcp_cwnd_test
   28652 __skb_dequeue
   28612 prepare_to_wait
   28268 kmem_cache_free
   28193 set_bit
   28149 dequeue_task_fair
   27906 skb_header_pointer
   27861 sys_kill
   27803 selinux_task_kill
   27627 audit_free_aux
   27600 selinux_netlbl_sock_rcv_skb
   26794 update_curr
   26777 __alloc_pages_internal
   26469 skb_entail
   26458 pskb_may_pull
   26216 inet_ehashfn
   26075 call_softirq
   26033 copy_from_user
   25933 __local_bh_disable
   25666 fget_light
   25270 inet_csk_reset_xmit_timer
   25071 signal_pending_state
   24117 tcp_init_tso_segs
   24109 TCP_ECN_check_ce
   23702 nf_hook_thresh
   23558 copy_to_user
   23426 sysret_audit
   23267 sk_wake_async
   22627 tcp_options_write
   22174 netif_tx_queue_stopped
   21795 tcp_prequeue_process
   21757 tcp_set_skb_tso_segs
   21579 avc_hash
   21565 ___swab16
   21560 ip_local_out
   21445 sk_wmem_schedule
   21234 get_page
   21200 __wake_up_common
   21042 sel_netnode_find
   20772 sock_put
   20625 schedule_timeout
   20613 __napi_complete
   20563 fput_light
   20532 tcp_bound_to_half_wnd
   19912 cap_task_kill
   19773 sysret_signal
   19374 compound_head
   19121 get_seconds
   19048 PageLRU
   18893 zone_watermark_ok
   18635 tcp_snd_wnd_test
   18634 enqueue_task_fair
   18603 rb_next
   18598 next_zones_zonelist
   18534 resched_task
   17820 hash_64
   17801 autoremove_wake_function
   17451 __skb_queue_before
   17283 native_load_tls
   17227 __skb_dequeue
   17149 xfrm4_policy_check
   16942 zone_statistics
   16886 skb_reset_network_header
   16824 ___swab16
   16725 pskb_may_pull
   16645 dev_hard_start_xmit
   16580 sk_filter
   16523 tcp_ca_event
   16479 tcp_win_from_space
   16408 tcp_parse_aligned_timestamp
   16204 finish_wait
   16124 virt_to_slab
   15965 tcp_v4_send_check
   15920 skb_reset_transport_header
   15867 tcp_data_snd_check
   15819 security_sock_rcv_skb
   15665 tcp_ack_saw_tstamp
   15621 skb_network_offset
   15568 virt_to_head_page
   15553 dst_confirm
   15320 skb_pull
   15277 clear_bit
   15179 alloc_pages_current
   14991 bictcp_acked
   14743 tcp_store_ts_recent
   14660 sel_netnode_sid
   14650 __xchg
   14573 task_has_perm
   14561 tcp_v4_check
   14492 net_invalid_timestamp
   14485 security_socket_recvmsg
   14363 __dequeue_entity
   14318 pid_nr_ns
   14311 device_not_available
   14212 local_bh_enable_ip
   14092 virt_to_cache
   13804 netpoll_rx
   13781 fcheck_files
   13724 tcp_adjust_fackets_out
   13717 net_timestamp
   13638 ___swab16
   13576 sel_netport_find
   13563 __kmalloc_node
   13530 __inc_zone_state
   13215 pid_vnr
   13208 free_pages_check
   13008 security_socket_sendmsg
   12971 ip_skb_dst_mtu
   12827 __cpu_set
   12782 bictcp_cong_avoid
   12779 test_tsk_thread_flag
   12734 wakeup_preempt_entity
   12651 sel_netif_find
   12545 skb_set_owner_r
   12534 skb_headroom
   12348 tcp_event_new_data_sent
   12251 place_entity
   12047 set_bit
   11805 update_rq_clock
   11788 detach_timer
   11659 policy_zonelist
   11423 skb_clone
   11380 __skb_queue_tail
   11249 dequeue_task
   10823 init_rootdomain
   10690 __cpu_clear
   10558 default_wake_function
   10556 tcp_rcv_rtt_measure_ts
   10451 PageSlab
   10427 sock_wfree
   10277 calc_delta_fair
   10237 tcp_validate_incoming
   10218 task_rq_unlock
   10023 page_get_cache

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 72924 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.28-rc5
# Mon Nov 17 11:59:36 2008
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=20
# CONFIG_CGROUPS is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
# CONFIG_GROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_COMPAT_BRK=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
# CONFIG_MARKERS is not set
CONFIG_OPROFILE=m
CONFIG_OPROFILE_IBS=y
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_CLASSIC_RCU=y
CONFIG_FREEZER=y

#
# Processor type and features
#
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_VSMP is not set
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_MEMTEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR_64=y
CONFIG_X86_DS=y
CONFIG_X86_PTRACE_BTS=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT=y
# CONFIG_AMD_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_NR_CPUS=255
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
# CONFIG_I8K is not set
CONFIG_MICROCODE=m
CONFIG_MICROCODE_INTEL=y
# CONFIG_MICROCODE_AMD is not set
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
# CONFIG_NUMA_EMU is not set
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_RESOURCES_64BIT=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_MMU_NOTIFIER=y
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
CONFIG_X86_RESERVE_LOW_64K=y
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
# CONFIG_X86_PAT is not set
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_SCHED_HRTICK is not set
CONFIG_KEXEC=y
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=m
CONFIG_ACPI_BATTERY=m
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_WMI is not set
CONFIG_ACPI_ASUS=m
CONFIG_ACPI_TOSHIBA=m
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_SBS=m

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
CONFIG_X86_ACPI_CPUFREQ=y
CONFIG_X86_POWERNOW_K8=y
CONFIG_X86_POWERNOW_K8_ACPI=y
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set
# CONFIG_X86_SPEEDSTEP_LIB is not set
# CONFIG_CPU_IDLE is not set

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=m
CONFIG_PCIEAER=y
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
CONFIG_HT_IRQ=y
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
CONFIG_PCCARD=y
# CONFIG_PCMCIA_DEBUG is not set
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_PCMCIA_IOCTL=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=y
CONFIG_YENTA_O2=y
CONFIG_YENTA_RICOH=y
CONFIG_YENTA_TI=y
CONFIG_YENTA_ENE_TUNE=y
CONFIG_YENTA_TOSHIBA=y
CONFIG_PD6729=m
CONFIG_I82092=m
CONFIG_PCCARD_NONSTATIC=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_FAKE=m
CONFIG_HOTPLUG_PCI_ACPI=m
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=m

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_SUB_POLICY is not set
CONFIG_XFRM_MIGRATE=y
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
CONFIG_INET_LRO=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=m
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
CONFIG_DEFAULT_BIC=y
# CONFIG_DEFAULT_CUBIC is not set
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="bic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=m
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
# CONFIG_IPV6_MIP6 is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
CONFIG_NETFILTER_DEBUG=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NF_CONNTRACK=y
CONFIG_NF_CT_ACCT=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CONNTRACK_SECMARK=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CT_PROTO_DCCP=m
CONFIG_NF_CT_PROTO_GRE=m
CONFIG_NF_CT_PROTO_SCTP=m
# CONFIG_NF_CT_PROTO_UDPLITE is not set
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
# CONFIG_NF_CT_NETLINK is not set
# CONFIG_NETFILTER_TPROXY is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
# CONFIG_NETFILTER_XT_TARGET_RATEEST is not set
# CONFIG_NETFILTER_XT_TARGET_TRACE is not set
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
# CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP is not set
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
# CONFIG_NETFILTER_XT_MATCH_CONNLIMIT is not set
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
# CONFIG_NETFILTER_XT_MATCH_IPRANGE is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
# CONFIG_NETFILTER_XT_MATCH_RATEEST is not set
CONFIG_NETFILTER_XT_MATCH_REALM=m
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
CONFIG_IP_VS=m
# CONFIG_IP_VS_IPV6 is not set
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m

#
# IP: Netfilter Configuration
#
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_CONNTRACK_IPV4=m
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_IP_NF_QUEUE=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_NF_NAT=m
CONFIG_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PROTO_DCCP=m
CONFIG_NF_NAT_PROTO_GRE=m
CONFIG_NF_NAT_PROTO_SCTP=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_NF_NAT_SIP=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
# CONFIG_IP_NF_SECURITY is not set
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
CONFIG_NF_CONNTRACK_IPV6=m
CONFIG_IP6_NF_QUEUE=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_TARGET_LOG=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_RAW=m
# CONFIG_IP6_NF_SECURITY is not set

#
# DECnet: Netfilter Configuration
#
# CONFIG_DECNET_NF_GRABULATOR is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
# CONFIG_BRIDGE_EBT_IP6 is not set
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_ULOG=m
# CONFIG_BRIDGE_EBT_NFLOG is not set
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m
CONFIG_IP_DCCP_ACKVEC=y

#
# DCCP CCIDs Configuration (EXPERIMENTAL)
#
CONFIG_IP_DCCP_CCID2=m
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=m
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_CCID3_RTO=100
CONFIG_IP_DCCP_TFRC_LIB=m

#
# DCCP Kernel Hacking
#
# CONFIG_IP_DCCP_DEBUG is not set
# CONFIG_NET_DCCPPROBE is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_MSG is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_HMAC_NONE is not set
# CONFIG_SCTP_HMAC_SHA1 is not set
CONFIG_SCTP_HMAC_MD5=y
CONFIG_TIPC=m
# CONFIG_TIPC_ADVANCED is not set
# CONFIG_TIPC_DEBUG is not set
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_STP=m
CONFIG_BRIDGE=m
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
# CONFIG_VLAN_8021Q_GVRP is not set
CONFIG_DECNET=m
CONFIG_DECNET_ROUTER=y
CONFIG_LLC=y
# CONFIG_LLC2 is not set
CONFIG_IPX=m
# CONFIG_IPX_INTERN is not set
CONFIG_ATALK=m
CONFIG_DEV_APPLETALK=m
CONFIG_IPDDP=m
CONFIG_IPDDP_ENCAP=y
CONFIG_IPDDP_DECAP=y
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
CONFIG_WAN_ROUTER=m
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
# CONFIG_NET_SCH_MULTIQ is not set
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_INGRESS=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_IPT=m
# CONFIG_NET_ACT_NAT is not set
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
# CONFIG_NET_ACT_SKBEDIT is not set
CONFIG_NET_CLS_IND=y
CONFIG_NET_SCH_FIFO=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_NET_TCPPROBE is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
CONFIG_IRDA=m

#
# IrDA protocols
#
CONFIG_IRLAN=m
CONFIG_IRNET=m
CONFIG_IRCOMM=m
# CONFIG_IRDA_ULTRA is not set

#
# IrDA options
#
CONFIG_IRDA_CACHE_LAST_LSAP=y
CONFIG_IRDA_FAST_RR=y
# CONFIG_IRDA_DEBUG is not set

#
# Infrared-port device drivers
#

#
# SIR device drivers
#
CONFIG_IRTTY_SIR=m

#
# Dongle support
#
CONFIG_DONGLE=y
CONFIG_ESI_DONGLE=m
CONFIG_ACTISYS_DONGLE=m
CONFIG_TEKRAM_DONGLE=m
CONFIG_TOIM3232_DONGLE=m
CONFIG_LITELINK_DONGLE=m
CONFIG_MA600_DONGLE=m
CONFIG_GIRBIL_DONGLE=m
CONFIG_MCP2120_DONGLE=m
CONFIG_OLD_BELKIN_DONGLE=m
CONFIG_ACT200L_DONGLE=m
# CONFIG_KINGSUN_DONGLE is not set
# CONFIG_KSDAZZLE_DONGLE is not set
# CONFIG_KS959_DONGLE is not set

#
# FIR device drivers
#
CONFIG_USB_IRDA=m
CONFIG_SIGMATEL_FIR=m
CONFIG_NSC_FIR=m
CONFIG_WINBOND_FIR=m
CONFIG_SMC_IRCC_FIR=m
CONFIG_ALI_FIR=m
CONFIG_VLSI_FIR=m
CONFIG_VIA_FIR=m
CONFIG_MCS_FIR=m
CONFIG_BT=m
CONFIG_BT_L2CAP=m
CONFIG_BT_SCO=m
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HIDP=m

#
# Bluetooth device drivers
#
CONFIG_BT_HCIUSB=m
CONFIG_BT_HCIUSB_SCO=y
# CONFIG_BT_HCIBTUSB is not set
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
# CONFIG_BT_HCIUART_LL is not set
CONFIG_BT_HCIBCM203X=m
CONFIG_BT_HCIBPA10X=m
CONFIG_BT_HCIBFUSB=m
CONFIG_BT_HCIDTL1=m
CONFIG_BT_HCIBT3C=m
CONFIG_BT_HCIBLUECARD=m
CONFIG_BT_HCIBTUART=m
CONFIG_BT_HCIVHCI=m
# CONFIG_AF_RXRPC is not set
# CONFIG_PHONET is not set
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set
CONFIG_WIRELESS_OLD_REGULATORY=y
CONFIG_WIRELESS_EXT=y
CONFIG_WIRELESS_EXT_SYSFS=y
# CONFIG_MAC80211 is not set
CONFIG_IEEE80211=m
# CONFIG_IEEE80211_DEBUG is not set
CONFIG_IEEE80211_CRYPT_WEP=m
CONFIG_IEEE80211_CRYPT_CCMP=m
CONFIG_IEEE80211_CRYPT_TKIP=m
CONFIG_RFKILL=m
# CONFIG_RFKILL_INPUT is not set
CONFIG_RFKILL_LEDS=y
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
CONFIG_PARPORT_PC_PCMCIA=m
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=m
# CONFIG_PARIDE is not set
CONFIG_BLK_CPQ_DA=y
CONFIG_BLK_CPQ_CISS_DA=m
CONFIG_CISS_SCSI_TAPE=y
CONFIG_BLK_DEV_DAC960=m
CONFIG_BLK_DEV_UMEM=m
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_SX8=m
CONFIG_BLK_DEV_UB=m
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
CONFIG_ATA_OVER_ETH=m
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_EEPROM_93CX6 is not set
CONFIG_SGI_IOC4=m
CONFIG_TIFM_CORE=m
CONFIG_TIFM_7XX1=m
# CONFIG_ACER_WMI is not set
CONFIG_ASUS_LAPTOP=m
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_ICS932S401 is not set
CONFIG_MSI_LAPTOP=m
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_COMPAL_LAPTOP is not set
CONFIG_SONY_LAPTOP=m
# CONFIG_SONYPI_COMPAT is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_SGI_XP is not set
# CONFIG_HP_ILO is not set
# CONFIG_SGI_GRU is not set
# CONFIG_C2PORT is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_TGT=m
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=m
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=m
# CONFIG_SCSI_FC_TGT_ATTRS is not set
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
# CONFIG_SCSI_SAS_LIBSAS is not set
CONFIG_SCSI_SRP_ATTRS=m
# CONFIG_SCSI_SRP_TGT_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
CONFIG_BLK_DEV_3W_XXXX_RAID=m
CONFIG_SCSI_3W_9XXX=m
CONFIG_SCSI_ACARD=m
CONFIG_SCSI_AACRAID=m
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=4
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC7XXX_OLD=m
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=4
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
CONFIG_SCSI_ARCMSR=m
# CONFIG_SCSI_ARCMSR_AER is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=m
CONFIG_MEGARAID_MAILBOX=m
CONFIG_MEGARAID_LEGACY=m
CONFIG_MEGARAID_SAS=m
CONFIG_SCSI_HPTIOP=m
CONFIG_SCSI_BUSLOGIC=m
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
CONFIG_SCSI_IPS=m
CONFIG_SCSI_INITIO=m
CONFIG_SCSI_INIA100=m
CONFIG_SCSI_PPA=m
CONFIG_SCSI_IMM=m
# CONFIG_SCSI_IZIP_EPP16 is not set
# CONFIG_SCSI_IZIP_SLOW_CTR is not set
# CONFIG_SCSI_MVSAS is not set
CONFIG_SCSI_STEX=m
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=m
CONFIG_SCSI_QLA_FC=m
CONFIG_SCSI_QLA_ISCSI=m
CONFIG_SCSI_LPFC=m
CONFIG_SCSI_DC395x=m
CONFIG_SCSI_DC390T=m
# CONFIG_SCSI_DEBUG is not set
CONFIG_SCSI_SRP=m
# CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
# CONFIG_SCSI_DH is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=y
CONFIG_SATA_SIL24=m
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=m
CONFIG_ATA_PIIX=y
CONFIG_SATA_MV=m
CONFIG_SATA_NV=y
CONFIG_PDC_ADMA=m
CONFIG_SATA_QSTOR=m
CONFIG_SATA_PROMISE=m
CONFIG_SATA_SX4=m
CONFIG_SATA_SIL=m
CONFIG_SATA_SIS=m
CONFIG_SATA_ULI=m
CONFIG_SATA_VIA=m
CONFIG_SATA_VITESSE=m
CONFIG_SATA_INIC162X=m
# CONFIG_PATA_ACPI is not set
CONFIG_PATA_ALI=m
CONFIG_PATA_AMD=y
CONFIG_PATA_ARTOP=m
CONFIG_PATA_ATIIXP=m
# CONFIG_PATA_CMD640_PCI is not set
CONFIG_PATA_CMD64X=m
CONFIG_PATA_CS5520=m
CONFIG_PATA_CS5530=m
CONFIG_PATA_CYPRESS=m
CONFIG_PATA_EFAR=m
CONFIG_ATA_GENERIC=m
CONFIG_PATA_HPT366=m
CONFIG_PATA_HPT37X=m
CONFIG_PATA_HPT3X2N=m
CONFIG_PATA_HPT3X3=m
# CONFIG_PATA_HPT3X3_DMA is not set
CONFIG_PATA_IT821X=m
CONFIG_PATA_IT8213=m
CONFIG_PATA_JMICRON=m
CONFIG_PATA_TRIFLEX=m
CONFIG_PATA_MARVELL=m
CONFIG_PATA_MPIIX=m
CONFIG_PATA_OLDPIIX=y
CONFIG_PATA_NETCELL=m
# CONFIG_PATA_NINJA32 is not set
CONFIG_PATA_NS87410=m
# CONFIG_PATA_NS87415 is not set
CONFIG_PATA_OPTI=m
CONFIG_PATA_OPTIDMA=m
CONFIG_PATA_PCMCIA=m
CONFIG_PATA_PDC_OLD=m
CONFIG_PATA_RADISYS=m
CONFIG_PATA_RZ1000=m
CONFIG_PATA_SC1200=m
CONFIG_PATA_SERVERWORKS=m
CONFIG_PATA_PDC2027X=m
CONFIG_PATA_SIL680=m
CONFIG_PATA_SIS=m
CONFIG_PATA_VIA=m
CONFIG_PATA_WINBOND=m
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_RAID5_RESHAPE=y
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_DEBUG is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=40
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LAN=m
# CONFIG_FUSION_LOGGING is not set

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
# CONFIG_FIREWIRE is not set
CONFIG_IEEE1394=m
CONFIG_IEEE1394_OHCI1394=m
CONFIG_IEEE1394_PCILYNX=m
CONFIG_IEEE1394_SBP2=m
# CONFIG_IEEE1394_SBP2_PHYS_DMA is not set
CONFIG_IEEE1394_ETH1394_ROM_ENTRY=y
CONFIG_IEEE1394_ETH1394=m
CONFIG_IEEE1394_RAWIO=m
CONFIG_IEEE1394_VIDEO1394=m
CONFIG_IEEE1394_DV1394=m
# CONFIG_IEEE1394_VERBOSEDEBUG is not set
CONFIG_I2O=m
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_EXT_ADAPTEC_DMA64=y
# CONFIG_I2O_CONFIG is not set
CONFIG_I2O_BUS=m
CONFIG_I2O_BLOCK=m
CONFIG_I2O_SCSI=m
CONFIG_I2O_PROC=m
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_IFB=m
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
CONFIG_EQUALIZER=m
CONFIG_TUN=m
# CONFIG_VETH is not set
CONFIG_NET_SB1000=m
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
CONFIG_MARVELL_PHY=m
CONFIG_DAVICOM_PHY=m
CONFIG_QSEMI_PHY=m
CONFIG_LXT_PHY=m
CONFIG_CICADA_PHY=m
CONFIG_VITESSE_PHY=m
CONFIG_SMSC_PHY=m
CONFIG_BROADCOM_PHY=m
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
CONFIG_HAPPYMEAL=m
CONFIG_SUNGEM=m
CONFIG_CASSINI=m
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_TYPHOON=m
CONFIG_NET_TULIP=y
CONFIG_DE2104X=m
CONFIG_TULIP=m
# CONFIG_TULIP_MWI is not set
CONFIG_TULIP_MMIO=y
# CONFIG_TULIP_NAPI is not set
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=m
CONFIG_ULI526X=m
CONFIG_PCMCIA_XIRCOM=m
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=m
CONFIG_AMD8111_ETH=m
CONFIG_ADAPTEC_STARFIRE=m
CONFIG_B44=m
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=y
CONFIG_FORCEDETH_NAPI=y
# CONFIG_EEPRO100 is not set
CONFIG_E100=y
CONFIG_FEALNX=m
CONFIG_NATSEMI=m
CONFIG_NE2K_PCI=m
CONFIG_8139CP=m
CONFIG_8139TOO=y
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
CONFIG_8139TOO_8129=y
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
CONFIG_SIS900=m
CONFIG_EPIC100=m
CONFIG_SUNDANCE=m
# CONFIG_SUNDANCE_MMIO is not set
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
CONFIG_SC92031=m
CONFIG_NET_POCKET=y
CONFIG_ATP=m
CONFIG_DE600=m
CONFIG_DE620=m
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=m
# CONFIG_ACENIC_OMIT_TIGON_I is not set
CONFIG_DL2K=m
CONFIG_E1000=y
CONFIG_E1000E=y
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
CONFIG_NS83820=m
CONFIG_HAMACHI=m
CONFIG_YELLOWFIN=m
CONFIG_R8169=m
CONFIG_R8169_VLAN=y
# CONFIG_SIS190 is not set
CONFIG_SKGE=m
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
CONFIG_VIA_VELOCITY=m
CONFIG_TIGON3=y
CONFIG_BNX2=m
CONFIG_QLA3XXX=m
CONFIG_ATL1=m
# CONFIG_ATL1E is not set
# CONFIG_JME is not set
CONFIG_NETDEV_10000=y
CONFIG_CHELSIO_T1=m
CONFIG_CHELSIO_T1_1G=y
CONFIG_CHELSIO_T3=m
# CONFIG_ENIC is not set
# CONFIG_IXGBE is not set
CONFIG_IXGB=m
CONFIG_S2IO=m
CONFIG_MYRI10GE=m
CONFIG_NETXEN_NIC=m
# CONFIG_NIU is not set
# CONFIG_MLX4_EN is not set
# CONFIG_MLX4_CORE is not set
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
# CONFIG_QLGE is not set
# CONFIG_SFC is not set
CONFIG_TR=y
CONFIG_IBMOL=m
CONFIG_3C359=m
# CONFIG_TMS380TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set
# CONFIG_IWLWIFI_LEDS is not set

#
# USB Network Adapters
#
CONFIG_USB_CATC=m
CONFIG_USB_KAWETH=m
CONFIG_USB_PEGASUS=m
CONFIG_USB_RTL8150=m
CONFIG_USB_USBNET=m
CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_CDCETHER=m
CONFIG_USB_NET_DM9601=m
# CONFIG_USB_NET_SMSC95XX is not set
CONFIG_USB_NET_GL620A=m
CONFIG_USB_NET_NET1080=m
CONFIG_USB_NET_PLUSB=m
CONFIG_USB_NET_MCS7830=m
CONFIG_USB_NET_RNDIS_HOST=m
CONFIG_USB_NET_CDC_SUBSET=m
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
CONFIG_USB_KC2190=y
CONFIG_USB_NET_ZAURUS=m
# CONFIG_USB_HSO is not set
CONFIG_NET_PCMCIA=y
CONFIG_PCMCIA_3C589=m
CONFIG_PCMCIA_3C574=m
CONFIG_PCMCIA_FMVJ18X=m
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
CONFIG_PCMCIA_XIRC2PS=m
CONFIG_PCMCIA_AXNET=m
# CONFIG_PCMCIA_IBMTR is not set
# CONFIG_WAN is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
CONFIG_ATM_TCP=m
CONFIG_ATM_LANAI=m
CONFIG_ATM_ENI=m
# CONFIG_ATM_ENI_DEBUG is not set
# CONFIG_ATM_ENI_TUNE_BURST is not set
CONFIG_ATM_FIRESTREAM=m
# CONFIG_ATM_ZATM is not set
CONFIG_ATM_IDT77252=m
# CONFIG_ATM_IDT77252_DEBUG is not set
# CONFIG_ATM_IDT77252_RCV_ALL is not set
CONFIG_ATM_IDT77252_USE_SUNI=y
CONFIG_ATM_AMBASSADOR=m
# CONFIG_ATM_AMBASSADOR_DEBUG is not set
CONFIG_ATM_HORIZON=m
# CONFIG_ATM_HORIZON_DEBUG is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
CONFIG_ATM_HE=m
# CONFIG_ATM_HE_USE_SUNI is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
CONFIG_SKFP=m
# CONFIG_HIPPI is not set
CONFIG_PLIP=m
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_MPPE=m
CONFIG_PPPOE=m
CONFIG_PPPOATM=m
# CONFIG_PPPOL2TP is not set
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLHC=m
CONFIG_SLIP_SMART=y
# CONFIG_SLIP_MODE_SLIP6 is not set
CONFIG_NET_FC=y
CONFIG_NETCONSOLE=y
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
CONFIG_NETPOLL_TRAP=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_KEYBOARD_STOWAWAY=m
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
CONFIG_MOUSE_VSXXXAA=m
CONFIG_INPUT_JOYSTICK=y
CONFIG_JOYSTICK_ANALOG=m
CONFIG_JOYSTICK_A3D=m
CONFIG_JOYSTICK_ADI=m
CONFIG_JOYSTICK_COBRA=m
CONFIG_JOYSTICK_GF2K=m
CONFIG_JOYSTICK_GRIP=m
CONFIG_JOYSTICK_GRIP_MP=m
CONFIG_JOYSTICK_GUILLEMOT=m
CONFIG_JOYSTICK_INTERACT=m
CONFIG_JOYSTICK_SIDEWINDER=m
CONFIG_JOYSTICK_TMDC=m
CONFIG_JOYSTICK_IFORCE=m
CONFIG_JOYSTICK_IFORCE_USB=y
CONFIG_JOYSTICK_IFORCE_232=y
CONFIG_JOYSTICK_WARRIOR=m
CONFIG_JOYSTICK_MAGELLAN=m
CONFIG_JOYSTICK_SPACEORB=m
CONFIG_JOYSTICK_SPACEBALL=m
CONFIG_JOYSTICK_STINGER=m
CONFIG_JOYSTICK_TWIDJOY=m
# CONFIG_JOYSTICK_ZHENHUA is not set
CONFIG_JOYSTICK_DB9=m
CONFIG_JOYSTICK_GAMECON=m
CONFIG_JOYSTICK_TURBOGRAFX=m
CONFIG_JOYSTICK_JOYDUMP=m
# CONFIG_JOYSTICK_XPAD is not set
# CONFIG_INPUT_TABLET is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_FUJITSU is not set
CONFIG_TOUCHSCREEN_GUNZE=m
CONFIG_TOUCHSCREEN_ELO=m
CONFIG_TOUCHSCREEN_MTOUCH=m
# CONFIG_TOUCHSCREEN_INEXIO is not set
CONFIG_TOUCHSCREEN_MK712=m
CONFIG_TOUCHSCREEN_PENMOUNT=m
CONFIG_TOUCHSCREEN_TOUCHRIGHT=m
CONFIG_TOUCHSCREEN_TOUCHWIN=m
# CONFIG_TOUCHSCREEN_WM97XX is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
# CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_APANEL is not set
CONFIG_INPUT_ATLAS_BTNS=m
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
CONFIG_INPUT_UINPUT=m

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_L4=m
CONFIG_GAMEPORT_EMU10K1=m
CONFIG_GAMEPORT_FM801=m

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_ISI is not set
CONFIG_SYNCLINK=m
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
CONFIG_N_HDLC=m
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
# CONFIG_SX is not set
# CONFIG_RIO is not set
# CONFIG_STALDRV is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=m
CONFIG_LP_CONSOLE=y
CONFIG_PPDEV=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_NVRAM=y
CONFIG_R3964=m
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_IPWIRELESS is not set
CONFIG_MWAVE=m
CONFIG_PC8736x_GPIO=m
CONFIG_NSC_GPIO=m
# CONFIG_RAW_DRIVER is not set
# CONFIG_HPET is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=m
CONFIG_TCG_TIS=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
# CONFIG_I2C_AMD756_S4882 is not set
CONFIG_I2C_AMD8111=m
CONFIG_I2C_I801=m
# CONFIG_I2C_ISCH is not set
CONFIG_I2C_PIIX4=y
CONFIG_I2C_NFORCE2=y
# CONFIG_I2C_NFORCE2_S4985 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
CONFIG_I2C_VOODOO3=m

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_PCA_PLATFORM is not set
CONFIG_I2C_STUB=m

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
# CONFIG_AT24 is not set
CONFIG_SENSORS_EEPROM=m
CONFIG_SENSORS_PCF8574=m
# CONFIG_PCF8575 is not set
# CONFIG_SENSORS_PCA9539 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_SENSORS_MAX6875=m
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
CONFIG_W1=m
CONFIG_W1_CON=y

#
# 1-wire Bus Masters
#
CONFIG_W1_MASTER_MATROX=m
CONFIG_W1_MASTER_DS2490=m
CONFIG_W1_MASTER_DS2482=m

#
# 1-wire Slaves
#
CONFIG_W1_SLAVE_THERM=m
CONFIG_W1_SLAVE_SMEM=m
CONFIG_W1_SLAVE_DS2433=m
CONFIG_W1_SLAVE_DS2433_CRC=y
# CONFIG_W1_SLAVE_DS2760 is not set
# CONFIG_W1_SLAVE_BQ27000 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_BATTERY_BQ27x00 is not set
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
CONFIG_SENSORS_ABITUGURU=m
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
CONFIG_SENSORS_ADM1029=m
CONFIG_SENSORS_ADM1031=m
CONFIG_SENSORS_ADM9240=m
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_ASB100=m
CONFIG_SENSORS_ATXP1=m
CONFIG_SENSORS_DS1621=m
# CONFIG_SENSORS_I5K_AMB is not set
CONFIG_SENSORS_F71805F=m
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHER=m
CONFIG_SENSORS_FSCPOS=m
# CONFIG_SENSORS_FSCHMD is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
# CONFIG_SENSORS_LM93 is not set
CONFIG_SENSORS_MAX1619=m
# CONFIG_SENSORS_MAX6650 is not set
CONFIG_SENSORS_PC87360=m
CONFIG_SENSORS_PC87427=m
CONFIG_SENSORS_SIS5595=m
# CONFIG_SENSORS_DME1737 is not set
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83793=m
CONFIG_SENSORS_W83L785TS=m
# CONFIG_SENSORS_W83L786NG is not set
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
CONFIG_SENSORS_HDAPS=m
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
# CONFIG_THERMAL_HWMON is not set
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_SC520_WDT is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=m
CONFIG_ITCO_WDT=m
CONFIG_ITCO_VENDOR_SUPPORT=y
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
CONFIG_PC87413_WDT=m
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
CONFIG_W83627HF_WDT=m
CONFIG_W83697HF_WDT=m
# CONFIG_W83697UG_WDT is not set
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m
CONFIG_WDT_501_PCI=y

#
# USB-based Watchdog Cards
#
CONFIG_USBPCWATCHDOG=m
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=m
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_PCMCIAHOST_POSSIBLE=y
# CONFIG_SSB_PCMCIAHOST is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
CONFIG_MFD_SM501=m
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
# CONFIG_VIDEO_DEV is not set
CONFIG_DVB_CORE=m
CONFIG_VIDEO_MEDIA=m

#
# Multimedia drivers
#
# CONFIG_MEDIA_ATTACH is not set
CONFIG_MEDIA_TUNER=m
# CONFIG_MEDIA_TUNER_CUSTOMIZE is not set
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_DVB_CAPTURE_DRIVERS=y

#
# Supported SAA7146 based PCI Adapters
#
# CONFIG_TTPCI_EEPROM is not set
# CONFIG_DVB_BUDGET_CORE is not set

#
# Supported USB Adapters
#
# CONFIG_DVB_USB is not set
CONFIG_DVB_TTUSB_BUDGET=m
CONFIG_DVB_TTUSB_DEC=m
# CONFIG_DVB_SIANO_SMS1XXX is not set

#
# Supported FlexCopII (B2C2) Adapters
#
CONFIG_DVB_B2C2_FLEXCOP=m
CONFIG_DVB_B2C2_FLEXCOP_PCI=m
CONFIG_DVB_B2C2_FLEXCOP_USB=m
# CONFIG_DVB_B2C2_FLEXCOP_DEBUG is not set

#
# Supported BT878 Adapters
#

#
# Supported Pluto2 Adapters
#
CONFIG_DVB_PLUTO2=m

#
# Supported SDMC DM1105 Adapters
#
# CONFIG_DVB_DM1105 is not set

#
# Supported DVB Frontends
#

#
# Customise DVB Frontends
#
# CONFIG_DVB_FE_CUSTOMISE is not set

#
# DVB-S (satellite) frontends
#
CONFIG_DVB_CX24110=m
CONFIG_DVB_CX24123=m
CONFIG_DVB_MT312=m
CONFIG_DVB_S5H1420=m
# CONFIG_DVB_STV0288 is not set
# CONFIG_DVB_STB6000 is not set
CONFIG_DVB_STV0299=m
CONFIG_DVB_TDA8083=m
CONFIG_DVB_TDA10086=m
CONFIG_DVB_VES1X93=m
CONFIG_DVB_TUNER_ITD1000=m
CONFIG_DVB_TDA826X=m
CONFIG_DVB_TUA6100=m
# CONFIG_DVB_CX24116 is not set
# CONFIG_DVB_SI21XX is not set

#
# DVB-T (terrestrial) frontends
#
CONFIG_DVB_SP8870=m
CONFIG_DVB_SP887X=m
CONFIG_DVB_CX22700=m
CONFIG_DVB_CX22702=m
# CONFIG_DVB_DRX397XD is not set
CONFIG_DVB_L64781=m
CONFIG_DVB_TDA1004X=m
CONFIG_DVB_NXT6000=m
CONFIG_DVB_MT352=m
CONFIG_DVB_ZL10353=m
CONFIG_DVB_DIB3000MB=m
CONFIG_DVB_DIB3000MC=m
CONFIG_DVB_DIB7000M=m
CONFIG_DVB_DIB7000P=m
# CONFIG_DVB_TDA10048 is not set

#
# DVB-C (cable) frontends
#
CONFIG_DVB_VES1820=m
CONFIG_DVB_TDA10021=m
# CONFIG_DVB_TDA10023 is not set
CONFIG_DVB_STV0297=m

#
# ATSC (North American/Korean Terrestrial/Cable DTV) frontends
#
CONFIG_DVB_NXT200X=m
CONFIG_DVB_OR51211=m
CONFIG_DVB_OR51132=m
CONFIG_DVB_BCM3510=m
CONFIG_DVB_LGDT330X=m
# CONFIG_DVB_S5H1409 is not set
# CONFIG_DVB_AU8522 is not set
# CONFIG_DVB_S5H1411 is not set

#
# Digital terrestrial only tuners/PLL
#
CONFIG_DVB_PLL=m
CONFIG_DVB_TUNER_DIB0070=m

#
# SEC control devices for DVB-S
#
CONFIG_DVB_LNBP21=m
# CONFIG_DVB_ISL6405 is not set
CONFIG_DVB_ISL6421=m
# CONFIG_DVB_LGS8GL5 is not set

#
# Tools to develop new frontends
#
# CONFIG_DVB_DUMMY_FE is not set
# CONFIG_DVB_AF9013 is not set
# CONFIG_DAB is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
CONFIG_AGP_SIS=y
CONFIG_AGP_VIA=y
CONFIG_DRM=m
CONFIG_DRM_TDFX=m
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_I810=m
CONFIG_DRM_I830=m
CONFIG_DRM_I915=m
CONFIG_DRM_MGA=m
# CONFIG_DRM_SIS is not set
CONFIG_DRM_VIA=m
CONFIG_DRM_SAVAGE=m
CONFIG_VGASTATE=m
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=m
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=m
CONFIG_FB_CFB_COPYAREA=m
CONFIG_FB_CFB_IMAGEBLIT=m
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
CONFIG_FB_SVGALIB=m
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_VESA is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_FB_RIVA=m
# CONFIG_FB_RIVA_I2C is not set
# CONFIG_FB_RIVA_DEBUG is not set
CONFIG_FB_RIVA_BACKLIGHT=y
# CONFIG_FB_LE80578 is not set
CONFIG_FB_INTEL=m
# CONFIG_FB_INTEL_DEBUG is not set
CONFIG_FB_INTEL_I2C=y
CONFIG_FB_MATROX=m
CONFIG_FB_MATROX_MILLENIUM=y
CONFIG_FB_MATROX_MYSTIQUE=y
CONFIG_FB_MATROX_G=y
CONFIG_FB_MATROX_I2C=m
CONFIG_FB_MATROX_MAVEN=m
CONFIG_FB_MATROX_MULTIHEAD=y
# CONFIG_FB_RADEON is not set
CONFIG_FB_ATY128=m
CONFIG_FB_ATY128_BACKLIGHT=y
CONFIG_FB_ATY=m
CONFIG_FB_ATY_CT=y
CONFIG_FB_ATY_GENERIC_LCD=y
CONFIG_FB_ATY_GX=y
CONFIG_FB_ATY_BACKLIGHT=y
CONFIG_FB_S3=m
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
CONFIG_FB_NEOMAGIC=m
CONFIG_FB_KYRO=m
CONFIG_FB_3DFX=m
CONFIG_FB_3DFX_ACCEL=y
CONFIG_FB_VOODOO1=m
# CONFIG_FB_VT8623 is not set
CONFIG_FB_TRIDENT=m
CONFIG_FB_TRIDENT_ACCEL=y
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
CONFIG_FB_SM501=m
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_PLATFORM is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_CORGI is not set
CONFIG_BACKLIGHT_PROGEAR=m
# CONFIG_BACKLIGHT_MBP_NVIDIA is not set
# CONFIG_BACKLIGHT_SAHARA is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=m
CONFIG_SOUND_OSS_CORE=y
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_HWDEP=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_DYNAMIC_MINORS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
CONFIG_SND_VMASTER=y
CONFIG_SND_MPU401_UART=m
CONFIG_SND_OPL3_LIB=m
CONFIG_SND_VX_LIB=m
CONFIG_SND_AC97_CODEC=m
CONFIG_SND_DRIVERS=y
CONFIG_SND_DUMMY=m
CONFIG_SND_VIRMIDI=m
# CONFIG_SND_MTPAV is not set
CONFIG_SND_MTS64=m
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=m
CONFIG_SND_PORTMAN2X4=m
CONFIG_SND_AC97_POWER_SAVE=y
CONFIG_SND_AC97_POWER_SAVE_DEFAULT=0
CONFIG_SND_SB_COMMON=m
CONFIG_SND_PCI=y
CONFIG_SND_AD1889=m
CONFIG_SND_ALS300=m
CONFIG_SND_ALS4000=m
CONFIG_SND_ALI5451=m
CONFIG_SND_ATIIXP=m
CONFIG_SND_ATIIXP_MODEM=m
CONFIG_SND_AU8810=m
CONFIG_SND_AU8820=m
CONFIG_SND_AU8830=m
# CONFIG_SND_AW2 is not set
CONFIG_SND_AZT3328=m
CONFIG_SND_BT87X=m
# CONFIG_SND_BT87X_OVERCLOCK is not set
CONFIG_SND_CA0106=m
CONFIG_SND_CMIPCI=m
# CONFIG_SND_OXYGEN is not set
CONFIG_SND_CS4281=m
CONFIG_SND_CS46XX=m
CONFIG_SND_CS46XX_NEW_DSP=y
# CONFIG_SND_CS5530 is not set
CONFIG_SND_DARLA20=m
CONFIG_SND_GINA20=m
CONFIG_SND_LAYLA20=m
CONFIG_SND_DARLA24=m
CONFIG_SND_GINA24=m
CONFIG_SND_LAYLA24=m
CONFIG_SND_MONA=m
CONFIG_SND_MIA=m
CONFIG_SND_ECHO3G=m
CONFIG_SND_INDIGO=m
CONFIG_SND_INDIGOIO=m
CONFIG_SND_INDIGODJ=m
CONFIG_SND_EMU10K1=m
CONFIG_SND_EMU10K1X=m
CONFIG_SND_ENS1370=m
CONFIG_SND_ENS1371=m
CONFIG_SND_ES1938=m
CONFIG_SND_ES1968=m
CONFIG_SND_FM801=m
CONFIG_SND_HDA_INTEL=m
# CONFIG_SND_HDA_HWDEP is not set
# CONFIG_SND_HDA_INPUT_BEEP is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_ATIHDMI=y
CONFIG_SND_HDA_CODEC_NVHDMI=y
CONFIG_SND_HDA_CODEC_CONEXANT=y
CONFIG_SND_HDA_CODEC_CMEDIA=y
CONFIG_SND_HDA_CODEC_SI3054=y
CONFIG_SND_HDA_GENERIC=y
# CONFIG_SND_HDA_POWER_SAVE is not set
CONFIG_SND_HDSP=m
CONFIG_SND_HDSPM=m
# CONFIG_SND_HIFIER is not set
CONFIG_SND_ICE1712=m
CONFIG_SND_ICE1724=m
CONFIG_SND_INTEL8X0=m
CONFIG_SND_INTEL8X0M=m
CONFIG_SND_KORG1212=m
CONFIG_SND_MAESTRO3=m
CONFIG_SND_MIXART=m
CONFIG_SND_NM256=m
CONFIG_SND_PCXHR=m
CONFIG_SND_RIPTIDE=m
CONFIG_SND_RME32=m
CONFIG_SND_RME96=m
CONFIG_SND_RME9652=m
CONFIG_SND_SONICVIBES=m
CONFIG_SND_TRIDENT=m
CONFIG_SND_VIA82XX=m
CONFIG_SND_VIA82XX_MODEM=m
# CONFIG_SND_VIRTUOSO is not set
CONFIG_SND_VX222=m
CONFIG_SND_YMFPCI=m
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=m
CONFIG_SND_USB_USX2Y=m
# CONFIG_SND_USB_CAIAQ is not set
# CONFIG_SND_USB_US122L is not set
CONFIG_SND_PCMCIA=y
# CONFIG_SND_VXPOCKET is not set
# CONFIG_SND_PDAUDIOCF is not set
CONFIG_SND_SOC=m
# CONFIG_SND_SOC_ALL_CODECS is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_AC97_BUS=m
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
# CONFIG_HID_DEBUG is not set
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
CONFIG_HID_PID=y
CONFIG_USB_HIDDEV=y

#
# Special HID drivers
#
CONFIG_HID_COMPAT=y
CONFIG_HID_A4TECH=y
CONFIG_HID_APPLE=y
CONFIG_HID_BELKIN=y
CONFIG_HID_BRIGHT=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
CONFIG_HID_DELL=y
CONFIG_HID_EZKEY=y
CONFIG_HID_GYRATION=y
CONFIG_HID_LOGITECH=y
CONFIG_LOGITECH_FF=y
# CONFIG_LOGIRUMBLEPAD2_FF is not set
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
CONFIG_HID_PANTHERLORD=y
CONFIG_PANTHERLORD_FF=y
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
CONFIG_HID_SUNPLUS=y
CONFIG_THRUSTMASTER_FF=y
CONFIG_ZEROPLUS_FF=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
CONFIG_USB_MON=y
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_ISP116X_HCD=m
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
CONFIG_USB_U132_HCD=m
CONFIG_USB_SL811_HCD=m
CONFIG_USB_SL811_CS=m
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
CONFIG_USB_ACM=m
CONFIG_USB_PRINTER=m
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may also be needed;
#

#
# see USB_STORAGE Help for more information
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
# CONFIG_USB_STORAGE_ONETOUCH is not set
CONFIG_USB_STORAGE_KARMA=y
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
CONFIG_USB_LIBUSUAL=y

#
# USB Imaging devices
#
CONFIG_USB_MDC800=m
CONFIG_USB_MICROTEK=m

#
# USB port drivers
#
CONFIG_USB_USS720=m
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
CONFIG_USB_SERIAL_AIRCABLE=m
CONFIG_USB_SERIAL_ARK3116=m
CONFIG_USB_SERIAL_BELKIN=m
# CONFIG_USB_SERIAL_CH341 is not set
CONFIG_USB_SERIAL_WHITEHEAT=m
CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m
CONFIG_USB_SERIAL_CP2101=m
CONFIG_USB_SERIAL_CYPRESS_M8=m
CONFIG_USB_SERIAL_EMPEG=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_SERIAL_FUNSOFT=m
CONFIG_USB_SERIAL_VISOR=m
CONFIG_USB_SERIAL_IPAQ=m
CONFIG_USB_SERIAL_IR=m
CONFIG_USB_SERIAL_EDGEPORT=m
CONFIG_USB_SERIAL_EDGEPORT_TI=m
CONFIG_USB_SERIAL_GARMIN=m
CONFIG_USB_SERIAL_IPW=m
# CONFIG_USB_SERIAL_IUU is not set
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
CONFIG_USB_SERIAL_KEYSPAN=m
CONFIG_USB_SERIAL_KEYSPAN_MPR=y
CONFIG_USB_SERIAL_KEYSPAN_USA28=y
CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
CONFIG_USB_SERIAL_KEYSPAN_USA19=y
CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
CONFIG_USB_SERIAL_MOS7720=m
CONFIG_USB_SERIAL_MOS7840=m
# CONFIG_USB_SERIAL_MOTOROLA is not set
CONFIG_USB_SERIAL_NAVMAN=m
CONFIG_USB_SERIAL_PL2303=m
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
CONFIG_USB_SERIAL_HP4X=m
CONFIG_USB_SERIAL_SAFE=m
CONFIG_USB_SERIAL_SAFE_PADDED=y
CONFIG_USB_SERIAL_SIERRAWIRELESS=m
CONFIG_USB_SERIAL_TI=m
CONFIG_USB_SERIAL_CYBERJACK=m
CONFIG_USB_SERIAL_XIRCOM=m
CONFIG_USB_SERIAL_OPTION=m
CONFIG_USB_SERIAL_OMNINET=m
CONFIG_USB_SERIAL_DEBUG=m

#
# USB Miscellaneous drivers
#
CONFIG_USB_EMI62=m
CONFIG_USB_EMI26=m
CONFIG_USB_ADUTUX=m
# CONFIG_USB_SEVSEG is not set
CONFIG_USB_RIO500=m
CONFIG_USB_LEGOTOWER=m
CONFIG_USB_LCD=m
CONFIG_USB_BERRY_CHARGE=m
CONFIG_USB_LED=m
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
CONFIG_USB_PHIDGET=m
CONFIG_USB_PHIDGETKIT=m
CONFIG_USB_PHIDGETMOTORCONTROL=m
CONFIG_USB_PHIDGETSERVO=m
CONFIG_USB_IDMOUSE=m
CONFIG_USB_FTDI_ELAN=m
CONFIG_USB_APPLEDISPLAY=m
CONFIG_USB_SISUSBVGA=m
CONFIG_USB_SISUSBVGA_CON=y
CONFIG_USB_LD=m
CONFIG_USB_TRANCEVIBRATOR=m
CONFIG_USB_IOWARRIOR=m
CONFIG_USB_TEST=m
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set
CONFIG_USB_ATM=m
CONFIG_USB_SPEEDTOUCH=m
CONFIG_USB_CXACRU=m
CONFIG_USB_UEAGLEATM=m
CONFIG_USB_XUSBATM=m
# CONFIG_USB_GADGET is not set
# CONFIG_UWB is not set
CONFIG_MMC=m
# CONFIG_MMC_DEBUG is not set
# CONFIG_MMC_UNSAFE_RESUME is not set

#
# MMC/SD/SDIO Card Drivers
#
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_BOUNCE=y
# CONFIG_SDIO_UART is not set
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
CONFIG_MMC_SDHCI=m
# CONFIG_MMC_SDHCI_PCI is not set
CONFIG_MMC_WBSD=m
CONFIG_MMC_TIFM_SD=m
# CONFIG_MMC_SDRICOH_CS is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_HP_DISK is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPATH=m
# CONFIG_INFINIBAND_AMSO1100 is not set
CONFIG_INFINIBAND_CXGB3=m
# CONFIG_INFINIBAND_CXGB3_DEBUG is not set
# CONFIG_MLX4_INFINIBAND is not set
# CONFIG_INFINIBAND_NES is not set
CONFIG_INFINIBAND_IPOIB=m
CONFIG_INFINIBAND_IPOIB_CM=y
CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_ISER=m
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_E752X=m
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_X38 is not set
# CONFIG_EDAC_I5000 is not set
# CONFIG_EDAC_I5100 is not set
CONFIG_RTC_LIB=m
CONFIG_RTC_CLASS=m

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1374 is not set
CONFIG_RTC_DRV_DS1672=m
# CONFIG_RTC_DRV_MAX6900 is not set
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8563=m
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=m
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
CONFIG_RTC_DRV_DS1553=m
CONFIG_RTC_DRV_DS1742=m
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_STAGING is not set
CONFIG_STAGING_EXCLUDE_BUILD=y

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4_FS is not set
CONFIG_FS_XIP=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=m
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=m
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
CONFIG_REISERFS_FS_SECURITY=y
CONFIG_JFS_FS=m
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
# CONFIG_XFS_RT is not set
# CONFIG_XFS_DEBUG is not set
CONFIG_GFS2_FS=m
CONFIG_GFS2_FS_LOCKING_DLM=m
CONFIG_OCFS2_FS=m
CONFIG_OCFS2_FS_O2CB=m
CONFIG_OCFS2_FS_USERSPACE_CLUSTER=m
CONFIG_OCFS2_FS_STATS=y
# CONFIG_OCFS2_DEBUG_MASKLOG is not set
# CONFIG_OCFS2_DEBUG_FS is not set
# CONFIG_OCFS2_COMPAT_JBD is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_AUTOFS_FS=m
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
CONFIG_GENERIC_ACL=y

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
CONFIG_AFFS_FS=m
# CONFIG_ECRYPT_FS is not set
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_BEFS_FS=m
# CONFIG_BEFS_DEBUG is not set
CONFIG_BFS_FS=m
CONFIG_EFS_FS=m
CONFIG_CRAMFS=m
CONFIG_VXFS_FS=m
CONFIG_MINIX_FS=m
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
CONFIG_QNX4FS_FS=m
CONFIG_ROMFS_FS=m
CONFIG_SYSV_FS=m
CONFIG_UFS_FS=m
# CONFIG_UFS_FS_WRITE is not set
# CONFIG_UFS_DEBUG is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_XPRT_RDMA=m
# CONFIG_SUNRPC_REGISTER_V4 is not set
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_SPKM3=m
# CONFIG_SMB_FS is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_WEAK_PW_HASH=y
# CONFIG_CIFS_UPCALL is not set
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_EXPERIMENTAL is not set
CONFIG_NCP_FS=m
CONFIG_NCPFS_PACKET_SIGNING=y
CONFIG_NCPFS_IOCTL_LOCKING=y
CONFIG_NCPFS_STRONG=y
CONFIG_NCPFS_NFS_NS=y
CONFIG_NCPFS_OS2_NS=y
CONFIG_NCPFS_SMALLDOS=y
CONFIG_NCPFS_NLS=y
CONFIG_NCPFS_EXTRAS=y
CONFIG_CODA_FS=m
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
# CONFIG_DETECT_SOFTLOCKUP is not set
# CONFIG_SCHED_DEBUG is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
CONFIG_RT_MUTEX_TESTER=y
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
CONFIG_FRAME_POINTER=y
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_LKDTM is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y

#
# Tracers
#
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SYSPROF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_CONTEXT_SWITCH_TRACER is not set
# CONFIG_BOOT_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_PRINTK_DEBUG is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
CONFIG_DIRECT_GBPAGES=y
# CONFIG_DEBUG_RODATA_TEST is not set
# CONFIG_DEBUG_NX_TEST is not set
# CONFIG_IOMMU_DEBUG is not set
# CONFIG_MMIOTRACE is not set
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
CONFIG_OPTIMIZE_INLINING=y

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
# CONFIG_SECURITY_NETWORK_XFRM is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_SECURITY_ROOTPLUG is not set
CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=0
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
# CONFIG_SECURITY_SELINUX_ENABLE_SECMARK_DEFAULT is not set
# CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
# CONFIG_SECURITY_SMACK is not set
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_AEAD=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_RNG=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=m
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_LRW=m
CONFIG_CRYPTO_PCBC=m
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_AES_X86_64=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_X86_64=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_BALLOON is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
  2008-11-17 11:01           ` Ingo Molnar
@ 2008-11-17 11:20             ` Eric Dumazet
       [not found]               ` <4921539B.2000002-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
       [not found]             ` <20081117110119.GL28786-X9Un+BFzKDI@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 11:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Miller, rjw, linux-kernel, kernel-testers, cl, efault,
	a.p.zijlstra, Linus Torvalds

Ingo Molnar a écrit :
> * David Miller <davem@davemloft.net> wrote:
> 
>> From: Ingo Molnar <mingo@elte.hu>
>> Date: Mon, 17 Nov 2008 10:06:48 +0100
>>
>>> * Rafael J. Wysocki <rjw@sisk.pl> wrote:
>>>
>>>> This message has been generated automatically as a part of a report
>>>> of regressions introduced between 2.6.26 and 2.6.27.
>>>>
>>>> The following bug entry is on the current list of known regressions
>>>> introduced between 2.6.26 and 2.6.27.  Please verify if it still should
>>>> be listed and let me know (either way).
>>>>
>>>>
>>>> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11308
>>>> Subject		: tbench regression on each kernel release from  2.6.22 -&gt; 2.6.28
>>>> Submitter	: Christoph Lameter <cl@linux-foundation.org>
>>>> Date		: 2008-08-11 18:36 (98 days old)
>>>> References	: http://marc.info/?l=linux-kernel&m=121847986119495&w=4
>>>> 		  http://marc.info/?l=linux-kernel&m=122125737421332&w=4
>>> Christoph, as per the recent analysis of Mike:
>>>
>>>  http://fixunix.com/kernel/556867-regression-benchmark-throughput-loss-a622cf6-f7160c7-pull.html
>>>
>>> all scheduler components of this regression have been eliminated.
>>>
>>> In fact his numbers show that scheduler speedups since 2.6.22 have 
>>> offset and hidden most other sources of tbench regression. (i.e. the 
>>> scheduler portion got 5% faster, hence it was able to offset a 
>>> slowdown of 5% in other areas of the kernel that tbench triggers)
>> Although I respect the improvements, wake_up() is still several 
>> orders of magnitude slower than it was in 2.6.22 and wake_up() is at 
>> the top of the profiles in tbench runs.
> 
> hm, several orders of magnitude slower? That contradicts Mike's 
> numbers and my own numbers and profiles as well: see below.
> 
> The scheduler's overhead barely even registers on a 16-way x86 system 
> i'm running tbench on. Here's the NMI profile during 64 threads tbench 
> on a 16-way x86 box with an v2.6.28-rc5 kernel [config attached]:
> 
>   Throughput 3437.65 MB/sec 64 procs
>   ==================================
>   21570252  total 
>   ........
>    1494803  copy_user_generic_string 
>     998232  sock_rfree 
>     491471  tcp_ack 
>     482405  ip_dont_fragment 
>     470685  ip_local_deliver 
>     436325  constant_test_bit         [ called by napi_disable_pending() ]
>     375469  avc_has_perm_noaudit 
>     347663  tcp_sendmsg 
>     310383  tcp_recvmsg 
>     300412  __inet_lookup_established 
>     294377  system_call 
>     286603  tcp_transmit_skb 
>     251782  selinux_ip_postroute 
>     236028  tcp_current_mss 
>     235631  schedule 
>     234013  netif_rx 
>     229854  _local_bh_enable_ip 
>     219501  tcp_v4_rcv 
> 
>     [ etc. - see full profile attached further below ]
> 
> Note that the scheduler does not even show up in the profile up to 
> entry #15!
> 
> I've also summarized NMI profiler output by major subsystems:
> 
>            NET       overhead (12603450/21570252): 58.43%
>            security  overhead ( 1903598/21570252):  8.83%
>            usercopy  overhead ( 1753617/21570252):  8.13%
>            sched     overhead ( 1599406/21570252):  7.41%
>            syscall   overhead (  560487/21570252):  2.60%
>            IRQ       overhead (  555439/21570252):  2.58%
>            slab      overhead (  492421/21570252):  2.28%
>            timer     overhead (  226573/21570252):  1.05%
>            pagealloc overhead (  192681/21570252):  0.89%
>            PID       overhead (  115123/21570252):  0.53%
>            VFS       overhead (  107926/21570252):  0.50%
>            pagecache overhead (   62552/21570252):  0.29%
>            gtod      overhead (   38651/21570252):  0.18%
>            IDLE      overhead (       0/21570252):  0.00%
> ---------------------------------------------------------
>                          left ( 1349494/21570252):  6.26%
> 
> The scheduler's functions are absolutely flat, and consistent with an 
> extreme context-switching rate of 1.35 million per second. The 
> scheduler can go up to about 20 million context switches per second on 
> this system:
> 
>  procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>   r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  32  0      0 32229696  29308 649880    0    0     0     0 164135 20026853 24 76  0  0  0
>  32  0      0 32229752  29308 649880    0    0     0     0 164203 20032770 24 76  0  0  0
>  32  0      0 32229752  29308 649880    0    0     0     0 164201 20036492 25 75  0  0  0
> 
> ... and 7% scheduling overhead is roughly consistent with 1.35/20.0.
> 
> Wake up affinities and data flow caching is just fine in this workload 
> - we've got scheduler statistics for that and they look good too.
> 
> It all looks like pure old-fashioned straight overhead in the 
> networking layer to me. Do we still touch the same global cacheline 
> for every localhost packet we process? Anything like that would show 
> up big time.

Yes we do, I find strange we dont see dst_release() in your NMI profile

I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387
net: make sure struct dst_entry refcount is aligned on 64 bytes)
 (in net-next-2.6 tree)
to properly align struct dst_entry refcounter and got 4% speedup on tbench on my machine.

Small speedups too with commit ef711cf1d156428d4c2911b8c86c6ce90519dc45
(net: speedup dst_release())

Also on net-next-2.6, patches avoid dirtying last_rx on netdevices (loopback for example)
, it helps a lot tbench too.


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]               ` <4921539B.2000002-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2008-11-17 16:11                 ` Ingo Molnar
       [not found]                   ` <20081117161135.GE12081-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 16:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Linus Torvalds


* Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:

>> It all looks like pure old-fashioned straight overhead in the 
>> networking layer to me. Do we still touch the same global cacheline 
>> for every localhost packet we process? Anything like that would 
>> show up big time.
>
> Yes we do, I find strange we dont see dst_release() in your NMI 
> profile
>
> I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387 
> net: make sure struct dst_entry refcount is aligned on 64 bytes) (in 
> net-next-2.6 tree) to properly align struct dst_entry refcounter and 
> got 4% speedup on tbench on my machine.

Ouch, +4% from a oneliner networking change? That's a _huge_ speedup 
compared to the things we were after in scheduler land. A lot of 
scheduler folks worked hard to squeeze the last 1-2% out of the 
scheduler fastpath (which was not trivial at all). The _full_ 
scheduler accounts for only about 7% of the total system overhead here 
on a 16-way box...

So why should we be handling this anything but a plain networking 
performance regression/weakness? The localhost scalability bottleneck 
has been reported a _long_ time ago.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                   ` <20081117161135.GE12081-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 16:35                     ` Eric Dumazet
       [not found]                       ` <49219D36.5020801-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  2008-11-17 19:31                     ` David Miller
  1 sibling, 1 reply; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 16:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Linus Torvalds,
	Stephen Hemminger

Ingo Molnar a écrit :
> * Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:
> 
>>> It all looks like pure old-fashioned straight overhead in the 
>>> networking layer to me. Do we still touch the same global cacheline 
>>> for every localhost packet we process? Anything like that would 
>>> show up big time.
>> Yes we do, I find strange we dont see dst_release() in your NMI 
>> profile
>>
>> I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387 
>> net: make sure struct dst_entry refcount is aligned on 64 bytes) (in 
>> net-next-2.6 tree) to properly align struct dst_entry refcounter and 
>> got 4% speedup on tbench on my machine.
> 
> Ouch, +4% from a oneliner networking change? That's a _huge_ speedup 
> compared to the things we were after in scheduler land. A lot of 
> scheduler folks worked hard to squeeze the last 1-2% out of the 
> scheduler fastpath (which was not trivial at all). The _full_ 
> scheduler accounts for only about 7% of the total system overhead here 
> on a 16-way box...

4% on my machine, but apparently my machine is sooooo special (see oprofile thread),
so maybe its cpus have a hard time playing with a contended cache line.

It definitly needs more testing on other machines.

Maybe you'll discover patch is bad on your machines, this is why it's in
net-next-2.6

> 
> So why should we be handling this anything but a plain networking 
> performance regression/weakness? The localhost scalability bottleneck 
> has been reported a _long_ time ago.
> 

struct dst_entry problem was already discovered a _long_ time ago
and probably solved at this time.

(commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
Thu, 13 Mar 2008 05:52:37 +0000 (22:52 -0700)
[NET]: Fix tbench regression in 2.6.25-rc1)

Then, a gremlin came and broke the thing.

They are many contended cache lines in the system, we can do our
best to try to make them disappear. Thats not always possible.

Another contended cache line is the rwlock in iptables.
I remember Stephen had a patch to make the thing use RCU.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                       ` <49219D36.5020801-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2008-11-17 17:08                         ` Ingo Molnar
       [not found]                           ` <20081117170844.GJ12081-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 17:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Linus Torvalds,
	Stephen Hemminger


* Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:

> Ingo Molnar a écrit :
>> * Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:
>>
>>>> It all looks like pure old-fashioned straight overhead in the  
>>>> networking layer to me. Do we still touch the same global cacheline 
>>>> for every localhost packet we process? Anything like that would  
>>>> show up big time.
>>> Yes we do, I find strange we dont see dst_release() in your NMI  
>>> profile
>>>
>>> I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387  
>>> net: make sure struct dst_entry refcount is aligned on 64 bytes) (in  
>>> net-next-2.6 tree) to properly align struct dst_entry refcounter and  
>>> got 4% speedup on tbench on my machine.
>>
>> Ouch, +4% from a oneliner networking change? That's a _huge_ speedup  
>> compared to the things we were after in scheduler land. A lot of  
>> scheduler folks worked hard to squeeze the last 1-2% out of the  
>> scheduler fastpath (which was not trivial at all). The _full_  
>> scheduler accounts for only about 7% of the total system overhead here  
>> on a 16-way box...
>
> 4% on my machine, but apparently my machine is sooooo special (see 
> oprofile thread), so maybe its cpus have a hard time playing with a 
> contended cache line.
>
> It definitly needs more testing on other machines.
>
> Maybe you'll discover patch is bad on your machines, this is why 
> it's in net-next-2.6

ok, i'll try it on my testbox too, to check whether it has any effect 
- find below the port to -git.

tbench _is_ very sensitive to seemingly small details - it seems to be 
hoovering at around some sort of CPU cache boundary and penalizing 
random alignment changes, as we drop in and out of the sweet spot.

Mike Galbraith has been spending months trying to pin down all the 
issues.

	Ingo

------------->
From 8fbd307d402647b07c3c2662fdac589494d16e5e Mon Sep 17 00:00:00 2001
From: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
Date: Sun, 16 Nov 2008 19:46:36 -0800
Subject: [PATCH] net: make sure struct dst_entry refcount is aligned on 64 bytes

As found in the past (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.

We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.

for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0

As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)

Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.

"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)

Signed-off-by: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
---
 include/net/dst.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 8a8b71e..1b4de18 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -59,7 +59,11 @@ struct dst_entry
 
 	struct neighbour	*neighbour;
 	struct hh_cache		*hh;
+#ifdef CONFIG_XFRM
 	struct xfrm_state	*xfrm;
+#else
+	void			*__pad1;
+#endif
 
 	int			(*input)(struct sk_buff*);
 	int			(*output)(struct sk_buff*);
@@ -70,8 +74,20 @@ struct dst_entry
 
 #ifdef CONFIG_NET_CLS_ROUTE
 	__u32			tclassid;
+#else
+	__u32			__pad2;
 #endif
 
+
+	/*
+	 * Align __refcnt to a 64 bytes alignment
+	 * (L1_CACHE_SIZE would be too much)
+	 */
+#ifdef CONFIG_64BIT
+	long			__pad_to_align_refcnt[2];
+#else
+	long			__pad_to_align_refcnt[1];
+#endif
 	/*
 	 * __refcnt wants to be on a different cache line from
 	 * input/output/ops or performance tanks badly
@@ -157,6 +173,11 @@ dst_metric_locked(struct dst_entry *dst, int metric)
 
 static inline void dst_hold(struct dst_entry * dst)
 {
+	/*
+	 * If your kernel compilation stops here, please check
+	 * __pad_to_align_refcnt declaration in struct dst_entry
+	 */
+	BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63);
 	atomic_inc(&dst->__refcnt);
 }
 

^ permalink raw reply related	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                           ` <20081117170844.GJ12081-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 17:25                             ` Ingo Molnar
       [not found]                               ` <20081117172549.GA27974-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 19:36                             ` David Miller
  1 sibling, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 17:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Linus Torvalds,
	Stephen Hemminger


* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:

> > 4% on my machine, but apparently my machine is sooooo special (see 
> > oprofile thread), so maybe its cpus have a hard time playing with 
> > a contended cache line.
> >
> > It definitly needs more testing on other machines.
> >
> > Maybe you'll discover patch is bad on your machines, this is why 
> > it's in net-next-2.6
> 
> ok, i'll try it on my testbox too, to check whether it has any effect 
> - find below the port to -git.

it gives a small speedup of ~1% on my box:

   before:      Throughput 3437.65 MB/sec 64 procs
   after:       Throughput 3473.99 MB/sec 64 procs

... although that's still a bit close to the natural tbench noise 
range so it's not conclusive and not like a smoking gun IMO.

But i think this change might just be papering over the real 
scalability problem that this workload has in my opinion: that there's 
a single localhost route/dst/device that millions of packets are 
squeezed through every second:

 phoenix:~> ifconfig lo
 lo        Link encap:Local Loopback  
           inet addr:127.0.0.1  Mask:255.0.0.0
           UP LOOPBACK RUNNING  MTU:16436  Metric:1
           RX packets:258001524 errors:0 dropped:0 overruns:0 frame:0
           TX packets:258001524 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0 
           RX bytes:679809512144 (633.1 GiB)  TX bytes:679809512144 (633.1 GiB)

There does not seem to be any per CPU ness in localhost networking - 
it has a globally single-threaded rx/tx queue AFAICS even if both the 
client and server task is on the same CPU - how is that supposed to 
perform well? (but i might be missing something)

What kind of test-system do you have - one with P4 style Xeon CPUs 
perhaps where dirty-cacheline cachemisses to DRAM were particularly 
expensive?

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                               ` <20081117172549.GA27974-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 17:33                                 ` Eric Dumazet
       [not found]                                   ` <4921AAD6.3010603-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 17:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Linus Torvalds,
	Stephen Hemminger

Ingo Molnar a écrit :
> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
> 
>>> 4% on my machine, but apparently my machine is sooooo special (see 
>>> oprofile thread), so maybe its cpus have a hard time playing with 
>>> a contended cache line.
>>>
>>> It definitly needs more testing on other machines.
>>>
>>> Maybe you'll discover patch is bad on your machines, this is why 
>>> it's in net-next-2.6
>> ok, i'll try it on my testbox too, to check whether it has any effect 
>> - find below the port to -git.
> 
> it gives a small speedup of ~1% on my box:
> 
>    before:      Throughput 3437.65 MB/sec 64 procs
>    after:       Throughput 3473.99 MB/sec 64 procs

Strange, I get 2350 MB/sec on my 8 cpus box. "tbench 8"

> 
> ... although that's still a bit close to the natural tbench noise 
> range so it's not conclusive and not like a smoking gun IMO.
> 
> But i think this change might just be papering over the real 
> scalability problem that this workload has in my opinion: that there's 
> a single localhost route/dst/device that millions of packets are 
> squeezed through every second:

Yes, this point was mentioned on netdev a while back.

> 
>  phoenix:~> ifconfig lo
>  lo        Link encap:Local Loopback  
>            inet addr:127.0.0.1  Mask:255.0.0.0
>            UP LOOPBACK RUNNING  MTU:16436  Metric:1
>            RX packets:258001524 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:258001524 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:0 
>            RX bytes:679809512144 (633.1 GiB)  TX bytes:679809512144 (633.1 GiB)
> 
> There does not seem to be any per CPU ness in localhost networking - 
> it has a globally single-threaded rx/tx queue AFAICS even if both the 
> client and server task is on the same CPU - how is that supposed to 
> perform well? (but i might be missing something)

Stephen had a patch for this one too, but we got tbench noise too with this patch

http://kerneltrap.org/mailarchive/linux-netdev/2008/11/5/3926034


> 
> What kind of test-system do you have - one with P4 style Xeon CPUs 
> perhaps where dirty-cacheline cachemisses to DRAM were particularly 
> expensive?

Its a HP BL460c g1

Dual quad-core cpus Intel E5450  @3.00GHz

So 8 logical cpus. My bench was "tbench 8"


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                   ` <4921AAD6.3010603-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2008-11-17 17:38                                     ` Linus Torvalds
       [not found]                                       ` <alpine.LFD.2.00.0811170937540.3468-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-11-17 17:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger



On Mon, 17 Nov 2008, Eric Dumazet wrote:

> Ingo Molnar a écrit :

> > it gives a small speedup of ~1% on my box:
> > 
> >    before:      Throughput 3437.65 MB/sec 64 procs
> >    after:       Throughput 3473.99 MB/sec 64 procs
> 
> Strange, I get 2350 MB/sec on my 8 cpus box. "tbench 8"

I think Ingo may have a Nehalem. Let's just say that those things rock, 
and have rather good memory throughput.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                       ` <alpine.LFD.2.00.0811170937540.3468-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-11-17 17:42                                         ` Eric Dumazet
  2008-11-17 18:23                                         ` Ingo Molnar
  1 sibling, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 17:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger

Linus Torvalds a écrit :
> 
> On Mon, 17 Nov 2008, Eric Dumazet wrote:
> 
>> Ingo Molnar a écrit :
> 
>>> it gives a small speedup of ~1% on my box:
>>>
>>>    before:      Throughput 3437.65 MB/sec 64 procs
>>>    after:       Throughput 3473.99 MB/sec 64 procs
>> Strange, I get 2350 MB/sec on my 8 cpus box. "tbench 8"
> 
> I think Ingo may have a Nehalem. Let's just say that those things rock, 
> and have rather good memory throughput.
> 

I want one :)

Or even two of them :)


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                       ` <alpine.LFD.2.00.0811170937540.3468-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-11-17 17:42                                         ` Eric Dumazet
@ 2008-11-17 18:23                                         ` Ingo Molnar
       [not found]                                           ` <20081117182320.GA26844-X9Un+BFzKDI@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 18:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger

* Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> On Mon, 17 Nov 2008, Eric Dumazet wrote:
> 
> > Ingo Molnar a écrit :
> 
> > > it gives a small speedup of ~1% on my box:
> > > 
> > >    before:      Throughput 3437.65 MB/sec 64 procs
> > >    after:       Throughput 3473.99 MB/sec 64 procs
> > 
> > Strange, I get 2350 MB/sec on my 8 cpus box. "tbench 8"
> 
> I think Ingo may have a Nehalem. Let's just say that those things 
> rock, and have rather good memory throughput.

hm, i'm not sure whether i can post benchmarks from the Nehalem box - 
but i can confirm it in general terms that it's rather nice ;-)

This was run on another testbox (4x4 Barcelona) that rocks similarly 
well in terms of memory subsystem latencies: which seems to be 
tbench's main current critical path.

For the tbench bragging rights i'd probably turn off CONFIG_SECURITY 
and a few other options. Plus i'd run with 16 threads only - in this 
test i ran with 4x overload (64 tbench threads, not 16) to stress the 
scheduler harder.

Although we degrade very gently with overload so the numbers arent all 
that much different:

   16 threads: Throughput 3463.14 MB/sec 16 procs
   64 threads: Throughput 3473.99 MB/sec 64 procs
  256 threads: Throughput 3457.67 MB/sec 256 procs
 1024 threads: Throughput 3448.85 MB/sec 1024 procs

 [ so it's the same within noise range. ]

1024 threads is already a massive 64x overload so beyond any 
reasonable limit of workload sanity.

Which suggests that the main limitation factor is cacheline ping-pong 
that is already in full effect at 16 threads.

Which is supported by the "most expensive instructions" top-10 sorted 
list:

            RIP     #hits
..........................                           

                           [ usercopy ]
ffffffff80350fcd:  1373300 	f3 48 a5             	rep movsq %ds:(%rsi),%es:(%rdi)

ffffffff804a2f33:          <sock_rfree>:
ffffffff804a2f34:   985253 	48 89 e5             	mov    %rsp,%rbp

ffffffff804d2eb7:          <ip_local_deliver>:
ffffffff804d2eb8:   432659 	48 89 e5             	mov    %rsp,%rbp

ffffffff804aa23c:          <constant_test_bit>: [ => napi_disable_pending() ]
ffffffff804aa24c:   374052 	89 d1                	mov    %edx,%ecx

ffffffff804d5076:          <ip_dont_fragment>:
ffffffff804d5076:   310051 	8a 97 56 02 00 00    	mov    0x256(%rdi),%dl

ffffffff804d9b17:          <__inet_lookup_established>:
ffffffff804d9bdf:   247224 	eb ba                	jmp    ffffffff804d9b9b <__inet_lookup_established+0x84>

ffffffff80321529:          <selinux_ip_postroute>:
ffffffff8032152a:   183700 	48 89 e5             	mov    %rsp,%rbp

ffffffff8020c020:          <system_call>:
ffffffff8020c020:   183600 	0f 01 f8             	swapgs 

ffffffff8051884a:          <netlbl_enabled>:
ffffffff8051884a:   179538 	55                   	push   %rbp

The usual profiling caveat applies: it's not _these_ instructions that 
matter, but the surrounding code that calls them. Profiling overhead 
is delayed by a couple of instructions - the more out-of-order a CPU 
is, the larger this delay can be. But even a quick look to the list 
above shows that all of the heavy cachemisses are generated by 
networking.

Beyond the usual suspects of syscall entry and memcpy, it's only 
networking. We dont even have the mov %cr3 TLB flush overhead in this 
list, load_cr3() is a distant #30:

ffffffff8023049f:        0      0f 22 d8                mov    %rax,%cr3
ffffffff802304a2:   126303      c9                      leaveq

The place for the sock_rfree() hit looks a bit weird, and i'll 
investigate it now a bit more to place the real overhead point 
properly. (i already mapped the test-bit overhead: that comes from 
napi_disable_pending())

The first entry is 10x the cost of the last entry in the list so 
clearly we've got 1-2 brutal cacheline ping-pongs that dominate the 
overhead of this workload.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                           ` <20081117182320.GA26844-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 18:33                                             ` Linus Torvalds
  2008-11-17 18:49                                             ` Ingo Molnar
  1 sibling, 0 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-11-17 18:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger



On Mon, 17 Nov 2008, Ingo Molnar wrote:
> 
> hm, i'm not sure whether i can post benchmarks from the Nehalem box - 
> but i can confirm it in general terms that it's rather nice ;-)

Intel released the NDA from various web sites a week or two ago, and Intel 
is now selling it in the US (I think today was in fact the official 
launch), so I think benchmarks are safe - you can buy the dang things on 
the street.

I don't know what availability is, of course. But I doubt that Intel would 
mind Nehalem benchmarks even if it were a paper launch - at least from my 
personal experience, I've not seen any bad behavior (and plenty of good).

> This was run on another testbox (4x4 Barcelona) that rocks similarly 
> well in terms of memory subsystem latencies: which seems to be 
> tbench's main current critical path.

Ahh, ok.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                           ` <20081117182320.GA26844-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 18:33                                             ` Linus Torvalds
@ 2008-11-17 18:49                                             ` Ingo Molnar
       [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 22:08                                               ` Ingo Molnar
  1 sibling, 2 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 18:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger


* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:

4> The place for the sock_rfree() hit looks a bit weird, and i'll 
> investigate it now a bit more to place the real overhead point 
> properly. (i already mapped the test-bit overhead: that comes from 
> napi_disable_pending())

ok, here's a new set of profiles. (again for tbench 64-thread on a 
16-way box, with v2.6.28-rc5-19-ge14c8bf and with the kernel config i 
posted before.)

Here are the per major subsystem percentages:

           NET       overhead ( 5786945/10096751): 57.31%
           security  overhead (  925933/10096751):  9.17%
           usercopy  overhead (  837887/10096751):  8.30%
           sched     overhead (  753662/10096751):  7.46%
           syscall   overhead (  268809/10096751):  2.66%
           IRQ       overhead (  266500/10096751):  2.64%
           slab      overhead (  180258/10096751):  1.79%
           timer     overhead (   92986/10096751):  0.92%
           pagealloc overhead (   87381/10096751):  0.87%
           VFS       overhead (   53295/10096751):  0.53%
           PID       overhead (   44469/10096751):  0.44%
           pagecache overhead (   33452/10096751):  0.33%
           gtod      overhead (   11064/10096751):  0.11%
           IDLE      overhead (       0/10096751):  0.00%
---------------------------------------------------------
                         left (  753878/10096751):  7.47%

The breakdown is very similar to what i sent before, within noise.

[ 'left' is random overhead from all around the place - i categorized 
  the 500 most expensive functions in the profile per subsystem.
  I stopped short of doing it for all 1300+ functions: it's rather
  laborous manual work even with hefty use of regex patterns.
  It's also less meaningful in practice: the trend in the first 500
  functions is present in the remaining 800 functions as well. I 
  watched the breakdown evolve as i increased the coverage - in 
  practice it is the first 100 functions that matter - it just doesnt 
  change after that. ]

The readprofile output below seems structured in a more useful way now 
- i tweaked compiler options to have the profiler hits spread out in a 
more meaningful way. I collected 10 million NMI profiler hits, and 
normalized the readprofile output up to 100%.

[ I'll post per function analysis as i complete them, as a reply to
  this mail. ]

	Ingo

100.000000 total
................
  7.253355 copy_user_generic_string
  3.934833 avc_has_perm_noaudit
  3.356152 ip_queue_xmit
  3.038025 skb_release_data
  2.118525 skb_release_head_state
  1.997533 tcp_ack
  1.833688 tcp_recvmsg
  1.717771 eth_type_trans
  1.673249 __inet_lookup_established
  1.508888 system_call
  1.469183 tcp_current_mss
  1.431553 tcp_transmit_skb
  1.385125 tcp_sendmsg
  1.327643 tcp_v4_rcv
  1.292328 nf_hook_thresh
  1.203205 schedule
  1.059501 nf_hook_slow
  1.027373 constant_test_bit
  0.945183 sock_rfree
  0.922748 __switch_to
  0.911605 netif_rx
  0.876270 register_gifconf
  0.788200 ip_local_deliver_finish
  0.781467 dev_queue_xmit
  0.766530 constant_test_bit
  0.758208 _local_bh_enable_ip
  0.747184 load_cr3
  0.704341 memset_c
  0.671260 sysret_check
  0.651845 ip_finish_output2
  0.620204 audit_free_names
  0.617781 audit_syscall_exit
  0.615149 skb_copy_datagram_iovec
  0.613848 selinux_socket_sock_rcv_skb
  0.606995 constant_test_bit
  0.593936 __tcp_push_pending_frames
  0.592198 tcp_cleanup_rbuf
  0.574093 ip_rcv
  0.567886 netif_receive_skb
  0.563377 get_page_from_freelist
  0.557657 tcp_event_data_recv
  0.539274 ip_local_deliver
  0.534130 sys_recvfrom
  0.512321 __tcp_select_window
  0.498427 tcp_rcv_established
  0.494862 sys_sendto
  0.487473 audit_syscall_entry
  0.478495 sched_clock_cpu
  0.474861 kfree
  0.466310 tcp_established_options
  0.461384 net_rx_action
  0.447162 __mod_timer
  0.442078 ip_rcv_finish
  0.441631 find_pid_ns
  0.441124 sk_wait_data
  0.423943 __sock_recvmsg
  0.422126 selinux_parse_skb
  0.417975 __napi_schedule
  0.414082 __do_softirq
  0.403604 task_rq_lock
  0.380792 nf_iterate
  0.377614 select_task_rq_fair
  0.374973 sock_sendmsg
  0.374635 kmem_cache_alloc_node
  0.368775 avc_has_perm
  0.368706 local_bh_disable
  0.361834 release_sock
  0.346400 sock_common_recvmsg
  0.342825 skb_clone
  0.338704 __alloc_skb
  0.326488 do_softirq
  0.323410 lock_sock_nested
  0.322129 __copy_skb_header
  0.316835 put_page
  0.310966 selinux_ip_postroute
  0.306229 sel_netport_sid
  0.299863 try_to_wake_up
  0.296288 process_backlog
  0.294818 __inet_lookup
  0.294778 thread_return
  0.293219 cfs_rq_of
  0.292315 internal_add_timer
  0.292305 tcp_rcv_space_adjust
  0.281053 constant_test_bit
  0.278779 local_bh_enable
  0.272910 *unknown*
  0.269593 schedule_timeout
  0.261846 tcp_v4_md5_lookup
  0.260992 __ip_local_out
  0.255868 __enqueue_entity
  0.253931 avc_audit
  0.252004 finish_task_switch
  0.249263 audit_get_context
  0.248290 sockfd_lookup_light
  0.247416 virt_to_head_page
  0.244149 tcp_options_write
  0.243603 memcpy_toiovec
  0.243434 sock_recvmsg
  0.242599 call_softirq
  0.242391 __unlazy_fpu
  0.236412 fput_light
  0.235628 ret_from_sys_call
  0.234933 sk_reset_timer
  0.228358 math_state_restore
  0.227117 socket_has_perm
  0.223492 virt_to_cache
  0.219063 __cache_free
  0.216401 update_curr
  0.216232 tcp_v4_send_check
  0.213978 audit_free_aux
  0.213223 tcp_v4_do_rcv
  0.212975 __kfree_skb
  0.211137 dev_hard_start_xmit
  0.209052 tcp_rtt_estimator
  0.207999 netif_needs_gso
  0.207662 __update_sched_clock
  0.207284 rb_erase
  0.204861 enqueue_task_fair
  0.203490 skb_release_all
  0.203252 tcp_send_delayed_ack
  0.203232 inet_ehashfn
  0.199846 sel_netport_find
  0.195396 system_call_after_swapgs
  0.186756 lock_timer_base
  0.186687 pick_next_task_fair
  0.183986 mod_timer
  0.182982 loopback_xmit
  0.182605 native_read_tsc
  0.181195 skb_set_owner_r
  0.179248 switch_mm
  0.175584 set_next_entity
  0.173329 raw_local_deliver
  0.171641 sys_kill
  0.164510 dequeue_task_fair
  0.161938 clear_bit
  0.160528 sock_def_readable
  0.157628 __tcp_ack_snd_check
  0.156893 skb_can_coalesce
  0.156556 tcp_snd_wnd_test
  0.155662 ip_output
  0.150627 sk_stream_alloc_skb
  0.150219 cpu_sdc
  0.149425 sysret_careful
  0.148760 tcp_data_snd_check
  0.147816 auditsys
  0.147419 pskb_may_pull
  0.147151 fget_light
  0.143774 tcp_cwnd_test
  0.143029 rb_insert_color
  0.142265 __wake_up
  0.141808 tcp_bound_to_half_wnd
  0.138600 __sk_dst_check
  0.138431 free_hot_cold_page
  0.137954 unroll_tree_refs
  0.137080 __skb_unlink
  0.135124 __sock_sendmsg
  0.135064 get_pageblock_flags_group
  0.132701 kmem_cache_free
  0.128152 bictcp_cong_avoid
  0.127874 __napi_complete
  0.127527 ____cache_alloc
  0.127368 tcp_is_cwnd_limited
  0.127278 find_vpid
  0.126941 constant_test_bit
  0.126504 sk_mem_charge
  0.126255 __alloc_pages_internal
  0.125977 dst_release
  0.125521 hash_64
  0.124895 put_prev_task_fair
  0.123802 netlbl_enabled
  0.122829 sched_clock
  0.122640 skb_push
  0.122035 __phys_addr
  0.121161 dput
  0.120515 tcp_prequeue_process
  0.118916 __skb_dequeue
  0.117715 selinux_socket_sendmsg
  0.117536 __inc_zone_state
  0.115907 sk_wake_async
  0.113504 selinux_ipv4_output
  0.113017 sel_netif_sid
  0.112431 skb_reset_network_header
  0.111170 check_preempt_wakeup
  0.111061 bictcp_acked
  0.110882 sel_netnode_find
  0.109978 update_min_vruntime
  0.109889 resched_task
  0.109879 current_kernel_time
  0.109432 tcp_checksum_complete_user
  0.107476 ip_dont_fragment
  0.107386 sysret_audit
  0.106979 inet_csk_reset_xmit_timer
  0.106006 skb_entail
  0.105777 sysret_signal
  0.105420 avc_hash
  0.105251 __skb_clone
  0.105211 tcp_init_tso_segs
  0.103523 __dequeue_entity
  0.101715 PageLRU
  0.101378 tcp_parse_aligned_timestamp
  0.101219 __xchg
  0.100544 constant_test_bit
  0.097991 __kmalloc
  0.097584 test_tsk_thread_flag
  0.097475 autoremove_wake_function
  0.095747 selinux_task_kill
  0.094416 get_page
  0.093353 dequeue_task
  0.092728 __local_bh_disable
  0.091943 selinux_netlbl_sock_rcv_skb
  0.091655 path_put
  0.090970 skb_headroom
  0.090950 PageTail
  0.090642 dst_destroy
  0.090523 netpoll_rx
  0.089589 skb_header_pointer
  0.085935 security_socket_recvmsg
  0.084008 alloc_pages_current
  0.083184 compare_ether_addr
  0.082479 rb_next
  0.082439 sk_wmem_schedule
  0.081635 next_zones_zonelist
  0.080135 tcp_cwnd_validate
  0.079877 tcp_event_new_data_sent
  0.079817 fcheck_files
  0.079082 ip_skb_dst_mtu
  0.078804 ip_finish_output
  0.078278 wakeup_preempt_entity
  0.077026 sel_netif_find
  0.076788 __skb_queue_tail
  0.076570 sock_flag
  0.076520 tcp_win_from_space
  0.076510 zone_watermark_ok
  0.076282 sel_netnode_sid
  0.076162 policy_zonelist
  0.074732 __wake_up_common
  0.074613 compound_head
  0.074593 task_has_perm
  0.073243 __find_general_cachep
  0.073064 tcp_push
  0.072925 skb_cloned
  0.072309 pskb_may_pull
  0.071852 TCP_ECN_check_ce
  0.071495 cap_task_to_inode
  0.070770 default_wake_function
  0.069429 xfrm4_policy_check
  0.069091 tcp_parse_md5sig_option
  0.068287 tcp_v4_md5_do_lookup
  0.068059 tcp_v4_tw_remember_stamp
  0.067344 tcp_ca_event
  0.067125 tcp_ca_event
  0.065457 place_entity
  0.065318 write_seqlock
  0.065089 device_not_available
  0.065069 test_ti_thread_flag
  0.063878 tcp_set_skb_tso_segs
  0.063550 selinux_netlbl_inode_permission
  0.063391 sock_wfree
  0.063311 prepare_to_wait
  0.058872 pid_vnr
  0.058803 __cycles_2_ns
  0.057631 ip_local_out
  0.057333 tcp_ack_saw_tstamp
  0.056896 copy_to_user
  0.056628 set_bit
  0.055913 free_pages_check
  0.054969 tcp_rcv_rtt_measure_ts
  0.053797 init_rootdomain
  0.053708 selinux_socket_recvmsg
  0.053698 pid_nr_ns
  0.053629 sk_eat_skb
  0.052814 _local_bh_enable
  0.052645 nf_hook_thresh
  0.052516 sched_info_queued
  0.052457 enqueue_task
  0.052228 sk_filter
  0.052159 __cpu_clear
  0.051980 local_bh_enable_ip
  0.050292 update_rq_clock
  0.048981 task_tgid_vnr
  0.048881 copy_from_user
  0.048782 tcp_parse_options
  0.048484 lock_sock
  0.047779 net_timestamp
  0.047044 open_softirq
  0.046955 tcp_win_from_space
  0.045981 __skb_dequeue
  0.043846 getboottime
  0.043777 account_group_exec_runtime
  0.043519 can_checksum_protocol
  0.043469 set_user_nice
  0.042784 skb_fill_page_desc
  0.042247 security_socket_sendmsg
  0.041989 read_profile
  0.041930 tcp_validate_incoming
  0.041612 check_preempt_curr
  0.041413 skb_pull
  0.041026 generic_smp_call_function_interrupt
  0.041016 calc_delta_fair
  0.040936 clear_buddies
  0.040768 tcp_data_queue
  0.040698 page_count
  0.039695 lock_sock
  0.039099 skb_headroom
  0.038851 system_call_fastpath
  0.038622 zone_statistics
  0.037500 tcp_sack_extend
  0.037381 __kmalloc_node
  0.036587 first_zones_zonelist
  0.036497 mntput
  0.036179 pick_next_task
  0.035991 kmap
  0.035911 sock_put
  0.035613 deactivate_task
  0.035027 __nr_to_section
  0.033985 page_zone
  0.033190 native_load_tls
  0.032882 netif_tx_queue_stopped
  0.032713 __skb_insert
  0.032187 sock_flag
  0.031988 check_kill_permission
  0.031790 policy_nodemask
  0.031621 detach_timer
  0.030558 inet_csk_clear_xmit_timer
  0.030469 task_rq_unlock
  0.029883 tcp_nagle_test
  0.029744 tracesys
  0.028383 virt_to_slab
  0.028115 tcp_v4_check
  0.028046 __cpu_set
  0.027658 page_get_cache
  0.027063 tcp_store_ts_recent
  0.027053 __skb_pull
  0.026953 gfp_zone
  0.026586 sock_rcvlowat
  0.026576 csum_partial
  0.026397 init_waitqueue_head
  0.026109 finish_wait
  0.026040 kill_pid_info
  0.025404 tcp_full_space
  0.024888 __skb_queue_before
  0.024550 dst_confirm
  0.022603 inet_ehash_bucket
  0.021888 activate_task
  0.021650 tcp_rto_min
  0.021283 d_callback
  0.020965 signal_pending
  0.020925 avc_node_free
  0.020915 empty_bucket
  0.020746 group_send_sig_info
  0.020657 skb_reset_transport_header
  0.020061 sock_put
  0.019992 signal_pending_state
  0.019684 tcp_sync_mss
  0.019346 skb_network_offset
  0.019276 skb_split
  0.018988 tcp_adjust_fackets_out
  0.018204 tcp_fast_path_check
  0.017727 __skb_unlink
  0.017687 napi_disable_pending
  0.017678 sg_set_page
  0.017022 get_pageblock_bitmap
  0.016972 tcp_cong_avoid
  0.016962 pid_task
  0.016754 skb_set_tail_pointer
  0.016039 selinux_ipv4_postroute
  0.015930 idle_cpu
  0.015632 skb_reset_network_header
  0.015552 __count_vm_events
  0.015483 source_load
  0.014867 __skb_unlink
  0.014738 skb_reset_transport_header
  0.014599 set_bit
  0.014241 audit_zero_context
  0.014231 zone_page_state
  0.014152 clear_bit
  0.013874 PageSlab
  0.013546 __memset
  0.013238 get_pageblock_migratetype
  0.012623 __rb_rotate_right
  0.012543 kmem_find_general_cachep
  0.012414 __kprobes_text_start
  0.012344 security_sock_rcv_skb
  0.012344 node_zonelist
  0.012335 dnotify_parent
  0.012096 skb_headroom
  0.011778 tcp_push_one
  0.011540 mnt_want_write
  0.011143 kmalloc
  0.011073 retint_swapgs
  0.010954 __rb_rotate_left
  0.010805 check_pgd_range
  0.010785 tcp_mss_split_point
  0.010755 migrate_timer_list
  0.010338 __send_IPI_dest_field
  0.010229 reschedule_interrupt
  0.010179 sock_flag
  0.009882 smp_call_function_mask
  0.009673 test_tsk_need_resched
  0.009564 tcp_urg
  0.009504 generic_file_aio_read
  0.009176 PageReserved
  0.009147 net_invalid_timestamp
  0.009087 __node_set
  0.008749 do_tcp_setsockopt
  0.008730 set_tsk_thread_flag
  0.008720 tcp_enter_loss
  0.008422 sock_error
  0.008362 target_load
  0.008302 crypto_hash_update
  0.008104 PageReadahead
  0.008044 tcp_poll
  0.007915 tcp_checksum_complete
  0.007329 tcp_snd_test
  0.007309 selinux_file_permission
  0.007290 sel_netif_destroy
  0.007220 put_pages_list
  0.006992 dst_output
  0.006743 prepare_to_copy
  0.006694 tcp_init_cwnd
  0.006555 clear_bit
  0.006535 set_bit
  0.006425 normal_prio
  0.006366 msleep
  0.006346 error_sti
  0.006336 tcp_rcv_rtt_update
  0.006167 tcp_send_ack
  0.005989 tcp_init_nondata_skb
  0.005720 kfree_skb
  0.005502 call_function_interrupt
  0.005413 __count_vm_event
  0.005403 __skb_checksum_complete_head
  0.005363 page_cache_get_speculative
  0.005323 dev_kfree_skb_irq
  0.005174 skb_store_bits
  0.004956 cpu_avg_load_per_task
  0.004916 dev_cpu_callback
  0.004807 __kmem_cache_destroy
  0.004777 tcp_init_metrics
  0.004777 io_schedule
  0.004777 find_get_page
  0.004707 eth_header_parse
  0.004688 cap_task_kill
  0.004678 error_exit
  0.004668 rb_prev
  0.004658 tso_fragment
  0.004648 mmdrop
  0.004628 skb_reset_tail_pointer
  0.004598 apic_timer_interrupt
  0.004588 clear_bit
  0.004519 tcp_simple_retransmit
  0.004449 get_max_files
  0.004370 sk_stop_timer
  0.004340 tcp_reset
  0.004251 netlbl_cache_add
  0.004201 tcp_add_reno_sack
  0.004151 __pskb_trim_head
  0.004102 __profile_flip_buffers
  0.004092 sk_common_release
  0.004052 audit_copy_inode
  0.003953 eth_change_mtu
  0.003943 vfs_read
  0.003923 run_timer_softirq
  0.003843 mnt_drop_write
  0.003814 clear_page_c
  0.003804 do_sync_read
  0.003744 unset_migratetype_isolate
  0.003714 sk_stream_moderate_sndbuf
  0.003545 tcp_try_rmem_schedule
  0.003476 native_apic_mem_write
  0.003466 sys_read
  0.003446 skb_checksum
  0.003436 timer_set_base
  0.003426 security_task_kill
  0.003416 __flow_cache_shrink
  0.003406 __skb_checksum_complete
  0.003277 alloc_skb
  0.003267 physflat_send_IPI_mask
  0.003218 skb_gso_ok
  0.003178 constant_test_bit
  0.003168 find_next_bit
  0.003158 selinux_netlbl_skbuff_getsid
  0.003118 constant_test_bit
  0.003099 pull_task
  0.003079 hrtimer_run_queues
  0.003049 free_hot_page
  0.003009 scheduler_tick
  0.002900 set_32bit_tls
  0.002890 tcp_acceptable_seq
  0.002811 rw_verify_area
  0.002751 radix_tree_lookup_slot
  0.002731 zero_user_segment
  0.002731 sock_common_setsockopt
  0.002612 __load_balance_iterator
  0.002473 run_posix_cpu_timers
  0.002264 task_utime
  0.002254 switched_to_fair
  0.002185 fsnotify_access
  0.002145 __rmqueue_smallest
  0.002125 __schedule_bug
  0.002095 __task_rq_lock
  0.002086 tcp_may_update_window
  0.002076 restore_args
  0.002066 hrtimer_run_pending
  0.002056 generic_segment_checks
  0.002026 getnstimeofday
  0.002006 idle_task
  0.001976 touch_atime
  0.001956 __wake_up_locked
  0.001927 sk_mem_charge
  0.001877 smp_apic_timer_interrupt
  0.001827 native_smp_send_reschedule
  0.001798 __tcp_fast_path_on
  0.001788 file_read_actor
  0.001768 _cond_resched
  0.001738 avc_policy_seqno
  0.001718 tcp_ack_snd_check
  0.001629 ip_send_check
  0.001619 account_system_time
  0.001579 __xapic_wait_icr_idle
  0.001579 get_stats
  0.001539 tcp_set_state
  0.001539 bictcp_state
  0.001529 tcp_fast_path_on
  0.001519 file_accessed
  0.001480 get_seconds
  0.001450 kernel_math_error
  0.001410 ktime_set
  0.001331 kmap_atomic
  0.001281 printk_tick
  0.001281 __next_cpu_nr
  0.001271 account_group_system_time
  0.001261 __mod_zone_page_state
  0.001222 weighted_cpuload
  0.001192 security_file_permission
  0.001162 ack_APIC_irq
  0.001152 __free_one_page
  0.001142 rcu_pending
  0.001142 drain_array
  0.001122 sched_clock_tick
  0.001122 csum_fold
  0.001102 ret_from_intr
  0.001083 retint_careful
  0.001073 need_resched
  0.001073 calc_delta_mine
  0.001043 tcp_v4_md5_do_del
  0.001043 PageActive
  0.001033 mark_page_accessed
  0.001033 ktime_get_ts
  0.001023 tcp_insert_write_queue_after
  0.001013 tcp_delack_timer
  0.001013 task_tick_fair
  0.000973 delay_tsc
  0.000963 nv_nic_irq_optimized
  0.000904 tick_periodic
  0.000894 skb_reserve
  0.000884 cache_reap
  0.000874 timespec_trunc
  0.000864 skb_header_release
  0.000854 zone_page_state_add
  0.000844 update_process_times
  0.000834 sk_rmem_schedule
  0.000824 find_busiest_group
  0.000804 current_fs_time
  0.000785 tick_handle_periodic
  0.000785 __sk_mem_schedule
  0.000785 irq_enter
  0.000755 use_cpu_writer_for_mount
  0.000755 tcp_ratehalving_spur_to_response
  0.000745 update_wall_time
  0.000745 tcp_sendpage
  0.000745 __alloc_pages_nodemask
  0.000725 ktime_get
  0.000725 irq_exit
  0.000705 inotify_inode_queue_event
  0.000665 set_pageblock_flags_group
  0.000646 inotify_dentry_parent_queue_event
  0.000626 ack_APIC_irq
  0.000606 write_profile
  0.000566 set_normalized_timespec
  0.000566 raise_softirq
  0.000526 task_cputime_zero
  0.000516 smp_reschedule_interrupt
  0.000516 __skb_insert
  0.000497 page_fault
  0.000497 __copy_user_nocache
  0.000487 run_local_timers
  0.000487 read_tsc
  0.000487 nf_unregister_hook
  0.000477 __rcu_pending
  0.000477 jiffies_to_usecs
  0.000457 timespec_to_ktime
  0.000437 __skb_trim
  0.000427 __call_rcu
  0.000417 free_pages_bulk
  0.000407 smp_call_function_interrupt
  0.000397 set_irq_regs
  0.000397 radix_tree_deref_slot
  0.000397 expand
  0.000387 handle_mm_fault
  0.000387 handle_IRQ_event
  0.000387 fput_light
  0.000377 refresh_cpu_vm_stats
  0.000377 n_tty_write
  0.000367 get_page
  0.000358 run_rebalance_domains
  0.000358 get_cpu_mask
  0.000348 task_hot
  0.000348 __skb_queue_after
  0.000348 retint_check
  0.000348 do_select
  0.000338 PageUptodate
  0.000338 copy_page_c
  0.000328 cond_resched
  0.000318 unmap_vmas
  0.000318 sk_mem_reclaim
  0.000318 rmqueue_bulk
  0.000318 reciprocal_value
  0.000318 irq_return
  0.000308 rb_first
  0.000308 alloc_skb
  0.000308 account_process_tick
  0.000298 net_enable_timestamp
  0.000298 clocksource_read
  0.000298 account_system_time_scaled
  0.000288 sched_slice
  0.000278 ip_compute_csum
  0.000278 constant_test_bit
  0.000278 constant_test_bit
  0.000268 set_curr_task_fair
  0.000268 note_interrupt
  0.000268 exit_idle
  0.000258 native_apic_mem_write
  0.000258 exit_intr
  0.000248 PageReferenced
  0.000238 usb_hcd_irq
  0.000238 __mnt_is_readonly
  0.000238 constant_test_bit
  0.000218 IRQ0xba_interrupt
  0.000218 handle_fasteoi_irq
  0.000209 raise_softirq_irqoff
  0.000209 __find_get_block
  0.000199 tcp_current_ssthresh
  0.000199 n_tty_receive_buf
  0.000189 wake_up_page
  0.000189 vgacon_save_screen
  0.000189 free_block
  0.000189 constant_test_bit
  0.000179 pagefault_disable
  0.000169 clocksource_get_next
  0.000169 __bitmap_weight
  0.000159 tty_ldisc_deref
  0.000159 tcp_write_timer
  0.000159 kmem_cache_alloc
  0.000159 free_alien_cache
  0.000159 ext3_mark_iloc_dirty
  0.000159 constant_test_bit
  0.000159 __bitmap_equal
  0.000149 transfer_objects
  0.000149 __rcu_process_callbacks
  0.000149 page_waitqueue
  0.000149 constant_test_bit
  0.000139 __rmqueue
  0.000139 release_pages
  0.000139 constant_test_bit
  0.000129 __tcp_checksum_complete
  0.000129 run_workqueue
  0.000129 poll_freewait
  0.000129 n_tty_read
  0.000129 iommu_area_free
  0.000129 generic_file_llseek
  0.000129 __cpus_setall
  0.000129 cond_resched_softirq
  0.000129 avc_node_populate
  0.000129 add_to_page_cache_lru
  0.000129 account_user_time
  0.000119 wait_consider_task
  0.000119 sys_select
  0.000119 round_jiffies_common
  0.000119 nv_start_xmit_optimized
  0.000119 core_sys_select
  0.000109 tcp_tso_segment
  0.000109 sigprocmask
  0.000109 proc_reg_read
  0.000109 path_to_nameidata
  0.000109 PageBuddy
  0.000109 ohci_irq
  0.000109 nv_tx_done_optimized
  0.000109 nv_msi_workaround
  0.000109 IRQ0xc2_interrupt
  0.000109 __ext3_get_inode_loc
  0.000109 account_group_user_time
  0.000099 __wake_up_sync
  0.000099 __up_read
  0.000099 update_vsyscall
  0.000099 memmove
  0.000099 kmalloc
  0.000099 ext3_get_blocks_handle
  0.000099 do_device_not_available
  0.000099 constant_test_bit
  0.000089 tcp_incr_quickack
  0.000089 smp_send_reschedule
  0.000089 remove_from_page_cache
  0.000089 rcu_process_callbacks
  0.000089 prepare_to_wait_exclusive
  0.000089 pde_users_dec
  0.000089 find_first_bit
  0.000089 constant_test_bit
  0.000089 common_interrupt
  0.000089 add_wait_queue
  0.000079 task_gtime
  0.000079 sys_lseek
  0.000079 start_this_handle
  0.000079 schedule_hrtimeout_range
  0.000079 __sched_fork
  0.000079 journal_put_journal_head
  0.000079 find_first_zero_bit
  0.000079 do_syslog
  0.000079 do_sync_write
  0.000079 constant_test_bit
  0.000079 ack_apic_level
  0.000070 write_seqlock
  0.000070 slab_get_obj
  0.000070 remove_wait_queue
  0.000070 pty_chars_in_buffer
  0.000070 ____pagevec_lru_add
  0.000070 lock_hrtimer_base
  0.000070 kstat_incr_irqs_this_cpu
  0.000070 journal_dirty_data
  0.000070 journal_add_journal_head
  0.000070 find_lock_page
  0.000070 copy_from_read_buf
  0.000070 bit_waitqueue
  0.000070 alloc_page_vma
  0.000060 vfs_write
  0.000060 tty_write
  0.000060 __strnlen_user
  0.000060 sk_mem_uncharge
  0.000060 rt_worker_func
  0.000060 radix_tree_preload
  0.000060 poll_select_copy_remaining
  0.000060 pagefault_enable
  0.000060 __mark_inode_dirty
  0.000060 lru_add_drain_all
  0.000060 lock_page
  0.000060 list_replace_init
  0.000060 journal_stop
  0.000060 iowrite8
  0.000060 hrtimer_forward
  0.000060 gart_unmap_single
  0.000060 find_vma
  0.000060 __down_read_trylock
  0.000060 do_page_fault
  0.000060 do_IRQ
  0.000060 create_empty_buffers
  0.000060 constant_test_bit
  0.000060 constant_test_bit
  0.000060 alloc_iommu
  0.000060 add_to_page_cache_locked
  0.000050 zero_fd_set
  0.000050 vsnprintf
  0.000050 unlock_page
  0.000050 tty_read
  0.000050 tty_poll
  0.000050 sock_poll
  0.000050 sock_def_error_report
  0.000050 set_wq_data
  0.000050 rcu_check_callbacks
  0.000050 radix_tree_node_rcu_free
  0.000050 pipe_poll
  0.000050 opost
  0.000050 n_tty_chars_in_buffer
  0.000050 __next_cpu
  0.000050 mutex_trylock
  0.000050 msecs_to_jiffies
  0.000050 mempool_alloc_slab
  0.000050 load_elf_binary
  0.000050 __link_path_walk
  0.000050 __journal_remove_journal_head
  0.000050 journal_commit_transaction
  0.000050 journal_cancel_revoke
  0.000050 irq_complete_move
  0.000050 irq_cfg
  0.000050 fsnotify_modify
  0.000050 __first_cpu
  0.000050 file_update_time
  0.000050 filemap_fault
  0.000050 ext3_new_blocks
  0.000050 ext3_mark_inode_dirty
  0.000050 do_wp_page
  0.000050 __do_fault
  0.000050 buffer_dirty
  0.000050 anon_vma_prepare
  0.000040 yield
  0.000040 wq_per_cpu
  0.000040 walk_page_buffers
  0.000040 __wake_up_bit
  0.000040 vma_adjust
  0.000040 tty_put_char
  0.000040 tty_paranoia_check
  0.000040 tcp_current_ssthresh
  0.000040 sys_write
  0.000040 sys_rt_sigprocmask
  0.000040 sock_no_bind
  0.000040 show_stat
  0.000040 SetPageSwapBacked
  0.000040 set_irq_regs
  0.000040 set_buffer_write_io_error
  0.000040 recalc_sigpending
  0.000040 radix_tree_delete
  0.000040 queue_delayed_work_on
  0.000040 pty_write
  0.000040 __pollwait
  0.000040 physflat_send_IPI_allbutself
  0.000040 page_zone
  0.000040 page_remove_rmap
  0.000040 page_is_file_cache
  0.000040 page_evictable
  0.000040 nv_get_empty_tx_slots
  0.000040 n_tty_poll
  0.000040 next_zone
  0.000040 next_online_pgdat
  0.000040 need_resched
  0.000040 mutex_unlock
  0.000040 mpol_needs_cond_ref
  0.000040 __lookup
  0.000040 journal_invalidatepage
  0.000040 journal_dirty_metadata
  0.000040 ioread8
  0.000040 input_available_p
  0.000040 inet_csk_reset_xmit_timer
  0.000040 get_fd_set
  0.000040 generic_write_checks
  0.000040 free_poll_entry
  0.000040 fput
  0.000040 __ext3_journal_stop
  0.000040 ext3_get_group_desc
  0.000040 ext3_get_block
  0.000040 do_mpage_readpage
  0.000040 __d_lookup
  0.000040 del_page_from_lru
  0.000040 __dec_zone_state
  0.000040 copy_user_generic
  0.000040 __bitmap_and
  0.000040 add_page_to_lru_list
  0.000040 account_user_time_scaled
  0.000040 account_steal_time
  0.000030 worker_thread
  0.000030 wake_up_bit
  0.000030 vmstat_update
  0.000030 vm_normal_page
  0.000030 tty_write_unlock
  0.000030 tty_write_lock
  0.000030 tty_wakeup
  0.000030 tty_ldisc_try
  0.000030 tty_ioctl
  0.000030 tag_get
  0.000030 sys_pread64
  0.000030 submit_bh
  0.000030 stop_this_cpu
  0.000030 sock_aio_write
  0.000030 sk_mem_reclaim
  0.000030 sk_backlog_rcv
  0.000030 show_interrupts
  0.000030 sg_next
  0.000030 seq_printf
  0.000030 send_remote_softirq
  0.000030 remove_vma
  0.000030 reg_delay
  0.000030 radix_tree_lookup
  0.000030 radix_tree_insert
  0.000030 proc_lookup_de
  0.000030 pipe_write
  0.000030 __percpu_counter_add
  0.000030 pci_map_single
  0.000030 nv_napi_poll
  0.000030 __next_node
  0.000030 native_send_call_func_ipi
  0.000030 mpage_readpages
  0.000030 mix_pool_bytes_extract
  0.000030 mii_rw
  0.000030 mempool_alloc
  0.000030 __make_request
  0.000030 jbd_lock_bh_state
  0.000030 iov_iter_copy_from_user_atomic
  0.000030 insert_work
  0.000030 hrtimer_try_to_cancel
  0.000030 get_dma_ops
  0.000030 __generic_file_aio_write_nolock
  0.000030 gart_map_sg
  0.000030 __fput
  0.000030 fixup_irqs
  0.000030 __find_get_block_slow
  0.000030 filp_close
  0.000030 ext3_get_branch
  0.000030 ext3_dirty_inode
  0.000030 ext3_block_to_path
  0.000030 do_get_write_access
  0.000030 delayed_work_timer_fn
  0.000030 csum_block_add
  0.000030 copy_process
  0.000030 copy_page_range
  0.000030 constant_test_bit
  0.000030 constant_test_bit
  0.000030 check_irqs_on
  0.000030 call_rcu
  0.000030 __brelse
  0.000030 _atomic_dec_and_lock
  0.000020 __xchg
  0.000020 vm_stat_account
  0.000020 vma_prio_tree_remove
  0.000020 tty_mode_ioctl
  0.000020 tty_audit_add_data
  0.000020 try_to_free_buffers
  0.000020 truncate_inode_pages_range
  0.000020 tcp_slow_start
  0.000020 task_curr
  0.000020 sys_setpgid
  0.000020 sys_rt_sigreturn
  0.000020 sys_getppid
  0.000020 strncpy_from_user
  0.000020 sock_put
  0.000020 smp_call_function
  0.000020 __sk_mem_reclaim
  0.000020 signal_wake_up
  0.000020 signal_pending
  0.000020 set_termios
  0.000020 SetPageUptodate
  0.000020 SetPageLRU
  0.000020 set_fd_set
  0.000020 set_bit
  0.000020 __send_IPI_shortcut
  0.000020 security_inode_need_killpriv
  0.000020 scsi_request_fn
  0.000020 sb_bread
  0.000020 restore_i387_xstate
  0.000020 __qdisc_run
  0.000020 pud_alloc
  0.000020 pmd_alloc
  0.000020 pfn_pte
  0.000020 pfifo_fast_enqueue
  0.000020 pfifo_fast_dequeue
  0.000020 pci_map_page
  0.000020 path_get
  0.000020 __pagevec_free
  0.000020 pagevec_add
  0.000020 PageUnevictable
  0.000020 page_mapping
  0.000020 nv_get_hw_stats
  0.000020 number
  0.000020 normalize_rt_tasks
  0.000020 __netif_tx_lock
  0.000020 mk_pid
  0.000020 memscan
  0.000020 memcpy_c
  0.000020 __lru_cache_add
  0.000020 __lookup_mnt
  0.000020 load_balance_rt
  0.000020 kthread_should_stop
  0.000020 journal_start
  0.000020 journal_remove_journal_head
  0.000020 __journal_file_buffer
  0.000020 jbd_unlock_bh_journal_head
  0.000020 itimer_get_remtime
  0.000020 irq_to_desc
  0.000020 iowrite32
  0.000020 inotify_remove_watch_locked
  0.000020 inode_permission
  0.000020 inode_has_perm
  0.000020 init_timer
  0.000020 goal_in_my_reservation
  0.000020 get_vma_policy
  0.000020 __get_free_pages
  0.000020 generic_sync_sb_inodes
  0.000020 gart_map_single
  0.000020 freezing
  0.000020 free_pgtables
  0.000020 free_pages_and_swap_cache
  0.000020 free_buffer_head
  0.000020 __follow_mount
  0.000020 flush_tlb_page
  0.000020 find_busiest_queue
  0.000020 file_has_perm
  0.000020 ext3_try_to_allocate
  0.000020 ext3_journal_start
  0.000020 __ext3_journal_dirty_metadata
  0.000020 ext3_file_write
  0.000020 enqueue_hrtimer
  0.000020 dup_mm
  0.000020 do_wait
  0.000020 do_vfs_ioctl
  0.000020 do_path_lookup
  0.000020 do_munmap
  0.000020 do_machine_check
  0.000020 do_lookup
  0.000020 do_follow_link
  0.000020 dma_unmap_single
  0.000020 __dec_zone_page_state
  0.000020 count_vm_event
  0.000020 constant_test_bit
  0.000020 constant_test_bit
  0.000020 compound_head
  0.000020 clear_buffer_jbddirty
  0.000020 clear_buffer_delay
  0.000020 claim_block
  0.000020 cascade
  0.000020 cancel_dirty_page
  0.000020 cache_grow
  0.000020 brelse
  0.000020 __block_prepare_write
  0.000020 __blocking_notifier_call_chain
  0.000020 blk_rq_map_sg
  0.000020 __bitmap_empty
  0.000020 __bitmap_andnot
  0.000020 anon_vma_unlink
  0.000010 zone_page_state
  0.000010 zero_user_segments
  0.000010 __xchg
  0.000010 __vma_link_rb
  0.000010 vma_link
  0.000010 vfs_llseek
  0.000010 __up_write
  0.000010 update_xtime_cache
  0.000010 unmap_underlying_metadata
  0.000010 unmap_region
  0.000010 unix_poll
  0.000010 tty_write_room
  0.000010 tty_unthrottle
  0.000010 tty_ldisc_ref_wait
  0.000010 tty_ldisc_ref
  0.000010 tty_fasync
  0.000010 tty_check_change
  0.000010 tty_chars_in_buffer
  0.000010 tty_audit_fork
  0.000010 truncate_complete_page
  0.000010 test_tsk_thread_flag
  0.000010 taskstats_exit
  0.000010 sys_writev
  0.000010 sys_readahead
  0.000010 sys_poll
  0.000010 sys_newstat
  0.000010 sys_nanosleep
  0.000010 sys_ioctl
  0.000010 syscall_trace_leave
  0.000010 sync_supers
  0.000010 stub_execve
  0.000010 split_page
  0.000010 sock_kfree_s
  0.000010 __sleep_on_page_lock
  0.000010 skip_atoi
  0.000010 signal_pending
  0.000010 signal_pending
  0.000010 sg_init_table
  0.000010 set_task_cpu
  0.000010 __set_page_dirty
  0.000010 SetPageActive
  0.000010 set_bit
  0.000010 seq_puts
  0.000010 selinux_task_setpgid
  0.000010 selinux_secctx_to_secid
  0.000010 selinux_sb_show_options
  0.000010 selinux_inode_permission
  0.000010 selinux_inode_need_killpriv
  0.000010 selinux_inode_free_security
  0.000010 selinux_inode_alloc_security
  0.000010 selinux_d_instantiate
  0.000010 security_vm_enough_memory
  0.000010 second_overflow
  0.000010 scsi_run_queue
  0.000010 __scsi_put_command
  0.000010 scsi_init_sgtable
  0.000010 scsi_end_request
  0.000010 schedule_tail
  0.000010 schedule_delayed_work
  0.000010 sb_any_quota_enabled
  0.000010 rt_hash
  0.000010 round_jiffies_relative
  0.000010 remove_hrtimer
  0.000010 __remove_hrtimer
  0.000010 __remove_from_page_cache
  0.000010 rcu_bh_qsctr_inc
  0.000010 radix_tree_tag_clear
  0.000010 radix_tree_gang_lookup_tag_slot
  0.000010 radix_tree_gang_lookup_slot
  0.000010 queue_delayed_work
  0.000010 qdisc_run
  0.000010 put_tty_queue_nolock
  0.000010 put_io_context
  0.000010 pty_write_room
  0.000010 pty_open
  0.000010 ptep_set_access_flags
  0.000010 profile_munmap
  0.000010 proc_pident_lookup
  0.000010 proc_get_inode
  0.000010 prio_tree_replace
  0.000010 prio_tree_remove
  0.000010 prio_tree_insert
  0.000010 pmd_none_or_clear_bad
  0.000010 pipe_release
  0.000010 pipe_read
  0.000010 pid_revalidate
  0.000010 pgd_alloc
  0.000010 pci_unmap_single
  0.000010 pci_read_config_dword
  0.000010 pci_conf1_write
  0.000010 pci_bus_read_config_dword
  0.000010 path_walk
  0.000010 page_zone
  0.000010 PageSwapCache
  0.000010 PageSwapCache
  0.000010 PageSwapCache
  0.000010 __page_set_anon_rmap
  0.000010 PagePrivate
  0.000010 PagePrivate
  0.000010 PagePrivate
  0.000010 page_add_file_rmap
  0.000010 on_each_cpu
  0.000010 nv_do_interrupt
  0.000010 net_tx_action
  0.000010 netif_start_queue
  0.000010 netif_carrier_ok
  0.000010 need_resched
  0.000010 need_iommu
  0.000010 native_pte_clear
  0.000010 native_io_delay
  0.000010 mutex_lock
  0.000010 mprotect_fixup
  0.000010 mod_zone_page_state
  0.000010 mntput_no_expire
  0.000010 mm_init
  0.000010 mmap_region
  0.000010 mempool_free
  0.000010 memcmp
  0.000010 mcheck_check_cpu
  0.000010 may_open
  0.000010 __lookup_tag
  0.000010 locks_remove_posix
  0.000010 locks_remove_flock
  0.000010 lock_buffer
  0.000010 load_elf_binary
  0.000010 load_balance_fair
  0.000010 ll_back_merge_fn
  0.000010 kzalloc
  0.000010 ktime_add_safe
  0.000010 kill_fasync
  0.000010 __journal_temp_unlink_buffer
  0.000010 journal_switch_revoke_table
  0.000010 __journal_remove_checkpoint
  0.000010 journal_get_write_access
  0.000010 journal_get_undo_access
  0.000010 journal_get_descriptor_buffer
  0.000010 journal_bmap
  0.000010 jbd_unlock_bh_state
  0.000010 jbd_unlock_bh_state
  0.000010 IRQ0xd2_interrupt
  0.000010 ip_append_data
  0.000010 iov_iter_advance
  0.000010 iov_fault_in_pages_read
  0.000010 iommu_area_alloc
  0.000010 inode_sub_bytes
  0.000010 inode_doinit_with_dentry
  0.000010 inode_add_bytes
  0.000010 __inc_zone_page_state
  0.000010 inc_zone_page_state
  0.000010 hweight_long
  0.000010 hweight64
  0.000010 hrtimer_wakeup
  0.000010 hrtimer_init
  0.000010 hash_64
  0.000010 half_md4_transform
  0.000010 __grab_cache_page
  0.000010 get_user_pages
  0.000010 get_signal_to_deliver
  0.000010 get_random_int
  0.000010 getname
  0.000010 get_empty_filp
  0.000010 __getblk
  0.000010 generic_permission
  0.000010 generic_make_request
  0.000010 generic_fillattr
  0.000010 generic_file_open
  0.000010 generic_file_llseek_unlocked
  0.000010 generic_file_buffered_write
  0.000010 generic_file_aio_write
  0.000010 generic_cont_expand_simple
  0.000010 generic_block_bmap
  0.000010 freezing
  0.000010 free_swap_cache
  0.000010 free_pid
  0.000010 free_pgd_range
  0.000010 free_pages
  0.000010 flush_old_exec
  0.000010 first_online_pgdat
  0.000010 find_vma_prepare
  0.000010 find_task_by_pid_type_ns
  0.000010 find_next_zero_bit
  0.000010 find_inode_fast
  0.000010 file_remove_suid
  0.000010 file_mask_to_av
  0.000010 file_free_rcu
  0.000010 __FD_CLR
  0.000010 ext3_write_begin
  0.000010 ext3_try_to_allocate_with_rsv
  0.000010 ext3_ordered_write_end
  0.000010 ext3_journalled_set_page_dirty
  0.000010 ext3_invalidatepage
  0.000010 ext3_iget_acl
  0.000010 ext3_get_inode_flags
  0.000010 ext3_free_data
  0.000010 ext3_discard_reservation
  0.000010 exit_thread
  0.000010 exit_task_namespaces
  0.000010 exit_sem
  0.000010 end_that_request_last
  0.000010 end_buffer_write_sync
  0.000010 end_buffer_async_write
  0.000010 elv_rb_del
  0.000010 elv_queue_empty
  0.000010 elv_merged_request
  0.000010 elv_completed_request
  0.000010 elf_map
  0.000010 echo_char
  0.000010 e1000_watchdog
  0.000010 e1000_read_phy_reg
  0.000010 __drain_alien_cache
  0.000010 __d_path
  0.000010 __down_write_nested
  0.000010 __down_write
  0.000010 double_rq_lock
  0.000010 do_timer
  0.000010 do_sys_open
  0.000010 do_sigaltstack
  0.000010 do_sigaction
  0.000010 do_setitimer
  0.000010 do_pipe_flags
  0.000010 __do_page_cache_readahead
  0.000010 do_notify_parent
  0.000010 do_filp_open
  0.000010 do_exit
  0.000010 dnotify_flush
  0.000010 d_kill
  0.000010 destroy_inode
  0.000010 dequeue_signal
  0.000010 de_put
  0.000010 delayacct_end
  0.000010 create_write_pipe
  0.000010 create_workqueue_thread
  0.000010 __cpus_equal
  0.000010 cpu_quiet
  0.000010 __cpu_clear
  0.000010 __cpu_clear
  0.000010 count
  0.000010 copy_thread
  0.000010 copy_namespaces
  0.000010 constant_test_bit
  0.000010 constant_test_bit
  0.000010 constant_test_bit
  0.000010 constant_test_bit
  0.000010 constant_test_bit
  0.000010 __cond_resched
  0.000010 clocksource_forward_now
  0.000010 __clear_user
  0.000010 clear_inode
  0.000010 clear_buffer_new
  0.000010 clear_bit
  0.000010 clear_bit
  0.000010 check_for_bios_corruption
  0.000010 __cfq_slice_expired
  0.000010 cfq_set_request
  0.000010 cfq_dispatch_requests
  0.000010 cfq_completed_request
  0.000010 cap_set_effective
  0.000010 can_share_swap_page
  0.000010 bvec_alloc_bs
  0.000010 buffer_uptodate
  0.000010 buffer_mapped
  0.000010 buffer_locked
  0.000010 buffer_jbd
  0.000010 buffer_jbd
  0.000010 brelse
  0.000010 __bread
  0.000010 blk_invoke_request_fn
  0.000010 __blk_complete_request
  0.000010 blk_add_trace_generic
  0.000010 blk_add_trace_bio
  0.000010 bit_spin_lock
  0.000010 bio_put
  0.000010 bio_alloc_bioset
  0.000010 bdi_read_congested
  0.000010 balance_runtime
  0.000010 balance_dirty_pages_ratelimited_nr
  0.000010 audit_log_task_context
  0.000010 ata_sff_qc_prep
  0.000010 ata_scsi_queuecmd
  0.000010 ata_link_max_devices
  0.000010 ata_get_xlat_func
  0.000010 arp_process
  0.000010 arch_pick_mmap_layout
  0.000010 arch_irq_stat_cpu
  0.000010 arch_dup_task_struct
  0.000010 alloc_pid
  0.000010 alloc_fdtable
  0.000010 alloc_fd
  0.000010 add_mm_rss
  0.000010 acct_collect

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]             ` <20081117110119.GL28786-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 19:21               ` David Miller
       [not found]                 ` <20081117.112157.146825192.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-17 19:21 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI
  Cc: rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date: Mon, 17 Nov 2008 12:01:19 +0100

> The scheduler's overhead barely even registers on a 16-way x86 system 
> i'm running tbench on. Here's the NMI profile during 64 threads tbench 
> on a 16-way x86 box with an v2.6.28-rc5 kernel [config attached]:

Try a non-NMI profile.

It's the whole of the try_to_wake_up() path that's the problem.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 19:30                                                 ` Eric Dumazet
  2008-11-17 19:39                                                 ` David Miller
                                                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 19:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger

Ingo Molnar a écrit :
> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
> 
> 4> The place for the sock_rfree() hit looks a bit weird, and i'll 
>> investigate it now a bit more to place the real overhead point 
>> properly. (i already mapped the test-bit overhead: that comes from 
>> napi_disable_pending())
> 
> ok, here's a new set of profiles. (again for tbench 64-thread on a 
> 16-way box, with v2.6.28-rc5-19-ge14c8bf and with the kernel config i 
> posted before.)
> 
> Here are the per major subsystem percentages:
> 
>            NET       overhead ( 5786945/10096751): 57.31%
>            security  overhead (  925933/10096751):  9.17%
>            usercopy  overhead (  837887/10096751):  8.30%
>            sched     overhead (  753662/10096751):  7.46%
>            syscall   overhead (  268809/10096751):  2.66%
>            IRQ       overhead (  266500/10096751):  2.64%
>            slab      overhead (  180258/10096751):  1.79%
>            timer     overhead (   92986/10096751):  0.92%
>            pagealloc overhead (   87381/10096751):  0.87%
>            VFS       overhead (   53295/10096751):  0.53%
>            PID       overhead (   44469/10096751):  0.44%
>            pagecache overhead (   33452/10096751):  0.33%
>            gtod      overhead (   11064/10096751):  0.11%
>            IDLE      overhead (       0/10096751):  0.00%
> ---------------------------------------------------------
>                          left (  753878/10096751):  7.47%
> 
> The breakdown is very similar to what i sent before, within noise.
> 
> [ 'left' is random overhead from all around the place - i categorized 
>   the 500 most expensive functions in the profile per subsystem.
>   I stopped short of doing it for all 1300+ functions: it's rather
>   laborous manual work even with hefty use of regex patterns.
>   It's also less meaningful in practice: the trend in the first 500
>   functions is present in the remaining 800 functions as well. I 
>   watched the breakdown evolve as i increased the coverage - in 
>   practice it is the first 100 functions that matter - it just doesnt 
>   change after that. ]
> 
> The readprofile output below seems structured in a more useful way now 
> - i tweaked compiler options to have the profiler hits spread out in a 
> more meaningful way. I collected 10 million NMI profiler hits, and 
> normalized the readprofile output up to 100%.
> 
> [ I'll post per function analysis as i complete them, as a reply to
>   this mail. ]
> 
> 	Ingo
> 
> 100.000000 total
> ................
>   7.253355 copy_user_generic_string
>   3.934833 avc_has_perm_noaudit

>   3.356152 ip_queue_xmit

>   3.038025 skb_release_data
>   2.118525 skb_release_head_state
>   1.997533 tcp_ack
>   1.833688 tcp_recvmsg

>   1.717771 eth_type_trans
Strange, in my profile, eth_type_trans is not in the top 20
Maybe an alignment problem ?
Oh, I understand, you hit the netdevice->last_rx update probblem, already corrected on net-next-2.6

>   1.673249 __inet_lookup_established
TCP established/timewait table is now RCUified (for linux-2.6.29), this one
should go down in profiles. 

>   1.508888 system_call

>   1.469183 tcp_current_mss
Yes there is a divide that might be expensive. discussion on netdev.

>   1.431553 tcp_transmit_skb
>   1.385125 tcp_sendmsg
>   1.327643 tcp_v4_rcv
>   1.292328 nf_hook_thresh
>   1.203205 schedule
>   1.059501 nf_hook_slow
>   1.027373 constant_test_bit
>   0.945183 sock_rfree
>   0.922748 __switch_to
>   0.911605 netif_rx
>   0.876270 register_gifconf
>   0.788200 ip_local_deliver_finish
>   0.781467 dev_queue_xmit
>   0.766530 constant_test_bit
>   0.758208 _local_bh_enable_ip
>   0.747184 load_cr3
>   0.704341 memset_c
>   0.671260 sysret_check
>   0.651845 ip_finish_output2
>   0.620204 audit_free_names


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                   ` <20081117161135.GE12081-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 16:35                     ` Eric Dumazet
@ 2008-11-17 19:31                     ` David Miller
       [not found]                       ` <20081117.113158.200497613.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-17 19:31 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI
  Cc: dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date: Mon, 17 Nov 2008 17:11:35 +0100

> Ouch, +4% from a oneliner networking change? That's a _huge_ speedup 
> compared to the things we were after in scheduler land.

The scheduler has accounted for at least %10 of the tbench
regressions at this point, what are you talking about?

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                           ` <20081117170844.GJ12081-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 17:25                             ` Ingo Molnar
@ 2008-11-17 19:36                             ` David Miller
  1 sibling, 0 replies; 318+ messages in thread
From: David Miller @ 2008-11-17 19:36 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI
  Cc: dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date: Mon, 17 Nov 2008 18:08:44 +0100

> Mike Galbraith has been spending months trying to pin down all the 
> issues.

Yes Mike has been doing tireless good work.

Another thing I noticed is that because all of the scheduler
core operations are now function pointer callbacks, the
call chain is deeper for core operations like wake_up().

Much of it used to be completely inlined into try_to_wake_up()

With the addition of the RB tree stuff, that adds yet another
unavoidable depth of function call.

wake_up() is usually at the deepest part of the call chain,
so this is a big deal

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 19:30                                                 ` Eric Dumazet
@ 2008-11-17 19:39                                                 ` David Miller
       [not found]                                                   ` <20081117.113936.81699150.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-11-17 19:57                                                 ` Ingo Molnar
                                                                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-17 19:39 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI
  Cc: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date: Mon, 17 Nov 2008 19:49:51 +0100

> 
> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
> 
> 4> The place for the sock_rfree() hit looks a bit weird, and i'll 
> > investigate it now a bit more to place the real overhead point 
> > properly. (i already mapped the test-bit overhead: that comes from 
> > napi_disable_pending())
> 
> ok, here's a new set of profiles. (again for tbench 64-thread on a 
> 16-way box, with v2.6.28-rc5-19-ge14c8bf and with the kernel config i 
> posted before.)

Again, do a non-NMI profile and the top (at least for me)
looks like this:

samples  %        app name                 symbol name
473       6.3928  vmlinux                  finish_task_switch
349       4.7169  vmlinux                  tcp_v4_rcv
327       4.4195  vmlinux                  U3copy_from_user
322       4.3519  vmlinux                  tl0_linux32
178       2.4057  vmlinux                  tcp_ack
170       2.2976  vmlinux                  tcp_sendmsg
167       2.2571  vmlinux                  U3copy_to_user

That tcp_v4_rcv() hit is %98 on the wake_up() call it does.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                   ` <20081117.113936.81699150.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-17 19:43                                                     ` Eric Dumazet
  2008-11-17 19:55                                                     ` Linus Torvalds
  2008-11-18 12:29                                                     ` Mike Galbraith
  2 siblings, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 19:43 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

David Miller a écrit :
> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Date: Mon, 17 Nov 2008 19:49:51 +0100
> 
>> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
>>
>> 4> The place for the sock_rfree() hit looks a bit weird, and i'll 
>>> investigate it now a bit more to place the real overhead point 
>>> properly. (i already mapped the test-bit overhead: that comes from 
>>> napi_disable_pending())
>> ok, here's a new set of profiles. (again for tbench 64-thread on a 
>> 16-way box, with v2.6.28-rc5-19-ge14c8bf and with the kernel config i 
>> posted before.)
> 
> Again, do a non-NMI profile and the top (at least for me)
> looks like this:
> 
> samples  %        app name                 symbol name
> 473       6.3928  vmlinux                  finish_task_switch
> 349       4.7169  vmlinux                  tcp_v4_rcv
> 327       4.4195  vmlinux                  U3copy_from_user
> 322       4.3519  vmlinux                  tl0_linux32
> 178       2.4057  vmlinux                  tcp_ack
> 170       2.2976  vmlinux                  tcp_sendmsg
> 167       2.2571  vmlinux                  U3copy_to_user
> 
> That tcp_v4_rcv() hit is %98 on the wake_up() call it does.
> 
> 

Another profile from my tree (net-next-2.6 + some patches), on my machine


CPU: Core 2, speed 3000.22 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        symbol name
223265    9.2711  __copy_user_zeroing_intel
87525     3.6345  __copy_user_intel
73203     3.0398  tcp_sendmsg
53229     2.2103  netif_rx
53041     2.2025  tcp_recvmsg
47241     1.9617  sysenter_past_esp
42888     1.7809  __copy_from_user_ll
40858     1.6966  tcp_transmit_skb
39390     1.6357  __switch_to
37363     1.5515  dst_release
36823     1.5291  __sk_dst_check_get
36050     1.4970  tcp_v4_rcv
35829     1.4878  __do_softirq
32333     1.3426  tcp_rcv_established
30451     1.2645  tcp_clean_rtx_queue
29758     1.2357  ip_queue_xmit
28497     1.1833  __copy_to_user_ll
28119     1.1676  release_sock
25218     1.0472  lock_sock_nested
23701     0.9842  __inet_lookup_established
23463     0.9743  tcp_ack
22989     0.9546  netif_receive_skb
21880     0.9086  sched_clock_cpu
20730     0.8608  tcp_write_xmit
20372     0.8460  ip_rcv
20336     0.8445  local_bh_enable
19153     0.7953  __update_sched_clock
18603     0.7725  skb_release_data
17020     0.7068  local_bh_enable_ip
16932     0.7031  process_backlog
16299     0.6768  ip_finish_output
16279     0.6760  dev_queue_xmit
15858     0.6585  sock_recvmsg
15641     0.6495  native_read_tsc
15454     0.6417  sock_wfree
15366     0.6381  update_curr
14585     0.6056  sys_socketcall
14564     0.6048  __alloc_skb
14519     0.6029  __tcp_select_window
14417     0.5987  tcp_current_mss
14391     0.5976  nf_iterate
14221     0.5905  page_address
14122     0.5864  local_bh_disable



^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                       ` <20081117.113158.200497613.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-17 19:47                         ` Linus Torvalds
       [not found]                           ` <alpine.LFD.2.00.0811171134480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-11-17 22:47                         ` Ingo Molnar
  1 sibling, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-11-17 19:47 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw



On Mon, 17 Nov 2008, David Miller wrote:
> 
> The scheduler has accounted for at least %10 of the tbench
> regressions at this point, what are you talking about?

I'm wondering if you're not looking at totally different issues.

For example, if I recall correctly, David had a big hit on the hrtimers. 
And I wonder if perhaps Ingo's numbers are without hrtimers or something? 

The other possibility is that it's just a sparc suckiness issue, that 
simply doesn't show up on x86. 

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                 ` <20081117.112157.146825192.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-17 19:48                   ` Linus Torvalds
       [not found]                     ` <alpine.LFD.2.00.0811171147380.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-11-17 19:48 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw



On Mon, 17 Nov 2008, David Miller wrote:

> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Date: Mon, 17 Nov 2008 12:01:19 +0100
> 
> > The scheduler's overhead barely even registers on a 16-way x86 system 
> > i'm running tbench on. Here's the NMI profile during 64 threads tbench 
> > on a 16-way x86 box with an v2.6.28-rc5 kernel [config attached]:
> 
> Try a non-NMI profile.
> 
> It's the whole of the try_to_wake_up() path that's the problem.

David, that makes no sense. A NMI profile is going to be a _lot_ more 
accurate than a non-NMI one. Asking somebody to do a clearly inferior 
profile to get "better numbers" is insane.

We've asked _you_ to do NMI profiling, it shouldn't be the other way 
around.

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                           ` <alpine.LFD.2.00.0811171134480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-11-17 19:51                             ` David Miller
  2008-11-17 19:53                             ` Ingo Molnar
  1 sibling, 0 replies; 318+ messages in thread
From: David Miller @ 2008-11-17 19:51 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-X9Un+BFzKDI, dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Mon, 17 Nov 2008 11:47:24 -0800 (PST)

> For example, if I recall correctly, David had a big hit on the hrtimers. 

That got fixed, the HRTIMER bits are now disabled.

> The other possibility is that it's just a sparc suckiness issue, that 
> simply doesn't show up on x86. 

Could be and I intend to measure that to find out.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                     ` <alpine.LFD.2.00.0811171147380.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-11-17 19:52                       ` David Miller
       [not found]                         ` <20081117.115258.227376348.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-17 19:52 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-X9Un+BFzKDI, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Mon, 17 Nov 2008 11:48:33 -0800 (PST)

> We've asked _you_ to do NMI profiling, it shouldn't be the other way 
> around.

I wasn't able to on these systems, so instead I did cycle level
evaluation of the parts that have to run with interrupts disabled.

And as a result I found that wake_up() is now 4 times slower than it
was in 2.6.22, I even analyzed this for every single kernel release
till now.

It could be a sparc specific issue, because the call chain is deeper
and we eat a lot more register window spills onto the stack.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                           ` <alpine.LFD.2.00.0811171134480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-11-17 19:51                             ` David Miller
@ 2008-11-17 19:53                             ` Ingo Molnar
  1 sibling, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 19:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

* Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> On Mon, 17 Nov 2008, David Miller wrote:
> > 
> > The scheduler has accounted for at least %10 of the tbench 
> > regressions at this point, what are you talking about?
> 
> I'm wondering if you're not looking at totally different issues.
> 
> For example, if I recall correctly, David had a big hit on the 
> hrtimers. And I wonder if perhaps Ingo's numbers are without 
> hrtimers or something?

hrtimers should not be an issue anymore since this commit:

| commit 0c4b83da58ec2e96ce9c44c211d6eac5f9dae478
| Author: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
| Date:   Mon Oct 20 14:27:43 2008 +0200
|
|     sched: disable the hrtick for now
|    
|     David Miller reported that hrtick update overhead has tripled the
|     wakeup overhead on Sparc64.
|    
|     That is too much - disable the HRTICK feature for now by default,
|     until a faster implementation is found.
|    
|     Reported-by: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
|     Acked-by: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
|     Signed-off-by: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>

Which was included in v2.6.28-rc1 already.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                   ` <20081117.113936.81699150.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-11-17 19:43                                                     ` Eric Dumazet
@ 2008-11-17 19:55                                                     ` Linus Torvalds
       [not found]                                                       ` <alpine.LFD.2.00.0811171149100.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-11-18 12:29                                                     ` Mike Galbraith
  2 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-11-17 19:55 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

On Mon, 17 Nov 2008, David Miller wrote:
> 
> Again, do a non-NMI profile and the top (at least for me)
> looks like this:

Can _you_ please do a NMI profile and see what your real problem is?

I can't imagine that Niagara (or whatever) is so weak that it can't do 
NMI's. 

The fact is, David, that Ingo just posted a profile that was _better_ than 
anything you have ever posted, and it doesn't show what you complain 
about. So he's not seeing it. Asking him to do a _stupid_ profile is just 
that: stupid.

So try to figure out why his (better) profile doesn't match your 
(inferior) one, instead of asking him to do stupid things. It's some 
difference in architectures, likely: maybe the sparc timekeeping is crap, 
maybe it's a cache issue and sparc caches are crap, maybe it's something 
where Niagara (is it niagara) has some oddness that shows up because it 
has that odd four-threads+four-cores or whatever.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                         ` <20081117.115258.227376348.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-17 19:57                           ` Linus Torvalds
       [not found]                             ` <alpine.LFD.2.00.0811171156080.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-11-17 19:57 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw



On Mon, 17 Nov 2008, David Miller wrote:
> 
> And as a result I found that wake_up() is now 4 times slower than it
> was in 2.6.22, I even analyzed this for every single kernel release
> till now.

..and that's the one where you then pointed to hrtimers, and now you claim 
that was fixed?

At least I haven't seen any new analysis since then.

> It could be a sparc specific issue, because the call chain is deeper
> and we eat a lot more register window spills onto the stack.

Oh, easily. In-order machines tend to have serious problems with indirect 
function calls in particular. x86, in contrast, tends to not even notice, 
especially if the indirect function is fairly static per call-site, and 
predicts well.

There is a reason my machine is 15-20 times faster than yours.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 19:30                                                 ` Eric Dumazet
  2008-11-17 19:39                                                 ` David Miller
@ 2008-11-17 19:57                                                 ` Ingo Molnar
  2008-11-17 20:47                                                 ` Ingo Molnar
  2008-11-17 22:19                                                 ` Ingo Molnar
  4 siblings, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 19:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger


> [ I'll post per function analysis as i complete them, as a reply to
>   this mail. ]

[ i'll do a separate mail for every function analyzed, the discussion 
  spreads better that way. ]

> 100.000000 total
> ................
>   7.253355 copy_user_generic_string

This is the Well-known pattern of user-copy overhead, which centers 
around this single REP MOVS instruction:

                nr-of-hits
                 .........
ffffffff80341eea:       42 	83 e2 07    		and    $0x7,%edx
ffffffff80341eed:   677398 	f3 48 a5         	rep movsq %ds:(%rsi),%es:(%rdi)
ffffffff80341ef0:     3642 	89 d1                	mov    %edx,%ecx
ffffffff80341ef2:    16260 	f3 a4                	rep movsb %ds:(%rsi),%es:(%rdi)
ffffffff80341ef4:     6554 	31 c0                	xor    %eax,%eax
ffffffff80341ef6:     1958 	c3                   	retq   
ffffffff80341ef7:        0 	90                   	nop    
ffffffff80341ef8:        0 	90                   	nop    

That's to be expected - tbench shuffles 3.5 GB of effective data 
to/from sockets. That's 7.5 GB due to double-copy. So for every 64 
bytes of data transferred we spend 1.4 CPU cycles in this specific 
function - that is OK-ish.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                       ` <alpine.LFD.2.00.0811171149100.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-11-17 20:16                                                         ` David Miller
       [not found]                                                           ` <20081117.121641.167690467.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-17 20:16 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-X9Un+BFzKDI, dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Mon, 17 Nov 2008 11:55:35 -0800 (PST)

> So try to figure out why his (better) profile doesn't match your 
> (inferior) one, instead of asking him to do stupid things. It's some 
> difference in architectures, likely: maybe the sparc timekeeping is crap, 
> maybe it's a cache issue and sparc caches are crap, maybe it's something 
> where Niagara (is it niagara) has some oddness that shows up because it 
> has that odd four-threads+four-cores or whatever.

It's on my workstation which is a much simpler 2 processor
UltraSPARC-IIIi (1.5Ghz) system.

And yes I will investigate, it's all I've been doing in my
spare time these past few weeks.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                             ` <alpine.LFD.2.00.0811171156080.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-11-17 20:18                               ` David Miller
  0 siblings, 0 replies; 318+ messages in thread
From: David Miller @ 2008-11-17 20:18 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-X9Un+BFzKDI, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Mon, 17 Nov 2008 11:57:55 -0800 (PST)

> On Mon, 17 Nov 2008, David Miller wrote:
> > And as a result I found that wake_up() is now 4 times slower than it
> > was in 2.6.22, I even analyzed this for every single kernel release
> > till now.
> 
> ..and that's the one where you then pointed to hrtimers, and now you claim 
> that was fixed?

That was a huge increase going from 2.6.26 to 2.6.27, and has
been fixed.

The rest of the gradual release-to-release cost increase, however,
remains.

> At least I haven't seen any new analysis since then.

I will find time ot make it after I get back from Portland.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                           ` <20081117.121641.167690467.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-17 20:30                                                             ` Linus Torvalds
       [not found]                                                               ` <alpine.LFD.2.00.0811171218470.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Linus Torvalds @ 2008-11-17 20:30 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

On Mon, 17 Nov 2008, David Miller wrote:
> 
> It's on my workstation which is a much simpler 2 processor
> UltraSPARC-IIIi (1.5Ghz) system.

Ok. It could easily be something like a cache footprint issue. And while I 
don't know my sparc cpu's very well, I think the Ultrasparc-IIIi is super- 
scalar but does no out-of-order and speculation, no? So I could easily see 
that the indirect branches in the scheduler hurt much more, and might 
explain why the x86 profile looks so different.

One thing that non-NMI profiles also tend to show is "clumping", which in 
turn tends to rather excessively pinpoint code sequences that release the 
irq flag - just because those points show up in profiles, rather than 
being a spread-out-mush. So it's possible that Ingo's profile did show the 
scheduler more, but it was in the form of much more spread out "noise" 
rather than the single spike you saw. 

		Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
                                                                   ` (2 preceding siblings ...)
  2008-11-17 19:57                                                 ` Ingo Molnar
@ 2008-11-17 20:47                                                 ` Ingo Molnar
       [not found]                                                   ` <20081117204743.GD12020-X9Un+BFzKDI@public.gmane.org>
  2008-11-17 22:19                                                 ` Ingo Molnar
  4 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 20:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger


* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:

> 100.000000 total
> ................
>   3.038025 skb_release_data

                      hits (303802 total)
                 .........
ffffffff80488c7e:      780 <skb_release_data>:
ffffffff80488c7e:      780 	55                   	push   %rbp
ffffffff80488c7f:   267141 	53                   	push   %rbx
ffffffff80488c80:        0 	48 89 fb             	mov    %rdi,%rbx
ffffffff80488c83:     3552 	48 83 ec 08          	sub    $0x8,%rsp
ffffffff80488c87:      604 	8a 47 7c             	mov    0x7c(%rdi),%al
ffffffff80488c8a:     2644 	a8 02                	test   $0x2,%al
ffffffff80488c8c:       49 	74 2a                	je     ffffffff80488cb8 <skb_release_data+0x3a>
ffffffff80488c8e:        0 	83 e0 10             	and    $0x10,%eax
ffffffff80488c91:     2079 	8b 97 c8 00 00 00    	mov    0xc8(%rdi),%edx
ffffffff80488c97:       53 	3c 01                	cmp    $0x1,%al
ffffffff80488c99:        0 	19 c0                	sbb    %eax,%eax
ffffffff80488c9b:      870 	48 03 97 d0 00 00 00 	add    0xd0(%rdi),%rdx
ffffffff80488ca2:       65 	66 31 c0             	xor    %ax,%ax
ffffffff80488ca5:        0 	05 01 00 01 00       	add    $0x10001,%eax
ffffffff80488caa:      888 	f7 d8                	neg    %eax
ffffffff80488cac:       49 	89 c1                	mov    %eax,%ecx
ffffffff80488cae:        0 	f0 0f c1 0a          	lock xadd %ecx,(%rdx)
ffffffff80488cb2:     1909 	01 c8                	add    %ecx,%eax
ffffffff80488cb4:     1040 	85 c0                	test   %eax,%eax
ffffffff80488cb6:        0 	75 6d                	jne    ffffffff80488d25 <skb_release_data+0xa7>
ffffffff80488cb8:        0 	8b 93 c8 00 00 00    	mov    0xc8(%rbx),%edx
ffffffff80488cbe:     4199 	48 8b 83 d0 00 00 00 	mov    0xd0(%rbx),%rax
ffffffff80488cc5:     4995 	31 ed                	xor    %ebp,%ebp
ffffffff80488cc7:        0 	66 83 7c 10 04 00    	cmpw   $0x0,0x4(%rax,%rdx,1)
ffffffff80488ccd:      983 	75 15                	jne    ffffffff80488ce4 <skb_release_data+0x66>
ffffffff80488ccf:       15 	eb 28                	jmp    ffffffff80488cf9 <skb_release_data+0x7b>
ffffffff80488cd1:      665 	48 63 c5             	movslq %ebp,%rax
ffffffff80488cd4:      546 	ff c5                	inc    %ebp
ffffffff80488cd6:      328 	48 c1 e0 04          	shl    $0x4,%rax
ffffffff80488cda:      356 	48 8b 7c 02 20       	mov    0x20(%rdx,%rax,1),%rdi
ffffffff80488cdf:       95 	e8 be 87 de ff       	callq  ffffffff802714a2 <put_page>
ffffffff80488ce4:       66 	8b 93 c8 00 00 00    	mov    0xc8(%rbx),%edx
ffffffff80488cea:     1321 	48 03 93 d0 00 00 00 	add    0xd0(%rbx),%rdx
ffffffff80488cf1:      439 	0f b7 42 04          	movzwl 0x4(%rdx),%eax
ffffffff80488cf5:        0 	39 c5                	cmp    %eax,%ebp
ffffffff80488cf7:     1887 	7c d8                	jl     ffffffff80488cd1 <skb_release_data+0x53>
ffffffff80488cf9:     2187 	8b 93 c8 00 00 00    	mov    0xc8(%rbx),%edx
ffffffff80488cff:     1784 	48 8b 83 d0 00 00 00 	mov    0xd0(%rbx),%rax
ffffffff80488d06:      422 	48 83 7c 10 18 00    	cmpq   $0x0,0x18(%rax,%rdx,1)
ffffffff80488d0c:      110 	74 08                	je     ffffffff80488d16 <skb_release_data+0x98>
ffffffff80488d0e:        0 	48 89 df             	mov    %rbx,%rdi
ffffffff80488d11:        0 	e8 52 ff ff ff       	callq  ffffffff80488c68 <skb_drop_fraglist>
ffffffff80488d16:       14 	48 8b bb d0 00 00 00 	mov    0xd0(%rbx),%rdi
ffffffff80488d1d:      715 	5e                   	pop    %rsi
ffffffff80488d1e:      109 	5b                   	pop    %rbx
ffffffff80488d1f:       20 	5d                   	pop    %rbp
ffffffff80488d20:      980 	e9 b7 66 e0 ff       	jmpq   ffffffff8028f3dc <kfree>
ffffffff80488d25:        0 	59                   	pop    %rcx
ffffffff80488d26:     1948 	5b                   	pop    %rbx
ffffffff80488d27:        0 	5d                   	pop    %rbp
ffffffff80488d28:        0 	c3                   	retq   

this is a short function, and 90% of the overhead is false leaked-in 
overhead from callsites:

ffffffff80488c7f:   267141 	53                   	push   %rbx

unfortunately i have a hard time mapping its callsites. 
pskb_expand_head() is the only static callsite, but it's not active in 
the profile.

The _usual_ callsite is normally skb_release_all(), which does have 
overhead:

ffffffff80489449:      925 <skb_release_all>:
ffffffff80489449:      925 	53                   	push   %rbx
ffffffff8048944a:     5249 	48 89 fb             	mov    %rdi,%rbx
ffffffff8048944d:        4 	e8 3c ff ff ff       	callq  ffffffff8048938e <skb_release_head_state>
ffffffff80489452:     1149 	48 89 df             	mov    %rbx,%rdi
ffffffff80489455:    13163 	5b                   	pop    %rbx
ffffffff80489456:        0 	e9 23 f8 ff ff       	jmpq   ffffffff80488c7e <skb_release_data>

it is also tail-optimized, which explains why i found little 
callsites. The main callsite of skb_release_all() is:

ffffffff80488b86:       26 	e8 be 08 00 00       	callq  ffffffff80489449 <skb_release_all>

which is __kfree_skb(). That is a frequently referenced function, and 
in my profile there's a single callsite active:

ffffffff804c1027:      432 	e8 56 7b fc ff       	callq  ffffffff80488b82 <__kfree_skb>

which is tcp_ack() - subject of a later email. The wider context is:

ffffffff804c0ffc:      433 	41 2b 85 e0 00 00 00 	sub    0xe0(%r13),%eax
ffffffff804c1003:     4843 	89 85 f0 00 00 00    	mov    %eax,0xf0(%rbp)
ffffffff804c1009:     1730 	48 8b 45 30          	mov    0x30(%rbp),%rax
ffffffff804c100d:      311 	41 8b 95 e0 00 00 00 	mov    0xe0(%r13),%edx
ffffffff804c1014:        0 	48 83 b8 b0 00 00 00 	cmpq   $0x0,0xb0(%rax)
ffffffff804c101b:        0 	00 
ffffffff804c101c:      418 	74 06                	je     ffffffff804c1024 <tcp_ack+0x50d>
ffffffff804c101e:       37 	01 95 f4 00 00 00    	add    %edx,0xf4(%rbp)
ffffffff804c1024:        2 	4c 89 ef             	mov    %r13,%rdi
ffffffff804c1027:      432 	e8 56 7b fc ff       	callq  ffffffff80488b82 <__kfree_skb>

this is a good, top-of-the-line x86 CPU with a really good BTB 
implementation that seems to be able to fall through calls and tail 
optimizations as if they werent there.

some guesses are:

(gdb) list *0xffffffff804c1003
0xffffffff804c1003 is in tcp_ack (include/net/sock.h:789).
784	
785	static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
786	{
787		skb_truesize_check(skb);
788		sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
789		sk->sk_wmem_queued -= skb->truesize;
790		sk_mem_uncharge(sk, skb->truesize);
791		__kfree_skb(skb);
792	}
793	

both sk and skb should be cache-hot here so this seems unlikely.

(gdb) list *0xffffffff804c10090xffffffff804c1009 is in tcp_ack (include/net/sock.h:736).
731	}
732	
733	static inline int sk_has_account(struct sock *sk)
734	{
735		/* return true if protocol supports memory accounting */
736		return !!sk->sk_prot->memory_allocated;
737	}
738	
739	static inline int sk_wmem_schedule(struct sock *sk, int size)
740	{

this cannot be it - unless sk_prot somehow ends up being dirtied or 
false-shared?

Still, my guess would be on ffffffff804c1009 and a 
sk_prot->memory_allocated cachemiss: look at how this instruction uses 
%ebp, and the one that shows the many hits in skb_release_data() 
pushes %ebp to the stack - that's where the CPU's OOO trick ends: it 
has to compute the result and serialize on the cachemiss.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                   ` <20081117204743.GD12020-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 20:56                                                     ` Eric Dumazet
  0 siblings, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 20:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger

Ingo Molnar a écrit :
> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
> 
>> 100.000000 total
>> ................
>>   3.038025 skb_release_data
> 
>                       hits (303802 total)
>                  .........
> ffffffff80488c7e:      780 <skb_release_data>:
> ffffffff80488c7e:      780 	55                   	push   %rbp
> ffffffff80488c7f:   267141 	53                   	push   %rbx
> ffffffff80488c80:        0 	48 89 fb             	mov    %rdi,%rbx
> ffffffff80488c83:     3552 	48 83 ec 08          	sub    $0x8,%rsp
> ffffffff80488c87:      604 	8a 47 7c             	mov    0x7c(%rdi),%al
> ffffffff80488c8a:     2644 	a8 02                	test   $0x2,%al
> ffffffff80488c8c:       49 	74 2a                	je     ffffffff80488cb8 <skb_release_data+0x3a>
> ffffffff80488c8e:        0 	83 e0 10             	and    $0x10,%eax
> ffffffff80488c91:     2079 	8b 97 c8 00 00 00    	mov    0xc8(%rdi),%edx
> ffffffff80488c97:       53 	3c 01                	cmp    $0x1,%al
> ffffffff80488c99:        0 	19 c0                	sbb    %eax,%eax
> ffffffff80488c9b:      870 	48 03 97 d0 00 00 00 	add    0xd0(%rdi),%rdx
> ffffffff80488ca2:       65 	66 31 c0             	xor    %ax,%ax
> ffffffff80488ca5:        0 	05 01 00 01 00       	add    $0x10001,%eax
> ffffffff80488caa:      888 	f7 d8                	neg    %eax
> ffffffff80488cac:       49 	89 c1                	mov    %eax,%ecx
> ffffffff80488cae:        0 	f0 0f c1 0a          	lock xadd %ecx,(%rdx)
> ffffffff80488cb2:     1909 	01 c8                	add    %ecx,%eax
> ffffffff80488cb4:     1040 	85 c0                	test   %eax,%eax
> ffffffff80488cb6:        0 	75 6d                	jne    ffffffff80488d25 <skb_release_data+0xa7>
> ffffffff80488cb8:        0 	8b 93 c8 00 00 00    	mov    0xc8(%rbx),%edx
> ffffffff80488cbe:     4199 	48 8b 83 d0 00 00 00 	mov    0xd0(%rbx),%rax
> ffffffff80488cc5:     4995 	31 ed                	xor    %ebp,%ebp
> ffffffff80488cc7:        0 	66 83 7c 10 04 00    	cmpw   $0x0,0x4(%rax,%rdx,1)
> ffffffff80488ccd:      983 	75 15                	jne    ffffffff80488ce4 <skb_release_data+0x66>
> ffffffff80488ccf:       15 	eb 28                	jmp    ffffffff80488cf9 <skb_release_data+0x7b>
> ffffffff80488cd1:      665 	48 63 c5             	movslq %ebp,%rax
> ffffffff80488cd4:      546 	ff c5                	inc    %ebp
> ffffffff80488cd6:      328 	48 c1 e0 04          	shl    $0x4,%rax
> ffffffff80488cda:      356 	48 8b 7c 02 20       	mov    0x20(%rdx,%rax,1),%rdi
> ffffffff80488cdf:       95 	e8 be 87 de ff       	callq  ffffffff802714a2 <put_page>
> ffffffff80488ce4:       66 	8b 93 c8 00 00 00    	mov    0xc8(%rbx),%edx
> ffffffff80488cea:     1321 	48 03 93 d0 00 00 00 	add    0xd0(%rbx),%rdx
> ffffffff80488cf1:      439 	0f b7 42 04          	movzwl 0x4(%rdx),%eax
> ffffffff80488cf5:        0 	39 c5                	cmp    %eax,%ebp
> ffffffff80488cf7:     1887 	7c d8                	jl     ffffffff80488cd1 <skb_release_data+0x53>
> ffffffff80488cf9:     2187 	8b 93 c8 00 00 00    	mov    0xc8(%rbx),%edx
> ffffffff80488cff:     1784 	48 8b 83 d0 00 00 00 	mov    0xd0(%rbx),%rax
> ffffffff80488d06:      422 	48 83 7c 10 18 00    	cmpq   $0x0,0x18(%rax,%rdx,1)
> ffffffff80488d0c:      110 	74 08                	je     ffffffff80488d16 <skb_release_data+0x98>
> ffffffff80488d0e:        0 	48 89 df             	mov    %rbx,%rdi
> ffffffff80488d11:        0 	e8 52 ff ff ff       	callq  ffffffff80488c68 <skb_drop_fraglist>
> ffffffff80488d16:       14 	48 8b bb d0 00 00 00 	mov    0xd0(%rbx),%rdi
> ffffffff80488d1d:      715 	5e                   	pop    %rsi
> ffffffff80488d1e:      109 	5b                   	pop    %rbx
> ffffffff80488d1f:       20 	5d                   	pop    %rbp
> ffffffff80488d20:      980 	e9 b7 66 e0 ff       	jmpq   ffffffff8028f3dc <kfree>
> ffffffff80488d25:        0 	59                   	pop    %rcx
> ffffffff80488d26:     1948 	5b                   	pop    %rbx
> ffffffff80488d27:        0 	5d                   	pop    %rbp
> ffffffff80488d28:        0 	c3                   	retq   
> 
> this is a short function, and 90% of the overhead is false leaked-in 
> overhead from callsites:
> 
> ffffffff80488c7f:   267141 	53                   	push   %rbx
> 
> unfortunately i have a hard time mapping its callsites. 
> pskb_expand_head() is the only static callsite, but it's not active in 
> the profile.
> 
> The _usual_ callsite is normally skb_release_all(), which does have 
> overhead:
> 
> ffffffff80489449:      925 <skb_release_all>:
> ffffffff80489449:      925 	53                   	push   %rbx
> ffffffff8048944a:     5249 	48 89 fb             	mov    %rdi,%rbx
> ffffffff8048944d:        4 	e8 3c ff ff ff       	callq  ffffffff8048938e <skb_release_head_state>
> ffffffff80489452:     1149 	48 89 df             	mov    %rbx,%rdi
> ffffffff80489455:    13163 	5b                   	pop    %rbx
> ffffffff80489456:        0 	e9 23 f8 ff ff       	jmpq   ffffffff80488c7e <skb_release_data>
> 
> it is also tail-optimized, which explains why i found little 
> callsites. The main callsite of skb_release_all() is:
> 
> ffffffff80488b86:       26 	e8 be 08 00 00       	callq  ffffffff80489449 <skb_release_all>
> 
> which is __kfree_skb(). That is a frequently referenced function, and 
> in my profile there's a single callsite active:
> 
> ffffffff804c1027:      432 	e8 56 7b fc ff       	callq  ffffffff80488b82 <__kfree_skb>
> 
> which is tcp_ack() - subject of a later email. The wider context is:
> 
> ffffffff804c0ffc:      433 	41 2b 85 e0 00 00 00 	sub    0xe0(%r13),%eax
> ffffffff804c1003:     4843 	89 85 f0 00 00 00    	mov    %eax,0xf0(%rbp)
> ffffffff804c1009:     1730 	48 8b 45 30          	mov    0x30(%rbp),%rax
> ffffffff804c100d:      311 	41 8b 95 e0 00 00 00 	mov    0xe0(%r13),%edx
> ffffffff804c1014:        0 	48 83 b8 b0 00 00 00 	cmpq   $0x0,0xb0(%rax)
> ffffffff804c101b:        0 	00 
> ffffffff804c101c:      418 	74 06                	je     ffffffff804c1024 <tcp_ack+0x50d>
> ffffffff804c101e:       37 	01 95 f4 00 00 00    	add    %edx,0xf4(%rbp)
> ffffffff804c1024:        2 	4c 89 ef             	mov    %r13,%rdi
> ffffffff804c1027:      432 	e8 56 7b fc ff       	callq  ffffffff80488b82 <__kfree_skb>
> 
> this is a good, top-of-the-line x86 CPU with a really good BTB 
> implementation that seems to be able to fall through calls and tail 
> optimizations as if they werent there.
> 
> some guesses are:
> 
> (gdb) list *0xffffffff804c1003
> 0xffffffff804c1003 is in tcp_ack (include/net/sock.h:789).
> 784	
> 785	static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
> 786	{
> 787		skb_truesize_check(skb);
> 788		sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
> 789		sk->sk_wmem_queued -= skb->truesize;
> 790		sk_mem_uncharge(sk, skb->truesize);
> 791		__kfree_skb(skb);
> 792	}
> 793	
> 
> both sk and skb should be cache-hot here so this seems unlikely.
> 
> (gdb) list *0xffffffff804c10090xffffffff804c1009 is in tcp_ack (include/net/sock.h:736).
> 731	}
> 732	
> 733	static inline int sk_has_account(struct sock *sk)
> 734	{
> 735		/* return true if protocol supports memory accounting */
> 736		return !!sk->sk_prot->memory_allocated;
> 737	}
> 738	
> 739	static inline int sk_wmem_schedule(struct sock *sk, int size)
> 740	{
> 
> this cannot be it - unless sk_prot somehow ends up being dirtied or 
> false-shared?
> 
> Still, my guess would be on ffffffff804c1009 and a 
> sk_prot->memory_allocated cachemiss: look at how this instruction uses 
> %ebp, and the one that shows the many hits in skb_release_data() 
> pushes %ebp to the stack - that's where the CPU's OOO trick ends: it 
> has to compute the result and serialize on the cachemiss.
> 

I did some investigation on this part (memory_allocated) and discovered UDP had a problem,
not TCP (and tbench)

commit 270acefafeb74ce2fe93d35b75733870bf1e11e7

net: sk_free_datagram() should use sk_mem_reclaim_partial()

I noticed a contention on udp_memory_allocated on regular UDP applications.

While tcp_memory_allocated is seldom used, it appears each incoming UDP frame
is currently touching udp_memory_allocated when queued, and when received by
application.

One possible solution is to use sk_mem_reclaim_partial() instead of
sk_mem_reclaim(), so that we keep a small reserve (less than one page)
of memory for each UDP socket.

We did something very similar on TCP side in commit
9993e7d313e80bdc005d09c7def91903e0068f07
([TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer())

A more complex solution would need to convert prot->memory_allocated to
use a percpu_counter with batches of 64 or 128 pages.

Signed-off-by: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                               ` <alpine.LFD.2.00.0811171218470.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-11-17 20:58                                                                 ` David Miller
       [not found]                                                                   ` <20081117.125826.193693115.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-17 20:58 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-X9Un+BFzKDI, dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Mon, 17 Nov 2008 12:30:00 -0800 (PST)

> On Mon, 17 Nov 2008, David Miller wrote:
> > 
> > It's on my workstation which is a much simpler 2 processor
> > UltraSPARC-IIIi (1.5Ghz) system.
> 
> Ok. It could easily be something like a cache footprint issue. And while I 
> don't know my sparc cpu's very well, I think the Ultrasparc-IIIi is super- 
> scalar but does no out-of-order and speculation, no?

I does only very simple speculation, but you're description is accurate.

> So I could easily see that the indirect branches in the scheduler
> hurt much more, and might explain why the x86 profile looks so
> different.

Right.

> One thing that non-NMI profiles also tend to show is "clumping", which in 
> turn tends to rather excessively pinpoint code sequences that release the 
> irq flag - just because those points show up in profiles, rather than 
> being a spread-out-mush. So it's possible that Ingo's profile did show the 
> scheduler more, but it was in the form of much more spread out "noise" 
> rather than the single spike you saw. 

Sure.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
  2008-11-17 18:49                                             ` Ingo Molnar
       [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 22:08                                               ` Ingo Molnar
       [not found]                                                 ` <20081117220828.GB6398-X9Un+BFzKDI@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 22:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, rjw, linux-kernel, kernel-testers, cl,
	efault, a.p.zijlstra, Stephen Hemminger


* Ingo Molnar <mingo@elte.hu> wrote:

> 100.000000 total
> ................
>   1.469183 tcp_current_mss

                      hits (total: 146918)
                 .........
ffffffff804c5237:      526 <tcp_current_mss>:
ffffffff804c5237:      526 	41 54                	push   %r12
ffffffff804c5239:     5929 	55                   	push   %rbp
ffffffff804c523a:       32 	53                   	push   %rbx
ffffffff804c523b:      294 	48 89 fb             	mov    %rdi,%rbx
ffffffff804c523e:      539 	48 83 ec 30          	sub    $0x30,%rsp
ffffffff804c5242:     2590 	85 f6                	test   %esi,%esi
ffffffff804c5244:      444 	48 8b 4f 78          	mov    0x78(%rdi),%rcx
ffffffff804c5248:      521 	8b af 4c 04 00 00    	mov    0x44c(%rdi),%ebp
ffffffff804c524e:      791 	74 2a                	je     ffffffff804c527a <tcp_current_mss+0x43>
ffffffff804c5250:      433 	8b 87 00 01 00 00    	mov    0x100(%rdi),%eax
ffffffff804c5256:      236 	c1 e0 10             	shl    $0x10,%eax
ffffffff804c5259:      191 	89 c2                	mov    %eax,%edx
ffffffff804c525b:      487 	23 97 fc 00 00 00    	and    0xfc(%rdi),%edx
ffffffff804c5261:      362 	39 c2                	cmp    %eax,%edx
ffffffff804c5263:      342 	75 15                	jne    ffffffff804c527a <tcp_current_mss+0x43>
ffffffff804c5265:      473 	45 31 e4             	xor    %r12d,%r12d
ffffffff804c5268:      221 	8b 87 00 04 00 00    	mov    0x400(%rdi),%eax
ffffffff804c526e:      194 	3b 87 80 04 00 00    	cmp    0x480(%rdi),%eax
ffffffff804c5274:      445 	41 0f 94 c4          	sete   %r12b
ffffffff804c5278:      261 	eb 03                	jmp    ffffffff804c527d <tcp_current_mss+0x46>
ffffffff804c527a:        0 	45 31 e4             	xor    %r12d,%r12d
ffffffff804c527d:      185 	48 85 c9             	test   %rcx,%rcx
ffffffff804c5280:      686 	74 15                	je     ffffffff804c5297 <tcp_current_mss+0x60>
ffffffff804c5282:     1806 	8b 71 7c             	mov    0x7c(%rcx),%esi
ffffffff804c5285:        1 	3b b3 5c 03 00 00    	cmp    0x35c(%rbx),%esi
ffffffff804c528b:       21 	74 0a                	je     ffffffff804c5297 <tcp_current_mss+0x60>
ffffffff804c528d:        0 	48 89 df             	mov    %rbx,%rdi
ffffffff804c5290:        0 	e8 8b fb ff ff       	callq  ffffffff804c4e20 <tcp_sync_mss>
ffffffff804c5295:        0 	89 c5                	mov    %eax,%ebp
ffffffff804c5297:      864 	48 8d 4c 24 28       	lea    0x28(%rsp),%rcx
ffffffff804c529c:      634 	48 8d 54 24 10       	lea    0x10(%rsp),%rdx
ffffffff804c52a1:      995 	31 f6                	xor    %esi,%esi
ffffffff804c52a3:        0 	48 89 df             	mov    %rbx,%rdi
ffffffff804c52a6:        2 	e8 f2 fe ff ff       	callq  ffffffff804c519d <tcp_established_options>
ffffffff804c52ab:      859 	8b 8b e8 03 00 00    	mov    0x3e8(%rbx),%ecx
ffffffff804c52b1:      936 	83 c0 14             	add    $0x14,%eax
ffffffff804c52b4:        6 	0f b7 d1             	movzwl %cx,%edx
ffffffff804c52b7:        0 	39 d0                	cmp    %edx,%eax
ffffffff804c52b9:      911 	74 04                	je     ffffffff804c52bf <tcp_current_mss+0x88>
ffffffff804c52bb:        0 	29 d0                	sub    %edx,%eax
ffffffff804c52bd:        0 	29 c5                	sub    %eax,%ebp
ffffffff804c52bf:        0 	45 85 e4             	test   %r12d,%r12d
ffffffff804c52c2:     6894 	89 e8                	mov    %ebp,%eax
ffffffff804c52c4:        0 	74 38                	je     ffffffff804c52fe <tcp_current_mss+0xc7>
ffffffff804c52c6:      990 	48 8b 83 68 03 00 00 	mov    0x368(%rbx),%rax
ffffffff804c52cd:      642 	8b b3 04 01 00 00    	mov    0x104(%rbx),%esi
ffffffff804c52d3:        3 	48 89 df             	mov    %rbx,%rdi
ffffffff804c52d6:      240 	66 2b 70 30          	sub    0x30(%rax),%si
ffffffff804c52da:      588 	66 2b b3 7e 03 00 00 	sub    0x37e(%rbx),%si
ffffffff804c52e1:        2 	66 29 ce             	sub    %cx,%si
ffffffff804c52e4:      284 	ff ce                	dec    %esi
ffffffff804c52e6:      664 	0f b7 f6             	movzwl %si,%esi
ffffffff804c52e9:        2 	e8 0a fb ff ff       	callq  ffffffff804c4df8 <tcp_bound_to_half_wnd>
ffffffff804c52ee:       68 	0f b7 d0             	movzwl %ax,%edx
ffffffff804c52f1:     1870 	89 c1                	mov    %eax,%ecx
ffffffff804c52f3:        0 	89 d0                	mov    %edx,%eax
ffffffff804c52f5:        0 	31 d2                	xor    %edx,%edx
ffffffff804c52f7:     2135 	f7 f5                	div    %ebp
ffffffff804c52f9:   107010 	89 c8                	mov    %ecx,%eax
ffffffff804c52fb:     1670 	66 29 d0             	sub    %dx,%ax
ffffffff804c52fe:        0 	66 89 83 ea 03 00 00 	mov    %ax,0x3ea(%rbx)
ffffffff804c5305:        4 	48 83 c4 30          	add    $0x30,%rsp
ffffffff804c5309:      855 	89 e8                	mov    %ebp,%eax
ffffffff804c530b:        0 	5b                   	pop    %rbx
ffffffff804c530c:      797 	5d                   	pop    %rbp
ffffffff804c530d:        0 	41 5c                	pop    %r12
ffffffff804c530f:        0 	c3                   	retq   

apparently this division causes 1.0% of tbench overhead:

ffffffff804c52f5:        0 	31 d2                	xor    %edx,%edx
ffffffff804c52f7:     2135 	f7 f5                	div    %ebp
ffffffff804c52f9:   107010 	89 c8                	mov    %ecx,%eax

(gdb) list *0xffffffff804c52f7
0xffffffff804c52f7 is in tcp_current_mss (net/ipv4/tcp_output.c:1078).
1073					  inet_csk(sk)->icsk_af_ops->net_header_len -
1074					  inet_csk(sk)->icsk_ext_hdr_len -
1075					  tp->tcp_header_len);
1076	
1077			xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
1078			xmit_size_goal -= (xmit_size_goal % mss_now);
1079		}
1080		tp->xmit_size_goal = xmit_size_goal;
1081	
1082		return mss_now;
(gdb) 

it's this division:

        if (doing_tso) {
        [...]
			xmit_size_goal -= (xmit_size_goal % mss_now);

Has no-one hit this before? Perhaps this is why switching loopback 
networking to TSO had a performance impact for others?

It's still a bit weird ... how can a single division cause this much 
overhead? tcp_bound_to_half_wnd() [which is called straight before 
this sequence] seems low-overhead.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                 ` <20081117220828.GB6398-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 22:15                                                   ` Eric Dumazet
       [not found]                                                     ` <4921ED16.9050307-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 22:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger

Ingo Molnar a écrit :
> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
> 
>> 100.000000 total
>> ................
>>   1.469183 tcp_current_mss
> 
>                       hits (total: 146918)
>                  .........
> ffffffff804c5237:      526 <tcp_current_mss>:
> ffffffff804c5237:      526 	41 54                	push   %r12
> ffffffff804c5239:     5929 	55                   	push   %rbp
> ffffffff804c523a:       32 	53                   	push   %rbx
> ffffffff804c523b:      294 	48 89 fb             	mov    %rdi,%rbx
> ffffffff804c523e:      539 	48 83 ec 30          	sub    $0x30,%rsp
> ffffffff804c5242:     2590 	85 f6                	test   %esi,%esi
> ffffffff804c5244:      444 	48 8b 4f 78          	mov    0x78(%rdi),%rcx
> ffffffff804c5248:      521 	8b af 4c 04 00 00    	mov    0x44c(%rdi),%ebp
> ffffffff804c524e:      791 	74 2a                	je     ffffffff804c527a <tcp_current_mss+0x43>
> ffffffff804c5250:      433 	8b 87 00 01 00 00    	mov    0x100(%rdi),%eax
> ffffffff804c5256:      236 	c1 e0 10             	shl    $0x10,%eax
> ffffffff804c5259:      191 	89 c2                	mov    %eax,%edx
> ffffffff804c525b:      487 	23 97 fc 00 00 00    	and    0xfc(%rdi),%edx
> ffffffff804c5261:      362 	39 c2                	cmp    %eax,%edx
> ffffffff804c5263:      342 	75 15                	jne    ffffffff804c527a <tcp_current_mss+0x43>
> ffffffff804c5265:      473 	45 31 e4             	xor    %r12d,%r12d
> ffffffff804c5268:      221 	8b 87 00 04 00 00    	mov    0x400(%rdi),%eax
> ffffffff804c526e:      194 	3b 87 80 04 00 00    	cmp    0x480(%rdi),%eax
> ffffffff804c5274:      445 	41 0f 94 c4          	sete   %r12b
> ffffffff804c5278:      261 	eb 03                	jmp    ffffffff804c527d <tcp_current_mss+0x46>
> ffffffff804c527a:        0 	45 31 e4             	xor    %r12d,%r12d
> ffffffff804c527d:      185 	48 85 c9             	test   %rcx,%rcx
> ffffffff804c5280:      686 	74 15                	je     ffffffff804c5297 <tcp_current_mss+0x60>
> ffffffff804c5282:     1806 	8b 71 7c             	mov    0x7c(%rcx),%esi
> ffffffff804c5285:        1 	3b b3 5c 03 00 00    	cmp    0x35c(%rbx),%esi
> ffffffff804c528b:       21 	74 0a                	je     ffffffff804c5297 <tcp_current_mss+0x60>
> ffffffff804c528d:        0 	48 89 df             	mov    %rbx,%rdi
> ffffffff804c5290:        0 	e8 8b fb ff ff       	callq  ffffffff804c4e20 <tcp_sync_mss>
> ffffffff804c5295:        0 	89 c5                	mov    %eax,%ebp
> ffffffff804c5297:      864 	48 8d 4c 24 28       	lea    0x28(%rsp),%rcx
> ffffffff804c529c:      634 	48 8d 54 24 10       	lea    0x10(%rsp),%rdx
> ffffffff804c52a1:      995 	31 f6                	xor    %esi,%esi
> ffffffff804c52a3:        0 	48 89 df             	mov    %rbx,%rdi
> ffffffff804c52a6:        2 	e8 f2 fe ff ff       	callq  ffffffff804c519d <tcp_established_options>
> ffffffff804c52ab:      859 	8b 8b e8 03 00 00    	mov    0x3e8(%rbx),%ecx
> ffffffff804c52b1:      936 	83 c0 14             	add    $0x14,%eax
> ffffffff804c52b4:        6 	0f b7 d1             	movzwl %cx,%edx
> ffffffff804c52b7:        0 	39 d0                	cmp    %edx,%eax
> ffffffff804c52b9:      911 	74 04                	je     ffffffff804c52bf <tcp_current_mss+0x88>
> ffffffff804c52bb:        0 	29 d0                	sub    %edx,%eax
> ffffffff804c52bd:        0 	29 c5                	sub    %eax,%ebp
> ffffffff804c52bf:        0 	45 85 e4             	test   %r12d,%r12d
> ffffffff804c52c2:     6894 	89 e8                	mov    %ebp,%eax
> ffffffff804c52c4:        0 	74 38                	je     ffffffff804c52fe <tcp_current_mss+0xc7>
> ffffffff804c52c6:      990 	48 8b 83 68 03 00 00 	mov    0x368(%rbx),%rax
> ffffffff804c52cd:      642 	8b b3 04 01 00 00    	mov    0x104(%rbx),%esi
> ffffffff804c52d3:        3 	48 89 df             	mov    %rbx,%rdi
> ffffffff804c52d6:      240 	66 2b 70 30          	sub    0x30(%rax),%si
> ffffffff804c52da:      588 	66 2b b3 7e 03 00 00 	sub    0x37e(%rbx),%si
> ffffffff804c52e1:        2 	66 29 ce             	sub    %cx,%si
> ffffffff804c52e4:      284 	ff ce                	dec    %esi
> ffffffff804c52e6:      664 	0f b7 f6             	movzwl %si,%esi
> ffffffff804c52e9:        2 	e8 0a fb ff ff       	callq  ffffffff804c4df8 <tcp_bound_to_half_wnd>
> ffffffff804c52ee:       68 	0f b7 d0             	movzwl %ax,%edx
> ffffffff804c52f1:     1870 	89 c1                	mov    %eax,%ecx
> ffffffff804c52f3:        0 	89 d0                	mov    %edx,%eax
> ffffffff804c52f5:        0 	31 d2                	xor    %edx,%edx
> ffffffff804c52f7:     2135 	f7 f5                	div    %ebp
> ffffffff804c52f9:   107010 	89 c8                	mov    %ecx,%eax
> ffffffff804c52fb:     1670 	66 29 d0             	sub    %dx,%ax
> ffffffff804c52fe:        0 	66 89 83 ea 03 00 00 	mov    %ax,0x3ea(%rbx)
> ffffffff804c5305:        4 	48 83 c4 30          	add    $0x30,%rsp
> ffffffff804c5309:      855 	89 e8                	mov    %ebp,%eax
> ffffffff804c530b:        0 	5b                   	pop    %rbx
> ffffffff804c530c:      797 	5d                   	pop    %rbp
> ffffffff804c530d:        0 	41 5c                	pop    %r12
> ffffffff804c530f:        0 	c3                   	retq   
> 
> apparently this division causes 1.0% of tbench overhead:
> 
> ffffffff804c52f5:        0 	31 d2                	xor    %edx,%edx
> ffffffff804c52f7:     2135 	f7 f5                	div    %ebp
> ffffffff804c52f9:   107010 	89 c8                	mov    %ecx,%eax
> 
> (gdb) list *0xffffffff804c52f7
> 0xffffffff804c52f7 is in tcp_current_mss (net/ipv4/tcp_output.c:1078).
> 1073					  inet_csk(sk)->icsk_af_ops->net_header_len -
> 1074					  inet_csk(sk)->icsk_ext_hdr_len -
> 1075					  tp->tcp_header_len);
> 1076	
> 1077			xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
> 1078			xmit_size_goal -= (xmit_size_goal % mss_now);
> 1079		}
> 1080		tp->xmit_size_goal = xmit_size_goal;
> 1081	
> 1082		return mss_now;
> (gdb) 
> 
> it's this division:
> 
>         if (doing_tso) {
>         [...]
> 			xmit_size_goal -= (xmit_size_goal % mss_now);
> 
> Has no-one hit this before? Perhaps this is why switching loopback 
> networking to TSO had a performance impact for others?

Yes, I mentioned it later. But apparently you dont read my mails, so
I will just stop now.

> 
> It's still a bit weird ... how can a single division cause this much 
> overhead? tcp_bound_to_half_wnd() [which is called straight before 
> this sequence] seems low-overhead.
> 
> 	Ingo
> 
> 


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
                                                                   ` (3 preceding siblings ...)
  2008-11-17 20:47                                                 ` Ingo Molnar
@ 2008-11-17 22:19                                                 ` Ingo Molnar
  4 siblings, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 22:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger


* Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:

> 100.000000 total
> ................
>   1.385125 tcp_sendmsg

this too is spread out, no spikes i noticed.

Seems like the subsequent functions seem to be spread out pretty 
evenly, with no particular spikes visible.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                     ` <4921ED16.9050307-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2008-11-17 22:26                                                       ` Ingo Molnar
       [not found]                                                         ` <20081117222640.GA17880-X9Un+BFzKDI@public.gmane.org>
  2008-11-18  5:23                                                       ` David Miller
  1 sibling, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 22:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linus Torvalds, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger


* Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:

> Ingo Molnar a écrit :
>> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
>>
>>> 100.000000 total
>>> ................
>>>   1.469183 tcp_current_mss
>>
>>                       hits (total: 146918)
>>                  .........
>> ffffffff804c5237:      526 <tcp_current_mss>:
>> ffffffff804c5237:      526 	41 54                	push   %r12
>> ffffffff804c5239:     5929 	55                   	push   %rbp
>> ffffffff804c523a:       32 	53                   	push   %rbx
>> ffffffff804c523b:      294 	48 89 fb             	mov    %rdi,%rbx
>> ffffffff804c523e:      539 	48 83 ec 30          	sub    $0x30,%rsp
>> ffffffff804c5242:     2590 	85 f6                	test   %esi,%esi
>> ffffffff804c5244:      444 	48 8b 4f 78          	mov    0x78(%rdi),%rcx
>> ffffffff804c5248:      521 	8b af 4c 04 00 00    	mov    0x44c(%rdi),%ebp
>> ffffffff804c524e:      791 	74 2a                	je     ffffffff804c527a <tcp_current_mss+0x43>
>> ffffffff804c5250:      433 	8b 87 00 01 00 00    	mov    0x100(%rdi),%eax
>> ffffffff804c5256:      236 	c1 e0 10             	shl    $0x10,%eax
>> ffffffff804c5259:      191 	89 c2                	mov    %eax,%edx
>> ffffffff804c525b:      487 	23 97 fc 00 00 00    	and    0xfc(%rdi),%edx
>> ffffffff804c5261:      362 	39 c2                	cmp    %eax,%edx
>> ffffffff804c5263:      342 	75 15                	jne    ffffffff804c527a <tcp_current_mss+0x43>
>> ffffffff804c5265:      473 	45 31 e4             	xor    %r12d,%r12d
>> ffffffff804c5268:      221 	8b 87 00 04 00 00    	mov    0x400(%rdi),%eax
>> ffffffff804c526e:      194 	3b 87 80 04 00 00    	cmp    0x480(%rdi),%eax
>> ffffffff804c5274:      445 	41 0f 94 c4          	sete   %r12b
>> ffffffff804c5278:      261 	eb 03                	jmp    ffffffff804c527d <tcp_current_mss+0x46>
>> ffffffff804c527a:        0 	45 31 e4             	xor    %r12d,%r12d
>> ffffffff804c527d:      185 	48 85 c9             	test   %rcx,%rcx
>> ffffffff804c5280:      686 	74 15                	je     ffffffff804c5297 <tcp_current_mss+0x60>
>> ffffffff804c5282:     1806 	8b 71 7c             	mov    0x7c(%rcx),%esi
>> ffffffff804c5285:        1 	3b b3 5c 03 00 00    	cmp    0x35c(%rbx),%esi
>> ffffffff804c528b:       21 	74 0a                	je     ffffffff804c5297 <tcp_current_mss+0x60>
>> ffffffff804c528d:        0 	48 89 df             	mov    %rbx,%rdi
>> ffffffff804c5290:        0 	e8 8b fb ff ff       	callq  ffffffff804c4e20 <tcp_sync_mss>
>> ffffffff804c5295:        0 	89 c5                	mov    %eax,%ebp
>> ffffffff804c5297:      864 	48 8d 4c 24 28       	lea    0x28(%rsp),%rcx
>> ffffffff804c529c:      634 	48 8d 54 24 10       	lea    0x10(%rsp),%rdx
>> ffffffff804c52a1:      995 	31 f6                	xor    %esi,%esi
>> ffffffff804c52a3:        0 	48 89 df             	mov    %rbx,%rdi
>> ffffffff804c52a6:        2 	e8 f2 fe ff ff       	callq  ffffffff804c519d <tcp_established_options>
>> ffffffff804c52ab:      859 	8b 8b e8 03 00 00    	mov    0x3e8(%rbx),%ecx
>> ffffffff804c52b1:      936 	83 c0 14             	add    $0x14,%eax
>> ffffffff804c52b4:        6 	0f b7 d1             	movzwl %cx,%edx
>> ffffffff804c52b7:        0 	39 d0                	cmp    %edx,%eax
>> ffffffff804c52b9:      911 	74 04                	je     ffffffff804c52bf <tcp_current_mss+0x88>
>> ffffffff804c52bb:        0 	29 d0                	sub    %edx,%eax
>> ffffffff804c52bd:        0 	29 c5                	sub    %eax,%ebp
>> ffffffff804c52bf:        0 	45 85 e4             	test   %r12d,%r12d
>> ffffffff804c52c2:     6894 	89 e8                	mov    %ebp,%eax
>> ffffffff804c52c4:        0 	74 38                	je     ffffffff804c52fe <tcp_current_mss+0xc7>
>> ffffffff804c52c6:      990 	48 8b 83 68 03 00 00 	mov    0x368(%rbx),%rax
>> ffffffff804c52cd:      642 	8b b3 04 01 00 00    	mov    0x104(%rbx),%esi
>> ffffffff804c52d3:        3 	48 89 df             	mov    %rbx,%rdi
>> ffffffff804c52d6:      240 	66 2b 70 30          	sub    0x30(%rax),%si
>> ffffffff804c52da:      588 	66 2b b3 7e 03 00 00 	sub    0x37e(%rbx),%si
>> ffffffff804c52e1:        2 	66 29 ce             	sub    %cx,%si
>> ffffffff804c52e4:      284 	ff ce                	dec    %esi
>> ffffffff804c52e6:      664 	0f b7 f6             	movzwl %si,%esi
>> ffffffff804c52e9:        2 	e8 0a fb ff ff       	callq  ffffffff804c4df8 <tcp_bound_to_half_wnd>
>> ffffffff804c52ee:       68 	0f b7 d0             	movzwl %ax,%edx
>> ffffffff804c52f1:     1870 	89 c1                	mov    %eax,%ecx
>> ffffffff804c52f3:        0 	89 d0                	mov    %edx,%eax
>> ffffffff804c52f5:        0 	31 d2                	xor    %edx,%edx
>> ffffffff804c52f7:     2135 	f7 f5                	div    %ebp
>> ffffffff804c52f9:   107010 	89 c8                	mov    %ecx,%eax
>> ffffffff804c52fb:     1670 	66 29 d0             	sub    %dx,%ax
>> ffffffff804c52fe:        0 	66 89 83 ea 03 00 00 	mov    %ax,0x3ea(%rbx)
>> ffffffff804c5305:        4 	48 83 c4 30          	add    $0x30,%rsp
>> ffffffff804c5309:      855 	89 e8                	mov    %ebp,%eax
>> ffffffff804c530b:        0 	5b                   	pop    %rbx
>> ffffffff804c530c:      797 	5d                   	pop    %rbp
>> ffffffff804c530d:        0 	41 5c                	pop    %r12
>> ffffffff804c530f:        0 	c3                   	retq   
>>
>> apparently this division causes 1.0% of tbench overhead:
>>
>> ffffffff804c52f5:        0 	31 d2                	xor    %edx,%edx
>> ffffffff804c52f7:     2135 	f7 f5                	div    %ebp
>> ffffffff804c52f9:   107010 	89 c8                	mov    %ecx,%eax
>>
>> (gdb) list *0xffffffff804c52f7
>> 0xffffffff804c52f7 is in tcp_current_mss (net/ipv4/tcp_output.c:1078).
>> 1073					  inet_csk(sk)->icsk_af_ops->net_header_len -
>> 1074					  inet_csk(sk)->icsk_ext_hdr_len -
>> 1075					  tp->tcp_header_len);
>> 1076	
>> 1077			xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
>> 1078			xmit_size_goal -= (xmit_size_goal % mss_now);
>> 1079		}
>> 1080		tp->xmit_size_goal = xmit_size_goal;
>> 1081	
>> 1082		return mss_now;
>> (gdb) 
>>
>> it's this division:
>>
>>         if (doing_tso) {
>>         [...]
>> 			xmit_size_goal -= (xmit_size_goal % mss_now);
>>
>> Has no-one hit this before? Perhaps this is why switching loopback  
>> networking to TSO had a performance impact for others?
>
> Yes, I mentioned it later. [...]

i see - i just caught up with some of my inbox from today.

> [...] But apparently you dont read my mails, so I will just stop 
> now.

Sorry, i spent my time looking at the profile output.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                         ` <20081117222640.GA17880-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-17 22:39                                                           ` Eric Dumazet
  0 siblings, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-11-17 22:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, David Miller, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Stephen Hemminger

Ingo Molnar a écrit :
> * Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:
> 
>> Ingo Molnar a écrit :

>>> it's this division:
>>>
>>>         if (doing_tso) {
>>>         [...]
>>> 			xmit_size_goal -= (xmit_size_goal % mss_now);
>>>
>>> Has no-one hit this before? Perhaps this is why switching loopback  
>>> networking to TSO had a performance impact for others?
>> Yes, I mentioned it later. [...]
> 
> i see - i just caught up with some of my inbox from today.
> 
>> [...] But apparently you dont read my mails, so I will just stop 
>> now.
> 
> Sorry, i spent my time looking at the profile output.
> 

No problem Ingo, I am very glad you take so much time to profil kernel ;)

I had too many problems with profilers on my dev machine lately :(


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                       ` <20081117.113158.200497613.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-11-17 19:47                         ` Linus Torvalds
@ 2008-11-17 22:47                         ` Ingo Molnar
  1 sibling, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-17 22:47 UTC (permalink / raw)
  To: David Miller
  Cc: dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

* David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:

> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Date: Mon, 17 Nov 2008 17:11:35 +0100
> 
> > Ouch, +4% from a oneliner networking change? That's a _huge_ speedup 
> > compared to the things we were after in scheduler land.
> 
> The scheduler has accounted for at least %10 of the tbench 
> regressions at this point, what are you talking about?

yeah, you are probably right when it comes to task migration policy 
impact - that can have effects in that range. (and that, you have to 
accept, is a fundamentally hard and fragile job to get right, as it 
involves observing the past and predicting the future out of it - at 
1.3 million events per second)

So above i was just talking about straight scheduling code overhead. 
(that cannot have been +10% of the total - as the whole scheduler only 
takes 7% total - TLB flush and FPU restore overhead included. Even the 
hrtimer bits were about 1% of the total.)

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                     ` <4921ED16.9050307-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  2008-11-17 22:26                                                       ` Ingo Molnar
@ 2008-11-18  5:23                                                       ` David Miller
       [not found]                                                         ` <20081117.212352.77940634.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-18  5:23 UTC (permalink / raw)
  To: dada1-fPLkHRcR87vqlBn2x/YWAg
  Cc: mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

From: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
Date: Mon, 17 Nov 2008 23:15:50 +0100

> Yes, I mentioned it later. But apparently you dont read my mails, so
> I will just stop now.

Yeah I was going to mention this too :-/

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                         ` <20081117.212352.77940634.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-18  8:45                                                           ` Ingo Molnar
  0 siblings, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-18  8:45 UTC (permalink / raw)
  To: David Miller
  Cc: dada1-fPLkHRcR87vqlBn2x/YWAg,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA


* David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:

> From: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
> Date: Mon, 17 Nov 2008 23:15:50 +0100
> 
> > Yes, I mentioned it later. But apparently you dont read my mails, 
> > so I will just stop now.
> 
> Yeah I was going to mention this too :-/

I spent hours profiling the networking code, and no, i didnt read all 
the incoming emails in parallel - i read them after that.

I have established it beyond reasonable doubt that the scheduler is 
doing the right thing with the config i've posted. Your "wakeup is two 
orders of magnitude more expensive" claim, which got me to measure and 
profile this stuff, is not reproducible here and this regression 
should not be listed as a scheduler regression.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                                   ` <20081117.125826.193693115.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-18  9:44                                                                     ` Nick Piggin
       [not found]                                                                       ` <200811182044.11055.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Nick Piggin @ 2008-11-18  9:44 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mingo-X9Un+BFzKDI,
	dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

On Tuesday 18 November 2008 07:58, David Miller wrote:
> From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Date: Mon, 17 Nov 2008 12:30:00 -0800 (PST)
>
> > On Mon, 17 Nov 2008, David Miller wrote:
> > > It's on my workstation which is a much simpler 2 processor
> > > UltraSPARC-IIIi (1.5Ghz) system.
> >
> > Ok. It could easily be something like a cache footprint issue. And while
> > I don't know my sparc cpu's very well, I think the Ultrasparc-IIIi is
> > super- scalar but does no out-of-order and speculation, no?
>
> I does only very simple speculation, but you're description is accurate.

Surely it would do branch prediction, but maybe not indirect branch?
I did wonder why those indirect function calls were added everywhere
in the scheduler...

They didn't show up in the newest generation of x86 CPUs, but simpler
implementations won't handle them as well.

I wouldn't expect that to cause such a big regression on its own, but
it would still be interesting to test changing them to direct calls.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                   ` <20081117.113936.81699150.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-11-17 19:43                                                     ` Eric Dumazet
  2008-11-17 19:55                                                     ` Linus Torvalds
@ 2008-11-18 12:29                                                     ` Mike Galbraith
  2 siblings, 0 replies; 318+ messages in thread
From: Mike Galbraith @ 2008-11-18 12:29 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

On Mon, 2008-11-17 at 11:39 -0800, David Miller wrote:
> From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
> Date: Mon, 17 Nov 2008 19:49:51 +0100
> 
> > 
> > * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote:
> > 
> > 4> The place for the sock_rfree() hit looks a bit weird, and i'll 
> > > investigate it now a bit more to place the real overhead point 
> > > properly. (i already mapped the test-bit overhead: that comes from 
> > > napi_disable_pending())
> > 
> > ok, here's a new set of profiles. (again for tbench 64-thread on a 
> > 16-way box, with v2.6.28-rc5-19-ge14c8bf and with the kernel config i 
> > posted before.)
> 
> Again, do a non-NMI profile and the top (at least for me)
> looks like this:
> 
> samples  %        app name                 symbol name
> 473       6.3928  vmlinux                  finish_task_switch
> 349       4.7169  vmlinux                  tcp_v4_rcv
> 327       4.4195  vmlinux                  U3copy_from_user
> 322       4.3519  vmlinux                  tl0_linux32
> 178       2.4057  vmlinux                  tcp_ack
> 170       2.2976  vmlinux                  tcp_sendmsg
> 167       2.2571  vmlinux                  U3copy_to_user
> 
> That tcp_v4_rcv() hit is %98 on the wake_up() call it does.

Easy enough, since i don't know how to do spiffy NMI profile.. yet ;-) 

I revived the 2.6.25 kernel where I tested back-ports of recent sched
fixes, and did a non-NMI profile of 2.6.22.19 and the back-port kernel.

The test kernel has all clock fixes 25->.git, min_vruntime accuracy fix
native_read_tsc() fix, and back looking buddy.  No knobs turned, and
only testing one pair per CPU, as to not take unfair advantage of back
looking buddy.  Netperf TCP_RR (hits sched harder) looks about the same.

Tbench 4 throughput was so close you would call these two twins.

2.6.22.19-smp
CPU: Core 2, speed 2400 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
vma      samples  %        symbol name
ffffffff802e6670 575909   13.7425  copy_user_generic_string
ffffffff80422ad8 175649    4.1914  schedule
ffffffff803a522d 133152    3.1773  tcp_sendmsg
ffffffff803a9387 128911    3.0761  tcp_ack
ffffffff803b65f7 116562    2.7814  tcp_v4_rcv
ffffffff803aeac8 116541    2.7809  tcp_transmit_skb
ffffffff8039eb95 112133    2.6757  ip_queue_xmit
ffffffff80209e20 110945    2.6474  system_call
ffffffff8037b720 108277    2.5837  __kfree_skb
ffffffff803a65cd 105493    2.5173  tcp_recvmsg
ffffffff80210f87 97947     2.3372  read_tsc
ffffffff802085b6 95255     2.2730  __switch_to
ffffffff803803f1 82069     1.9584  netif_rx
ffffffff8039f645 80937     1.9313  ip_output
ffffffff8027617d 74585     1.7798  __slab_alloc
ffffffff803824a0 70928     1.6925  process_backlog
ffffffff803ad9a5 69574     1.6602  tcp_rcv_established
ffffffff80399d40 55453     1.3232  ip_rcv
ffffffff803b07d1 53256     1.2708  __tcp_push_pending_frames
ffffffff8037b49c 52565     1.2543  skb_clone
ffffffff80276e97 49690     1.1857  __kmalloc_track_caller
ffffffff80379d05 45450     1.0845  sock_wfree
ffffffff80223d82 44851     1.0702  effective_prio
ffffffff803826b6 42417     1.0122  net_rx_action
ffffffff8027684c 42341     1.0104  kfree

2.6.25.20-test-smp
CPU: Core 2, speed 2400 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
vma      samples  %        symbol name
ffffffff80301450 576125   14.0874  copy_user_generic_string
ffffffff803cf8d9 127997    3.1298  tcp_transmit_skb
ffffffff803c9eac 125402    3.0663  tcp_ack
ffffffff80454da3 122337    2.9914  schedule
ffffffff803c673c 120401    2.9440  tcp_sendmsg
ffffffff8039aa9e 116554    2.8500  skb_release_all
ffffffff803c5abb 104840    2.5635  tcp_recvmsg
ffffffff8020a63d 92180     2.2540  __switch_to
ffffffff8020be20 79703     1.9489  system_call
ffffffff803bf460 79384     1.9411  ip_queue_xmit
ffffffff803a005c 78035     1.9081  netif_rx
ffffffff803ce56b 71223     1.7415  tcp_rcv_established
ffffffff8039ff70 66493     1.6259  process_backlog
ffffffff803d5a2d 61635     1.5071  tcp_v4_rcv
ffffffff803c1dae 60889     1.4889  __inet_lookup_established
ffffffff802126bc 54711     1.3378  native_read_tsc
ffffffff803d23bc 51843     1.2677  __tcp_push_pending_frames
ffffffff803bfb24 51821     1.2671  ip_finish_output
ffffffff8023700c 48248     1.1798  local_bh_enable
ffffffff803979bc 42221     1.0324  sock_wfree
ffffffff8039b12c 41279     1.0094  __alloc_skb


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                                       ` <200811182044.11055.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
@ 2008-11-18 15:58                                                                         ` Linus Torvalds
  2008-11-19  4:31                                                                           ` Nick Piggin
       [not found]                                                                           ` <alpine.LFD.2.00.0811180731480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  2008-11-20  9:06                                                                         ` David Miller
  1 sibling, 2 replies; 318+ messages in thread
From: Linus Torvalds @ 2008-11-18 15:58 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Miller, mingo-X9Un+BFzKDI, dada1-fPLkHRcR87vqlBn2x/YWAg,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

On Tue, 18 Nov 2008, Nick Piggin wrote:

> On Tuesday 18 November 2008 07:58, David Miller wrote:
> > From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > >
> > > Ok. It could easily be something like a cache footprint issue. And while
> > > I don't know my sparc cpu's very well, I think the Ultrasparc-IIIi is
> > > super- scalar but does no out-of-order and speculation, no?
> >
> > I does only very simple speculation, but you're description is accurate.
> 
> Surely it would do branch prediction, but maybe not indirect branch?

That would be "branch target prediction" (and a BTB - "Branch Target 
Buffer" to hold it), and no, I don't think Sparc does that. You can 
certainly do it for in-order machines too, but I think it's fairly rare.

It's sufficiently different from the regular "pick up the address from the 
static instruction stream, and also yank the kill-chain on mispredicted 
direction" to be real work to do. Unlike a compare or test instruction, 
it's not at all likely that you can resolve the final address in just a 
single pipeline stage, and without that, it's usually too late to yank the 
kill-chain.

(And perhaps equally importantly, indirect branches are relatively rare on 
old-style Unix benchmarks - ie SpecInt/FP - or in databases. So it's not 
something that Sparc would necessarily have spent the effort on.)

There is obviously one very special indirect jump: "ret". That's the one 
that is common, and that tends to have a special branch target buffer that 
is a pure stack. And for that, there is usually a special branch target 
register that needs to be set up 'x' cycles before the ret in order to 
avoid the stall (then the predition is checking that register against the 
branch target stack, which is somewhat akin to a regular conditional 
branch comparison).

So I strongly suspect that an indirect (non-ret) branch flushes the 
pipeline on sparc. It is possible that there is a "prepare to jump" 
instruction that prepares the indirect branch stack (kind of a "push 
prediction information"). I suspect Java sees a lot more indirect 
branches than traditional Unix loads, so maybe Sun did do that.

			Linus

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
  2008-11-18 15:58                                                                         ` Linus Torvalds
@ 2008-11-19  4:31                                                                           ` Nick Piggin
       [not found]                                                                           ` <alpine.LFD.2.00.0811180731480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
  1 sibling, 0 replies; 318+ messages in thread
From: Nick Piggin @ 2008-11-19  4:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, mingo, dada1, rjw, linux-kernel, kernel-testers, cl,
	efault, a.p.zijlstra, shemminger

On Wednesday 19 November 2008 02:58, Linus Torvalds wrote:
> On Tue, 18 Nov 2008, Nick Piggin wrote:
> > On Tuesday 18 November 2008 07:58, David Miller wrote:
> > > From: Linus Torvalds <torvalds@linux-foundation.org>
> > >
> > > > Ok. It could easily be something like a cache footprint issue. And
> > > > while I don't know my sparc cpu's very well, I think the
> > > > Ultrasparc-IIIi is super- scalar but does no out-of-order and
> > > > speculation, no?
> > >
> > > I does only very simple speculation, but you're description is
> > > accurate.
> >
> > Surely it would do branch prediction, but maybe not indirect branch?
>
> That would be "branch target prediction" (and a BTB - "Branch Target
> Buffer" to hold it), and no, I don't think Sparc does that. You can
> certainly do it for in-order machines too, but I think it's fairly rare.
>
> It's sufficiently different from the regular "pick up the address from the
> static instruction stream, and also yank the kill-chain on mispredicted
> direction" to be real work to do. Unlike a compare or test instruction,
> it's not at all likely that you can resolve the final address in just a
> single pipeline stage, and without that, it's usually too late to yank the
> kill-chain.
>
> (And perhaps equally importantly, indirect branches are relatively rare on
> old-style Unix benchmarks - ie SpecInt/FP - or in databases. So it's not
> something that Sparc would necessarily have spent the effort on.)
>
> There is obviously one very special indirect jump: "ret". That's the one
> that is common, and that tends to have a special branch target buffer that
> is a pure stack. And for that, there is usually a special branch target
> register that needs to be set up 'x' cycles before the ret in order to
> avoid the stall (then the predition is checking that register against the
> branch target stack, which is somewhat akin to a regular conditional
> branch comparison).
>
> So I strongly suspect that an indirect (non-ret) branch flushes the
> pipeline on sparc. It is possible that there is a "prepare to jump"
> instruction that prepares the indirect branch stack (kind of a "push
> prediction information"). I suspect Java sees a lot more indirect
> branches than traditional Unix loads, so maybe Sun did do that.

Probably true. OTOH, I've seen indirect branches get compiled to direct
branches or the common-case special cased into a direct branch

if (object->fn == default_object_fn)
  default_object_fn();

That might be an easy way to test suspicions about CPU scheduler
slowdowns... (adding a likely() there, and using likely profiling would
help ensure you got the defualt case right).

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]     ` <20081117090648.GG28786-X9Un+BFzKDI@public.gmane.org>
  2008-11-17  9:14       ` David Miller
@ 2008-11-19 19:43       ` Christoph Lameter
       [not found]         ` <Pine.LNX.4.64.0811191341570.23502-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Christoph Lameter @ 2008-11-19 19:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Mike Galbraith, Peter Zijlstra

On Mon, 17 Nov 2008, Ingo Molnar wrote:

> Christoph, as per the recent analysis of Mike:
>
>  http://fixunix.com/kernel/556867-regression-benchmark-throughput-loss-a622cf6-f7160c7-pull.html
>
> all scheduler components of this regression have been eliminated.
>
> In fact his numbers show that scheduler speedups since 2.6.22 have
> offset and hidden most other sources of tbench regression. (i.e. the
> scheduler portion got 5% faster, hence it was able to offset a
> slowdown of 5% in other areas of the kernel that tbench triggers)

Ok will rerun the tests tomorrow. Just got back from SC08 need some time
to catch up.

Looks like a lot of work was done on this issue. Thanks!

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]         ` <Pine.LNX.4.64.0811191341570.23502-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
@ 2008-11-19 20:14           ` Ingo Molnar
  2008-11-20 23:52           ` Christoph Lameter
  1 sibling, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-19 20:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Mike Galbraith, Peter Zijlstra


* Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> On Mon, 17 Nov 2008, Ingo Molnar wrote:
> 
> > Christoph, as per the recent analysis of Mike:
> >
> >  http://fixunix.com/kernel/556867-regression-benchmark-throughput-loss-a622cf6-f7160c7-pull.html
> >
> > all scheduler components of this regression have been eliminated.
> >
> > In fact his numbers show that scheduler speedups since 2.6.22 have
> > offset and hidden most other sources of tbench regression. (i.e. the
> > scheduler portion got 5% faster, hence it was able to offset a
> > slowdown of 5% in other areas of the kernel that tbench triggers)
> 
> Ok will rerun the tests tomorrow. Just got back from SC08 need some 
> time to catch up.
> 
> Looks like a lot of work was done on this issue. Thanks!

You might also want to try net-next:

 [remote "net-next"]
        url = git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
        fetch = +refs/heads/*:refs/remotes/net-next/*

Some good stuff is in there too, impacting this workload.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                                       ` <200811182044.11055.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
  2008-11-18 15:58                                                                         ` Linus Torvalds
@ 2008-11-20  9:06                                                                         ` David Miller
  1 sibling, 0 replies; 318+ messages in thread
From: David Miller @ 2008-11-20  9:06 UTC (permalink / raw)
  To: nickpiggin-/E1597aS9LT0CCvOHzKKcA
  Cc: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, mingo-X9Un+BFzKDI,
	dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

From: Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
Date: Tue, 18 Nov 2008 20:44:10 +1100

> On Tuesday 18 November 2008 07:58, David Miller wrote:
> > From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > Date: Mon, 17 Nov 2008 12:30:00 -0800 (PST)
> >
> > > On Mon, 17 Nov 2008, David Miller wrote:
> > > > It's on my workstation which is a much simpler 2 processor
> > > > UltraSPARC-IIIi (1.5Ghz) system.
> > >
> > > Ok. It could easily be something like a cache footprint issue. And while
> > > I don't know my sparc cpu's very well, I think the Ultrasparc-IIIi is
> > > super- scalar but does no out-of-order and speculation, no?
> >
> > I does only very simple speculation, but you're description is accurate.
> 
> Surely it would do branch prediction, but maybe not indirect branch?

Right.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                                                                           ` <alpine.LFD.2.00.0811180731480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
@ 2008-11-20  9:14                                                                             ` David Miller
  0 siblings, 0 replies; 318+ messages in thread
From: David Miller @ 2008-11-20  9:14 UTC (permalink / raw)
  To: torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: nickpiggin-/E1597aS9LT0CCvOHzKKcA, mingo-X9Un+BFzKDI,
	dada1-fPLkHRcR87vqlBn2x/YWAg, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

From: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Date: Tue, 18 Nov 2008 07:58:49 -0800 (PST)

> There is obviously one very special indirect jump: "ret". That's the one 
> that is common, and that tends to have a special branch target buffer that 
> is a pure stack. And for that, there is usually a special branch target 
> register that needs to be set up 'x' cycles before the ret in order to 
> avoid the stall (then the predition is checking that register against the 
> branch target stack, which is somewhat akin to a regular conditional 
> branch comparison).

Yes, UltraSPARC has a RAS or Return Address Stack.  I think it has
effectively zero latency (ie. you can call some function, immediately
"ret" and it hits the RAS).  This is probably because, due to delay slots,
there is always going to be one instruction in between anyways. :)

> So I strongly suspect that an indirect (non-ret) branch flushes the 
> pipeline on sparc. It is possible that there is a "prepare to jump" 
> instruction that prepares the indirect branch stack (kind of a "push 
> prediction information").

It doesn't flush the pipeline, it just stalls it waiting for the
address computation.

Branches are predicted and can execute in the same cycle as the
condition-code setting instruction they depend upon.

> I suspect Java sees a lot more indirect branches than traditional
> Unix loads, so maybe Sun did do that.

There really isn't anything special done here for indirect jumps,
other than pushing onto the RAS.  Indirects just suck :)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]         ` <Pine.LNX.4.64.0811191341570.23502-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
  2008-11-19 20:14           ` Ingo Molnar
@ 2008-11-20 23:52           ` Christoph Lameter
       [not found]             ` <Pine.LNX.4.64.0811201727070.9089-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 318+ messages in thread
From: Christoph Lameter @ 2008-11-20 23:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Mike Galbraith, Peter Zijlstra

hmmm... Well we are almost there.

2.6.22:

Throughput 2526.15 MB/sec 8 procs

2.6.28-rc5:

Throughput 2486.2 MB/sec 8 procs

8p Dell 1950 and the number of processors specified on the tbench command
line.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]             ` <Pine.LNX.4.64.0811201727070.9089-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
@ 2008-11-21  8:30               ` Ingo Molnar
       [not found]                 ` <20081121083044.GL16242-X9Un+BFzKDI@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Ingo Molnar @ 2008-11-21  8:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Mike Galbraith, Peter Zijlstra, David S. Miller


* Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> hmmm... Well we are almost there.
> 
> 2.6.22:
> 
> Throughput 2526.15 MB/sec 8 procs
> 
> 2.6.28-rc5:
> 
> Throughput 2486.2 MB/sec 8 procs
> 
> 8p Dell 1950 and the number of processors specified on the tbench 
> command line.

And with net-next we might even be able to get past that magic limit? 
net-next is linus-latest plus the latest and greatest networking bits:

 $ cat .git/config

 [remote "net-next"]
	url = git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
	fetch = +refs/heads/*:refs/remotes/net-next/*

... so might be worth a test. Just to satisfy our curiosity and to 
possibly close the entry :-)

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                 ` <20081121083044.GL16242-X9Un+BFzKDI@public.gmane.org>
@ 2008-11-21  8:51                   ` Eric Dumazet
       [not found]                     ` <49267694.1030506-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  2008-11-21  9:03                   ` David Miller
  2008-11-21 16:11                   ` Christoph Lameter
  2 siblings, 1 reply; 318+ messages in thread
From: Eric Dumazet @ 2008-11-21  8:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	David S. Miller

Ingo Molnar a écrit :
> * Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> 
>> hmmm... Well we are almost there.
>>
>> 2.6.22:
>>
>> Throughput 2526.15 MB/sec 8 procs
>>
>> 2.6.28-rc5:
>>
>> Throughput 2486.2 MB/sec 8 procs
>>
>> 8p Dell 1950 and the number of processors specified on the tbench 
>> command line.
> 
> And with net-next we might even be able to get past that magic limit? 
> net-next is linus-latest plus the latest and greatest networking bits:
> 
>  $ cat .git/config
> 
>  [remote "net-next"]
> 	url = git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
> 	fetch = +refs/heads/*:refs/remotes/net-next/*
> 
> ... so might be worth a test. Just to satisfy our curiosity and to 
> possibly close the entry :-)
> 

Well, bits in net-next are new stuff for 2.6.29, not really regression fixes,
but yes, they should give nice tbench speedups.


Now, I wish sockets and pipes not going through dcache, not tbench affair
of course but real workloads...

running 8 processes on a 8 way machine doing a 

for (;;)
	close(socket(AF_INET, SOCK_STREAM, 0));

is slow as hell, we hit so many contended cache lines ...

ticket spin locks are slower in this case (dcache_lock for example
is taken twice when we allocate a socket(), once in d_alloc(), another one
in d_instantiate())


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                 ` <20081121083044.GL16242-X9Un+BFzKDI@public.gmane.org>
  2008-11-21  8:51                   ` Eric Dumazet
@ 2008-11-21  9:03                   ` David Miller
  2008-11-21 16:11                   ` Christoph Lameter
  2 siblings, 0 replies; 318+ messages in thread
From: David Miller @ 2008-11-21  9:03 UTC (permalink / raw)
  To: mingo-X9Un+BFzKDI
  Cc: cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, rjw-KKrjLPT3xs0,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

From: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Date: Fri, 21 Nov 2008 09:30:44 +0100

> 
> * Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> 
> > hmmm... Well we are almost there.
> > 
> > 2.6.22:
> > 
> > Throughput 2526.15 MB/sec 8 procs
> > 
> > 2.6.28-rc5:
> > 
> > Throughput 2486.2 MB/sec 8 procs
> > 
> > 8p Dell 1950 and the number of processors specified on the tbench 
> > command line.
> 
> And with net-next we might even be able to get past that magic limit? 
> net-next is linus-latest plus the latest and greatest networking bits:

In any event I'm happy to toss this from the regression list.

My sparc still shows the issues and I'll profile that independently.
I'm pretty sure it's the indirect calls and the deeper stack frames
(which == 128 bytes of extra stores at each level to save the register
window), but I need to prove that first.

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                     ` <49267694.1030506-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2008-11-21  9:05                       ` David Miller
       [not found]                         ` <20081121.010508.40225532.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2008-11-21  9:18                       ` Ingo Molnar
  1 sibling, 1 reply; 318+ messages in thread
From: David Miller @ 2008-11-21  9:05 UTC (permalink / raw)
  To: dada1-fPLkHRcR87vqlBn2x/YWAg
  Cc: mingo-X9Un+BFzKDI, cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

From: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
Date: Fri, 21 Nov 2008 09:51:32 +0100

> Now, I wish sockets and pipes not going through dcache, not tbench affair
> of course but real workloads...
> 
> running 8 processes on a 8 way machine doing a 
> 
> for (;;)
> 	close(socket(AF_INET, SOCK_STREAM, 0));
> 
> is slow as hell, we hit so many contended cache lines ...
> 
> ticket spin locks are slower in this case (dcache_lock for example
> is taken twice when we allocate a socket(), once in d_alloc(), another one
> in d_instantiate())

As you of course know, this used to be a ton worse.  At least now
these things are unhashed. :)

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                     ` <49267694.1030506-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  2008-11-21  9:05                       ` David Miller
@ 2008-11-21  9:18                       ` Ingo Molnar
  1 sibling, 0 replies; 318+ messages in thread
From: Ingo Molnar @ 2008-11-21  9:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	David S. Miller


* Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:

> Ingo Molnar a écrit :
>> * Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>>
>>> hmmm... Well we are almost there.
>>>
>>> 2.6.22:
>>>
>>> Throughput 2526.15 MB/sec 8 procs
>>>
>>> 2.6.28-rc5:
>>>
>>> Throughput 2486.2 MB/sec 8 procs
>>>
>>> 8p Dell 1950 and the number of processors specified on the tbench  
>>> command line.
>>
>> And with net-next we might even be able to get past that magic limit?  
>> net-next is linus-latest plus the latest and greatest networking bits:
>>
>>  $ cat .git/config
>>
>>  [remote "net-next"]
>> 	url = git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
>> 	fetch = +refs/heads/*:refs/remotes/net-next/*
>>
>> ... so might be worth a test. Just to satisfy our curiosity and to 
>> possibly close the entry :-)
>>
>
> Well, bits in net-next are new stuff for 2.6.29, not really 
> regression fixes, but yes, they should give nice tbench speedups.

yeah, i know - technically these are lots-of-kernel-releases effects 
so not bona fide latest-cycle regressions anyway. But it doesnt matter 
how we call them, we want improvement in these metrics.

> Now, I wish sockets and pipes not going through dcache, not tbench 
> affair of course but real workloads...
>
> running 8 processes on a 8 way machine doing a 
>
> for (;;)
> 	close(socket(AF_INET, SOCK_STREAM, 0));
>
> is slow as hell, we hit so many contended cache lines ...
>
> ticket spin locks are slower in this case (dcache_lock for example 
> is taken twice when we allocate a socket(), once in d_alloc(), 
> another one in d_instantiate())

hm, weird - since there's no real VFS namespace impact i fail to 
realize the fundamental need that causes us to hit the dcache_lock. 
(perhaps there's none and this is fixable)

The general concept of mapping sockets to fds is a fundamental and 
powerful abstraction. There are APIs that also connect them to the VFS 
namespace (such as unix domain sockets) - but those should be special 
cases, not impacting normal TCP sockets.

	Ingo

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                         ` <20081121.010508.40225532.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2008-11-21 12:51                           ` Eric Dumazet
  0 siblings, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-11-21 12:51 UTC (permalink / raw)
  To: David Miller
  Cc: mingo-X9Un+BFzKDI, cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	rjw-KKrjLPT3xs0, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA, efault-Mmb7MZpHnFY,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

David Miller a écrit :
> From: Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
> Date: Fri, 21 Nov 2008 09:51:32 +0100
> 
>> Now, I wish sockets and pipes not going through dcache, not tbench affair
>> of course but real workloads...
>>
>> running 8 processes on a 8 way machine doing a 
>>
>> for (;;)
>> 	close(socket(AF_INET, SOCK_STREAM, 0));
>>
>> is slow as hell, we hit so many contended cache lines ...
>>
>> ticket spin locks are slower in this case (dcache_lock for example
>> is taken twice when we allocate a socket(), once in d_alloc(), another one
>> in d_instantiate())
> 
> As you of course know, this used to be a ton worse.  At least now
> these things are unhashed. :)

Well, this is dust compared to what we currently have.

To allocate a socket we :
0) Do the usual file manipulation (pretty scalable these days)
   (but recent drop_file_write_access() and co slow down a bit)
1) allocate an inode with new_inode()
    This function :
     - locks inode_lock,
     - dirties nr_inodes counter
     - dirties inode_in_use list  (for sockets, I doubt it is usefull)
     - dirties superblock s_inodes.
     - dirties last_ino counter
 All these are in different cache lines of course.
2) allocate a dentry
   d_alloc() takes dcache_lock,
   insert dentry on its parent list (dirtying sock_mnt->mnt_sb->s_root)
   dirties nr_dentry
3) d_instantiate() dentry  (dcache_lock taken again)
4) init_file() -> atomic_inc on sock_mnt->refcount (in case we want to umount this vfs ...)



At close() time, we must undo the things. Its even more expensive because
of the _atomic_dec_and_lock() that stress a lot, and because of two cache 
lines that are touched when an element is deleted from a list.

for (i = 0; i < 1000*1000; i++)
	close(socket(socket(AF_INET, SOCK_STREAM, 0));

Cost if run one one cpu :

real    0m1.561s
user    0m0.092s
sys     0m1.469s

If run on 8 CPUS :

real    0m27.496s
user    0m0.657s
sys     3m39.092s


CPU: Core 2, speed 3000.11 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100
000
samples  cum. samples  %        cum. %     symbol name
164211   164211        10.9678  10.9678    init_file
155663   319874        10.3969  21.3647    d_alloc
147596   467470         9.8581  31.2228    _atomic_dec_and_lock
92993    560463         6.2111  37.4339    inet_create
73495    633958         4.9088  42.3427    kmem_cache_alloc
46353    680311         3.0960  45.4387    dentry_iput
46042    726353         3.0752  48.5139    tcp_close
42784    769137         2.8576  51.3715    kmem_cache_free
37074    806211         2.4762  53.8477    wake_up_inode
36375    842586         2.4295  56.2772    tcp_v4_init_sock
35212    877798         2.3518  58.6291    inotify_d_instantiate
33199    910997         2.2174  60.8465    sysenter_past_esp
31161    942158         2.0813  62.9277    d_instantiate
31000    973158         2.0705  64.9983    generic_forget_inode
28020    1001178        1.8715  66.8698    vfs_dq_drop
19007    1020185        1.2695  68.1393    __copy_from_user_ll
17513    1037698        1.1697  69.3090    new_inode
16957    1054655        1.1326  70.4415    __init_timer
16897    1071552        1.1286  71.5701    discard_slab
16115    1087667        1.0763  72.6464    d_kill
15542    1103209        1.0381  73.6845    __percpu_counter_add
13562    1116771        0.9058  74.5903    __slab_free
13276    1130047        0.8867  75.4771    __fput
12423    1142470        0.8297  76.3068    new_slab
11976    1154446        0.7999  77.1067    tcp_v4_destroy_sock
10889    1165335        0.7273  77.8340    inet_csk_destroy_sock
10516    1175851        0.7024  78.5364    alloc_inode
9979     1185830        0.6665  79.2029    sock_attach_fd
7980     1193810        0.5330  79.7359    drop_file_write_access
7609     1201419        0.5082  80.2441    alloc_fd
7584     1209003        0.5065  80.7506    sock_init_data
7164     1216167        0.4785  81.2291    add_partial
7107     1223274        0.4747  81.7038    sys_close
6997     1230271        0.4673  82.1711    mwait_idle

^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                 ` <20081121083044.GL16242-X9Un+BFzKDI@public.gmane.org>
  2008-11-21  8:51                   ` Eric Dumazet
  2008-11-21  9:03                   ` David Miller
@ 2008-11-21 16:11                   ` Christoph Lameter
       [not found]                     ` <Pine.LNX.4.64.0811210936580.25354-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 318+ messages in thread
From: Christoph Lameter @ 2008-11-21 16:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Mike Galbraith, Peter Zijlstra, David S. Miller

On Fri, 21 Nov 2008, Ingo Molnar wrote:

> > 2.6.22:
> > Throughput 2526.15 MB/sec 8 procs
> > 2.6.28-rc5:
> > Throughput 2486.2 MB/sec 8 procs
> >
> > 8p Dell 1950 and the number of processors specified on the tbench
> > command line.
>
> ... so might be worth a test. Just to satisfy our curiosity and to
> possibly close the entry :-)

Ahh.. Wow.... net-next gets us:

Throughput 2685.17 MB/sec 8 procs


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                     ` <Pine.LNX.4.64.0811210936580.25354-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
@ 2008-11-21 18:06                       ` Christoph Lameter
       [not found]                         ` <Pine.LNX.4.64.0811211119550.27777-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Christoph Lameter @ 2008-11-21 18:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List,
	Mike Galbraith, Peter Zijlstra, David S. Miller

AIM9 results:
		TCP		UDP
2.6.22		104868.00	489970.03
2.6.28-rc5	110007.00	518640.00
net-next	108207.00	514790.00

net-next looses here for some reason against 2.6.28-rc5. But the numbers
are better than 2.6.22 in any case.




^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                         ` <Pine.LNX.4.64.0811211119550.27777-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
@ 2008-11-21 18:16                           ` Eric Dumazet
       [not found]                             ` <4926FB13.3080808-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  0 siblings, 1 reply; 318+ messages in thread
From: Eric Dumazet @ 2008-11-21 18:16 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	David S. Miller

Christoph Lameter a écrit :
> AIM9 results:
> 		TCP		UDP
> 2.6.22		104868.00	489970.03
> 2.6.28-rc5	110007.00	518640.00
> net-next	108207.00	514790.00
> 
> net-next looses here for some reason against 2.6.28-rc5. But the numbers
> are better than 2.6.22 in any case.
> 

I found that on current net-next, running oprofile in background can give better bench
results. Thats really curious... no ?


So the single loop on close(socket()), on all my 8 cpus is almost 10% faster if oprofile
is running... (20 secs instead of 23 secs)


^ permalink raw reply	[flat|nested] 318+ messages in thread

* Re: [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28
       [not found]                             ` <4926FB13.3080808-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2008-11-21 18:19                               ` Eric Dumazet
  0 siblings, 0 replies; 318+ messages in thread
From: Eric Dumazet @ 2008-11-21 18:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Ingo Molnar, Rafael J. Wysocki, Linux Kernel Mailing List,
	Kernel Testers List, Mike Galbraith, Peter Zijlstra,
	David S. Miller

Eric Dumazet a écrit :
> Christoph Lameter a écrit :
>> AIM9 results:
>>         TCP        UDP
>> 2.6.22        104868.00    489970.03
>> 2.6.28-rc5    110007.00    518640.00
>> net-next    108207.00    514790.00
>>
>> net-next looses here for some reason against 2.6.28-rc5. But the numbers
>> are better than 2.6.22 in any case.
>>
> 
> I found that on current net-next, running oprofile in background can 
> give better bench
> results. Thats really curious... no ?
> 
> 
> So the single loop on close(socket()), on all my 8 cpus is almost 10% 
> faster if oprofile
> is running... (20 secs instead of 23 secs)
> 

Oh well, thats normal, since when a cpu is interrupted by a NMI, and
distracted by oprofile code, it doesnt fight with other cpus on dcache_lock
and other contended cache lines...


^ permalink raw reply	[flat|nested] 318+ messages in thread

end of thread, other threads:[~2008-11-21 18:19 UTC | newest]

Thread overview: 318+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-23 18:07 2.6.27-rc4-git1: Reported regressions from 2.6.26 Rafael J. Wysocki
2008-08-23 18:07 ` [Bug #11141] no battery or DC status - Dell i1501 Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11207] VolanoMark regression with 2.6.27-rc1 Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11191] 2.6.26-git8: spinlock lockup in c1e_idle() Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11210] libata badness Rafael J. Wysocki
2008-08-23 22:23   ` Jeff Garzik
     [not found]     ` <48B08DD8.8010906-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>
2008-08-24 21:04       ` Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11215] INFO: possible recursive locking detected ps2_command Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11209] 2.6.27-rc1 process time accounting Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11220] Screen stays black after resume Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11219] KVM modules break emergency reboot Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11224] Only three cores found on quad-core machine Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11230] Kconfig no longer outputs a .config with freshly updated defconfigs Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11237] corrupt PMD after resume Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11264] Invalid op opcode in kernel/workqueue Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11271] BUG: fealnx in 2.6.27-rc1 Rafael J. Wysocki
2008-08-23 22:26   ` Jeff Garzik
2008-08-23 18:10 ` [Bug #11254] KVM: fix userspace ABI breakage Rafael J. Wysocki
2008-08-24 19:27   ` Adrian Bunk
     [not found]     ` <20080824192714.GC1627-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
2008-08-25 10:23       ` Avi Kivity
2008-08-23 18:10 ` [Bug #11279] 2.6.27-rc0 Power Bugs with HP/Compaq Laptops Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11282] Please fix x86 defconfig regression Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11272] BUG: parport_serial in 2.6.27-rc1 for NetMos Technology PCI 9835 Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11276] build error: CONFIG_OPTIMIZE_INLINING=y causes gcc 4.2 to do stupid things Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11336] 2.6.27-rc2:stall while mounting root fs Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11335] 2.6.27-rc2-git5 BUG: unable to handle kernel paging request Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11334] myri10ge: use ioremap_wc: compilation failure on ARM Rafael J. Wysocki
2008-08-24 12:26   ` Martin Michlmayr
     [not found]     ` <20080824122643.GG8772-u+sgIaa8TU6A7rR/f+Zz5kHK5LHFu9C3@public.gmane.org>
2008-08-24 21:05       ` Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11340] LTP overnight run resulted in unusable box Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rafael J. Wysocki
2008-08-23 20:10   ` Linus Torvalds
     [not found]     ` <alpine.LFD.1.10.0808231257310.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-23 20:15       ` Arjan van de Ven
     [not found]         ` <48B06FE6.8060404-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2008-08-25 12:07           ` Alan D. Brunelle
2008-08-23 20:17       ` Linus Torvalds
     [not found]         ` <alpine.LFD.1.10.0808231313170.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-25 12:03           ` Alan D. Brunelle
     [not found]             ` <48B29F7B.6080405-VXdhtT5mjnY@public.gmane.org>
2008-08-25 12:22               ` Alan D. Brunelle
     [not found]                 ` <48B2A421.7080705-VXdhtT5mjnY@public.gmane.org>
2008-08-25 18:00                   ` Linus Torvalds
     [not found]                     ` <alpine.LFD.1.10.0808251019380.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-25 18:09                       ` Linus Torvalds
     [not found]                         ` <alpine.LFD.1.10.0808251106270.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-25 20:19                           ` Alan D. Brunelle
     [not found]                             ` <48B313E0.1000501-VXdhtT5mjnY@public.gmane.org>
2008-08-25 20:43                               ` Linus Torvalds
     [not found]                                 ` <alpine.LFD.1.10.0808251326500.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-25 20:45                                   ` Arjan van de Ven
2008-08-25 20:52                                   ` Linus Torvalds
     [not found]                                     ` <alpine.LFD.1.10.0808251344250.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-25 21:15                                       ` Linus Torvalds
     [not found]                                         ` <alpine.LFD.1.10.0808251401590.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26  7:22                                           ` Ingo Molnar
     [not found]                                             ` <20080826072220.GB31876-X9Un+BFzKDI@public.gmane.org>
2008-08-26  7:46                                               ` David Miller
     [not found]                                                 ` <20080826.004607.253712060.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-08-26  7:53                                                   ` Ingo Molnar
     [not found]                                                     ` <20080826075355.GA7596-X9Un+BFzKDI@public.gmane.org>
2008-08-26  8:36                                                       ` Yinghai Lu
     [not found]                                                         ` <86802c440808260136t3a33a9c8if53b6f70ab9df9e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-26 16:51                                                           ` Linus Torvalds
     [not found]                                                             ` <alpine.LFD.1.10.0808260939070.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26 17:08                                                               ` Yinghai Lu
2008-09-25  1:50                                                               ` Rusty Russell
     [not found]                                                                 ` <200809251150.26760.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2008-09-25  8:55                                                                   ` Ingo Molnar
2008-09-25 15:42                                                                   ` Linus Torvalds
     [not found]                                                                     ` <alpine.LFD.1.10.0809250836270.3265-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-09-25 20:59                                                                       ` Subject: [RFC 1/1] cpumask: Provide new cpumask API Mike Travis
2008-09-26  5:25                                                                       ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rusty Russell
     [not found]                                                                         ` <200809261525.30258.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2008-09-26  5:53                                                                           ` Mike Travis
     [not found]                                                                             ` <48DC78F2.8060400-sJ/iWh9BUns@public.gmane.org>
2008-09-27 19:16                                                                               ` Ingo Molnar
     [not found]                                                                                 ` <20080927191653.GB18619-X9Un+BFzKDI@public.gmane.org>
2008-09-29 14:33                                                                                   ` Mike Travis
     [not found]                                                                                     ` <48E0E73A.40803-sJ/iWh9BUns@public.gmane.org>
2008-09-30 11:04                                                                                       ` Ingo Molnar
2008-09-30 16:14                                                                                         ` Mike Travis
     [not found]                                                                                           ` <48E2506C.7000406-sJ/iWh9BUns@public.gmane.org>
2008-09-30 16:46                                                                                             ` Linus Torvalds
     [not found]                                                                                               ` <alpine.LFD.2.00.0809300939450.3389-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-09-30 18:02                                                                                                 ` Mike Travis
     [not found]                                                                                                   ` <48E269B6.1080904-sJ/iWh9BUns@public.gmane.org>
2008-09-30 22:22                                                                                                     ` [RFC 1/1] cpumask: New cpumask API - take 2 - use unsigned longs Mike Travis
     [not found]                                                                                                       ` <48E2A691.7060407-sJ/iWh9BUns@public.gmane.org>
2008-10-01  0:45                                                                                                         ` Rusty Russell
2008-10-01  0:44                                                                                                 ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Rusty Russell
2008-08-26 19:11                                                       ` Mike Travis
2008-08-26 19:06                                                 ` Mike Travis
     [not found]                                                   ` <48B4542A.1050004-sJ/iWh9BUns@public.gmane.org>
2008-08-26 20:45                                                     ` David Miller
     [not found]                                                       ` <20080826.134535.193703558.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-08-29 12:42                                                         ` Jes Sorensen
     [not found]                                                           ` <48B7EEA2.7090300-sJ/iWh9BUns@public.gmane.org>
2008-08-29 16:14                                                             ` Linus Torvalds
     [not found]                                                               ` <alpine.LFD.1.10.0808290909020.3300-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-29 20:04                                                                 ` David Miller
2008-09-01 11:53                                                                 ` Jes Sorensen
2008-09-02 14:27                                                                 ` Jes Sorensen
2008-08-26 19:03                                             ` Mike Travis
     [not found]                                               ` <48B45387.8090205-sJ/iWh9BUns@public.gmane.org>
2008-08-26 19:40                                                 ` Linus Torvalds
2008-08-26 19:01                                           ` Mike Travis
     [not found]                                             ` <48B452F3.9040304-sJ/iWh9BUns@public.gmane.org>
2008-08-26 19:09                                               ` Linus Torvalds
2008-08-26 19:28                                                 ` Dave Jones
     [not found]                                                   ` <20080826192848.GA20653-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-08-26 20:01                                                     ` Mike Travis
2008-08-27  6:54                                                       ` Nick Piggin
2008-08-27  7:05                                                         ` David Miller
     [not found]                                                           ` <20080827.000506.177643294.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-08-27  7:47                                                             ` Nick Piggin
     [not found]                                                               ` <200808271747.14690.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
2008-08-27  8:44                                                                 ` David Miller
     [not found]                                                                   ` <20080827.014457.140528687.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-08-27 14:48                                                                     ` Mike Travis
2008-08-27 14:36                                                             ` Mike Travis
     [not found]                                                         ` <200808271654.32721.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
2008-08-27 14:35                                                           ` Mike Travis
     [not found]                                                 ` <alpine.LFD.1.10.0808261205530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26 19:35                                                   ` Mike Travis
2008-08-25 21:30                                       ` Alan D. Brunelle
     [not found]                                         ` <48B32458.5020104-VXdhtT5mjnY@public.gmane.org>
2008-08-25 22:07                                           ` Christoph Lameter
     [not found]                                             ` <48B32D39.5040709-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-08-26  7:59                                               ` Ingo Molnar
     [not found]                                                 ` <20080826075937.GB7596-X9Un+BFzKDI@public.gmane.org>
2008-08-26 19:48                                                   ` Mike Travis
2008-08-26  1:11                                   ` Rusty Russell
     [not found]                                     ` <200808261111.19205.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2008-08-26 17:35                                       ` Linus Torvalds
2008-08-26 18:30                                         ` Adrian Bunk
     [not found]                                           ` <20080826183051.GB10925-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
2008-08-26 18:40                                             ` Linus Torvalds
     [not found]                                               ` <alpine.LFD.1.10.0808261134530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26 20:21                                                 ` Adrian Bunk
2008-08-26 20:41                                                   ` Linus Torvalds
2008-08-27 16:21                                                     ` Jamie Lokier
2008-08-26 18:47                                             ` Linus Torvalds
2008-08-26 19:02                                               ` Jamie Lokier
     [not found]                                                 ` <20080826190213.GA30255-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2008-08-26 19:18                                                   ` Linus Torvalds
     [not found]                                               ` <alpine.LFD.1.10.0808261144510.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26 20:59                                                 ` Adrian Bunk
2008-08-26 21:04                                                   ` Linus Torvalds
     [not found]                                                     ` <alpine.LFD.1.10.0808261403360.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26 22:54                                                       ` Parag Warudkar
     [not found]                                                         ` <f7848160808261554j2f4eaaa6i1ee8801ae75ca7bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-26 23:00                                                           ` David VomLehn
2008-08-26 23:45                                                             ` Adrian Bunk
2008-08-26 23:47                                                         ` Linus Torvalds
     [not found]                                                           ` <alpine.LFD.1.10.0808261644260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27  0:53                                                             ` Greg Ungerer
2008-08-27  1:08                                                               ` Parag Warudkar
2008-08-27  1:31                                                                 ` Greg Ungerer
     [not found]                                                                   ` <48B4AE68.4040205-XXXsiaCtIV5Wk0Htik3J/w@public.gmane.org>
2008-08-27  2:16                                                                     ` Parag Warudkar
2008-08-27  8:44                                                                       ` Bernd Petrovitsch
2008-08-27  0:58                                                             ` Parag Warudkar
     [not found]                                                               ` <f7848160808261758q7b84aab1m188c1ebb59304818-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-27  1:49                                                                 ` Linus Torvalds
     [not found]                                                                   ` <alpine.LFD.1.10.0808261837530.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27  2:36                                                                     ` Parag Warudkar
     [not found]                                                                       ` <f7848160808261936m18c69dc0r26f41850efae4b91-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-27  2:52                                                                         ` Linus Torvalds
2008-08-27  8:32                                                                       ` Alan Cox
2008-08-27  6:01                                                                     ` Paul Mackerras
     [not found]                                                                       ` <18612.60878.887716.452936-nUko2b1QN/1kfgV4h6NXRTJtLkR7yuzc@public.gmane.org>
2008-08-27 10:58                                                                         ` Arjan van de Ven
2008-08-27 15:18                                                                       ` Linus Torvalds
2008-08-27 11:58                                                                   ` Adrian Bunk
2008-08-27  9:00                                                               ` Bernd Petrovitsch
     [not found]                                                                 ` <1219827609.30209.29.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
2008-08-27 12:56                                                                   ` Parag Warudkar
2008-08-27 13:17                                                                     ` Bernd Petrovitsch
     [not found]                                                                       ` <1219843032.30209.51.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
2008-08-27 15:48                                                                         ` Jamie Lokier
2008-08-27 16:38                                                                           ` Bernd Petrovitsch
     [not found]                                                                             ` <1219855121.30209.112.camel-7sPfb3biEqGJZy4MaDjwDw@public.gmane.org>
2008-08-27 17:51                                                                               ` Jamie Lokier
2008-08-27 19:30                                                                                 ` Bernd Petrovitsch
2008-08-28  0:06                                                                                 ` Greg Ungerer
     [not found]                                                                           ` <20080827154805.GA25387-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2008-08-28  0:11                                                                             ` Greg Ungerer
2008-08-27  8:34                                                         ` Bernd Petrovitsch
2008-08-26 23:24                                                       ` Adrian Bunk
     [not found]                                                         ` <20080826232411.GC11734-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
2008-08-26 23:51                                                           ` Linus Torvalds
     [not found]                                                             ` <alpine.LFD.1.10.0808261648140.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27  0:23                                                               ` Adrian Bunk
2008-08-27  0:28                                                                 ` Linus Torvalds
     [not found]                                                                   ` <alpine.LFD.1.10.0808261726560.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27 11:58                                                                     ` Adrian Bunk
     [not found]                                                                       ` <20080827115829.GF11734-re2QNgSbS3j4D6uPqz5PAwR5/fbUUdgG@public.gmane.org>
2008-08-27 16:00                                                                         ` Paul Mundt
     [not found]                                                                           ` <20080827173544.GH11734@cs181140183.pp.htv.fi>
2008-08-28  0:32                                                                             ` Paul Mundt
2008-08-28  0:46                                                                               ` David Miller
     [not found]                                                                                 ` <20080827.174605.85608276.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-08-28  1:02                                                                                   ` Paul Mundt
     [not found]                                                                               ` <20080828003211.GA18893-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
2008-08-28 16:16                                                                                 ` Adrian Bunk
     [not found]                                                                           ` <20080827160052.GA15968-M7jkjyW5wf5g9hUCZPvPmw@public.gmane.org>
2008-08-27 17:35                                                                             ` Adrian Bunk
2008-08-28  1:05                                                                             ` Greg Ungerer
2008-08-27  8:25                                                           ` Alan Cox
2008-08-27 12:52                                                             ` Parag Warudkar
     [not found]                                                               ` <f7848160808270552u2ee66167x912a68e0bf8b25bf-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-27 13:21                                                                 ` Alan Cox
     [not found]                                                                   ` <20080827142142.303cdba8-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2008-08-27 16:24                                                                     ` Parag Warudkar
     [not found]                                         ` <alpine.LFD.1.10.0808261019450.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26 19:55                                           ` Jeff Garzik
     [not found]                                             ` <48B45FA2.8040603-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>
2008-08-26 20:06                                               ` e1000 horridness (was Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected) Linus Torvalds
     [not found]                                                 ` <alpine.LFD.1.10.0808261257210.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-26 20:14                                                   ` Kok, Auke
     [not found]                                                     ` <48B4641A.1020806-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2008-08-26 22:04                                                       ` Jeff Kirsher
2008-08-25 12:44           ` [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected Alan D. Brunelle
2008-08-25 13:13             ` Alan D. Brunelle
2008-08-25 18:02             ` Linus Torvalds
2008-08-25 14:05           ` Alan D. Brunelle
2008-08-23 18:10 ` [Bug #11343] SATA Cold Boot Problems with 2.6.27-rc[23] on nVidia 680i Rafael J. Wysocki
2008-08-23 22:34   ` Jeff Garzik
2008-08-23 18:10 ` [Bug #11356] Linux 2.6.27-rc3 - build failure: undefined reference to `.lockdep_count_forward_deps' Rafael J. Wysocki
2008-08-24  6:13   ` Frans Pop
     [not found]     ` <200808240813.56525.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
2008-08-24 21:10       ` Rafael J. Wysocki
2008-08-25 14:03       ` Adrian Bunk
2008-08-23 18:10 ` [Bug #11355] Regression in 2.6.27-rc2 when cross-building the kernel Rafael J. Wysocki
2008-08-24 21:34   ` Rafael J. Wysocki
     [not found]     ` <200808242334.05993.rjw-KKrjLPT3xs0@public.gmane.org>
2008-09-01  9:35       ` David Woodhouse
     [not found]         ` <1220261720.2982.51.camel-ZP4jZrcIevRpWr+L1FloEB2eb7JE58TQ@public.gmane.org>
2008-09-01 16:51           ` Larry Finger
     [not found]             ` <48BC1D8E.9050608-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
2008-09-01 17:37               ` David Woodhouse
2008-08-23 18:10 ` [Bug #11354] AMD Elan regression with 2.6.27-rc3 Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11357] Can not boot up with zd1211rw USB-Wlan Stick Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11358] net: forcedeth call restore mac addr in nv_shutdown path Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11360] mpc8xxx_wdt.c doesn't build modular Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11380] lockdep warning: cpu_add_remove_lock at:cpu_maps_update_begin+0x14/0x16 Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11361] my servers with nvidia mcp55 nic don't work with msi in second kernel by kexec Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11379] char/tpm: tpm_infineon no longer loaded for HP 2510p laptop Rafael J. Wysocki
2008-08-24  6:18   ` Frans Pop
     [not found]     ` <200808240818.09275.elendil-EIBgga6/0yRmR6Xm/wNWPw@public.gmane.org>
2008-08-24 21:12       ` Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11388] 2.6.27-rc3 warns about MTRR range; only 3 of 16gb of memory is usable Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11398] hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11401] pktcdvd: BUG, NULL pointer dereference in pkt_ioctl, bisected Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11382] e1000e: 2.6.27-rc1 corrupts EEPROM/NVM Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11403] 2.6.27-rc2 USB suspend regression Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11402] skbuff bug? Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11404] BUG: in 2.6.23-rc3-git7 in do_cciss_intr Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11405] 2.6.27-rc3 segfault on cold boot; not on warm boot Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11406] patch "x86: MOVE PCI IO ECS code to x86/pci" breaks CPU hotplug Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11409] build issue #564 for v2.6.27-rc4 : undefined reference to `NS8390p_init' Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11410] SLUB list_lock vs obj_hash.lock Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11413] get_rtc_time() triggers NMI watchdog in hpet_rtc_interrupt() Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11407] suspend: unable to handle kernel paging request Rafael J. Wysocki
2008-08-23 18:10 ` [Bug #11414] Random crashes with 2.6.27-rc3 on PPC Rafael J. Wysocki
2008-08-24 17:48 ` 2.6.27-rc4-git1: Reported regressions from 2.6.26 Linus Torvalds
     [not found]   ` <alpine.LFD.1.10.0808241030060.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-24 19:23     ` David Greaves
     [not found]       ` <48B1B526.2030100-FQ/kcb21CSxWk0Htik3J/w@public.gmane.org>
2008-08-25  0:51         ` Linus Torvalds
2008-08-24 18:03 ` Linus Torvalds
     [not found]   ` <alpine.LFD.1.10.0808241050180.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-24 18:43     ` Vegard Nossum
     [not found]       ` <19f34abd0808241143t6f5239d7o679135e9e974fe63-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-08-24 18:58         ` Linus Torvalds
     [not found]           ` <alpine.LFD.1.10.0808241152370.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-25 13:03             ` Daniel J Blueman
2008-08-24 18:34 ` Linus Torvalds
     [not found]   ` <alpine.LFD.1.10.0808241120460.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27 20:17     ` Peter Osterlund
     [not found]       ` <m3k5e2qkk2.fsf-zq6IREYz3ykAvxtiuMwx3w@public.gmane.org>
2008-08-27 20:40         ` Linus Torvalds
     [not found]           ` <alpine.LFD.1.10.0808271335260.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27 20:45             ` Linus Torvalds
2008-08-28 13:52             ` Christoph Hellwig
     [not found]               ` <20080828135245.GA12410-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2008-09-02  7:26                 ` Jens Axboe
     [not found]                   ` <20080902072642.GX20055-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2008-09-03  2:06                     ` Al Viro
     [not found]                       ` <20080903020629.GS28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-09-04 10:13                         ` Jens Axboe
     [not found]                           ` <20080904101326.GD20055-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2008-09-15  5:30                             ` Al Viro
2008-08-27 22:08         ` Alan Cox
     [not found]           ` <20080827230828.4285022b-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2008-08-27 22:38             ` Linus Torvalds
     [not found]               ` <alpine.LFD.1.10.0808271530350.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27 22:28                 ` Alan Cox
     [not found]                   ` <20080827232803.2ba8dd96-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2008-08-27 23:00                     ` Linus Torvalds
     [not found]                       ` <alpine.LFD.1.10.0808271551380.3419-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-27 23:12                         ` Linus Torvalds
2008-08-28  0:35                           ` Linus Torvalds
2008-08-27 22:43                 ` David Miller
2008-08-27 22:45                 ` Alexey Dobriyan
2008-08-24 18:52 ` Linus Torvalds
2008-08-24 22:50   ` Sean Young
     [not found]   ` <alpine.LFD.1.10.0808241141470.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-25  0:16     ` H. Peter Anvin
2008-08-24 19:03 ` Linus Torvalds
     [not found]   ` <alpine.LFD.1.10.0808241201090.3363-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-08-24 19:23     ` Adrian Bunk
2008-08-24 21:40 ` Rafael J. Wysocki
2008-08-25  0:48 ` Benjamin Herrenschmidt
2008-08-25 11:40   ` Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2008-11-16 17:38 2.6.28-rc5: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
2008-11-16 17:40 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-11-17  9:06   ` Ingo Molnar
     [not found]     ` <20081117090648.GG28786-X9Un+BFzKDI@public.gmane.org>
2008-11-17  9:14       ` David Miller
     [not found]         ` <20081117.011403.06989342.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-17 11:01           ` Ingo Molnar
2008-11-17 11:20             ` Eric Dumazet
     [not found]               ` <4921539B.2000002-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-17 16:11                 ` Ingo Molnar
     [not found]                   ` <20081117161135.GE12081-X9Un+BFzKDI@public.gmane.org>
2008-11-17 16:35                     ` Eric Dumazet
     [not found]                       ` <49219D36.5020801-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-17 17:08                         ` Ingo Molnar
     [not found]                           ` <20081117170844.GJ12081-X9Un+BFzKDI@public.gmane.org>
2008-11-17 17:25                             ` Ingo Molnar
     [not found]                               ` <20081117172549.GA27974-X9Un+BFzKDI@public.gmane.org>
2008-11-17 17:33                                 ` Eric Dumazet
     [not found]                                   ` <4921AAD6.3010603-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-17 17:38                                     ` Linus Torvalds
     [not found]                                       ` <alpine.LFD.2.00.0811170937540.3468-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-17 17:42                                         ` Eric Dumazet
2008-11-17 18:23                                         ` Ingo Molnar
     [not found]                                           ` <20081117182320.GA26844-X9Un+BFzKDI@public.gmane.org>
2008-11-17 18:33                                             ` Linus Torvalds
2008-11-17 18:49                                             ` Ingo Molnar
     [not found]                                               ` <20081117184951.GA5585-X9Un+BFzKDI@public.gmane.org>
2008-11-17 19:30                                                 ` Eric Dumazet
2008-11-17 19:39                                                 ` David Miller
     [not found]                                                   ` <20081117.113936.81699150.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-17 19:43                                                     ` Eric Dumazet
2008-11-17 19:55                                                     ` Linus Torvalds
     [not found]                                                       ` <alpine.LFD.2.00.0811171149100.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-17 20:16                                                         ` David Miller
     [not found]                                                           ` <20081117.121641.167690467.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-17 20:30                                                             ` Linus Torvalds
     [not found]                                                               ` <alpine.LFD.2.00.0811171218470.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-17 20:58                                                                 ` David Miller
     [not found]                                                                   ` <20081117.125826.193693115.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-18  9:44                                                                     ` Nick Piggin
     [not found]                                                                       ` <200811182044.11055.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
2008-11-18 15:58                                                                         ` Linus Torvalds
2008-11-19  4:31                                                                           ` Nick Piggin
     [not found]                                                                           ` <alpine.LFD.2.00.0811180731480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-20  9:14                                                                             ` David Miller
2008-11-20  9:06                                                                         ` David Miller
2008-11-18 12:29                                                     ` Mike Galbraith
2008-11-17 19:57                                                 ` Ingo Molnar
2008-11-17 20:47                                                 ` Ingo Molnar
     [not found]                                                   ` <20081117204743.GD12020-X9Un+BFzKDI@public.gmane.org>
2008-11-17 20:56                                                     ` Eric Dumazet
2008-11-17 22:19                                                 ` Ingo Molnar
2008-11-17 22:08                                               ` Ingo Molnar
     [not found]                                                 ` <20081117220828.GB6398-X9Un+BFzKDI@public.gmane.org>
2008-11-17 22:15                                                   ` Eric Dumazet
     [not found]                                                     ` <4921ED16.9050307-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-17 22:26                                                       ` Ingo Molnar
     [not found]                                                         ` <20081117222640.GA17880-X9Un+BFzKDI@public.gmane.org>
2008-11-17 22:39                                                           ` Eric Dumazet
2008-11-18  5:23                                                       ` David Miller
     [not found]                                                         ` <20081117.212352.77940634.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-18  8:45                                                           ` Ingo Molnar
2008-11-17 19:36                             ` David Miller
2008-11-17 19:31                     ` David Miller
     [not found]                       ` <20081117.113158.200497613.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-17 19:47                         ` Linus Torvalds
     [not found]                           ` <alpine.LFD.2.00.0811171134480.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-17 19:51                             ` David Miller
2008-11-17 19:53                             ` Ingo Molnar
2008-11-17 22:47                         ` Ingo Molnar
     [not found]             ` <20081117110119.GL28786-X9Un+BFzKDI@public.gmane.org>
2008-11-17 19:21               ` David Miller
     [not found]                 ` <20081117.112157.146825192.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-17 19:48                   ` Linus Torvalds
     [not found]                     ` <alpine.LFD.2.00.0811171147380.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-17 19:52                       ` David Miller
     [not found]                         ` <20081117.115258.227376348.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-17 19:57                           ` Linus Torvalds
     [not found]                             ` <alpine.LFD.2.00.0811171156080.18283-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-17 20:18                               ` David Miller
2008-11-19 19:43       ` Christoph Lameter
     [not found]         ` <Pine.LNX.4.64.0811191341570.23502-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
2008-11-19 20:14           ` Ingo Molnar
2008-11-20 23:52           ` Christoph Lameter
     [not found]             ` <Pine.LNX.4.64.0811201727070.9089-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
2008-11-21  8:30               ` Ingo Molnar
     [not found]                 ` <20081121083044.GL16242-X9Un+BFzKDI@public.gmane.org>
2008-11-21  8:51                   ` Eric Dumazet
     [not found]                     ` <49267694.1030506-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-21  9:05                       ` David Miller
     [not found]                         ` <20081121.010508.40225532.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-21 12:51                           ` Eric Dumazet
2008-11-21  9:18                       ` Ingo Molnar
2008-11-21  9:03                   ` David Miller
2008-11-21 16:11                   ` Christoph Lameter
     [not found]                     ` <Pine.LNX.4.64.0811210936580.25354-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
2008-11-21 18:06                       ` Christoph Lameter
     [not found]                         ` <Pine.LNX.4.64.0811211119550.27777-dRBSpnHQED8AvxtiuMwx3w@public.gmane.org>
2008-11-21 18:16                           ` Eric Dumazet
     [not found]                             ` <4926FB13.3080808-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-11-21 18:19                               ` Eric Dumazet
2008-11-09 19:40 2.6.28-rc3-git6: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
2008-11-09 19:43 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-11-02 16:47 2.6.28-rc2-git7: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
2008-11-02 16:49 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-10-25 21:04 2.6.28-rc1-git1: Reported regressions 2.6.26 -> 2.6.27 Rafael J. Wysocki
2008-10-25 21:07 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-10-04 17:28 2.6.27-rc8-git7: Reported regressions from 2.6.26 Rafael J. Wysocki
2008-10-04 17:32 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-09-21 18:52 2.6.27-rc6-git6: Reported regressions from 2.6.26 Rafael J. Wysocki
2008-09-21 18:54 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-09-12 18:59 2.6.27-rc6-git2: Reported regressions from 2.6.26 Rafael J. Wysocki
2008-09-12 19:06 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-09-12 22:05   ` Christoph Lameter
     [not found]     ` <48CAE7A0.8000004-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-09-13 11:44       ` Mike Galbraith
     [not found]         ` <1221306287.5213.111.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-13 11:57           ` Mike Galbraith
2008-09-14  6:24           ` Mike Galbraith
     [not found]             ` <1221373494.4979.1.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-14  7:02               ` Mike Galbraith
2008-09-14 14:18           ` Christoph Lameter
     [not found]             ` <48CD1D25.9080301-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-09-14 19:51               ` Mike Galbraith
     [not found]                 ` <1221421907.4597.24.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-15 10:44                   ` Mike Galbraith
     [not found]                     ` <1221475440.4784.39.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-16 12:28                       ` Mike Galbraith
     [not found]                         ` <1221568105.5020.17.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-16 14:07                           ` Ilpo Järvinen
2008-09-17  4:39                             ` Mike Galbraith
2008-09-17  5:01                               ` Mike Galbraith
     [not found]                                 ` <1221627676.5125.3.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-17 10:40                                   ` Ingo Molnar
     [not found]                                     ` <20080917104044.GC18764-X9Un+BFzKDI@public.gmane.org>
2008-09-17 11:41                                       ` Mike Galbraith
     [not found]                                         ` <1221651701.5102.17.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-17 12:49                                           ` Ingo Molnar
     [not found]                                             ` <20080917124943.GA7738-X9Un+BFzKDI@public.gmane.org>
2008-09-17 13:11                                               ` Mike Galbraith
     [not found]                                                 ` <1221657111.5511.14.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-17 13:36                                                   ` Ilpo Järvinen
     [not found]                                                     ` <Pine.LNX.4.64.0809171629550.1034-x/A8LOkYjdVsRR2hCrRKtT03IgOmwywn@public.gmane.org>
2008-09-17 13:57                                                       ` Mike Galbraith
     [not found]                                                         ` <1221659858.8818.13.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-17 17:04                                                           ` Ilpo Järvinen
2008-09-18  7:12                                                           ` Mike Galbraith
     [not found]                                                             ` <1221721970.5003.9.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-18  7:25                                                               ` Mike Galbraith
     [not found]                                                                 ` <1221722733.5149.6.camel-YqMYhexLQo1vAv1Ojkdn7Q@public.gmane.org>
2008-09-18  7:58                                                                   ` Ilpo Järvinen
2008-09-17 14:47                                                   ` Eric Dumazet
     [not found]                                                     ` <48D11871.4090805-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2008-09-17 14:50                                                       ` Eric Dumazet
2008-09-17 18:16                                                       ` Mike Galbraith
2008-09-06 21:24 2.6.27-rc5-git8: Reported regressions from 2.6.26 Rafael J. Wysocki
2008-09-06 21:30 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-08-30 19:46 2.6.27-rc5-git2: Reported regressions from 2.6.26 Rafael J. Wysocki
2008-08-30 19:50 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki
2008-08-16 19:00 2.6.27-rc3-git3: Reported regressions from 2.6.26 Rafael J. Wysocki
2008-08-16 19:02 ` [Bug #11308] tbench regression on each kernel release from 2.6.22 -&gt; 2.6.28 Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox