public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.12-rc6-mm1
@ 2005-06-07 11:29 Andrew Morton
  2005-06-07 14:24 ` 2.6.12-rc6-mm1 Wolfgang Wander
                   ` (7 more replies)
  0 siblings, 8 replies; 72+ messages in thread
From: Andrew Morton @ 2005-06-07 11:29 UTC (permalink / raw)
  To: linux-kernel


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc6/2.6.12-rc6-mm1/

- Added v9fs

- Various random fixes

- Probably a similar number of breakages



Changes since 2.6.12-rc5-mm2:


-fix-ide-scsi-eh-locking.patch
-ext3-fix-log_do_checkpoint-assertion-failure.patch
-ext3-fix-list-scanning-in-__cleanup_transaction.patch
-namei-fixes-01-19.patch
-namei-fixes-02-19.patch
-namei-fixes-03-19.patch
-namei-fixes-04-19.patch
-namei-fixes-05-19.patch
-namei-fixes-06-19.patch
-namei-fixes-07-19.patch
-namei-fixes-08-19.patch
-namei-fixes-09-19.patch
-namei-fixes-10-19.patch
-namei-fixes-11-19.patch
-namei-fixes-12-19.patch
-namei-fixes-13-19.patch
-namei-fixes-14-19.patch
-namei-fixes-15-19.patch
-namei-fixes-16-19.patch
-namei-fixes-17-19.patch
-namei-fixes-18-19.patch
-namei-fixes-19-19.patch
-ipmi-class_simple-fixes.patch
-gregkh-i2c-i2c-ali1563.patch
-git-ocfs-fix-for-shemminger-tcp-stuff.patch
-gregkh-pci-pci-hotplug-shpchp-_HPP-fix.patch
-gregkh-pci-pci-hotplug-shpchp-PERR-fix.patch
-gregkh-pci-pci-amd74xx-ids.patch
-gregkh-pci-pci-cpci-update.patch
-gregkh-usb-usb-sl811-hcd-fixes.patch
-gregkh-usb-usb-sl811_cs.patch
-gregkh-usb-usb-ftdi_sio-new-id.patch
-gregkh-usb-usb-serial-generic-init-fix.patch
-gregkh-usb-usb-ub_multi_lun.patch
-gregkh-usb-usb-remove_pwc_changelog.patch
-gregkh-usb-usb-add-new-wacom-device-to-usb-hid-core-list.patch
-gregkh-usb-usb-urb_documentation.patch
-gregkh-usb-usb-earthmate-hid-blacklist.patch
-gregkh-usb-usb-storage-trumpion.patch
-gregkh-usb-usb-modalias-shrink.patch
-gregkh-usb-usb-cp2101-flow-control.patch
-gregkh-usb-usb-usbatm-reduce-log-spam.patch
-gregkh-usb-usb-usbatm-avoid-oops-on-bind-failure.patch
-gregkh-usb-usb-usbatm-1-fix.patch
-usb-option-card-driver.patch
-usb-wacom-tablet-driver.patch
-atm-nicstar-remove-a-bunch-of-pointless-casts-of-null.patch
-fix-atm-build-with-o=.patch
-drivers-net-hamradio-baycom_eppc-cleanups.patch
-ppc32-apple-device-tree-bug-fix.patch
-ppc32-ppc64-cleanup-proc-device-tree.patch
-ppc64-cleanup-spr-definitions.patch
-ppc64-cleanup-iseries-runlight-support.patch
-ppc64-remove-decr_overclock.patch
-ppc64-fix-a-device-tree-bug-on-apples.patch
-i386-collect-host-bridge-resources.patch
-x86_64-collect-host-bridge-resources.patch
-allow-ev_abs-to-work-in-uinputc.patch
-serial-update-nec-vr4100-series-serial-support.patch

 Merged

+ppc32-add-linux-compilerh-to-asm-sigcontexth.patch
+include-linux-configh-before-testing-config_acpi.patch
+uml-make-the-emulated-iomem-driver-work-on-26.patch
+uml-compile-fixes-for-gcc-4.patch
+uml-fix-strace-f.patch
+uml-clean-up-error-path.patch
+uml-link-tt-mode-against-nptl.patch
+send_ipi_mask_sequence-warning-fix.patch
+ppc32-add-405ep-cpu_spec-entry.patch
+input-disable-scroll-feature-on-at-keyboards.patch

 Planned for 2.6.12

+x86_64-task_size-fixes-for-compatibility-mode-processes.patch

 x86_64 critical fixes (needs work)

+ia64-disable-preempt.patch

 Disable CONFIG_PREEMPT on ia64 (it has problem with floating-point
 save/restore)

+fix-up-macro-abuse-in-drivers-acpi-sleep-procc.patch

 ACPI cleanup

+git-arm.patch
+git-arm-smp.patch

 ARM git trees

-git-cpufreq.patch

 Empty

+fix-warning-in-powernow-k8c.patch

 Fix a cpufreq warning

+gregkh-driver-ipmi-class_simple-fixes.patch
+gregkh-driver-sysfs-permissions-01.patch
+gregkh-driver-sysfs-permissions-02.patch
+gregkh-driver-sysfs-permissions-03.patch
+gregkh-driver-dont-loose-devices-on-suspend-failure.patch

 New driver core patches

-bk-drm.patch
-bk-drm-via.patch

 DRM is moving to git

-update-drm-ioctl-compatibility-to-new-world-order.patch

 The code which this pathces isn't there any more (it will come back)

+git-drm-initmap.patch
+git-drm-via.patch

 Some DRM git trees

+gregkh-i2c-i2c-Kconfig-update.patch
+gregkh-i2c-i2c-pcf8574-cleanup.patch
+gregkh-i2c-i2c-adm9240-docs.patch
+gregkh-i2c-i2c-device-attr-lm90.patch
+gregkh-i2c-i2c-device-attr-lm83.patch
+gregkh-i2c-i2c-device-attr-lm63.patch
+gregkh-i2c-i2c-device-attr-it87.patch
+gregkh-i2c-hwmon-01.patch
+gregkh-i2c-hwmon-02.patch
+gregkh-i2c-hwmon-03.patch

 i2c tree updates

+i2c-chips-need-hwmon.patch
+gregkh-i2c-hwmon-02-sparc64-fix.patch

 Fix a few things in the i2c tree

+sonypi-make-sure-that-input_work-is-not-running-when-unloading.patch

 sonypi fix

-git-libata-adma.patch
-git-libata-ahci-msi.patch
-git-libata-bridge-detect.patch
-git-libata-chs-support.patch
-git-libata-docs.patch
-git-libata-svw.patch
-git-libata-promise-sata-pata.patch
-git-libata-pdc2027x.patch

 Dropped the libata tree - it changes all the time and I can't wqork out wtf
 is going on.

-git-netdev-r8169.patch

 Too many rejects from this one.

+fix-recursive-ipw2200-dependencies.patch
+drivers-net-chelsio-cxgb2-use-the-dma_3264bit_mask-constants.patch
+drivers-net-wireless-ipw2100-use-the-dma_32bit_mask-constant.patch
+drivers-net-wireless-ipw2200-use-the-dma_32bit_mask-constant.patch
+fix-tulip-suspend-resume.patch

 Net driver fixes

+scalable-tcp-cleaned.patch

 "scalable TCP"

+git-serial.patch

 Serial subsystem tree

+gregkh-pci-pci-fix-routing-in-parent-bridge.patch
+gregkh-pci-pci-dma-bursting-advice.patch
+gregkh-pci-pci-collect-host-bridge-resources-01.patch
+gregkh-pci-pci-collect-host-bridge-resources-02.patch

 PCI subsystem tree updates

+gregkh-pci-pci-dma-bursting-advice-fix.patch

 Fix it

-git-scsi-rc-fixes.patch

 This is empty

+gregkh-usb-usb-usbatm-reduce-log-spam.patch
+gregkh-usb-usb-usbatm-avoid-oops-on-bind-failure.patch
+gregkh-usb-usb-usbatm-fix-gcc-2.95.x.patch
+gregkh-usb-usb-usbatm-kcalloc.patch
+gregkh-usb-usb-uhci-detect-invalid-ports.patch
+gregkh-usb-usb-export-getput_intf.patch
+gregkh-usb-usb-cdc-acm-reference-count-fix.patch
+gregkh-usb-usb-ehci-fix-page-pointer-allocate.patch
+gregkh-usb-usb-wireless-definitions.patch
+gregkh-usb-usb-usblp-race-fix.patch
+gregkh-usb-usb-stv680-creative-mini.patch
+gregkh-usb-usb-atiremote-sysfs-links.patch
+gregkh-usb-usb-gotemp.patch

 USB tree updates

+sparsemem-memory-model-fix-4.patch
+sparsemem-memory-model-fix-5.patch

 Fix sparsemem-memory-model.patch even more

+sparsemem-hotplug-base-fix.patch

 Fix sparsemem-hotplug-base.patch

-vm-merge_lru_pages.patch
-vm-page-cache-reclaim-core.patch
-vm-page-cache-reclaim-core-tidy.patch
-vm-reclaim_page_cache_node-syscall.patch
-vm-reclaim_page_cache_node-syscall-x86.patch
-vm-automatic-reclaim-through-mempolicy.patch
+vm-add-may_swap-flag-to-scan_control.patch
+vm-early-zone-reclaim.patch
+vm-early-zone-reclaim-tidy.patch
+vm-add-__gfp_noreclaim.patch
+vm-rate-limit-early-reclaim.patch

 These patches were updated

+node-local-per-cpu-pages-tidy-2-fix.patch

 Fix node-local-per-cpu-pages.patch some more.

+avoiding-mmap-fragmentation-revert-unneeded-64-bit-changes-vs-x86_64-task_size-fixes-for-compatibility-mode-processes.patch

 Fix a patch clash

+__mod_page_state-pass-unsigned-long-instead-of-unsigned.patch
+__read_page_state-pass-unsigned-long-instead-of-unsigned.patch

 Warning fixes

+add-oom-debug.patch

 Additional debug output when the box goes oom.

+periodically-drain-non-local-pagesets.patch
+periodically-drain-non-local-pagesets-fix.patch

 Shrink the per-cpu-pages caches occasionally

+ia64-uncached-alloc.patch
+sn2-xpc-build-patches.patch

 Special allocator for uncached pages

+shmem-restore-superblock-info.patch
+mbind-fix-verify_pages-pte_page.patch
+mbind-check_range-use-standard-ptwalk.patch
+dup_mmap-update-comment-on-new-vma.patch
+bad_page-clear-reclaim-and-slab.patch
+rme96xx-fix-pagereserved-range.patch
+get_user_pages-kill-get_page_map.patch
+do_wp_page-cannot-share-file-page.patch
+can_share_swap_page-use-page_mapcount.patch
+msync-check-pte-dirty-earlier.patch

 Various mm fixes

+sunzilog-warning-fixes.patch
+ppp-handle-misaligned-accesses.patch

 Net fixes

+ppc32-removed-dependency-on-config_cpm2-for-building.patch
+ppc32-converted-mpc10x-bridge-to-use-platform.patch
+cpm_uart-route-scc2-pins-for-the-stx-gp3-board.patch

 ppc32 updates

+ppc64-iseries-remove-iseries_proch.patch
+ppc64-iseries-header-file-white-space-cleanups.patch
+ppc64-iseries-more-header-file-white-space-cleanups.patch
+ppc64-iseries-obvious-code-simplifications.patch
+ppc64-iseries-remove-lpardatah.patch
+ppc64-iseries-eliminate-some-unused-inline-functions.patch
+ppc64-iseries-remove-hvcallcfgh.patch
+ppc64-iseries-cleanup-itlpqueueh.patch
+ppc64-iseries-tidy-up-some-includes-and-hvcallh.patch
+ppc64-iseries-misc-header-cleanups.patch
+update-ppc64-defconfig.patch
+ppc64-iseries-remove-iseries_pci_resetc.patch
+ppc64-iseries-iommuh-cleanups.patch
+ppc64-iseries-iseries_vpdinfoc-cleanups.patch
+ppc64-iseries-iseries_pcih-cleanups.patch
+ppc64-iseries-remove-ioretry-from-iseries_device_node.patch
+ppc64-iseries-remove-some-more-members-of.patch

 ppc64 updates

+x86-x86_64-pcibus_to_node-fix.patch

 Fix x86-x86_64-pcibus_to_node.patch

+mempool-bounce-buffer-restriction.patch

 Limit the amount of memory which can be used for bounce buffers

+arm-irqs_disabled-type-fix.patch

 ARM warning fix

+variable-overflow-after-hundreds-round-of-hotplug-cpu.patch

 CPU hotplug fix

+x86_64-change-init-sections-for-cpu-hotplug-support.patch
+x86_64-change-init-sections-for-cpu-hotplug-support-fix.patch
+x86_64-cpu-hotplug-support.patch
+x86_64-cpu-hotplug-sibling-map-cleanup.patch
+x86_64-dont-use-broadcast-shortcut-to-make-it-cpu-hotplug-safe.patch
+x86_64-provide-ability-to-choose-using-shortcuts-for-ipi-in-flat-mode.patch

 CPU hotplug for x86_64

+m32r-support-m3a-2170mappi-iii-platform-fix.patch
+m32r-support-m3a-2170mappi-iii-platform-fix-2.patch
+m32r-update-setup_xxxxxc.patch
+m32r-update-m32r_cfc-to-support-mappi-iii-fix.patch
+m32r-cleanup-arch-m32r-mm-extablec.patch
+m32r-remove-include-asm-m32r-m32102perih.patch
+m32r-update-defconfig-files.patch
+m32r-use-asm-generic-div64h.patch

 m32r fixes and updates

+s390-cio-max-channels-checks.patch
+s390-cio-documentation.patch
+s390-ifdefs-in-compat_ioctls.patch
+s390-kernel-stack-overflow-panic.patch
+s390-cmm-sender-parameter-visibility.patch
+s390-memory-detection-32gb.patch
+s390-pending-interrupt-after-ipl-from-reader.patch

 s/390 updates

+ecryptfs-export-user-key-type.patch

 Export a symbol

+x86_64-specific-function-return-probes.patch
+kprobes-ia64-cleanup-2.patch
+kprobes-ia64-cmp-ctype-unc-support.patch
+kprobes-ia64-safe-register-kprobe.patch
+kprobes-temporary-disarming-of-reentrant-probe-for-x86_64-fix.patch
+allow-a-jprobe-to-coexist-with-muliple-kprobes.patch

 kprobes updates

+cs4236-irq-handling-fix.patch

 OSS driver fix

+block-add-unlocked_ioctl-support-for-block-devices.patch

 Support lock_kernel-less ioctls on blockdevs

+pcdp-handle-tables-that-dont-supply-baud-rate.patch

 serial driver update

+stop-arch-i386-kernel-vsyscall-noteo-being-rebuilt-every-time.patch

 kbuild fix

+remove-f_error-field-from-struct-file.patch

 cleanup

+autofs4-avoid-panic-on-bind-mount-of-autofs-owned-directory.patch
+autofs4-post-expire-race-fix.patch
+autofs4-bad-lookup-fix.patch
+autofs4-subversion-bump-to-identify-these-changes.patch

 autofs4 updates

+rapidio-support-core-base.patch
+rapidio-support-core-includes.patch
+rapidio-support-core-enum.patch
+rapidio-support-ppc32.patch
+rapidio-support-net-driver.patch

 RapidIO driver

+dlm-lockspaces-callbacks-directory-dlm-consistent-ifdefs.patch
+dlm-lockspaces-callbacks-directory-fix-2-dlm-dont-repeat-include.patch
+dlm-lockspaces-callbacks-directory-fix-3.patch
+dlm-lockspaces-callbacks-directory-dlm-dont-free-lvb-twice.patch
+dlm-communication-dlm-dont-add-duplicate-node-addresses.patch
+dlm-recovery-dlm-timer-cant-be-global.patch
+dlm-recovery-dlm-clear-recovery-flags.patch
+dlm-device-interface-dlm-uncomment-unregister_lockspace.patch
+dlm-device-interface-dlm-newline-in-printks.patch
+dlm-debug-fs-dlm-consistent-ifdefs.patch

 Various fixes and updates to the DLM driver

+tuner-corec-improvments-and-ymec-tvision-tvf8533mf.patch

 v4l udpate

+oprofile-report-anonymous-region-samples.patch

 oprofile feature

+lockd-flush-signals-on-shutdown.patch
+nfs4-hold-filp-while-reading-or-writing.patch
+nfsd4-fix-probe_callback.patch
+nfsd4-nfs4_check_open_reclaim-cleanup.patch
+nfsd4-create-separate-laundromat-workqueue.patch
+nfsd4-simplify-lease-changing.patch
+nfsd4-delegation-recovery.patch
+nfsd4-rename-nfs4_state_init.patch
+nfsd4-clean-up-state-initialization.patch
+nfsd4-remove-nfs4_reclaim_init.patch
+nfsd4-idmap-initialization.patch
+nfsd4-setclientid-simplification.patch
+nfsd4-reboot-hash.patch
+nfsd4-add-find_unconf_by_str-functions-to-simplify-setclientid.patch
+nfsd4-grace-period-end.patch
+nfsd4-make-needlessly-global-code-static.patch
+nfsd4-fix-uncomfirmed-list.patch
+nfsd4-fix-setclientid_confirm-cases.patch
+nfsd4-fix-setclientid_confirm-error-return.patch
+nfsd4-setclientid_confirm-gotoectomy.patch
+nfsd4-setclientid_confirm-comments.patch
+nfsd4-miscellaneous-setclientid_confirm-cleanup.patch
+nfsd4-rename-state-list-fields.patch
+nfsd4-allow-multiple-lockowners.patch
+nfsd4-remove-cb_parsed.patch
+nfsd4-initialize-recovery-directory.patch
+nfsd4-reboot-recovery.patch
+nfsd4-reboot-dirname.patch

 nfsd updates

+isofs-show-hidden-files-add-granularity-for-assoc-hidden-files-flags.patch
+isofs-show-hidden-files-add-granularity-for-assoc-hidden-files-flags-tidy.patch
+isofs-show-hidden-files-add-granularity-for-assoc-hidden-files-flags-fix.patch

 isofs feature work

+numa-aware-slab-allocator-v5.patch

 The NUMA-aware slab allocator is back.  Needs ifdef-reduction work.

-periodically-scan-redzone-entries-and-slab-control-structures.patch
-slab-leak-detector.patch
-slab-leak-detector-warning-fixes.patch

 It broke these.

+numa-aware-slab-allocator-v3-__bad_size-fix.patch

 Fix it.

+sched-run-sched_normal-tasks-with-real-time-tasks-on-smt-siblings.patch

 CPU scheduler fix

+v4l-add-support-for-pixelview-ultra-pro.patch
+dvico-fusionhdtv3-gold-t-documentation-fix.patch

 v4l updates

+kexec-code-cleanup.patch

 Make all the kexec patches resemble CodingStyle.

+v9fs-documentation-makefiles-configuration.patch
+v9fs-documentation-makefiles-configuration-fix.patch
+v9fs-vfs-file-dentry-and-directory-operations.patch
+v9fs-vfs-file-dentry-and-directory-operations-fix.patch
+v9fs-vfs-inode-operations.patch
+v9fs-vfs-superblock-operations-and-glue.patch
+v9fs-9p-protocol-implementation.patch
+v9fs-transport-modules.patch
+v9fs-debug-and-support-routines.patch
+v9fs-debug-and-support-routines-fix.patch

 The plan9 networked filesystem

+framebuffer-driver-for-arc-lcd-board.patch
+framebuffer-driver-for-arc-lcd-board-tidy.patch
+framebuffer-driver-for-arc-lcd-board-update.patch
+new-pci-id-for-chipsfb.patch

 fbdev updates

+modules-add-version-and-srcversion-to-sysfs-fix.patch
+modules-add-version-and-srcversion-to-sysfs-fix-2.patch

 Fix modules-add-version-and-srcversion-to-sysfs.patch

+fuse-device-functions-fuse-serious-information-leak-fix.patch

 FUSE fix

+remove-redundant-info-from-submittingpatches.patch

 Documentation update

-unexport-slab_reclaim_pages.patch

 Drop this due to some reject.



number of patches in -mm: 1397
number of changesets in external trees: 53
number of patches in -mm only: 1395
total patches: 1448


All 1397 patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc6/2.6.12-rc6-mm1/patch-list



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-07 14:24 ` Wolfgang Wander
  2005-06-07 14:49   ` 2.6.12-rc6-mm1 Wolfgang Wander
  2005-06-07 14:48 ` 2.6.12-rc6-mm1 Brice Goglin
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 72+ messages in thread
From: Wolfgang Wander @ 2005-06-07 14:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Chen, Kenneth W

Andrew Morton wrote:

> +avoiding-mmap-fragmentation-revert-unneeded-64-bit-changes-vs-x86_64-task_size-fixes-for-compatibility-mode-processes.patch

As a heads-up.

This one breaks the fragmentation reduction patch in 32 bit emulation mode.
Our test case shows the standard 17 fragmented regions in /proc/self/maps (as in
the 2.6 standard kernel) vs the 2 regions in 2.6.12-rc5-mm2 (and before).

Somehow the new way of detecting 32 bit remulation mode seems to fail here.

I'll try to figure out a fix.

               Wolfgang




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
  2005-06-07 14:24 ` 2.6.12-rc6-mm1 Wolfgang Wander
@ 2005-06-07 14:48 ` Brice Goglin
  2005-06-07 23:15 ` 2.6.12-rc6-mm1 Francois Romieu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Brice Goglin @ 2005-06-07 14:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton a écrit :
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc6/2.6.12-rc6-mm1/
> 
> - Added v9fs
> 
> - Various random fixes
> 
> - Probably a similar number of breakages

Hi Andrew,

I didn't see any breakage. But I get these two lines during boot:
yenta 0000:02:03.1: no resource of type 100 available, trying to continue...
yenta 0000:02:03.1: no resource of type 100 available, trying to continue...

Anyway, my PCMCIA slots seem to still work.

Brice

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 14:24 ` 2.6.12-rc6-mm1 Wolfgang Wander
@ 2005-06-07 14:49   ` Wolfgang Wander
  0 siblings, 0 replies; 72+ messages in thread
From: Wolfgang Wander @ 2005-06-07 14:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Chen, Kenneth W

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

Wolfgang Wander wrote:
> Andrew Morton wrote:
> 
>> +avoiding-mmap-fragmentation-revert-unneeded-64-bit-changes-vs-x86_64-task_size-fixes-for-compatibility-mode-processes.patch 
>>
> 
> 
> As a heads-up.
> 
> This one breaks the fragmentation reduction patch in 32 bit emulation mode.
> Our test case shows the standard 17 fragmented regions in 
> /proc/self/maps (as in
> the 2.6 standard kernel) vs the 2 regions in 2.6.12-rc5-mm2 (and before).
> 
> Somehow the new way of detecting 32 bit remulation mode seems to fail here.
> 
> I'll try to figure out a fix.
> 

Here is one possibility:

Since rc6 the difference between TASK_UNMAPPED_64 and TASK_UNMAPPED_32 is gone
and both are now merged into TASK_UNMAPPED_BASE.  Therefore we can no longer
check our local base against TASK_UNMAPPED_BASE to see if we are running in 32bit
emulation mode.  The appended patch uses other (hopefully the right) means.

Tested on x86_64 in 32 and 64 mode (64 bit fragments as desired, 32 bit
collapses as desired).

Signed-off-by: Wolfgang Wander <wwc@rentec.com>


[-- Attachment #2: avoiding-mmap-fragmentation-revert-unneeded-64-bit-changes-vs-x86_64-task_size-fixes-for-compatibility-mode-processes-fix.patch --]
[-- Type: text/x-patch, Size: 511 bytes --]

--- arch/x86_64/kernel/sys_x86_64.c~	2005-06-07 09:12:31.000000000 -0400
+++ arch/x86_64/kernel/sys_x86_64.c	2005-06-07 10:32:07.000000000 -0400
@@ -105,7 +105,8 @@ arch_get_unmapped_area(struct file *filp
 		    (!vma || addr + len <= vma->vm_start))
 			return addr;
 	}
-	if (begin != TASK_UNMAPPED_BASE && len <= mm->cached_hole_size) {
+	if (((flags & MAP_32BIT) || test_thread_flag(TIF_IA32))
+	    && len <= mm->cached_hole_size) {
 	        mm->cached_hole_size = 0;
 		mm->free_area_cache = begin;
 	}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
  2005-06-07 14:24 ` 2.6.12-rc6-mm1 Wolfgang Wander
  2005-06-07 14:48 ` 2.6.12-rc6-mm1 Brice Goglin
@ 2005-06-07 23:15 ` Francois Romieu
  2005-06-08  1:59 ` 2.6.12-rc6-mm1 Søren Lott
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 72+ messages in thread
From: Francois Romieu @ 2005-06-07 23:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton <akpm@osdl.org> :
[...]
> -git-netdev-r8169.patch
> 
>  Too many rejects from this one.

How did you generate git-netdev-r8169.patch ?

Jeff's 'upstream-2.6.13' includes all the pending r8169 changes and
nothing will be merged before 2.6.12 is out. Imho you can safely
ignore any r8169 change until 2.6.12 appears.

--
Ueimor

^ permalink raw reply	[flat|nested] 72+ messages in thread

* 2.6.12-rc6-mm1
@ 2005-06-07 23:50 Martin J. Bligh
  2005-06-07 23:56 ` 2.6.12-rc6-mm1 Andrew Morton
  0 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-07 23:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Wheeee! it actually compiles and boots for me on x86 ;-)

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png

Seems to show that perf is rather sucky on kernbench though.

baseline (-rc6) data is here:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/4760/kernbench.test/

-mm1 is here:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/4876/kernbench.test/

Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
If I factor it by 4x, I get:



     47796    10.9% total
     16644    30.5% buffered_rmqueue
     15574     7.7% default_idle
      2229   239.4% kmem_cache_free
      1782    11.1% zap_pte_range
      1752     0.0% inotify_inode_queue_event
      1467    36.3% release_pages
      1281    73.3% set_page_dirty
      1155    12.8% do_wp_page
       924     8.3% _spin_lock
       896     0.0% find_idlest_group
       828    21.7% free_hot_cold_page
       780     0.0% drain_remote_pages
       772     0.0% dput_recursive
       464     0.0% inotify_dentry_parent_queue_event
...
      -412    -8.1% __d_lookup
      -508   -98.4% find_idlest_cpu
      -542   -24.5% do_anonymous_page
      -549   -47.5% current_fs_time
      -580  -100.0% del_timer_sync
      -594   -86.6% dput
      -695   -31.4% __copy_user_intel
     -1461   -13.9% strnlen_user


Buggered if I know what that is from. I'm guessing scheduler, or the
HZ change. I guess I can rerun with the HZ set to 1000 ... you got any
experimental scheduler stuff in your tree?

Else I guess it's some memory allocator stuff maybe? 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 23:50 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-07 23:56 ` Andrew Morton
  2005-06-08  0:02   ` 2.6.12-rc6-mm1 Christoph Lameter
  2005-06-08  0:02   ` 2.6.12-rc6-mm1 Martin J. Bligh
  0 siblings, 2 replies; 72+ messages in thread
From: Andrew Morton @ 2005-06-07 23:56 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Christoph Lameter

"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>
> Wheeee! it actually compiles and boots for me on x86 ;-)

We aim to please.

> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png
> 
> Seems to show that perf is rather sucky on kernbench though.

CPU scheduler.

> baseline (-rc6) data is here:
> 
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/4760/kernbench.test/
> 
> -mm1 is here:
> 
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/4876/kernbench.test/
> 
> Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).

Oh crap, so it does.  That's wrong.

> If I factor it by 4x, I get:

Would it be possible to set it back to 100Hz, retest?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 23:56 ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-08  0:02   ` Christoph Lameter
  2005-06-08  0:08     ` 2.6.12-rc6-mm1 Andrew Morton
  2005-06-09  1:58     ` 2.6.12-rc6-mm1 Lee Revell
  2005-06-08  0:02   ` 2.6.12-rc6-mm1 Martin J. Bligh
  1 sibling, 2 replies; 72+ messages in thread
From: Christoph Lameter @ 2005-06-08  0:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel

On Tue, 7 Jun 2005, Andrew Morton wrote:

> > Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
> 
> Oh crap, so it does.  That's wrong.

Email by you and Linus indicated that 250 should be the default.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 23:56 ` 2.6.12-rc6-mm1 Andrew Morton
  2005-06-08  0:02   ` 2.6.12-rc6-mm1 Christoph Lameter
@ 2005-06-08  0:02   ` Martin J. Bligh
  1 sibling, 0 replies; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-08  0:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Christoph Lameter

>> Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
> 
> Oh crap, so it does.  That's wrong.
> 
>> If I factor it by 4x, I get:
> 
> Would it be possible to set it back to 100Hz, retest?

Sure. but you mean 1000, right?

M.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  0:02   ` 2.6.12-rc6-mm1 Christoph Lameter
@ 2005-06-08  0:08     ` Andrew Morton
  2005-06-08  3:17       ` 2.6.12-rc6-mm1 Nick Piggin
                         ` (2 more replies)
  2005-06-09  1:58     ` 2.6.12-rc6-mm1 Lee Revell
  1 sibling, 3 replies; 72+ messages in thread
From: Andrew Morton @ 2005-06-08  0:08 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: mbligh, linux-kernel

Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> On Tue, 7 Jun 2005, Andrew Morton wrote:
> 
> > > Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
> > 
> > Oh crap, so it does.  That's wrong.
> 
> Email by you and Linus indicated that 250 should be the default.

Oh, OK. hrm.

Martin, it would be useful if you could determine whether the kernbench
slowdown was due to the 1000Hz->250Hz change, thanks.

I'm assuming it was the CPU scheduler patches.  There are 36 of them ;)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
                   ` (2 preceding siblings ...)
  2005-06-07 23:15 ` 2.6.12-rc6-mm1 Francois Romieu
@ 2005-06-08  1:59 ` Søren Lott
  2005-06-08  5:53   ` 2.6.12-rc6-mm1 Jean Delvare
  2005-06-08 14:22 ` 2.6.12-rc6-mm1 Andy Whitcroft
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 72+ messages in thread
From: Søren Lott @ 2005-06-08  1:59 UTC (permalink / raw)
  To: Andrew Morton, gregkh; +Cc: linux-kernel

On Tuesday 07 June 2005 08:29, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc6/2.
>6.12-rc6-mm1/

[snip]

> +gregkh-i2c-i2c-Kconfig-update.patch
> +gregkh-i2c-i2c-pcf8574-cleanup.patch
> +gregkh-i2c-i2c-adm9240-docs.patch
> +gregkh-i2c-i2c-device-attr-lm90.patch
> +gregkh-i2c-i2c-device-attr-lm83.patch
> +gregkh-i2c-i2c-device-attr-lm63.patch
> +gregkh-i2c-i2c-device-attr-it87.patch
> +gregkh-i2c-hwmon-01.patch
> +gregkh-i2c-hwmon-02.patch
> +gregkh-i2c-hwmon-03.patch
>
>  i2c tree updates
>
> +i2c-chips-need-hwmon.patch
> +gregkh-i2c-hwmon-02-sparc64-fix.patch
>
>  Fix a few things in the i2c tree

[snip]

after those changes i don't get entries in /sys for my W83627THF chip. 
(p4c800-D, i875,ICH5)

relevant config parts:

CONFIG_HWMON=y
CONFIG_I2C=y
CONFIG_I2C_ISA=y
CONFIG_I2C_SENSOR=y
CONFIG_SENSORS_W83627HF=y

thanks.

-SL

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  0:08     ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-08  3:17       ` Nick Piggin
  2005-06-08  3:33         ` 2.6.12-rc6-mm1 Con Kolivas
  2005-06-08 14:15       ` 2.6.12-rc6-mm1 Martin J. Bligh
  2005-06-09 23:56       ` 2.6.12-rc6-mm1 Martin J. Bligh
  2 siblings, 1 reply; 72+ messages in thread
From: Nick Piggin @ 2005-06-08  3:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, mbligh, lkml, Con Kolivas

On Tue, 2005-06-07 at 17:08 -0700, Andrew Morton wrote: 
> Christoph Lameter <clameter@engr.sgi.com> wrote:
> >
> > On Tue, 7 Jun 2005, Andrew Morton wrote:
> > 
> > > > Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
> > > 
> > > Oh crap, so it does.  That's wrong.
> > 
> > Email by you and Linus indicated that 250 should be the default.
> 
> Oh, OK. hrm.
> 
> Martin, it would be useful if you could determine whether the kernbench
> slowdown was due to the 1000Hz->250Hz change, thanks.
> 
> I'm assuming it was the CPU scheduler patches.  There are 36 of them ;)


I'm looking at some issues with the scheduler patches.

To start with, it looks like the smp-nice patches are broken. Even if
they weren't I think it might be a good idea just to put them on hold
until we work out what to do with the other sched patches... we're
only just starting to get some interesting tests (ie. regressions) being
run on -mm (at least that I've been made aware of). So give me a bit of
time to work though that.

Anyway, Con, this is what it is doing on a 64-way Altix running aim7:
(compare imbalances, task move rates, wakeup move rates, etc).


--- wakeup statistics ---
  269.174 task wakes / s
    31.704% of them from the local CPU
    14.190% of remote wakeups come from domain0
      0.000% are moved to the local CPU via passive load balancing
      26.660% are moved to the local CPU via affine wakeups
    46.672% of remote wakeups come from domain1
      10.359% are moved to the local CPU via passive load balancing
      0.000% are moved to the local CPU via affine wakeups
    39.012% of remote wakeups come from domain2
      10.659% are moved to the local CPU via passive load balancing
      0.000% are moved to the local CPU via affine wakeups

--- load balancing statistics ---
  for domain0
    4368.652 load balance calls / s move 137.083 tasks / s
      96.456% calls and 1.174% task moves came from idle balancing
        0.042% were imbalanced with an average imbalance of 566.708
        0.038% found an imbalance but failed
        6.165% of tasks moved were cache hot
      1.818% calls and 73.086% task moves came from busy balancing
        47.694% were imbalanced with an average imbalance of 335.932
        4.704% found an imbalance but failed
        0.140% of tasks moved were cache hot
      1.726% calls and 25.740% task moves came from new-idle balancing
        26.938% were imbalanced with an average imbalance of 198.054
        9.136% found an imbalance but failed
        0.151% of tasks moved were cache hot
    0.000 active balances / s  move 0.000 tasks / s
    0.000 exec balances / s  move 0.000 tasks / s
    0.000 fork balances / s  move 0.000 tasks / s
                                                                                                                                                             
  for domain1
    102.002 load balance calls / s move 180.344 tasks / s
      85.398% calls and 17.496% task moves came from idle balancing
        5.920% were imbalanced with an average imbalance of 386.172
        2.103% found an imbalance but failed
        0.920% of tasks moved were cache hot
      14.602% calls and 82.504% task moves came from busy balancing
        69.017% were imbalanced with an average imbalance of 702.928
        5.849% found an imbalance but failed
        0.075% of tasks moved were cache hot
      0.000% calls and 0.000% task moves came from new-idle balancing
    0.048 active balances / s  move 0.002 tasks / s
      %95.000 attempts failed
    0.000 exec balances / s  move 0.000 tasks / s
    0.000 fork balances / s  move 0.000 tasks / s
                                                                                                                                                             
  for domain2
    9.496 load balance calls / s move 13.070 tasks / s
      91.335% calls and 32.327% task moves came from idle balancing
        21.094% were imbalanced with an average imbalance of 115.513
        16.936% found an imbalance but failed
        2.978% of tasks moved were cache hot
      8.665% calls and 67.673% task moves came from busy balancing
        64.118% were imbalanced with an average imbalance of 503.867
        17.353% found an imbalance but failed
        0.383% of tasks moved were cache hot
      0.000% calls and 0.000% task moves came from new-idle balancing
    0.007 active balances / s  move 0.007 tasks / s
      %0.000 attempts failed
    0.000 exec balances / s  move 0.000 tasks / s
    0.000 fork balances / s  move 0.000 tasks / s

And this is what it looks like with smpnice #if'ed out: 
--- wakeup statistics ---
  331.734 task wakes / s
    25.492% of them from the local CPU
    13.601% of remote wakeups come from domain0
      0.000% are moved to the local CPU via passive load balancing
      1.674% are moved to the local CPU via affine wakeups
    44.484% of remote wakeups come from domain1
      3.139% are moved to the local CPU via passive load balancing
      0.000% are moved to the local CPU via affine wakeups
    42.088% of remote wakeups come from domain2
      0.000% are moved to the local CPU via passive load balancing
      0.000% are moved to the local CPU via affine wakeups
 
--- load balancing statistics ---
  for domain0
    3940.070 load balance calls / s move 3.671 tasks / s
      96.488% calls and 48.889% task moves came from idle balancing
        0.068% were imbalanced with an average imbalance of 1.132
        0.029% found an imbalance but failed
        3.135% of tasks moved were cache hot
      1.339% calls and 33.563% task moves came from busy balancing
        2.319% were imbalanced with an average imbalance of 1.037
        0.069% found an imbalance but failed
        0.228% of tasks moved were cache hot
      2.173% calls and 17.548% task moves came from new-idle balancing
        1.259% were imbalanced with an average imbalance of 1.008
        0.516% found an imbalance but failed
        3.057% of tasks moved were cache hot
    0.006 active balances / s  move 0.006 tasks / s
      %0.000 attempts failed
    0.000 exec balances / s  move 0.000 tasks / s
    0.000 fork balances / s  move 0.000 tasks / s
 
  for domain1
    86.378 load balance calls / s move 2.644 tasks / s
      94.236% calls and 89.468% task moves came from idle balancing
        4.116% were imbalanced with an average imbalance of 1.123
        1.597% found an imbalance but failed
        4.281% of tasks moved were cache hot
      5.764% calls and 10.532% task moves came from busy balancing
        6.667% were imbalanced with an average imbalance of 1.008
        1.130% found an imbalance but failed
        0.000% of tasks moved were cache hot
      0.000% calls and 0.000% task moves came from new-idle balancing
    0.082 active balances / s  move 0.017 tasks / s
      %79.310 attempts failed
    0.000 exec balances / s  move 0.000 tasks / s
    0.000 fork balances / s  move 0.000 tasks / s
 
  for domain2
    9.024 load balance calls / s move 0.343 tasks / s
      95.293% calls and 88.525% task moves came from idle balancing
        12.103% were imbalanced with an average imbalance of 1.003
        8.701% found an imbalance but failed
        14.815% of tasks moved were cache hot
      4.707% calls and 11.475% task moves came from busy balancing
        16.556% were imbalanced with an average imbalance of 1.000
        7.285% found an imbalance but failed
        21.429% of tasks moved were cache hot
      0.000% calls and 0.000% task moves came from new-idle balancing
    0.008 active balances / s  move 0.008 tasks / s
      %0.000 attempts failed
    0.000 exec balances / s  move 0.000 tasks / s
    0.000 fork balances / s  move 0.000 tasks / s


-- 
SUSE Labs, Novell Inc.



Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  3:17       ` 2.6.12-rc6-mm1 Nick Piggin
@ 2005-06-08  3:33         ` Con Kolivas
  2005-06-08  3:50           ` 2.6.12-rc6-mm1 Nick Piggin
  0 siblings, 1 reply; 72+ messages in thread
From: Con Kolivas @ 2005-06-08  3:33 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Christoph Lameter, mbligh, lkml

On Wed, 8 Jun 2005 01:17 pm, Nick Piggin wrote:
> On Tue, 2005-06-07 at 17:08 -0700, Andrew Morton wrote:
> > Christoph Lameter <clameter@engr.sgi.com> wrote:
> > > On Tue, 7 Jun 2005, Andrew Morton wrote:
> > > > > Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
> > > >
> > > > Oh crap, so it does.  That's wrong.
> > >
> > > Email by you and Linus indicated that 250 should be the default.
> >
> > Oh, OK. hrm.
> >
> > Martin, it would be useful if you could determine whether the kernbench
> > slowdown was due to the 1000Hz->250Hz change, thanks.
> >
> > I'm assuming it was the CPU scheduler patches.  There are 36 of them ;)
>
> I'm looking at some issues with the scheduler patches.
>
> To start with, it looks like the smp-nice patches are broken. Even if
> they weren't I think it might be a good idea just to put them on hold
> until we work out what to do with the other sched patches... 

I originally said I'd wait till the sched patches settled down before tackling 
it but it didn't look like that was ever going to happen and broken nice on 
SMP is a real bug biting people now so I figured I should just tackle it 
anyway. I don't mind if we just work on it later though.

> Anyway, Con, this is what it is doing on a 64-way Altix running aim7:
> (compare imbalances, task move rates, wakeup move rates, etc).

Definitely different I agree. As for the performance impact the statistics 
alone don't tell us if they're for good or evil, but we can look at it again 
separately when we tackle smp nice again. It is a real issue for users now, 
though so it would be good if we can have a calmer period in the future to do 
this (smp nice) by itself.

These are the four patches Andrew:
sched-implement-nice-support-across-physical-cpus-on-smp.patch
sched-change_prio_bias_only_if_queued.patch
sched-account_rt_tasks_in_prio_bias.patch
sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch

The other HT patch by me is separate and a bugfix so please leave that in.

Cheers,
Con

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  3:33         ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-08  3:50           ` Nick Piggin
  0 siblings, 0 replies; 72+ messages in thread
From: Nick Piggin @ 2005-06-08  3:50 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, Christoph Lameter, mbligh, lkml

On Wed, 2005-06-08 at 13:33 +1000, Con Kolivas wrote:
> On Wed, 8 Jun 2005 01:17 pm, Nick Piggin wrote:

> > To start with, it looks like the smp-nice patches are broken. Even if
> > they weren't I think it might be a good idea just to put them on hold
> > until we work out what to do with the other sched patches... 
> 
> I originally said I'd wait till the sched patches settled down before tackling 
> it but it didn't look like that was ever going to happen and broken nice on 
> SMP is a real bug biting people now so I figured I should just tackle it 
> anyway. I don't mind if we just work on it later though.
> 

Well I agree with you that it would be nice to fix it. I
think your approach has good potential, and it is along
the same lines as what I had in mind.

> > Anyway, Con, this is what it is doing on a 64-way Altix running aim7:
> > (compare imbalances, task move rates, wakeup move rates, etc).
> 
> Definitely different I agree. As for the performance impact the statistics 
> alone don't tell us if they're for good or evil, but we can look at it again 
> separately when we tackle smp nice again. It is a real issue for users now, 
> though so it would be good if we can have a calmer period in the future to do 
> this (smp nice) by itself.
> 

True. Fortunately this seems to only come up once a year or so.
Although I guess with the rise and rise of multi threaded and
multi cored CPUs it could become a bigger issue.

> These are the four patches Andrew:
> sched-implement-nice-support-across-physical-cpus-on-smp.patch
> sched-change_prio_bias_only_if_queued.patch
> sched-account_rt_tasks_in_prio_bias.patch
> sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch
> 

Thanks.

> The other HT patch by me is separate and a bugfix so please leave that in.
> 

Yep.


Nick

-- 
SUSE Labs, Novell Inc.



Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  1:59 ` 2.6.12-rc6-mm1 Søren Lott
@ 2005-06-08  5:53   ` Jean Delvare
  2005-06-08  7:08     ` 2.6.12-rc6-mm1 Søren Lott
  0 siblings, 1 reply; 72+ messages in thread
From: Jean Delvare @ 2005-06-08  5:53 UTC (permalink / raw)
  To: Søren Lott; +Cc: Andrew Morton, Greg KH, LKML, LM Sensors

Hi Soren,

> [snip]
> 
> > +gregkh-i2c-i2c-Kconfig-update.patch
> > +gregkh-i2c-i2c-pcf8574-cleanup.patch
> > +gregkh-i2c-i2c-adm9240-docs.patch
> > +gregkh-i2c-i2c-device-attr-lm90.patch
> > +gregkh-i2c-i2c-device-attr-lm83.patch
> > +gregkh-i2c-i2c-device-attr-lm63.patch
> > +gregkh-i2c-i2c-device-attr-it87.patch
> > +gregkh-i2c-hwmon-01.patch
> > +gregkh-i2c-hwmon-02.patch
> > +gregkh-i2c-hwmon-03.patch
> >
> >  i2c tree updates
> >
> > +i2c-chips-need-hwmon.patch
> > +gregkh-i2c-hwmon-02-sparc64-fix.patch
> >
> >  Fix a few things in the i2c tree
> 
> [snip]
> 
> after those changes i don't get entries in /sys for my W83627THF chip.
> 
> (p4c800-D, i875,ICH5)
> 
> relevant config parts:
> 
> CONFIG_HWMON=y
> CONFIG_I2C=y
> CONFIG_I2C_ISA=y
> CONFIG_I2C_SENSOR=y
> CONFIG_SENSORS_W83627HF=y

Which kernel are you upgrading from?

Is CONFIG_PNPACPI set? If it is, try whithout it.

If it doesn't work, please try reverting (in reverse order):
  gregkh-i2c-hwmon-01.patch
  gregkh-i2c-hwmon-02.patch
  gregkh-i2c-hwmon-03.patch
  i2c-chips-need-hwmon.patch
  gregkh-i2c-hwmon-02-sparc64-fix.patch
and see how it goes.

Thanks,
-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  5:53   ` 2.6.12-rc6-mm1 Jean Delvare
@ 2005-06-08  7:08     ` Søren Lott
  0 siblings, 0 replies; 72+ messages in thread
From: Søren Lott @ 2005-06-08  7:08 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Andrew Morton, Greg KH, LKML, LM Sensors

On Wednesday 08 June 2005 02:53, Jean Delvare wrote:
> Hi Soren,
Hi,
> Which kernel are you upgrading from?

from 2.6.12-rc5-mm2

> Is CONFIG_PNPACPI set? If it is, try whithout it.

nope, don't even have CONFIG_PNP set.

> If it doesn't work, please try reverting (in reverse order):
>   gregkh-i2c-hwmon-01.patch
>   gregkh-i2c-hwmon-02.patch
>   gregkh-i2c-hwmon-03.patch
>   i2c-chips-need-hwmon.patch
>   gregkh-i2c-hwmon-02-sparc64-fix.patch
> and see how it goes.

yeap, reverting these did the trick, all i2c entries in sysfs are back. :)

> Thanks,

thanks alot.
cheers.

-SL

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  0:08     ` 2.6.12-rc6-mm1 Andrew Morton
  2005-06-08  3:17       ` 2.6.12-rc6-mm1 Nick Piggin
@ 2005-06-08 14:15       ` Martin J. Bligh
  2005-06-09 23:56       ` 2.6.12-rc6-mm1 Martin J. Bligh
  2 siblings, 0 replies; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-08 14:15 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter; +Cc: linux-kernel



--Andrew Morton <akpm@osdl.org> wrote (on Tuesday, June 07, 2005 17:08:53 -0700):

> Christoph Lameter <clameter@engr.sgi.com> wrote:
>> 
>> On Tue, 7 Jun 2005, Andrew Morton wrote:
>> 
>> > > Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
>> > 
>> > Oh crap, so it does.  That's wrong.
>> 
>> Email by you and Linus indicated that 250 should be the default.
> 
> Oh, OK. hrm.
> 
> Martin, it would be useful if you could determine whether the kernbench
> slowdown was due to the 1000Hz->250Hz change, thanks.
> 
> I'm assuming it was the CPU scheduler patches.  There are 36 of them ;)

Is actually worse with HZ=1000 ... so I think we still have another problem,
probably with scheduler patches. (the one marked -mm1+p4947 in blue is the
patched one)

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png

I can back out various patches ... are all the scheduler patches starting
in sched.* or something equally obvious? if not, a list of what to blat 
would help me ... or I'll do a crapshoot, and see what falls out ;-)

M.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
                   ` (3 preceding siblings ...)
  2005-06-08  1:59 ` 2.6.12-rc6-mm1 Søren Lott
@ 2005-06-08 14:22 ` Andy Whitcroft
  2005-06-08 20:01   ` 2.6.12-rc6-mm1 Andrew Morton
  2005-06-09  4:27   ` 2.6.12-rc6-mm1 Andrey Panin
  2005-06-11 11:51 ` 2.6.12-rc6-mm1 Benoit Boissinot
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 72+ messages in thread
From: Andy Whitcroft @ 2005-06-08 14:22 UTC (permalink / raw)
  To: Andrew Morton, Andrey Panin; +Cc: linux-kernel

We've been seeing an early boot hang on IBM x-series (at least on an
x440) with -rc6-mm1.  Finally got hold of a box to go search for this
and it seems that backing out the three patches below fixes it.

 515  dmi-move-acpi-boot-quirk.patch
 516  dmi-move-acpi-sleep-quirk.patch
 517  dmi-remove-central-blacklist.patch

I am pretty sure it is actually the first one (thats where my bisection
search pointed) but I had to drop the other two to back it out.  Anyhow,
2.6.12-rc6-mm1 boots on an x440 with these backed out.

Cheers.

-apw

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08 14:22 ` 2.6.12-rc6-mm1 Andy Whitcroft
@ 2005-06-08 20:01   ` Andrew Morton
  2005-06-08 23:14     ` 2.6.12-rc6-mm1 Martin J. Bligh
  2005-06-09  4:27   ` 2.6.12-rc6-mm1 Andrey Panin
  1 sibling, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-06-08 20:01 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: pazke, linux-kernel

Andy Whitcroft <apw@shadowen.org> wrote:
>
> We've been seeing an early boot hang on IBM x-series (at least on an
>  x440) with -rc6-mm1.  Finally got hold of a box to go search for this
>  and it seems that backing out the three patches below fixes it.
> 
>   515  dmi-move-acpi-boot-quirk.patch
>   516  dmi-move-acpi-sleep-quirk.patch
>   517  dmi-remove-central-blacklist.patch

Thanks for taking the time to do that - it helps enormously.

The patches aren't terribly important - I'll drop them if nobody sees the
problem.  It might be an incorrect __init/__initdata/etc marking.  But that
wouldn't cause an "early" boot hang...



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08 20:01   ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-08 23:14     ` Martin J. Bligh
  2005-06-08 23:22       ` 2.6.12-rc6-mm1 Andrew Morton
  0 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-08 23:14 UTC (permalink / raw)
  To: Andrew Morton, Andy Whitcroft; +Cc: pazke, linux-kernel



--On Wednesday, June 08, 2005 13:01:17 -0700 Andrew Morton <akpm@osdl.org> wrote:

> Andy Whitcroft <apw@shadowen.org> wrote:
>> 
>> We've been seeing an early boot hang on IBM x-series (at least on an
>>  x440) with -rc6-mm1.  Finally got hold of a box to go search for this
>>  and it seems that backing out the three patches below fixes it.
>> 
>>   515  dmi-move-acpi-boot-quirk.patch
>>   516  dmi-move-acpi-sleep-quirk.patch
>>   517  dmi-remove-central-blacklist.patch
> 
> Thanks for taking the time to do that - it helps enormously.
> 
> The patches aren't terribly important - I'll drop them if nobody sees the
> problem.  It might be an incorrect __init/__initdata/etc marking.  But that
> wouldn't cause an "early" boot hang...

That does indeed make it boot. However ... once it's booted it seems
to hit another problem, a hang condition ;-( I suspect it's unrelated.
The box is still up and responsive, but cp spins.

I'm still chasing the other boot/hang double problem (amd64), so can't
really look at this right now, but if anyone has any bright ideas they
want me to try, or wants more info, let me know (machine is still hung
in that state).

Some snippets:

ps -ef:

root     10980 10979  0 09:02 ?        00:00:00 /bin/bash /usr/local/autobench/scripts/run test kernbench 32 5 -
m 2^M
root     11060 10980  0 09:02 ?        00:00:00 /bin/bash /usr/local/autobench/scripts/getsysinfo before /usr/lo
cal/autobench/logs/k^M
root     11219 11060  0 09:02 ?        00:00:00 /bin/bash /usr/local/autobench/scripts/archive_dir /proc/scsi /u
sr/local/autobench/l^M
root     11221 11219 99 09:02 ?        04:13:26 cp -r /proc/scsi/aic7xxx /proc/scsi/device_info /proc/scsi/scsi 
/usr/local/autobench^M

alt+sysrq+t

^M^@getsysinfo    S CB5260CC     0 11060  10980 11219               (NOTLB)
^M^@d5fc1f40 00000082 fffffe00 cb5260cc 00000000 c011259b 2691b900 003d08e4 
^M^@       080fa558 00000001 d5fc1f38 c04715c0 c0473080 bfcb43b8 d740e000 cb526020 
^M^@       00000001 cb526020 00000007 d5fc1fbc 0008b824 26cec200 003d08e4 c02fc928 
^M^@Call Trace:
^M^@ [<c011259b>] do_page_fault+0x193/0x60f
^M^@ [<c011d584>] do_wait+0x2a4/0x358
^M^@ [<c0115ff8>] default_wake_function+0x0/0x1c
^M^@ [<c0115ff8>] default_wake_function+0x0/0x1c
^M^@ [<c011d6c6>] sys_wait4+0x26/0x38
^M^@ [<c011d6ee>] sys_waitpid+0x16/0x1a
^M^@ [<c0102a19>] syscall_call+0x7/0xb
^M^@archive_dir   S CBB810CC     0 11219  11060 11221               (NOTLB)
^M^@d7793f40 00000082 fffffe00 cbb810cc 00000000 c011259b 28b70a00 003d08e4 
^M^@       080fa158 00000001 d7793f38 c04715c0 c0473080 bfc51a68 c040e000 cbb81020 
^M^@       00000001 cbb81020 00000007 d7793fbc 00000000 28b70a00 003d08e4 c02fc928 
^M^@Call Trace:
^M^@ [<c011259b>] do_page_fault+0x193/0x60f
^M^@ [<c011d584>] do_wait+0x2a4/0x358
^M^@ [<c0115ff8>] default_wake_function+0x0/0x1c
^M^@ [<c0115ff8>] default_wake_function+0x0/0x1c
^M^@ [<c011d6c6>] sys_wait4+0x26/0x38
^M^@ [<c011d6ee>] sys_waitpid+0x16/0x1a
^M^@ [<c0102a19>] syscall_call+0x7/0xb
^M^@cp            R running     0 11221  11219                     (NOTLB)
^M^@sleep         S D77A1F68     0 11906   1409                     (NOTLB)
^M^@d77a1f58 00000086 0039a67c d77a1f68 bfade9d8 272d8698 b605a700 003d16b7 
^M^@       d5c1e804 d6ecdbac d77a1f50 c04715c0 c0473080 d77a1fbc d6ecd814 d76d3020 
^M^@       00000282 c0121f31 0039a67c c107d0e0 00000000 b605a700 003d16b7 d77a1f68 
^M^@Call Trace:
^M^@ [<c0121f31>] lock_timer_base+0x19/0x3c
^M^@ [<c02ef4db>] schedule_timeout+0x7b/0x9c
^M^@ [<c0122904>] process_timeout+0x0/0xc
^M^@ [<c01229fb>] sys_nanosleep+0xdb/0x158
^M^@ [<c0102a19>] syscall_call+0x7/0xb
^M^@BUG: soft lockup detected on CPU#0!
^M
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c02efcd9>] CPU: 0
^M^@EIP is at _spin_unlock_irqrestore+0x5/0x8
^M^@ EFLAGS: 00000292    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: c03b9b84 EBX: c03b9ad4 ECX: 0a000000 EDX: 00000292
^M^@ESI: 00000074 EDI: c040ffa4 EBP: d5c16000 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 080f9008 CR3: 16dd0300 CR4: 000006b0
^M^@ [<c020e729>] __handle_sysrq+0x121/0x128
^M^@ [<c020e74f>] handle_sysrq+0x1f/0x24
^M^@ [<c021dda4>] receive_chars+0x16c/0x270
^M^@ [<c021e0a2>] serial8250_interrupt+0x66/0xe4
^M^@ [<c01320f0>] handle_IRQ_event+0x28/0x58
^M^@ [<c0132203>] __do_IRQ+0xe3/0x134
^M^@ [<c0104b4b>] do_IRQ+0x1b/0x28
^M^@ [<c01033d6>] common_interrupt+0x1a/0x20
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c01002c8>] rest_init+0x28/0x2c
^M^@ [<c0410899>] start_kernel+0x19d/0x1a0


alt+sysrq+p does wierd stuff (is that new patch in your tree Andrew?
doesn't seem to inter-react with the other NMI code well)

Command> break
^@SysRq : Show Regs
^M
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 0
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: c040e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7e3f5a0 CR3: 16dd0300 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c01002c8>] rest_init+0x28/0x2c
^M^@ [<c0410899>] start_kernel+0x19d/0x1a0
^M^@ Uhhuh. NMI received for unknown reason 00 on CPU 1.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 16.
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 3.
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 17.
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 16
^M^@EIP is at default_idle+0x23/0x2c
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 2.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 18.
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 19.
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7420000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@CR0: 8005003b CR2: 00000000 CR3: 17771800 CR4: 000006b0
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 6.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 20.
^M^@ start_secondary+0x13d/0x140
^M^@Dazed and confused, but trying to continue
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 18
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 10.
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7426000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f25d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 29.
^M^@ cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 2
^M^@ EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7400000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7edeb00 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 23.
^M^@ start_secondary+0x13d/0x140
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 7.
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 3
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7402000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@Do you have a strange power saving mode enabled?
^M^@CR0: 8005003b CR2: b7f95438 CR3: 17771800 CR4: 000006b0
^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 4.
^M^@ cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 17
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 5.
^M^@Dazed and confused, but trying to continue
^M^@EIP is at default_idle+0x23/0x2c
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 14.
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7424000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 9.
^M^@ cpu_idle+0x7b/0x8c
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 25.
^M^@ [<c010e79d>] start_secondary+0x13d/0x140Dazed and confused, but trying to continue
^M
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 19
^M^@ EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7428000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f30d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 13.
^M^@ start_secondary+0x13d/0x140
^M^@ Do you have a strange power saving mode enabled?
^M^@----------- IPI show regs -----------Uhhuh. NMI received for unknown reason 00 on CPU 8.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 11.
^M^@Dazed and confused, but trying to continue
^M^@Dazed and confused, but trying to continue
^M^@Dazed and confused, but trying to continue
^M
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 20
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 22.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 26.
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ESI: d742a000 EDI: c0470300 EBP: c0470300Uhhuh. NMI received for unknown reason 00 on CPU 30.
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@ DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 21.
^M^@ [<c0100ca3>]Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ cpu_idle+0x7b/0x8c
^M^@Dazed and confused, but trying to continue
^M^@ [<c010e79d>]Do you have a strange power saving mode enabled?
^M^@ start_secondary+0x13d/0x140
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 27.
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 24.
^M^@ ----------- IPI show regs -----------
^M^@Pid: 11221, comm:                   cp
^M^@EIP: 0060:[<c02efbdc>] CPU: 5
^M^@Do you have a strange power saving mode enabled?
^M^@EIP is at _spin_lock_irqsave+0x14/0x20
^M^@ EFLAGS: 00000286    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@Dazed and confused, but trying to continue
^M^@EAX: 00000286 EBX: d6ce4800 ECX: c03cabe0 EDX: c049ba84
^M^@ESI: ffffffea EDI: d55f8000 EBP: d55f8000 DS: 007b ES: 007b
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@CR0: 80050033 CR2: bfc7d2fc CR3: 16dd02e0 CR4: 000006b0
^M^@ [<c0270377>]Uhhuh. NMI received for unknown reason 00 on CPU 31.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 12.
^M^@ ahc_linux_proc_info+0x27/0x212
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@ [<c0149052>]Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ page_add_anon_rmap+0x62/0x68
^M^@ [<c0144358>]Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 15.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 28.
^M^@Dazed and confused, but trying to continue
^M^@ do_anonymous_page+0x1f0/0x21c
^M^@ [<c0144370>]Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ do_anonymous_page+0x208/0x21c
^M^@Dazed and confused, but trying to continue
^M^@ [<c01443d9>]Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@ do_no_page+0x55/0x3e8
^M^@ [<c01372b5>] prep_new_page+0x49/0x50
^M^@ [<c0137973>] buffered_rmqueue+0x16f/0x1d0
^M^@ [<c0137e1b>] __alloc_pages+0x3bb/0x3c8
^M^@ [<c0257cdb>] proc_scsi_read+0x2b/0x44
^M^@ [<c0182f28>] proc_file_read+0xec/0x200
^M^@ [<c0152ff9>] vfs_read+0x91/0x12c
^M^@ [<c01532e4>] sys_read+0x40/0x6c
^M^@ [<c0102a19>] syscall_call+0x7/0xb
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 7
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d740c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 4
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7404000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 080f9c48 CR3: 17771320 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c02efcae>] CPU: 30
^M^@EIP is at _spin_lock+0xa/0x10
^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: c1050aa0 EBX: c1050aa0 ECX: d7463ea8 EDX: 00000003
^M^@ESI: c10d9620 EDI: c10d9fe0 EBP: d7463eb0 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7eea900 CR3: 00474000 CR4: 000006b0
^M^@ [<c011583b>] load_balance+0xcf/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs ----------- 
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c02efb6a>] CPU: 15
^M^@EIP is at _spin_trylock+0x6/0x14
^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c1050aa0 ECX: 00000008 EDX: c1050aa0
^M^@ESI: c10875a0 EDI: c1087f60 EBP: d741fe84 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0114fda>] double_lock_balance+0x12/0x48
^M^@ [<c01157e4>] load_balance+0x78/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 21
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d742c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 14
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d741c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7e64070 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 27
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d745c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f66d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 8
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d740e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 080f133c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 25
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7436000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f74d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c02efb6a>] CPU: 29
^M^@EIP is at _spin_trylock+0x6/0x14
^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000001 EBX: c1050aa0 ECX: 00000008 EDX: c1050aa0
^M^@ESI: c10d3ea0 EDI: c10d4860 EBP: d7461e84 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0114fda>] double_lock_balance+0x12/0x48
^M^@ [<c01157e4>] load_balance+0x78/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 31
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7464000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 24
^M^@EIP is at default_idle+0x23/0x2c
^M^@  EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7434000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f3cdd8 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 10
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7412000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7ea6920 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 26
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7438000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c01154a3>] CPU: 13
^M^@EIP is at find_busiest_group+0x103/0x2f8
^M^@ EFLAGS: 00000086    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000005 EBX: 00000005 ECX: c1050aa0 EDX: 00000000
^M^@ESI: c04813ac EDI: 00000200 EBP: d741be7c DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7e7e070 CR3: 00474000 CR4: 000006b0
^M^@ [<c01157a2>] load_balance+0x36/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 28
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d745e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c011e897>] CPU: 12
^M^@EIP is at __do_softirq+0x47/0x100
^M^@ EFLAGS: 00000006    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: c0470380 EBX: c0476020 ECX: 00000030 EDX: c1075ce0
^M^@ESI: 00000002 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f54000 CR3: 00474000 CR4: 000006b0
^M^@ [<c011e97f>] do_softirq+0x2f/0x34
^M^@ [<c011ea24>] irq_exit+0x34/0x38
^M^@ [<c010f601>] smp_apic_timer_interrupt+0xdd/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 9
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7410000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f1d900 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 11
^M^@ EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7414000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 23
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7432000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs ----------- 
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 6
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7408000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f3cdd8 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs ----------- 
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 22
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7430000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 1
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: c13fc000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7ee1d9c CR3: 17771640 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ^M






^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08 23:14     ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-08 23:22       ` Andrew Morton
  2005-06-08 23:34         ` 2.6.12-rc6-mm1 Martin J. Bligh
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-06-08 23:22 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: apw, pazke, linux-kernel

"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>
> alt+sysrq+p does wierd stuff (is that new patch in your tree Andrew?
>  doesn't seem to inter-react with the other NMI code well)

What patch?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08 23:22       ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-08 23:34         ` Martin J. Bligh
  2005-06-09  7:17           ` 2.6.12-rc6-mm1 Kirill Korotaev
  0 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-08 23:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: apw, pazke, linux-kernel, dev



--On Wednesday, June 08, 2005 16:22:47 -0700 Andrew Morton <akpm@osdl.org> wrote:

> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
>> 
>> alt+sysrq+p does wierd stuff (is that new patch in your tree Andrew?
>>  doesn't seem to inter-react with the other NMI code well)
> 
> What patch?

Sorry.

nmi-lockup-and-altsysrq-p-dumping-calltraces-on-_all_-cpus.patch

It does seem to work. But probably needs some cleanup for the NMI
errors.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  0:02   ` 2.6.12-rc6-mm1 Christoph Lameter
  2005-06-08  0:08     ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-09  1:58     ` Lee Revell
  1 sibling, 0 replies; 72+ messages in thread
From: Lee Revell @ 2005-06-09  1:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel

On Tue, 2005-06-07 at 17:02 -0700, Christoph Lameter wrote:
> On Tue, 7 Jun 2005, Andrew Morton wrote:
> 
> > > Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
> > 
> > Oh crap, so it does.  That's wrong.
> 
> Email by you and Linus indicated that 250 should be the default.

Wait, does that mean the default HZ is going to be changed in the 2.6.x
timeframe?  That's a big user-visible regression, as it makes the
sleep() resolution worse, and would force apps with tight timing
requirements to go back to using the RTC like on 2.4.

Unless, of course, the plan is to merge the high-res timers patch at the
same time.

Lee


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08 14:22 ` 2.6.12-rc6-mm1 Andy Whitcroft
  2005-06-08 20:01   ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-09  4:27   ` Andrey Panin
  2005-06-09 13:12     ` 2.6.12-rc6-mm1 Andy Whitcroft
  1 sibling, 1 reply; 72+ messages in thread
From: Andrey Panin @ 2005-06-09  4:27 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Andrew Morton, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 794 bytes --]

On 159, 06 08, 2005 at 03:22:57 +0100, Andy Whitcroft wrote:
> We've been seeing an early boot hang on IBM x-series (at least on an
> x440) with -rc6-mm1.  Finally got hold of a box to go search for this
> and it seems that backing out the three patches below fixes it.
> 
>  515  dmi-move-acpi-boot-quirk.patch
>  516  dmi-move-acpi-sleep-quirk.patch
>  517  dmi-remove-central-blacklist.patch
> 
> I am pretty sure it is actually the first one (thats where my bisection
> search pointed) but I had to drop the other two to back it out.  Anyhow,
> 2.6.12-rc6-mm1 boots on an x440 with these backed out.

Yeah, probably brown paper bag time... Please try the attached patch.

-- 
Andrey Panin		| Linux and UNIX system administrator
pazke@donpac.ru		| PGP key: wwwkeys.pgp.net

[-- Attachment #1.2: patch-stupid-dmi-bug --]
[-- Type: text/plain, Size: 978 bytes --]

diff -urdpNX /usr/share/dontdiff linux-2.6.12-rc6-mm1.vanilla/arch/i386/kernel/acpi/boot.c linux-2.6.12-rc6-mm1/arch/i386/kernel/acpi/boot.c
--- linux-2.6.12-rc6-mm1.vanilla/arch/i386/kernel/acpi/boot.c	2005-06-09 08:02:06.000000000 +0400
+++ linux-2.6.12-rc6-mm1/arch/i386/kernel/acpi/boot.c	2005-06-09 08:24:01.000000000 +0400
@@ -1040,6 +1040,7 @@ static struct dmi_system_id __initdata a
 		},
 	},
 #endif
+	{ }
 };
 
 #endif	/* __i386__ */
diff -urdpNX /usr/share/dontdiff linux-2.6.12-rc6-mm1.vanilla/arch/i386/kernel/acpi/sleep.c linux-2.6.12-rc6-mm1/arch/i386/kernel/acpi/sleep.c
--- linux-2.6.12-rc6-mm1.vanilla/arch/i386/kernel/acpi/sleep.c	2005-06-09 08:02:06.000000000 +0400
+++ linux-2.6.12-rc6-mm1/arch/i386/kernel/acpi/sleep.c	2005-06-09 08:24:15.000000000 +0400
@@ -108,6 +108,7 @@ static __initdata struct dmi_system_id a
 			DMI_MATCH(DMI_PRODUCT_NAME, "S4030CDT/4.3"),
 		},
 	},
+	{ }
 };
 
 static int __init acpisleep_dmi_init(void)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08 23:34         ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-09  7:17           ` Kirill Korotaev
  2005-06-09 13:38             ` 2.6.12-rc6-mm1 Martin J. Bligh
  0 siblings, 1 reply; 72+ messages in thread
From: Kirill Korotaev @ 2005-06-09  7:17 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, apw, pazke, linux-kernel

> --On Wednesday, June 08, 2005 16:22:47 -0700 Andrew Morton <akpm@osdl.org> wrote:
> 
> 
>>"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>>
>>>alt+sysrq+p does wierd stuff (is that new patch in your tree Andrew?
>>> doesn't seem to inter-react with the other NMI code well)
>>
>>What patch?
> 
> 
> Sorry.
> 
> nmi-lockup-and-altsysrq-p-dumping-calltraces-on-_all_-cpus.patch
> 
> It does seem to work. But probably needs some cleanup for the NMI
> errors.
If you give me to know where the problem come from I can fix it and make 
a cleanup.

Kirill


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-09  4:27   ` 2.6.12-rc6-mm1 Andrey Panin
@ 2005-06-09 13:12     ` Andy Whitcroft
  0 siblings, 0 replies; 72+ messages in thread
From: Andy Whitcroft @ 2005-06-09 13:12 UTC (permalink / raw)
  To: Andrey Panin; +Cc: Andrew Morton, linux-kernel, Martin J. Bligh

Andrey Panin wrote:

> Yeah, probably brown paper bag time... Please try the attached patch.

Ok.  I can confirm that linux-2.6.12-rc6-mm1 + just this fix boots fine
and works.  And yes I said works?  I can't understand why backing the
others out left us with the odd spin hang and this combination doesn't.
 I've managed to run 4 sets of boot and kernbench (10 runs) without a hang.

/me feels there is something else ugly in here we don't want but
unrelated to this patch.

-apw

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-09  7:17           ` 2.6.12-rc6-mm1 Kirill Korotaev
@ 2005-06-09 13:38             ` Martin J. Bligh
  2005-06-10 12:12               ` 2.6.12-rc6-mm1 Kirill Korotaev
  0 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-09 13:38 UTC (permalink / raw)
  To: Kirill Korotaev; +Cc: Andrew Morton, apw, pazke, linux-kernel



--Kirill Korotaev <dev@sw.ru> wrote (on Thursday, June 09, 2005 11:17:43 +0400):

>> --On Wednesday, June 08, 2005 16:22:47 -0700 Andrew Morton <akpm@osdl.org> wrote:
>> 
>> 
>>> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
>>> 
>>>> alt+sysrq+p does wierd stuff (is that new patch in your tree Andrew?
>>>> doesn't seem to inter-react with the other NMI code well)
>>> 
>>> What patch?
>> 
>> 
>> Sorry.
>> 
>> nmi-lockup-and-altsysrq-p-dumping-calltraces-on-_all_-cpus.patch
>> 
>> It does seem to work. But probably needs some cleanup for the NMI
>> errors.
> If you give me to know where the problem come from I can fix it and make a cleanup.

It gets a lot of the "dazed and confused" errors. Possibly you just need
to disable that part of the handler?



Command> break
^@SysRq : Show Regs
^M
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 0
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: c040e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7e3f5a0 CR3: 16dd0300 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c01002c8>] rest_init+0x28/0x2c
^M^@ [<c0410899>] start_kernel+0x19d/0x1a0
^M^@ Uhhuh. NMI received for unknown reason 00 on CPU 1.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 16.
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 3.
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 17.
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 16
^M^@EIP is at default_idle+0x23/0x2c
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 2.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 18.
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 19.
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7420000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@CR0: 8005003b CR2: 00000000 CR3: 17771800 CR4: 000006b0
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 6.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 20.
^M^@ start_secondary+0x13d/0x140
^M^@Dazed and confused, but trying to continue
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 18
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 10.
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7426000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f25d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 29.
^M^@ cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 2
^M^@ EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7400000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7edeb00 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 23.
^M^@ start_secondary+0x13d/0x140
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 7.
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 3
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7402000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@Do you have a strange power saving mode enabled?
^M^@CR0: 8005003b CR2: b7f95438 CR3: 17771800 CR4: 000006b0
^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 4.
^M^@ cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 17
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 5.
^M^@Dazed and confused, but trying to continue
^M^@EIP is at default_idle+0x23/0x2c
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 14.
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7424000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 9.
^M^@ cpu_idle+0x7b/0x8c
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 25.
^M^@ [<c010e79d>] start_secondary+0x13d/0x140Dazed and confused, but trying to continue
^M
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 19
^M^@ EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7428000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f30d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 13.
^M^@ start_secondary+0x13d/0x140
^M^@ Do you have a strange power saving mode enabled?
^M^@----------- IPI show regs -----------Uhhuh. NMI received for unknown reason 00 on CPU 8.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 11.
^M^@Dazed and confused, but trying to continue
^M^@Dazed and confused, but trying to continue
^M^@Dazed and confused, but trying to continue
^M
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 20
^M^@Dazed and confused, but trying to continue
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 22.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 26.
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ESI: d742a000 EDI: c0470300 EBP: c0470300Uhhuh. NMI received for unknown reason 00 on CPU 30.
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@ DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 21.
^M^@ [<c0100ca3>]Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ cpu_idle+0x7b/0x8c
^M^@Dazed and confused, but trying to continue
^M^@ [<c010e79d>]Do you have a strange power saving mode enabled?
^M^@ start_secondary+0x13d/0x140
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 27.
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 24.
^M^@ ----------- IPI show regs -----------
^M^@Pid: 11221, comm:                   cp
^M^@EIP: 0060:[<c02efbdc>] CPU: 5
^M^@Do you have a strange power saving mode enabled?
^M^@EIP is at _spin_lock_irqsave+0x14/0x20
^M^@ EFLAGS: 00000286    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@Dazed and confused, but trying to continue
^M^@EAX: 00000286 EBX: d6ce4800 ECX: c03cabe0 EDX: c049ba84
^M^@ESI: ffffffea EDI: d55f8000 EBP: d55f8000 DS: 007b ES: 007b
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@CR0: 80050033 CR2: bfc7d2fc CR3: 16dd02e0 CR4: 000006b0
^M^@ [<c0270377>]Uhhuh. NMI received for unknown reason 00 on CPU 31.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 12.
^M^@ ahc_linux_proc_info+0x27/0x212
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@ [<c0149052>]Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ page_add_anon_rmap+0x62/0x68
^M^@ [<c0144358>]Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 15.
^M^@Uhhuh. NMI received for unknown reason 00 on CPU 28.
^M^@Dazed and confused, but trying to continue
^M^@ do_anonymous_page+0x1f0/0x21c
^M^@ [<c0144370>]Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@ do_anonymous_page+0x208/0x21c
^M^@Dazed and confused, but trying to continue
^M^@ [<c01443d9>]Do you have a strange power saving mode enabled?
^M^@Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@Do you have a strange power saving mode enabled?
^M^@ do_no_page+0x55/0x3e8
^M^@ [<c01372b5>] prep_new_page+0x49/0x50
^M^@ [<c0137973>] buffered_rmqueue+0x16f/0x1d0
^M^@ [<c0137e1b>] __alloc_pages+0x3bb/0x3c8
^M^@ [<c0257cdb>] proc_scsi_read+0x2b/0x44
^M^@ [<c0182f28>] proc_file_read+0xec/0x200
^M^@ [<c0152ff9>] vfs_read+0x91/0x12c
^M^@ [<c01532e4>] sys_read+0x40/0x6c
^M^@ [<c0102a19>] syscall_call+0x7/0xb
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 7
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d740c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 4
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7404000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 080f9c48 CR3: 17771320 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c02efcae>] CPU: 30
^M^@EIP is at _spin_lock+0xa/0x10
^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: c1050aa0 EBX: c1050aa0 ECX: d7463ea8 EDX: 00000003
^M^@ESI: c10d9620 EDI: c10d9fe0 EBP: d7463eb0 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7eea900 CR3: 00474000 CR4: 000006b0
^M^@ [<c011583b>] load_balance+0xcf/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs ----------- 
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c02efb6a>] CPU: 15
^M^@EIP is at _spin_trylock+0x6/0x14
^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c1050aa0 ECX: 00000008 EDX: c1050aa0
^M^@ESI: c10875a0 EDI: c1087f60 EBP: d741fe84 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0114fda>] double_lock_balance+0x12/0x48
^M^@ [<c01157e4>] load_balance+0x78/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 21
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d742c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 14
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d741c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7e64070 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 27
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d745c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f66d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 8
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d740e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 080f133c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 25
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7436000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f74d9c CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c02efb6a>] CPU: 29
^M^@EIP is at _spin_trylock+0x6/0x14
^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000001 EBX: c1050aa0 ECX: 00000008 EDX: c1050aa0
^M^@ESI: c10d3ea0 EDI: c10d4860 EBP: d7461e84 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0114fda>] double_lock_balance+0x12/0x48
^M^@ [<c01157e4>] load_balance+0x78/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 31
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7464000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 24
^M^@EIP is at default_idle+0x23/0x2c
^M^@  EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7434000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f3cdd8 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 10
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7412000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7ea6920 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 26
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7438000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c01154a3>] CPU: 13
^M^@EIP is at find_busiest_group+0x103/0x2f8
^M^@ EFLAGS: 00000086    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000005 EBX: 00000005 ECX: c1050aa0 EDX: 00000000
^M^@ESI: c04813ac EDI: 00000200 EBP: d741be7c DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7e7e070 CR3: 00474000 CR4: 000006b0
^M^@ [<c01157a2>] load_balance+0x36/0x170
^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
^M^@ [<c01225b3>] update_process_times+0xef/0x100
^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 28
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d745e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c011e897>] CPU: 12
^M^@EIP is at __do_softirq+0x47/0x100
^M^@ EFLAGS: 00000006    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: c0470380 EBX: c0476020 ECX: 00000030 EDX: c1075ce0
^M^@ESI: 00000002 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f54000 CR3: 00474000 CR4: 000006b0
^M^@ [<c011e97f>] do_softirq+0x2f/0x34
^M^@ [<c011ea24>] irq_exit+0x34/0x38
^M^@ [<c010f601>] smp_apic_timer_interrupt+0xdd/0xe4
^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
^M^@ [<c0100bb0>] default_idle+0x0/0x2c
^M^@ [<c0100bd3>] default_idle+0x23/0x2c
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 9
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7410000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f1d900 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 11
^M^@ EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7414000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 23
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7432000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs ----------- 
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 6
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7408000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7f3cdd8 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@----------- IPI show regs ----------- 
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 22
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: d7430000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ Dazed and confused, but trying to continue
^M^@Do you have a strange power saving mode enabled?
^M^@----------- IPI show regs -----------
^M^@Pid: 0, comm:              swapper
^M^@EIP: 0060:[<c0100bd3>] CPU: 1
^M^@EIP is at default_idle+0x23/0x2c
^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
^M^@ESI: c13fc000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
^M^@CR0: 8005003b CR2: b7ee1d9c CR3: 17771640 CR4: 000006b0
^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
^M^@ [<c010e79d>] start_secondary+0x13d/0x140
^M^@ ^M






^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-08  0:08     ` 2.6.12-rc6-mm1 Andrew Morton
  2005-06-08  3:17       ` 2.6.12-rc6-mm1 Nick Piggin
  2005-06-08 14:15       ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-09 23:56       ` Martin J. Bligh
  2005-06-10  7:02         ` 2.6.12-rc6-mm1 Ingo Molnar
  2 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-09 23:56 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter; +Cc: linux-kernel



--On Tuesday, June 07, 2005 17:08:53 -0700 Andrew Morton <akpm@osdl.org> wrote:

> Christoph Lameter <clameter@engr.sgi.com> wrote:
>> 
>> On Tue, 7 Jun 2005, Andrew Morton wrote:
>> 
>> > > Diffprofile is wacko (HZ seems to be defaulting to 250 in -mm).
>> > 
>> > Oh crap, so it does.  That's wrong.
>> 
>> Email by you and Linus indicated that 250 should be the default.
> 
> Oh, OK. hrm.
> 
> Martin, it would be useful if you could determine whether the kernbench
> slowdown was due to the 1000Hz->250Hz change, thanks.
> 
> I'm assuming it was the CPU scheduler patches.  There are 36 of them ;)

Backed them all out ... performance thunks down to earth again, and is actually
the best I've seen it ever (probably 250Hz is helping, I used to run 100 in 
-mjb for better benefit).

the +5081 item is the one to look at
http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png

Patch I used was here:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/patches/nosched

But it was just everything under the "CPU scheduler" section of your series
file.

M.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-09 23:56       ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-10  7:02         ` Ingo Molnar
  2005-06-10 12:03           ` 2.6.12-rc6-mm1 Con Kolivas
  0 siblings, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2005-06-10  7:02 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, Christoph Lameter, linux-kernel


* Martin J. Bligh <mbligh@mbligh.org> wrote:

> > I'm assuming it was the CPU scheduler patches.  There are 36 of them ;)
> 
> Backed them all out ... performance thunks down to earth again, and is 
> actually the best I've seen it ever (probably 250Hz is helping, I used 
> to run 100 in -mjb for better benefit).
> 
> the +5081 item is the one to look at
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png
> 
> Patch I used was here:
> 
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/patches/nosched
> 
> But it was just everything under the "CPU scheduler" section of your 
> series file.

we know from Nick's testing that the patches up to and including 
dynamic-sched-domains-ia64-changes.patch are probably OK. So the 
candidates for the regression are:

 sched-implement-nice-support-across-physical-cpus-on-smp.patch
 sched-change_prio_bias_only_if_queued.patch
 sched-account_rt_tasks_in_prio_bias.patch
 consolidate-preempt-options-into-kernel-kconfigpreempt.patch
 enable-preempt_bkl-on-preemptsmp-too.patch
 sched-tweak-idle-thread-setup-semantics.patch
 sched-voluntary-kernel-preemption.patch
 sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch
 sched-task_noninteractive.patch
 sched-run-sched_normal-tasks-with-real-time-tasks-on-smt-siblings.patch

there are two feature patches in this:

 enable-preempt_bkl-on-preemptsmp-too.patch
 sched-voluntary-kernel-preemption.patch

so make sure you have PREEMPT_BKL and PREEMPT_VOLUNTARY disabled.

these ones should not impact your workload's functionality (unless they 
are buggy):

 sched-account_rt_tasks_in_prio_bias.patch
 consolidate-preempt-options-into-kernel-kconfigpreempt.patch
 sched-tweak-idle-thread-setup-semantics.patch
 sched-run-sched_normal-tasks-with-real-time-tasks-on-smt-siblings.patch

and unless you are using separate nice levels, this one shouldnt make a 
difference in theory:

 sched-implement-nice-support-across-physical-cpus-on-smp.patch

which leaves the following 3 likely candidates:

 sched-change_prio_bias_only_if_queued.patch
 sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch
 sched-task_noninteractive.patch

so if you could do a run with all 3 of the above unapplied, that would 
be a good starting point. (But any of the others might be it too, if 
they contain some sort of bug.)

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10  7:02         ` 2.6.12-rc6-mm1 Ingo Molnar
@ 2005-06-10 12:03           ` Con Kolivas
  2005-06-10 14:19             ` 2.6.12-rc6-mm1 Con Kolivas
  0 siblings, 1 reply; 72+ messages in thread
From: Con Kolivas @ 2005-06-10 12:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Christoph Lameter

On Fri, 10 Jun 2005 17:02, Ingo Molnar wrote:
> * Martin J. Bligh <mbligh@mbligh.org> wrote:
> > > I'm assuming it was the CPU scheduler patches.  There are 36 of them ;)
> >
> > Backed them all out ... performance thunks down to earth again, and is
> > actually the best I've seen it ever (probably 250Hz is helping, I used
> > to run 100 in -mjb for better benefit).
> >
> > the +5081 item is the one to look at
> > http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.
> >moe.png
> >
> > Patch I used was here:
> >
> > http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/patches/nosched
> >
> > But it was just everything under the "CPU scheduler" section of your
> > series file.
>
> we know from Nick's testing that the patches up to and including
> dynamic-sched-domains-ia64-changes.patch are probably OK. So the
> candidates for the regression are:
>
>  sched-implement-nice-support-across-physical-cpus-on-smp.patch
>  sched-change_prio_bias_only_if_queued.patch
>  sched-account_rt_tasks_in_prio_bias.patch
>  consolidate-preempt-options-into-kernel-kconfigpreempt.patch
>  enable-preempt_bkl-on-preemptsmp-too.patch
>  sched-tweak-idle-thread-setup-semantics.patch
>  sched-voluntary-kernel-preemption.patch
>  sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch
>  sched-task_noninteractive.patch
>  sched-run-sched_normal-tasks-with-real-time-tasks-on-smt-siblings.patch
>
> there are two feature patches in this:
>
>  enable-preempt_bkl-on-preemptsmp-too.patch
>  sched-voluntary-kernel-preemption.patch
>
> so make sure you have PREEMPT_BKL and PREEMPT_VOLUNTARY disabled.
>
> these ones should not impact your workload's functionality (unless they
> are buggy):
>
>  sched-account_rt_tasks_in_prio_bias.patch
>  consolidate-preempt-options-into-kernel-kconfigpreempt.patch
>  sched-tweak-idle-thread-setup-semantics.patch
>  sched-run-sched_normal-tasks-with-real-time-tasks-on-smt-siblings.patch
>
> and unless you are using separate nice levels, this one shouldnt make a
> difference in theory:
>
>  sched-implement-nice-support-across-physical-cpus-on-smp.patch
>
> which leaves the following 3 likely candidates:
>
>  sched-change_prio_bias_only_if_queued.patch
>  sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch

These tend to run together so just try adding my four patches together. In 
retrospect I guess they're likely candidates because they also change the 
_ratio_ of balance which they should not so they are buggy as a group 
currently. Easy enough to fix but it will make it easy to pinpoint the 
problem if they're responsible.

sched-implement-nice-support-across-physical-cpus-on-smp.patch
sched-change_prio_bias_only_if_queued.patch
sched-account_rt_tasks_in_prio_bias.patch
sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch

Con

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-09 13:38             ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-10 12:12               ` Kirill Korotaev
  0 siblings, 0 replies; 72+ messages in thread
From: Kirill Korotaev @ 2005-06-10 12:12 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, apw, pazke, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 24361 bytes --]

>>>>>alt+sysrq+p does wierd stuff (is that new patch in your tree Andrew?
>>>>>doesn't seem to inter-react with the other NMI code well)
>>>>
>>>>What patch?
>>>
>>>
>>>Sorry.
>>>
>>>nmi-lockup-and-altsysrq-p-dumping-calltraces-on-_all_-cpus.patch
>>>
>>>It does seem to work. But probably needs some cleanup for the NMI
>>>errors.
>>
>>If you give me to know where the problem come from I can fix it and make a cleanup.
> 
> 
> It gets a lot of the "dazed and confused" errors. Possibly you just need
> to disable that part of the handler?
can you try this cleanup patch?
This fixes the problem for me, though I do no like the way it does so 
very much...

Kirill

> Command> break
> ^@SysRq : Show Regs
> ^M
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 0
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: c040e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7e3f5a0 CR3: 16dd0300 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c01002c8>] rest_init+0x28/0x2c
> ^M^@ [<c0410899>] start_kernel+0x19d/0x1a0
> ^M^@ Uhhuh. NMI received for unknown reason 00 on CPU 1.
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 16.
> ^M^@Dazed and confused, but trying to continue
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 3.
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 17.
> ^M^@----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 16
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@Dazed and confused, but trying to continue
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 2.
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 18.
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 19.
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7420000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 17771800 CR4: 000006b0
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 6.
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 20.
> ^M^@ start_secondary+0x13d/0x140
> ^M^@Dazed and confused, but trying to continue
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 18
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 10.
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7426000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f25d9c CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 29.
> ^M^@ cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 2
> ^M^@ EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7400000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7edeb00 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 23.
> ^M^@ start_secondary+0x13d/0x140
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 7.
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 3
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7402000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@CR0: 8005003b CR2: b7f95438 CR3: 17771800 CR4: 000006b0
> ^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 4.
> ^M^@ cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 17
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 5.
> ^M^@Dazed and confused, but trying to continue
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 14.
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7424000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>]Uhhuh. NMI received for unknown reason 00 on CPU 9.
> ^M^@ cpu_idle+0x7b/0x8c
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 25.
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140Dazed and confused, but trying to continue
> ^M
> ^M^@----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 19
> ^M^@ EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7428000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f30d9c CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>]Uhhuh. NMI received for unknown reason 00 on CPU 13.
> ^M^@ start_secondary+0x13d/0x140
> ^M^@ Do you have a strange power saving mode enabled?
> ^M^@----------- IPI show regs -----------Uhhuh. NMI received for unknown reason 00 on CPU 8.
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 11.
> ^M^@Dazed and confused, but trying to continue
> ^M^@Dazed and confused, but trying to continue
> ^M^@Dazed and confused, but trying to continue
> ^M
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 20
> ^M^@Dazed and confused, but trying to continue
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 22.
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 26.
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ESI: d742a000 EDI: c0470300 EBP: c0470300Uhhuh. NMI received for unknown reason 00 on CPU 30.
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 21.
> ^M^@ [<c0100ca3>]Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ cpu_idle+0x7b/0x8c
> ^M^@Dazed and confused, but trying to continue
> ^M^@ [<c010e79d>]Do you have a strange power saving mode enabled?
> ^M^@ start_secondary+0x13d/0x140
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 27.
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 24.
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 11221, comm:                   cp
> ^M^@EIP: 0060:[<c02efbdc>] CPU: 5
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@EIP is at _spin_lock_irqsave+0x14/0x20
> ^M^@ EFLAGS: 00000286    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@Dazed and confused, but trying to continue
> ^M^@EAX: 00000286 EBX: d6ce4800 ECX: c03cabe0 EDX: c049ba84
> ^M^@ESI: ffffffea EDI: d55f8000 EBP: d55f8000 DS: 007b ES: 007b
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@CR0: 80050033 CR2: bfc7d2fc CR3: 16dd02e0 CR4: 000006b0
> ^M^@ [<c0270377>]Uhhuh. NMI received for unknown reason 00 on CPU 31.
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 12.
> ^M^@ ahc_linux_proc_info+0x27/0x212
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ [<c0149052>]Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ page_add_anon_rmap+0x62/0x68
> ^M^@ [<c0144358>]Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 15.
> ^M^@Uhhuh. NMI received for unknown reason 00 on CPU 28.
> ^M^@Dazed and confused, but trying to continue
> ^M^@ do_anonymous_page+0x1f0/0x21c
> ^M^@ [<c0144370>]Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ do_anonymous_page+0x208/0x21c
> ^M^@Dazed and confused, but trying to continue
> ^M^@ [<c01443d9>]Do you have a strange power saving mode enabled?
> ^M^@Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@ do_no_page+0x55/0x3e8
> ^M^@ [<c01372b5>] prep_new_page+0x49/0x50
> ^M^@ [<c0137973>] buffered_rmqueue+0x16f/0x1d0
> ^M^@ [<c0137e1b>] __alloc_pages+0x3bb/0x3c8
> ^M^@ [<c0257cdb>] proc_scsi_read+0x2b/0x44
> ^M^@ [<c0182f28>] proc_file_read+0xec/0x200
> ^M^@ [<c0152ff9>] vfs_read+0x91/0x12c
> ^M^@ [<c01532e4>] sys_read+0x40/0x6c
> ^M^@ [<c0102a19>] syscall_call+0x7/0xb
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 7
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d740c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 4
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7404000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 080f9c48 CR3: 17771320 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c02efcae>] CPU: 30
> ^M^@EIP is at _spin_lock+0xa/0x10
> ^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: c1050aa0 EBX: c1050aa0 ECX: d7463ea8 EDX: 00000003
> ^M^@ESI: c10d9620 EDI: c10d9fe0 EBP: d7463eb0 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7eea900 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c011583b>] load_balance+0xcf/0x170
> ^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
> ^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
> ^M^@ [<c01225b3>] update_process_times+0xef/0x100
> ^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
> ^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
> ^M^@ [<c0100bb0>] default_idle+0x0/0x2c
> ^M^@ [<c0100bd3>] default_idle+0x23/0x2c
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@----------- IPI show regs ----------- 
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c02efb6a>] CPU: 15
> ^M^@EIP is at _spin_trylock+0x6/0x14
> ^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c1050aa0 ECX: 00000008 EDX: c1050aa0
> ^M^@ESI: c10875a0 EDI: c1087f60 EBP: d741fe84 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0114fda>] double_lock_balance+0x12/0x48
> ^M^@ [<c01157e4>] load_balance+0x78/0x170
> ^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
> ^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
> ^M^@ [<c01225b3>] update_process_times+0xef/0x100
> ^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
> ^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
> ^M^@ [<c0100bb0>] default_idle+0x0/0x2c
> ^M^@ [<c0100bd3>] default_idle+0x23/0x2c
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 21
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d742c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 14
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d741c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7e64070 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 27
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d745c000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f66d9c CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 8
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d740e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 080f133c CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 25
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7436000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f74d9c CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c02efb6a>] CPU: 29
> ^M^@EIP is at _spin_trylock+0x6/0x14
> ^M^@ EFLAGS: 00000046    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000001 EBX: c1050aa0 ECX: 00000008 EDX: c1050aa0
> ^M^@ESI: c10d3ea0 EDI: c10d4860 EBP: d7461e84 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0114fda>] double_lock_balance+0x12/0x48
> ^M^@ [<c01157e4>] load_balance+0x78/0x170
> ^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
> ^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
> ^M^@ [<c01225b3>] update_process_times+0xef/0x100
> ^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
> ^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
> ^M^@ [<c0100bb0>] default_idle+0x0/0x2c
> ^M^@ [<c0100bd3>] default_idle+0x23/0x2c
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 31
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7464000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 24
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@  EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7434000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f3cdd8 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 10
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7412000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7ea6920 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 26
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7438000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c01154a3>] CPU: 13
> ^M^@EIP is at find_busiest_group+0x103/0x2f8
> ^M^@ EFLAGS: 00000086    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000005 EBX: 00000005 ECX: c1050aa0 EDX: 00000000
> ^M^@ESI: c04813ac EDI: 00000200 EBP: d741be7c DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7e7e070 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c01157a2>] load_balance+0x36/0x170
> ^M^@ [<c0115af5>] rebalance_tick+0xe1/0x104
> ^M^@ [<c0115d77>] scheduler_tick+0x97/0x318
> ^M^@ [<c01225b3>] update_process_times+0xef/0x100
> ^M^@ [<c010f5f9>] smp_apic_timer_interrupt+0xd5/0xe4
> ^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
> ^M^@ [<c0100bb0>] default_idle+0x0/0x2c
> ^M^@ [<c0100bd3>] default_idle+0x23/0x2c
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 28
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d745e000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c011e897>] CPU: 12
> ^M^@EIP is at __do_softirq+0x47/0x100
> ^M^@ EFLAGS: 00000006    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: c0470380 EBX: c0476020 ECX: 00000030 EDX: c1075ce0
> ^M^@ESI: 00000002 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f54000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c011e97f>] do_softirq+0x2f/0x34
> ^M^@ [<c011ea24>] irq_exit+0x34/0x38
> ^M^@ [<c010f601>] smp_apic_timer_interrupt+0xdd/0xe4
> ^M^@ [<c0103464>] apic_timer_interrupt+0x1c/0x24
> ^M^@ [<c0100bb0>] default_idle+0x0/0x2c
> ^M^@ [<c0100bd3>] default_idle+0x23/0x2c
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 9
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7410000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f1d900 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 11
> ^M^@ EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7414000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 23
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7432000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@----------- IPI show regs ----------- 
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 6
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7408000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7f3cdd8 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@----------- IPI show regs ----------- 
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 22
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: d7430000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: 00000000 CR3: 00474000 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ Dazed and confused, but trying to continue
> ^M^@Do you have a strange power saving mode enabled?
> ^M^@----------- IPI show regs -----------
> ^M^@Pid: 0, comm:              swapper
> ^M^@EIP: 0060:[<c0100bd3>] CPU: 1
> ^M^@EIP is at default_idle+0x23/0x2c
> ^M^@ EFLAGS: 00000246    Not tainted  (2.6.12-rc6-mm1-autokern1)
> ^M^@EAX: 00000000 EBX: c0476020 ECX: c0100bb0 EDX: 003a2f43
> ^M^@ESI: c13fc000 EDI: c0470300 EBP: c0470300 DS: 007b ES: 007b
> ^M^@CR0: 8005003b CR2: b7ee1d9c CR3: 17771640 CR4: 000006b0
> ^M^@ [<c0100ca3>] cpu_idle+0x7b/0x8c
> ^M^@ [<c010e79d>] start_secondary+0x13d/0x140
> ^M^@ ^M
> 
> 
> 
> 
> 
> 


[-- Attachment #2: altsysrq-p-cleanup --]
[-- Type: text/plain, Size: 1080 bytes --]

--- ./arch/i386/kernel/traps.c.xxx	2005-05-10 18:27:04.000000000 +0400
+++ ./arch/i386/kernel/traps.c	2005-06-10 14:18:32.000000000 +0400
@@ -574,6 +574,14 @@ void die_nmi (struct pt_regs *regs, cons
 	do_exit(SIGSEGV);
 }
 
+static int dummy_nmi_callback(struct pt_regs * regs, int cpu)
+{
+	return 0;
+}
+ 
+static nmi_callback_t nmi_callback = dummy_nmi_callback;
+static nmi_callback_t nmi_ipi_callback = dummy_nmi_callback;
+ 
 static void default_do_nmi(struct pt_regs * regs)
 {
 	unsigned char reason = 0;
@@ -596,6 +604,9 @@ static void default_do_nmi(struct pt_reg
 			return;
 		}
 #endif
+		if (nmi_ipi_callback != dummy_nmi_callback)
+			return;
+
 		unknown_nmi_error(reason, regs);
 		return;
 	}
@@ -612,14 +623,6 @@ static void default_do_nmi(struct pt_reg
 	reassert_nmi();
 }
 
-static int dummy_nmi_callback(struct pt_regs * regs, int cpu)
-{
-	return 0;
-}
- 
-static nmi_callback_t nmi_callback = dummy_nmi_callback;
-static nmi_callback_t nmi_ipi_callback = dummy_nmi_callback;
- 
 fastcall void do_nmi(struct pt_regs * regs, long error_code)
 {
 	int cpu;

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10 12:03           ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-10 14:19             ` Con Kolivas
  2005-06-10 23:14               ` 2.6.12-rc6-mm1 J.A. Magallon
  2005-06-10 23:50               ` 2.6.12-rc6-mm1 Martin J. Bligh
  0 siblings, 2 replies; 72+ messages in thread
From: Con Kolivas @ 2005-06-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Christoph Lameter,
	Nick Piggin


[-- Attachment #1.1: Type: text/plain, Size: 2194 bytes --]

On Fri, 10 Jun 2005 22:03, Con Kolivas wrote:
> On Fri, 10 Jun 2005 17:02, Ingo Molnar wrote:
> > * Martin J. Bligh <mbligh@mbligh.org> wrote:
> > > > I'm assuming it was the CPU scheduler patches.  There are 36 of them
> > > > ;)

> > So the 
> > candidates for the regression are:
> >
> >  sched-implement-nice-support-across-physical-cpus-on-smp.patch
> >  sched-change_prio_bias_only_if_queued.patch
> >  sched-account_rt_tasks_in_prio_bias.patch
> >  consolidate-preempt-options-into-kernel-kconfigpreempt.patch
> >  enable-preempt_bkl-on-preemptsmp-too.patch
> >  sched-tweak-idle-thread-setup-semantics.patch
> >  sched-voluntary-kernel-preemption.patch
> >  sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch
> >  sched-task_noninteractive.patch
> >  sched-run-sched_normal-tasks-with-real-time-tasks-on-smt-siblings.patch

> These tend to run together so just try adding my four patches together. In
> retrospect I guess they're likely candidates because they also change the
> _ratio_ of balance which they should not so they are buggy as a group
> currently. Easy enough to fix but it will make it easy to pinpoint the
> problem if they're responsible.
>
> sched-implement-nice-support-across-physical-cpus-on-smp.patch
> sched-change_prio_bias_only_if_queued.patch
> sched-account_rt_tasks_in_prio_bias.patch
> sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch

By the way it has already been decided to remove these patches from -mm 
pending the completion of current scheduler work. If they turn out to be 
responsible for this regression I apologise profusely :-|. 

It is clearer to me now that I have made a mistake with the priority biasing, 
and the following patch corrects it to the planned behaviour. This is 
academic at this stage as we won't be looking at this particular feature 
again in earnest until the other 32 scheduler patches (and any followups) go 
upstream. 

It's already known that schedstats data will be off without further code to 
understand smp nice as well (thanks Nick for pointing out the data)... more 
academic stuff but obviously something to consider when/if we get there.

Cheers,
Con

[-- Attachment #1.2: sched-correct_smp_nice_bias.patch --]
[-- Type: text/x-diff, Size: 1621 bytes --]

The priority biasing was off by mutliplying the total load by the total 
priority bias and this ruins the ratio of loads between runqueues. This
patch should correct the ratios of loads between runqueues to be proportional
to overall load.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

Index: linux-2.6.12-rc6-mm1/kernel/sched.c
===================================================================
--- linux-2.6.12-rc6-mm1.orig/kernel/sched.c	2005-06-10 23:56:56.000000000 +1000
+++ linux-2.6.12-rc6-mm1/kernel/sched.c	2005-06-10 23:59:57.000000000 +1000
@@ -978,7 +978,7 @@ static inline unsigned long __source_loa
 	else
 		source_load = min(cpu_load, load_now);
 
-	if (idle == NOT_IDLE || rq->nr_running > 1)
+	if (idle == NOT_IDLE || rq->nr_running > 1) {
 		/*
 		 * If we are busy rebalancing the load is biased by
 		 * priority to create 'nice' support across cpus. When
@@ -987,7 +987,10 @@ static inline unsigned long __source_loa
 		 * prevent idle rebalance from trying to pull tasks from a
 		 * queue with only one running task.
 		 */
-		source_load *= rq->prio_bias;
+		unsigned long prio_bias = rq->prio_bias / rq->nr_running;
+
+		source_load *= prio_bias;
+	}
 
 	return source_load;
 }
@@ -1011,8 +1014,11 @@ static inline unsigned long __target_loa
 	else
 		target_load = max(cpu_load, load_now);
 
-	if (idle == NOT_IDLE || rq->nr_running > 1)
-		target_load *= rq->prio_bias;
+	if (idle == NOT_IDLE || rq->nr_running > 1) {
+		unsigned long prio_bias = rq->prio_bias / rq->nr_running;
+
+		target_load *= prio_bias;
+	}
 
 	return target_load;
 }

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10 14:19             ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-10 23:14               ` J.A. Magallon
  2005-06-10 23:59                 ` 2.6.12-rc6-mm1 Con Kolivas
  2005-06-10 23:50               ` 2.6.12-rc6-mm1 Martin J. Bligh
  1 sibling, 1 reply; 72+ messages in thread
From: J.A. Magallon @ 2005-06-10 23:14 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux-kernel, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Christoph Lameter, Nick Piggin


On 06.10, Con Kolivas wrote:

> The priority biasing was off by mutliplying the total load by the total 
> priority bias and this ruins the ratio of loads between runqueues. This
> patch should correct the ratios of loads between runqueues to be proportional
> to overall load.
> 

2.6.12-rc6-mm1 + this patch just oopses nicely on boot.
I did not had a digital camera handy, but the first oops that fit in the
screen was this call chain:

kernel_thread_helper
init
init
do:base_setup
usermodehelper_init
__create_workqueue
EIP in try_to_wake_up

After this, there was another with some do_div_error calls...

Something looks un-initialized the first time, or the integer arithmetic
is wrong. I really dont like a*(b/c), I really prefer (a*b)/c. It is more
common b/c == 0 (because b<c), than the possibility of overflowing (a*b).

So I tried both. With this, it boots again:

--- linux-2.6.11-jam24/kernel/sched.c.orig	2005-06-11 00:59:44.000000000 +0200
+++ linux-2.6.11-jam24/kernel/sched.c	2005-06-11 01:03:32.000000000 +0200
@@ -987,9 +987,10 @@
 		 * prevent idle rebalance from trying to pull tasks from a
 		 * queue with only one running task.
 		 */
-		unsigned long prio_bias = rq->prio_bias / rq->nr_running;
+		unsigned long prio_scale = (rq->nr_running > 0 ? 
+										rq->nr_running : 1);
 
-		source_load *= prio_bias;
+		source_load = (source_load*rq->prio_bias) / prio_scale;
 	}
 
 	return source_load;
@@ -1015,9 +1016,10 @@
 		target_load = max(cpu_load, load_now);
 
 	if (idle == NOT_IDLE || rq->nr_running > 1) {
-		unsigned long prio_bias = rq->prio_bias / rq->nr_running;
+		unsigned long prio_scale = (rq->nr_running > 0 ? 
+										rq->nr_running : 1);
 
-		target_load *= prio_bias;
+		target_load = (target_load*rq->prio_bias) / prio_scale;
 	}
 
 	return target_load;


Perhaps this:

	if (idle == NOT_IDLE || rq->nr_running > 1)

should be

	if (idle == NOT_IDLE && rq->nr_running > 1)

???

Hope this helps, thanks.

> Signed-off-by: Con Kolivas <kernel@kolivas.org>
> 
> Index: linux-2.6.12-rc6-mm1/kernel/sched.c
> ===================================================================
> --- linux-2.6.12-rc6-mm1.orig/kernel/sched.c	2005-06-10 23:56:56.000000000 +1000
> +++ linux-2.6.12-rc6-mm1/kernel/sched.c	2005-06-10 23:59:57.000000000 +1000
> @@ -978,7 +978,7 @@ static inline unsigned long __source_loa
>  	else
>  		source_load = min(cpu_load, load_now);
>  
> -	if (idle == NOT_IDLE || rq->nr_running > 1)
> +	if (idle == NOT_IDLE || rq->nr_running > 1) {
>  		/*
>  		 * If we are busy rebalancing the load is biased by
>  		 * priority to create 'nice' support across cpus. When
> @@ -987,7 +987,10 @@ static inline unsigned long __source_loa
>  		 * prevent idle rebalance from trying to pull tasks from a
>  		 * queue with only one running task.
>  		 */
> -		source_load *= rq->prio_bias;
> +		unsigned long prio_bias = rq->prio_bias / rq->nr_running;
> +
> +		source_load *= prio_bias;
> +	}
>  
>  	return source_load;
>  }
> @@ -1011,8 +1014,11 @@ static inline unsigned long __target_loa
>  	else
>  		target_load = max(cpu_load, load_now);
>  
> -	if (idle == NOT_IDLE || rq->nr_running > 1)
> -		target_load *= rq->prio_bias;
> +	if (idle == NOT_IDLE || rq->nr_running > 1) {
> +		unsigned long prio_bias = rq->prio_bias / rq->nr_running;
> +
> +		target_load *= prio_bias;
> +	}
>  
>  	return target_load;
>  }
> 

--
J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
werewolf!able!es                         \         It's better when it's free
Mandriva Linux release 2006.0 (Cooker) for i586
Linux 2.6.11-jam24 (gcc 4.0.0 (4.0.0-3mdk for Mandriva Linux release 2006.0))





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10 14:19             ` 2.6.12-rc6-mm1 Con Kolivas
  2005-06-10 23:14               ` 2.6.12-rc6-mm1 J.A. Magallon
@ 2005-06-10 23:50               ` Martin J. Bligh
  2005-06-11  4:14                 ` 2.6.12-rc6-mm1 Martin J. Bligh
  1 sibling, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-10 23:50 UTC (permalink / raw)
  To: Con Kolivas, linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Christoph Lameter, Nick Piggin

>> These tend to run together so just try adding my four patches together. In
>> retrospect I guess they're likely candidates because they also change the
>> _ratio_ of balance which they should not so they are buggy as a group
>> currently. Easy enough to fix but it will make it easy to pinpoint the
>> problem if they're responsible.
>> 
>> sched-implement-nice-support-across-physical-cpus-on-smp.patch
>> sched-change_prio_bias_only_if_queued.patch
>> sched-account_rt_tasks_in_prio_bias.patch
>> sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch
> 
> By the way it has already been decided to remove these patches from -mm 
> pending the completion of current scheduler work. If they turn out to be 
> responsible for this regression I apologise profusely :-|. 
> 
> It is clearer to me now that I have made a mistake with the priority biasing, 
> and the following patch corrects it to the planned behaviour. This is 
> academic at this stage as we won't be looking at this particular feature 
> again in earnest until the other 32 scheduler patches (and any followups) go 
> upstream. 
> 
> It's already known that schedstats data will be off without further code to 
> understand smp nice as well (thanks Nick for pointing out the data)... more 
> academic stuff but obviously something to consider when/if we get there.

OK, I backed out those 4, and the degredation mostly went away.
See http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png

and more specifically, see the +p5150 near the right hand side.
I don't think it's quite as good as mainline, but much closer.
I did this run with HZ=1000, and the the one with no scheduler
patches at all with HZ=250, so I'll try to do a run that's more
directly comparable as well

Thanks,

M.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10 23:14               ` 2.6.12-rc6-mm1 J.A. Magallon
@ 2005-06-10 23:59                 ` Con Kolivas
  2005-06-11  0:18                   ` 2.6.12-rc6-mm1 Con Kolivas
  2005-06-11  0:32                   ` 2.6.12-rc6-mm1 J.A. Magallon
  0 siblings, 2 replies; 72+ messages in thread
From: Con Kolivas @ 2005-06-10 23:59 UTC (permalink / raw)
  To: J.A. Magallon
  Cc: linux-kernel, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Christoph Lameter, Nick Piggin


[-- Attachment #1.1: Type: text/plain, Size: 1816 bytes --]

On Sat, 11 Jun 2005 09:14, J.A. Magallon wrote:
> On 06.10, Con Kolivas wrote:
> > The priority biasing was off by mutliplying the total load by the total
> > priority bias and this ruins the ratio of loads between runqueues. This
> > patch should correct the ratios of loads between runqueues to be
> > proportional to overall load.
>
> 2.6.12-rc6-mm1 + this patch just oopses nicely on boot.
> I did not had a digital camera handy, but the first oops that fit in the
> screen was this call chain:
>
> kernel_thread_helper
> init
> init
> do:base_setup
> usermodehelper_init
> __create_workqueue
> EIP in try_to_wake_up
>
> After this, there was another with some do_div_error calls...
>
> Something looks un-initialized the first time, or the integer arithmetic
> is wrong. I really dont like a*(b/c), I really prefer (a*b)/c. It is more
> common b/c == 0 (because b<c), than the possibility of overflowing (a*b).
>
> So I tried both. With this, it boots again:

Doh Doh DOH DOH!
I need a real swift kick up the bum. The point of the patch was to show what 
was wrong with the math, and I shouldn't have posted it without actually 
trying it.

> -		unsigned long prio_bias = rq->prio_bias / rq->nr_running;

rq->nr_running can often be 0 and rq->prio_bias by definition has to be larger 
than or equal to rq->nr_running.

> Perhaps this:
>
> 	if (idle == NOT_IDLE || rq->nr_running > 1)
>
> should be
>
> 	if (idle == NOT_IDLE && rq->nr_running > 1)

No, testing for rq->nr_running > 1 is only needed if we are balancing in an 
idle balance.

> Hope this helps, thanks.

Yes it does :\

Here is what the patch _should_ have been. (*same warnings with this patch 
about math demonstration and untested as should have been posted with the 
earlier one*)

Con

[-- Attachment #1.2: sched-correct_smp_nice_bias.patch --]
[-- Type: text/x-diff, Size: 1720 bytes --]

The priority biasing was off by mutliplying the total load by the total 
priority bias and this ruins the ratio of loads between runqueues. This
patch should correct the ratios of loads between runqueues to be proportional
to overall load. -2nd attempt.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

Index: linux-2.6.12-rc6-mm1/kernel/sched.c
===================================================================
--- linux-2.6.12-rc6-mm1.orig/kernel/sched.c	2005-06-10 23:56:56.000000000 +1000
+++ linux-2.6.12-rc6-mm1/kernel/sched.c	2005-06-11 09:55:56.000000000 +1000
@@ -978,7 +978,8 @@ static inline unsigned long __source_loa
 	else
 		source_load = min(cpu_load, load_now);
 
-	if (idle == NOT_IDLE || rq->nr_running > 1)
+	if (idle == NOT_IDLE || rq->nr_running > 1) {
+		unsigned long prio_bias = 1;
 		/*
 		 * If we are busy rebalancing the load is biased by
 		 * priority to create 'nice' support across cpus. When
@@ -987,7 +988,10 @@ static inline unsigned long __source_loa
 		 * prevent idle rebalance from trying to pull tasks from a
 		 * queue with only one running task.
 		 */
-		source_load *= rq->prio_bias;
+		if (rq->nr_running)
+			prio_bias = rq->prio_bias / rq->nr_running;
+		source_load *= prio_bias;
+	}
 
 	return source_load;
 }
@@ -1011,8 +1015,13 @@ static inline unsigned long __target_loa
 	else
 		target_load = max(cpu_load, load_now);
 
-	if (idle == NOT_IDLE || rq->nr_running > 1)
-		target_load *= rq->prio_bias;
+	if (idle == NOT_IDLE || rq->nr_running > 1) {
+		unsigned long prio_bias = 1;
+
+		if (rq->nr_running)
+			prio_bias = rq->prio_bias / rq->nr_running;
+		target_load *= prio_bias;
+	}
 
 	return target_load;
 }

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10 23:59                 ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-11  0:18                   ` Con Kolivas
  2005-06-11  0:32                   ` 2.6.12-rc6-mm1 J.A. Magallon
  1 sibling, 0 replies; 72+ messages in thread
From: Con Kolivas @ 2005-06-11  0:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: J.A. Magallon, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Christoph Lameter, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 447 bytes --]

On Sat, 11 Jun 2005 09:59, Con Kolivas wrote:
> Here is what the patch _should_ have been. (*same warnings with this patch
> about math demonstration and untested as should have been posted with the
> earlier one*)

Ok I booted this patch and all seems fine. Thanks to those that tracked down 
this regression and the bugs, and apologies for the inconvenience. Looks like 
Martin's automated testbed is already paying off.

Cheers,
Con

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10 23:59                 ` 2.6.12-rc6-mm1 Con Kolivas
  2005-06-11  0:18                   ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-11  0:32                   ` J.A. Magallon
  2005-06-11  0:48                     ` 2.6.12-rc6-mm1 Con Kolivas
  1 sibling, 1 reply; 72+ messages in thread
From: J.A. Magallon @ 2005-06-11  0:32 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux-kernel, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Christoph Lameter, Nick Piggin


On 06.11, Con Kolivas wrote:
> 
> Here is what the patch _should_ have been. (*same warnings with this patch 
> about math demonstration and untested as should have been posted with the 
> earlier one*)
> 
> +	if (idle == NOT_IDLE || rq->nr_running > 1) {
> +		unsigned long prio_bias = 1;
> +		if (rq->nr_running)
> +			prio_bias = rq->prio_bias / rq->nr_running;
> +		source_load *= prio_bias;
> +	}
>  

Again... sorry, I don't try to be picky, just want to know if its worth or
not...

Would not be better something like:

	if (idle == NOT_IDLE || rq->nr_running > 1) {
		if (rq->nr_running)
			source_load = (source_load*rq->prio_bias) / rq->nr_running;
	}
  
wrt the integer math ? Think of

100*( 5/5) vs  500/5
100*( 6/5) vs  600/5
100*( 7/5) vs  700/5
100*( 8/5) vs  800/5
100*( 9/5) vs  900/5
100*(10/5) vs 1000/5

--
J.A. Magallon <jamagallon()able!es>     \               Software is like sex:
werewolf!able!es                         \         It's better when it's free
Mandriva Linux release 2006.0 (Cooker) for i586
Linux 2.6.11-jam24 (gcc 4.0.0 (4.0.0-3mdk for Mandriva Linux release 2006.0))



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11  0:32                   ` 2.6.12-rc6-mm1 J.A. Magallon
@ 2005-06-11  0:48                     ` Con Kolivas
  2005-06-11  0:52                       ` 2.6.12-rc6-mm1 Con Kolivas
  0 siblings, 1 reply; 72+ messages in thread
From: Con Kolivas @ 2005-06-11  0:48 UTC (permalink / raw)
  To: J.A. Magallon
  Cc: linux-kernel, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Christoph Lameter, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

On Sat, 11 Jun 2005 10:32, J.A. Magallon wrote:
> On 06.11, Con Kolivas wrote:
> > Here is what the patch _should_ have been. (*same warnings with this
> > patch about math demonstration and untested as should have been posted
> > with the earlier one*)
> >
> > +	if (idle == NOT_IDLE || rq->nr_running > 1) {
> > +		unsigned long prio_bias = 1;
> > +		if (rq->nr_running)
> > +			prio_bias = rq->prio_bias / rq->nr_running;
> > +		source_load *= prio_bias;
> > +	}
>
> Again... sorry, I don't try to be picky, just want to know if its worth or
> not...
>
> Would not be better something like:
>
> 	if (idle == NOT_IDLE || rq->nr_running > 1) {
> 		if (rq->nr_running)
> 			source_load = (source_load*rq->prio_bias) / rq->nr_running;
> 	}

I understand your concern, but by definition rq->nr_running will always be a 
factor of rq->prio_bias so integer math should be fine. Either will do.

Cheers,
Con

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11  0:48                     ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-11  0:52                       ` Con Kolivas
  0 siblings, 0 replies; 72+ messages in thread
From: Con Kolivas @ 2005-06-11  0:52 UTC (permalink / raw)
  To: J.A. Magallon
  Cc: linux-kernel, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Christoph Lameter, Nick Piggin

On Sat, 11 Jun 2005 10:48, Con Kolivas wrote:
> On Sat, 11 Jun 2005 10:32, J.A. Magallon wrote:
> > On 06.11, Con Kolivas wrote:
> > > Here is what the patch _should_ have been. (*same warnings with this
> > > patch about math demonstration and untested as should have been posted
> > > with the earlier one*)
> > >
> > > +	if (idle == NOT_IDLE || rq->nr_running > 1) {
> > > +		unsigned long prio_bias = 1;
> > > +		if (rq->nr_running)
> > > +			prio_bias = rq->prio_bias / rq->nr_running;
> > > +		source_load *= prio_bias;
> > > +	}
> >
> > Again... sorry, I don't try to be picky, just want to know if its worth
> > or not...
> >
> > Would not be better something like:
> >
> > 	if (idle == NOT_IDLE || rq->nr_running > 1) {
> > 		if (rq->nr_running)
> > 			source_load = (source_load*rq->prio_bias) / rq->nr_running;
> > 	}
>
> I understand your concern, but by definition rq->nr_running will always be
> a factor of rq->prio_bias so integer math should be fine. Either will do.

Hmm. No you are right and I'm smoking crack, but integer math should still be 
accurate enough here. Let me think about the accuracy before spraying more 
patches like a fool.

Cheers,
Con

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-10 23:50               ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-11  4:14                 ` Martin J. Bligh
  2005-06-11  5:22                   ` 2.6.12-rc6-mm1 Con Kolivas
  0 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-11  4:14 UTC (permalink / raw)
  To: Con Kolivas, linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Christoph Lameter, Nick Piggin



--"Martin J. Bligh" <mbligh@mbligh.org> wrote (on Friday, June 10, 2005 16:50:40 -0700):

>>> These tend to run together so just try adding my four patches together. In
>>> retrospect I guess they're likely candidates because they also change the
>>> _ratio_ of balance which they should not so they are buggy as a group
>>> currently. Easy enough to fix but it will make it easy to pinpoint the
>>> problem if they're responsible.
>>> 
>>> sched-implement-nice-support-across-physical-cpus-on-smp.patch
>>> sched-change_prio_bias_only_if_queued.patch
>>> sched-account_rt_tasks_in_prio_bias.patch
>>> sched-smp-nice-bias-busy-queues-on-idle-rebalance.patch
>> 
>> By the way it has already been decided to remove these patches from -mm 
>> pending the completion of current scheduler work. If they turn out to be 
>> responsible for this regression I apologise profusely :-|. 
>> 
>> It is clearer to me now that I have made a mistake with the priority biasing, 
>> and the following patch corrects it to the planned behaviour. This is 
>> academic at this stage as we won't be looking at this particular feature 
>> again in earnest until the other 32 scheduler patches (and any followups) go 
>> upstream. 
>> 
>> It's already known that schedstats data will be off without further code to 
>> understand smp nice as well (thanks Nick for pointing out the data)... more 
>> academic stuff but obviously something to consider when/if we get there.
> 
> OK, I backed out those 4, and the degredation mostly went away.
> See http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png
> 
> and more specifically, see the +p5150 near the right hand side.
> I don't think it's quite as good as mainline, but much closer.
> I did this run with HZ=1000, and the the one with no scheduler
> patches at all with HZ=250, so I'll try to do a run that's more
> directly comparable as well

OK, that makes it look much more like mainline. Looks like you were still
revising the details of your patch Con ... once you're ready, drop me a 
URL for it, and I'll make the system whack on that too.

M.

PS. Hmmm. I need to get better at identifying what +p5150 means in the
graphs, etc ;-( Maybe HTML explanation with embedded png image.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11  4:14                 ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-11  5:22                   ` Con Kolivas
  2005-06-11  5:56                     ` 2.6.12-rc6-mm1 Martin J. Bligh
  2005-06-11 20:13                     ` 2.6.12-rc6-mm1 Martin J. Bligh
  0 siblings, 2 replies; 72+ messages in thread
From: Con Kolivas @ 2005-06-11  5:22 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

On Sat, 11 Jun 2005 14:14, Martin J. Bligh wrote:
> --"Martin J. Bligh" <mbligh@mbligh.org> wrote (on Friday, June 10, 2005 > > 
OK, I backed out those 4, and the degredation mostly went away.
> > See
> > http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.
> >moe.png
> >
> > and more specifically, see the +p5150 near the right hand side.
> > I don't think it's quite as good as mainline, but much closer.
> > I did this run with HZ=1000, and the the one with no scheduler
> > patches at all with HZ=250, so I'll try to do a run that's more
> > directly comparable as well
>
> OK, that makes it look much more like mainline. Looks like you were still
> revising the details of your patch Con ... once you're ready, drop me a
> URL for it, and I'll make the system whack on that too.

Great thanks. Here are rolled up all the reconsidered changes that apply 
directly to 2.6.12-rc6-mm1 -purely for testing purposes-. I'd be very 
grateful to see how this performed; it has been boot and stress tested at 
this end. If it shows detriment I'll have to make the smp nice changes more 
complex.

Cheers,
Con

[-- Attachment #2: 2.6.12-rc6-mm1-mjbtest.patch --]
[-- Type: text/x-diff, Size: 1253 bytes --]

Index: linux-2.6.12-rc6-mm1/kernel/sched.c
===================================================================
--- linux-2.6.12-rc6-mm1.orig/kernel/sched.c	2005-06-10 23:56:56.000000000 +1000
+++ linux-2.6.12-rc6-mm1/kernel/sched.c	2005-06-11 11:48:09.000000000 +1000
@@ -978,7 +978,7 @@ static inline unsigned long __source_loa
 	else
 		source_load = min(cpu_load, load_now);
 
-	if (idle == NOT_IDLE || rq->nr_running > 1)
+	if (rq->nr_running > 1 || (idle == NOT_IDLE && rq->nr_running))
 		/*
 		 * If we are busy rebalancing the load is biased by
 		 * priority to create 'nice' support across cpus. When
@@ -987,7 +987,7 @@ static inline unsigned long __source_loa
 		 * prevent idle rebalance from trying to pull tasks from a
 		 * queue with only one running task.
 		 */
-		source_load *= rq->prio_bias;
+		source_load = source_load * rq->prio_bias / rq->nr_running;
 
 	return source_load;
 }
@@ -1011,8 +1011,8 @@ static inline unsigned long __target_loa
 	else
 		target_load = max(cpu_load, load_now);
 
-	if (idle == NOT_IDLE || rq->nr_running > 1)
-		target_load *= rq->prio_bias;
+	if (rq->nr_running > 1 || (idle == NOT_IDLE && rq->nr_running))
+		target_load = target_load * rq->prio_bias / rq->nr_running;
 
 	return target_load;
 }

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11  5:22                   ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-11  5:56                     ` Martin J. Bligh
  2005-06-11 20:13                     ` 2.6.12-rc6-mm1 Martin J. Bligh
  1 sibling, 0 replies; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-11  5:56 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin



--Con Kolivas <kernel@kolivas.org> wrote (on Saturday, June 11, 2005 15:22:30 +1000):

> On Sat, 11 Jun 2005 14:14, Martin J. Bligh wrote:
>> --"Martin J. Bligh" <mbligh@mbligh.org> wrote (on Friday, June 10, 2005 > > 
> OK, I backed out those 4, and the degredation mostly went away.
>> > See
>> > http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.
>> > moe.png
>> > 
>> > and more specifically, see the +p5150 near the right hand side.
>> > I don't think it's quite as good as mainline, but much closer.
>> > I did this run with HZ=1000, and the the one with no scheduler
>> > patches at all with HZ=250, so I'll try to do a run that's more
>> > directly comparable as well
>> 
>> OK, that makes it look much more like mainline. Looks like you were still
>> revising the details of your patch Con ... once you're ready, drop me a
>> URL for it, and I'll make the system whack on that too.
> 
> Great thanks. Here are rolled up all the reconsidered changes that apply 
> directly to 2.6.12-rc6-mm1 -purely for testing purposes-. I'd be very 
> grateful to see how this performed; it has been boot and stress tested at 
> this end. If it shows detriment I'll have to make the smp nice changes more 
> complex.

Kicked it off - should appear in a few hours as 
http://mbligh.org/abat/con_sched_test

M.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
                   ` (4 preceding siblings ...)
  2005-06-08 14:22 ` 2.6.12-rc6-mm1 Andy Whitcroft
@ 2005-06-11 11:51 ` Benoit Boissinot
  2005-06-18 22:39 ` 2.6.12-rc6-mm1 Richard Purdie
  2005-06-21 13:20 ` 2.6.12-rc6-mm1 Dominik Karall
  7 siblings, 0 replies; 72+ messages in thread
From: Benoit Boissinot @ 2005-06-11 11:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On 6/7/05, Andrew Morton <akpm@osdl.org> wrote:
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc6/2.6.12-rc6-mm1/
> 
> - Added v9fs
> 
> - Various random fixes
> 
> - Probably a similar number of breakages
> 
I just had the following Oopses:

Unable to handle kernel paging request at virtual address 901a1960
printing eip:
c0139251
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: radeon drm tun snd_seq snd_pcm_oss snd_mixer_oss
snd_via82xx snd_ac97_codec snd_pcm snd_timer snd_page_alloc
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore ipt_multiport
ipt_state ipt_limit ipt_MASQUERADE ipt_mark iptable_mangle ipt_MARK
ipt_REJECT iptable_filter iptable_nat ip_tables ip_conntrack_irc
ip_conntrack_ftp ip_conntrack skge 8139too mii usbcore ide_cd cdrom
CPU:    0
EIP:    0060:[<c0139251>]    Not tainted VLI
EFLAGS: 00010086   (2.6.12-rc6-mm1-arakou) 
EIP is at find_lock_page+0x21/0xb0
eax: 901a195c   ebx: 901a195c   ecx: d8a3b094   edx: 00000003
esi: 00109380   edi: c18e4b08   ebp: d822cb10   esp: d822cb00
ds: 007b   es: 007b   ss: 0068
Process emerge (pid: 31977, threadinfo=d822c000 task=cbb9d040)
Stack: c18e4b04 c1218060 00000000 00000050 d822cb34 c013930e 00000050 00109380 
        c18e4b04 c0333d04 00109380 c18e4a00 00001000 d822cb50 c0157986 d822cb50
 00109380 00109380 00001000 c18e4a00 d822cb70 c0157af5 00001000 d822cb70
Call Trace:
[<c0103d17>] show_stack+0x97/0xd0 
[<c0103ec5>] show_registers+0x155/0x1f0
[<c01040c1>] die+0xc1/0x140 
[<c01157ec>] do_page_fault+0x23c/0x6b5 
[<c010395f>] error_code+0x4f/0x54
[<c013930e>] find_or_create_page+0x2e/0xd0
[<c0157986>] grow_dev_page+0x26/0x110
[<c0157af5>] __getblk_slow+0x85/0x130
[<c0157e8b>] __getblk+0x3b/0x50
[<c01a788b>] search_by_key+0x9b/0xf40
[<c0195095>] reiserfs_read_locked_inode+0x65/0x110
[<c01951e9>] reiserfs_iget+0x79/0xa0
[<c0190330>] reiserfs_lookup+0xd0/0x130
[<c0161f80>] real_lookup+0xb0/0xd0
[<c01622be>] do_lookup+0x7e/0x90
[<c0162a06>] __link_path_walk+0x736/0xd50
[<c016306a>] link_path_walk+0x4a/0x110
[<c01633b4>] path_lookup+0x74/0x120
[<c01635ee>] __user_walk+0x2e/0x50
[<c015e240>] vfs_stat+0x20/0x50
[<c015e834>] sys_stat64+0x14/0x30
[<c0102e0f>] sysenter_past_esp+0x54/0x75
Code: c3 89 f6 8d bc 27 00 00 00 00 55 89 e5 57 56 89 d6 53 83 ec 04
89 45 f0 fa 8d 78 04 89 f2 89 f8 e8 35 04 0b 00 85 c0 89 c3 74 56 <ff>
40 04 0f ba 28 00 19 c0 85 c0 74 49 fb 0f ba 2b 00 19 c0 85


 <1>Unable to handle kernel paging request at virtual address 71ef2710
 printing eip:
c0157140
*pde = 00000000
Oops: 0000 [#2]
Modules linked in: radeon drm tun snd_seq snd_pcm_oss snd_mixer_oss
snd_via82xx snd_ac97_codec snd_pcm snd_timer snd_page_alloc
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore ipt_multiport
ipt_state ipt_limit ipt_MASQUERADE ipt_mark iptable_mangle ipt_MARK
ipt_REJECT iptable_filter iptable_nat ip_tables ip_conntrack_irc
ip_conntrack_ftp ip_conntrack skge 8139too mii usbcore ide_cd cdrom   
                          CPU:    0
EIP:    0060:[<c0157140>]    Not tainted VLI   
EFLAGS: 00010a16   (2.6.12-rc6-mm1-arakou)     
EIP is at __find_get_block_slow+0x90/0x140
eax: 00000000   ebx: 71ef26fc   ecx: cb96f0e7   edx: 00000001
esi: c1309f20   edi: 000f9df5   ebp: e35d5b98   esp: e35d5b74
ds: 007b   es: 007b   ss: 0068
Process vim (pid: 32081, threadinfo=e35d5000 task=c2cc55b0)  
Stack: df7fb6bc f6de8a4c f7cf12fc f6de8cec 00000002 c18e4584 dcb43d7c c18e4520 
       000f9df5 e35d5bac c0157e1c 00001000 000f9df5 c18e4520 e35d5bc0 c0157e6c 
       00003e94 0000003e 000f9df5 e35d5ce0 c01a788b 0000001e 0000001f e35d5bf0 
Call Trace:
 [<c0103d17>] show_stack+0x97/0xd0
 [<c0103ec5>] show_registers+0x155/0x1f0
 [<c01040c1>] die+0xc1/0x140
 [<c01157ec>] do_page_fault+0x23c/0x6b5 
 [<c010395f>] error_code+0x4f/0x54
 [<c0157e1c>] __find_get_block+0x6c/0xa0
 [<c0157e6c>] __getblk+0x1c/0x50
 [<c01a788b>] search_by_key+0x9b/0xf40  
 [<c018fc2c>] search_by_entry_key+0x1c/0x1f0
 [<c01901e0>] reiserfs_find_entry+0x90/0x110
 [<c01902d2>] reiserfs_lookup+0x72/0x130
 [<c0161f80>] real_lookup+0xb0/0xd0
 [<c01622be>] do_lookup+0x7e/0x90
 [<c0162a06>] __link_path_walk+0x736/0xd50
 [<c016306a>] link_path_walk+0x4a/0x110 
 [<c01633b4>] path_lookup+0x74/0x120
 [<c0163a09>] open_namei+0x79/0x5f0
 [<c0154c29>] filp_open+0x29/0x50
 [<c0154fac>] sys_open+0x3c/0xc0
 [<c0102e0f>] sysenter_past_esp+0x54/0x75
Code: 89 f0 e8 34 b8 fe ff 89 d8 83 c4 18 5b 5e 5f c9 c3 8b 06 f6 c4
08 0f 84 a4 00 00 00 8b 5e 0c ba 01 00 00 00 89 d9 90 8d 74 26 00 <3b>
7b 14 74 7b 8b 03 8b 5b 04 a8 10 b8 00 00 00 00 0f 44 d0 39

Bad page state at free_hot_cold_page (in process 'firefox-bin', page c1309360)
flags:0x40000000 mapping:00000000 mapcount:-1 count:0
Backtrace:
 [<c0103d67>] dump_stack+0x17/0x20
 [<c013cb52>] bad_page+0x72/0xb0
 [<c013d2da>] free_hot_cold_page+0x4a/0xe0
 [<c013da81>] __pagevec_free+0x31/0x40
 [<c0142a9d>] release_pages+0x9d/0x150
 [<c0142b68>] __pagevec_release+0x18/0x30
 [<c01430bb>] truncate_inode_pages_range+0x13b/0x300
 [<c014329a>] truncate_inode_pages+0x1a/0x20
 [<c016d8e2>] generic_delete_inode+0xb2/0xd0
 [<c016da1f>] generic_drop_inode+0xf/0x20
 [<c016da92>] iput+0x62/0x90
 [<c016494f>] sys_unlink+0xdf/0x110
 [<c0102e0f>] sysenter_past_esp+0x54/0x75
Trying to fix it up, but a reboot is needed

regards,

Benoit Boissinot

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11  5:22                   ` 2.6.12-rc6-mm1 Con Kolivas
  2005-06-11  5:56                     ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-11 20:13                     ` Martin J. Bligh
  2005-06-11 22:20                       ` 2.6.12-rc6-mm1 Con Kolivas
  1 sibling, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-11 20:13 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin



--Con Kolivas <kernel@kolivas.org> wrote (on Saturday, June 11, 2005 15:22:30 +1000):

> On Sat, 11 Jun 2005 14:14, Martin J. Bligh wrote:
>> --"Martin J. Bligh" <mbligh@mbligh.org> wrote (on Friday, June 10, 2005 > > 
> OK, I backed out those 4, and the degredation mostly went away.
>> > See
>> > http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.
>> > moe.png
>> > 
>> > and more specifically, see the +p5150 near the right hand side.
>> > I don't think it's quite as good as mainline, but much closer.
>> > I did this run with HZ=1000, and the the one with no scheduler
>> > patches at all with HZ=250, so I'll try to do a run that's more
>> > directly comparable as well
>> 
>> OK, that makes it look much more like mainline. Looks like you were still
>> revising the details of your patch Con ... once you're ready, drop me a
>> URL for it, and I'll make the system whack on that too.
> 
> Great thanks. Here are rolled up all the reconsidered changes that apply 
> directly to 2.6.12-rc6-mm1 -purely for testing purposes-. I'd be very 
> grateful to see how this performed; it has been boot and stress tested at 
> this end. If it shows detriment I'll have to make the smp nice changes more 
> complex.

It's much better ... but still a degredation - see point p5181 on:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.moe.png

Only really seems to hurt the NUMA box (the x440 one ... elm3b67 ... is 
still trying to find it's ass with both hands). I'm not necessarily saying
it's a problem ... not sure what the benefits of the patch are, but it's a 
data point, at least ?

M.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11 20:13                     ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-11 22:20                       ` Con Kolivas
  2005-06-11 23:27                         ` 2.6.12-rc6-mm1 Martin J. Bligh
  0 siblings, 1 reply; 72+ messages in thread
From: Con Kolivas @ 2005-06-11 22:20 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin

On Sun, 12 Jun 2005 06:13, Martin J. Bligh wrote:
> --Con Kolivas <kernel@kolivas.org> wrote (on Saturday, June 11, 2005 > > 
Great thanks. Here are rolled up all the reconsidered changes that apply
> > directly to 2.6.12-rc6-mm1 -purely for testing purposes-. I'd be very
> > grateful to see how this performed; it has been boot and stress tested at
> > this end. If it shows detriment I'll have to make the smp nice changes
> > more complex.
>
> It's much better ... but still a degredation - see point p5181 on:
>
> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.mo
>e.png
>
> Only really seems to hurt the NUMA box (the x440 one ... elm3b67 ... is
> still trying to find it's ass with both hands). I'm not necessarily saying
> it's a problem ... not sure what the benefits of the patch are, but it's a
> data point, at least ?

Thanks a lot!

Just checking the numbering of the test runs with you. This is the blue line 
order as plotted on the graph:

5181 is with this patch
4947 is mm1?
5150 is mm1 with the 4 patches backed out
5081 is mm1 with the 4 patches backed out and Hz changed to 100?
5169 is ?

Cheers,
Con

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11 22:20                       ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-11 23:27                         ` Martin J. Bligh
  2005-06-11 23:47                           ` 2.6.12-rc6-mm1 Con Kolivas
  0 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-11 23:27 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin



--Con Kolivas <kernel@kolivas.org> wrote (on Sunday, June 12, 2005 08:20:05 +1000):

> On Sun, 12 Jun 2005 06:13, Martin J. Bligh wrote:
>> --Con Kolivas <kernel@kolivas.org> wrote (on Saturday, June 11, 2005 > > 
> Great thanks. Here are rolled up all the reconsidered changes that apply
>> > directly to 2.6.12-rc6-mm1 -purely for testing purposes-. I'd be very
>> > grateful to see how this performed; it has been boot and stress tested at
>> > this end. If it shows detriment I'll have to make the smp nice changes
>> > more complex.
>> 
>> It's much better ... but still a degredation - see point p5181 on:
>> 
>> http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/perf/kernbench.mo
>> e.png
>> 
>> Only really seems to hurt the NUMA box (the x440 one ... elm3b67 ... is
>> still trying to find it's ass with both hands). I'm not necessarily saying
>> it's a problem ... not sure what the benefits of the patch are, but it's a
>> data point, at least ?
> 
> Thanks a lot!
> 
> Just checking the numbering of the test runs with you. This is the blue line 
> order as plotted on the graph:
> 
> 5181 is with this patch
> 4947 is mm1?
> 5150 is mm1 with the 4 patches backed out
> 5081 is mm1 with the 4 patches backed out and Hz changed to 100?
> 5169 is ?

Until I get off my ass and write an html wrapper for the graphs, easiest
thing to do is just cross-reference to here:

http://ftp.kernel.org/pub/linux/kernel/people/mbligh/abat/regression_matrix.html

The +pXXXX numbers on the graph match the job numbers in the boxes. You 
can click on the patches down the left side, and see exactly what they 
were if you want.

M.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11 23:27                         ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-11 23:47                           ` Con Kolivas
  2005-06-12  0:23                             ` 2.6.12-rc6-mm1 Martin J. Bligh
  0 siblings, 1 reply; 72+ messages in thread
From: Con Kolivas @ 2005-06-11 23:47 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]

On Sun, 12 Jun 2005 09:27, Martin J. Bligh wrote:
> >> not sure what the benefits of the patch are, 

I should have answered this. Since we moved to one runqueue per cpu with the 
current scheduler, 'nice' levels basically fall apart on SMP. Balancing tends 
to group together all the wrong tasks to have any meaningful 'nice' support 
where often on a 2 cpu machine if we run 4 tasks, 2 nice 0 and 2 nice 19 we 
end up with:

cpu 1: nice 19 + nice 19
cpu 2: nice 0 + nice 0

which means each nice 19 task gets half a cpu and each nice 0 task gets half a 
cpu which is lousy fairness. 

The smp nice patches should end up with
cpu 1: nice 0 + nice 19
cpu 2: nice 0 + nice 19

so that the nice 0 tasks get 95% of a cpu and nice 19 tasks get 5% of a cpu.

The patches should balance things as fairly as possible according to nice 
levels across cpus. As you can see this is clearly a bug in behaviour and has 
been a showstopper for many trying to move from 2.4.

Cheers,
Con

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-11 23:47                           ` 2.6.12-rc6-mm1 Con Kolivas
@ 2005-06-12  0:23                             ` Martin J. Bligh
  2005-06-12  5:19                               ` 2.6.12-rc6-mm1 Con Kolivas
  0 siblings, 1 reply; 72+ messages in thread
From: Martin J. Bligh @ 2005-06-12  0:23 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin



--Con Kolivas <kernel@kolivas.org> wrote (on Sunday, June 12, 2005 09:47:08 +1000):

> On Sun, 12 Jun 2005 09:27, Martin J. Bligh wrote:
>> >> not sure what the benefits of the patch are, 
> 
> I should have answered this. Since we moved to one runqueue per cpu with the 
> current scheduler, 'nice' levels basically fall apart on SMP. Balancing tends 
> to group together all the wrong tasks to have any meaningful 'nice' support 
> where often on a 2 cpu machine if we run 4 tasks, 2 nice 0 and 2 nice 19 we 
> end up with:
> 
> cpu 1: nice 19 + nice 19
> cpu 2: nice 0 + nice 0
> 
> which means each nice 19 task gets half a cpu and each nice 0 task gets half a 
> cpu which is lousy fairness. 
> 
> The smp nice patches should end up with
> cpu 1: nice 0 + nice 19
> cpu 2: nice 0 + nice 19
> 
> so that the nice 0 tasks get 95% of a cpu and nice 19 tasks get 5% of a cpu.
> 
> The patches should balance things as fairly as possible according to nice 
> levels across cpus. As you can see this is clearly a bug in behaviour and has 
> been a showstopper for many trying to move from 2.4.

Oh, right. that makes a lot of sense ... maybe just let it have an error
factor when migrating cross numa nodes (ie not be as strict)? Not sure 
that's really the problem, as I doubt anything in my test is actually 
niced anyway (assuming you're meaning static prio, not dynamic). In that
case, your changes should have no effect, right (from explanation, not
looking at the code ;-))

M.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-12  0:23                             ` 2.6.12-rc6-mm1 Martin J. Bligh
@ 2005-06-12  5:19                               ` Con Kolivas
  0 siblings, 0 replies; 72+ messages in thread
From: Con Kolivas @ 2005-06-12  5:19 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Christoph Lameter,
	Nick Piggin

On Sun, 12 Jun 2005 10:23, Martin J. Bligh wrote:
> --Con Kolivas <kernel@kolivas.org> wrote (on Sunday, June 12, 2005 09:47:08 
+1000):
> > The patches should balance things as fairly as possible according to nice
> > levels across cpus. As you can see this is clearly a bug in behaviour and
> > has been a showstopper for many trying to move from 2.4.
>
> Oh, right. that makes a lot of sense ... maybe just let it have an error
> factor when migrating cross numa nodes (ie not be as strict)? Not sure
> that's really the problem, as I doubt anything in my test is actually
> niced anyway (assuming you're meaning static prio, not dynamic). In that
> case, your changes should have no effect, right (from explanation, not
> looking at the code ;-))

The balancing code is not really aware that the loads being returned are being 
altered and it was not clear whether this would be needed or not as it 
usually bases its decisions on ratios of load rather than absolute amounts. 
The tricky part is idle balancing where we don't want to try and pull from a 
queue that only has one running task and the patch has a "if single task 
running and idle balancing tell it only one task running and don't bias" 
feature. This may cause slight performance effects on numa as I guess the 
other nodes suddenly seem much more loaded and we normally wouldn't try 
balancing between nodes until there was a larger load discrepancy than 
between cpus. I'll think on this and see how much more nice-aware the 
balancing code needs to be for this to not have any effect.

Cheers,
Con

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
                   ` (5 preceding siblings ...)
  2005-06-11 11:51 ` 2.6.12-rc6-mm1 Benoit Boissinot
@ 2005-06-18 22:39 ` Richard Purdie
  2005-06-18 22:44   ` 2.6.12-rc6-mm1 Andrew Morton
  2005-06-18 23:18   ` 2.6.12-rc6-mm1 Russell King
  2005-06-21 13:20 ` 2.6.12-rc6-mm1 Dominik Karall
  7 siblings, 2 replies; 72+ messages in thread
From: Richard Purdie @ 2005-06-18 22:39 UTC (permalink / raw)
  To: Russell King; +Cc: LKML, Andrew Morton

On Tue, 2005-06-07 at 04:29 -0700, Andrew Morton wrote:
> +git-arm-smp.patch
> 
>  ARM git trees

The arm pxa255 based Zaurus won't resume from a suspend with the patches
from the above tree applied. The suspend looks normal and gets at least
as far as pxa_pm_enter(). After that, the device appears to be dead and
needs a battery removal to reset. I'm unsure if it actually suspends and
is failing to resume or is crashing in the latter suspend stages.

Is there some documentation on what the above patch is aiming to do
anywhere?

Richard




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-18 22:39 ` 2.6.12-rc6-mm1 Richard Purdie
@ 2005-06-18 22:44   ` Andrew Morton
  2005-06-18 22:57     ` 2.6.12-rc6-mm1 Richard Purdie
  2005-06-18 23:18   ` 2.6.12-rc6-mm1 Russell King
  1 sibling, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-06-18 22:44 UTC (permalink / raw)
  To: Richard Purdie; +Cc: linux, linux-kernel

Richard Purdie <rpurdie@rpsys.net> wrote:
>
> On Tue, 2005-06-07 at 04:29 -0700, Andrew Morton wrote:
> > +git-arm-smp.patch
> > 
> >  ARM git trees
> 
> The arm pxa255 based Zaurus won't resume from a suspend with the patches
> from the above tree applied. The suspend looks normal and gets at least
> as far as pxa_pm_enter(). After that, the device appears to be dead and
> needs a battery removal to reset. I'm unsure if it actually suspends and
> is failing to resume or is crashing in the latter suspend stages.
> 
> Is there some documentation on what the above patch is aiming to do
> anywhere?

Did you apply just that patch, or are you talking about the whole -mm lineup?

If the latter, please test with only git-arm-smp.patch.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-18 22:44   ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-18 22:57     ` Richard Purdie
  2005-06-18 23:11       ` 2.6.12-rc6-mm1 Richard Purdie
  0 siblings, 1 reply; 72+ messages in thread
From: Richard Purdie @ 2005-06-18 22:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux, linux-kernel

On Sat, 2005-06-18 at 15:44 -0700, Andrew Morton wrote:
> Richard Purdie <rpurdie@rpsys.net> wrote:
> >
> > On Tue, 2005-06-07 at 04:29 -0700, Andrew Morton wrote:
> > > +git-arm-smp.patch
> > > 
> > >  ARM git trees
> > 
> > The arm pxa255 based Zaurus won't resume from a suspend with the patches
> > from the above tree applied. The suspend looks normal and gets at least
> > as far as pxa_pm_enter(). After that, the device appears to be dead and
> > needs a battery removal to reset. I'm unsure if it actually suspends and
> > is failing to resume or is crashing in the latter suspend stages.
> > 
> > Is there some documentation on what the above patch is aiming to do
> > anywhere?
> 
> Did you apply just that patch, or are you talking about the whole -mm lineup?
> 
> If the latter, please test with only git-arm-smp.patch.

Sorry, I wasn't clear. I had problems with the -mm lineup and tracked it
down to the above patch. With the above patch removed, -mm works fine. 

(I know there's a number of changes to the arm pxa suspend/resume code
in git-arm.patch but they're definitely not causing the problem.)

Richard


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-18 22:57     ` 2.6.12-rc6-mm1 Richard Purdie
@ 2005-06-18 23:11       ` Richard Purdie
  0 siblings, 0 replies; 72+ messages in thread
From: Richard Purdie @ 2005-06-18 23:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux, linux-kernel

On Sat, 2005-06-18 at 23:57 +0100, Richard Purdie wrote:
> On Sat, 2005-06-18 at 15:44 -0700, Andrew Morton wrote:
> > > > +git-arm-smp.patch
> > > >  ARM git trees
> > > 
> > > The arm pxa255 based Zaurus won't resume from a suspend with the patches
> > > from the above tree applied. The suspend looks normal and gets at least
> > > as far as pxa_pm_enter(). After that, the device appears to be dead and
> > > needs a battery removal to reset. I'm unsure if it actually suspends and
> > > is failing to resume or is crashing in the latter suspend stages.
> > > 
> > > Is there some documentation on what the above patch is aiming to do
> > > anywhere?
> > 
> > Did you apply just that patch, or are you talking about the whole -mm lineup?
> > 
> > If the latter, please test with only git-arm-smp.patch.
> 
> Sorry, I wasn't clear. I had problems with the -mm lineup and tracked it
> down to the above patch. With the above patch removed, -mm works fine. 
> 
> (I know there's a number of changes to the arm pxa suspend/resume code
> in git-arm.patch but they're definitely not causing the problem.)

I meant to add that git-arm-smp.patch breaks suspend/resume, even
applied in isolation against 2.6.12-rc6.

Richard


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-18 22:39 ` 2.6.12-rc6-mm1 Richard Purdie
  2005-06-18 22:44   ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-06-18 23:18   ` Russell King
  2005-06-19  1:20     ` 2.6.12-rc6-mm1 Richard Purdie
  1 sibling, 1 reply; 72+ messages in thread
From: Russell King @ 2005-06-18 23:18 UTC (permalink / raw)
  To: Richard Purdie; +Cc: LKML, Andrew Morton

On Sat, Jun 18, 2005 at 11:39:18PM +0100, Richard Purdie wrote:
> On Tue, 2005-06-07 at 04:29 -0700, Andrew Morton wrote:
> > +git-arm-smp.patch
> > 
> >  ARM git trees
> 
> The arm pxa255 based Zaurus won't resume from a suspend with the patches
> from the above tree applied. The suspend looks normal and gets at least
> as far as pxa_pm_enter(). After that, the device appears to be dead and
> needs a battery removal to reset. I'm unsure if it actually suspends and
> is failing to resume or is crashing in the latter suspend stages.

<grumble>Well, its a bit late for this since (a) stuff has rapidly
moved on at rmk towers since 2.6.12 was released this morning, and
(b) I've just asked Linus to pull this.</grumble>

Thinking about what's probably happening, I suspect all the ARM suspend
and resume code needs to be reworked to save more state.  I'll try to
cook up a patch tomorrow to fix it, but I'll need you to provide
feedback.

Please note that you may see other ARM breakage over the next month
or so - I'm going to be concentrating on merging ARM SMP support,
and whatever bashing other people like yourself can give the kernel
will help ensure that problems are picked up quickly.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-18 23:18   ` 2.6.12-rc6-mm1 Russell King
@ 2005-06-19  1:20     ` Richard Purdie
  2005-06-19  9:02       ` 2.6.12-rc6-mm1 Russell King
  0 siblings, 1 reply; 72+ messages in thread
From: Richard Purdie @ 2005-06-19  1:20 UTC (permalink / raw)
  To: Russell King; +Cc: LKML, Andrew Morton

On Sun, 2005-06-19 at 00:18 +0100, Russell King wrote: 
> On Sat, Jun 18, 2005 at 11:39:18PM +0100, Richard Purdie wrote:
> > On Tue, 2005-06-07 at 04:29 -0700, Andrew Morton wrote:
> > > +git-arm-smp.patch
> > > 
> > >  ARM git trees
> > 
> > The arm pxa255 based Zaurus won't resume from a suspend with the patches
> > from the above tree applied. The suspend looks normal and gets at least
> > as far as pxa_pm_enter(). After that, the device appears to be dead and
> > needs a battery removal to reset. I'm unsure if it actually suspends and
> > is failing to resume or is crashing in the latter suspend stages.
> 
> <grumble>Well, its a bit late for this since (a) stuff has rapidly
> moved on at rmk towers since 2.6.12 was released this morning, and
> (b) I've just asked Linus to pull this.</grumble>

Please don't underestimate the time it takes to wade through all the
patches in the -mm tree, find the one causing the breakage, investigate
the patch and report it to the person concerned. I'm doing the Zaurus
work in my spare time and don't get paid for it. Just reflashing and
booting a new kernel probably takes ~15mins on the Zaurus.  The
copy/clearpage problem took a complete weekend to track down (as it was
showing up randomly) and then needed further evenings to debug your
patch which is a large chunk of my free time. The Checked-By: line
didn't quite give the full picture.

I realise its taken me a while to find enough time to test/debug this
kernel but as least you now know there's a problem...

> Thinking about what's probably happening, I suspect all the ARM suspend
> and resume code needs to be reworked to save more state.  I'll try to
> cook up a patch tomorrow to fix it, but I'll need you to provide
> feedback.

Ok, thanks. I'm happy to test any fixes/patches.

> Please note that you may see other ARM breakage over the next month
> or so - I'm going to be concentrating on merging ARM SMP support,
> and whatever bashing other people like yourself can give the kernel
> will help ensure that problems are picked up quickly.

In order to assist with that, can you publish these patches somewhere?
That way, I can apply them against a known good Zaurus kernel tree and
know straight away if they break anything (diff/patch format would be
preferable as my Zaurus trees are all patch based).

On a positive note, something in the later 2.6.12-rc kernels has made a
massive difference to the speed on the Zaurus - I suspect the removal of
the preempt locks on copy/clearpage. It boots up ~1.5x faster and the
speed gain will make a lot of people very happy :)

Richard


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-19  1:20     ` 2.6.12-rc6-mm1 Richard Purdie
@ 2005-06-19  9:02       ` Russell King
  2005-06-19  9:11         ` 2.6.12-rc6-mm1 Russell King
  0 siblings, 1 reply; 72+ messages in thread
From: Russell King @ 2005-06-19  9:02 UTC (permalink / raw)
  To: Richard Purdie; +Cc: LKML, Andrew Morton

On Sun, Jun 19, 2005 at 02:20:48AM +0100, Richard Purdie wrote:
> On Sun, 2005-06-19 at 00:18 +0100, Russell King wrote: 
> > Thinking about what's probably happening, I suspect all the ARM suspend
> > and resume code needs to be reworked to save more state.  I'll try to
> > cook up a patch tomorrow to fix it, but I'll need you to provide
> > feedback.
> 
> Ok, thanks. I'm happy to test any fixes/patches.

This should resolve the problem - we now rely on the stack pointer for
each CPU mode to remain constant throughout the running time of the
kernel, which includes across suspend/resume cycles.

--- a/arch/arm/mach-pxa/sleep.S
+++ b/arch/arm/mach-pxa/sleep.S
@@ -38,6 +38,16 @@ ENTRY(pxa_cpu_suspend)
 #endif
 	stmfd	sp!, {r2 - r12, lr}		@ save registers on stack
 
+	@ preserve IRQ, abort and undefined mode stack pointers
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | IRQ_MODE
+	mov	r4, sp
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | ABT_MODE
+	mov	r5, sp
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | UND_MODE
+	mov	r6, sp
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | SVC_MODE
+	stmfd	sp!, {r4 - r6}
+
 	@ get coprocessor registers
 	mrc	p14, 0, r3, c6, c0, 0		@ clock configuration, for turbo mode
 	mrc	p15, 0, r4, c15, c1, 0		@ CP access reg
@@ -229,6 +239,17 @@ resume_after_mmu:
 #ifdef CONFIG_XSCALE_CACHE_ERRATA
 	bl	cpu_xscale_proc_init
 #endif
+
+	@ restore IRQ, abort and undefined mode stack pointers
+	ldmfd	sp!, {r4 - r6}
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | IRQ_MODE
+	mov	sp, r4
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | ABT_MODE
+	mov	sp, r5
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | UND_MODE
+	mov	sp, r6
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | SVC_MODE
+
 	ldmfd	sp!, {r2, r3}
 #ifndef CONFIG_IWMMXT
 	mar	acc0, r2, r3
--- a/arch/arm/mach-sa1100/sleep.S
+++ b/arch/arm/mach-sa1100/sleep.S
@@ -37,6 +37,16 @@ ENTRY(sa1100_cpu_suspend)
 
 	stmfd	sp!, {r4 - r12, lr}		@ save registers on stack
 
+	@ preserve IRQ, abort and undefined mode stack pointers
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | IRQ_MODE
+	mov	r4, sp
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | ABT_MODE
+	mov	r5, sp
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | UND_MODE
+	mov	r6, sp
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | SVC_MODE
+	stmfd	sp!, {r4 - r6}
+
 	@ get coprocessor registers
 	mrc 	p15, 0, r4, c3, c0, 0		@ domain ID
 	mrc 	p15, 0, r5, c2, c0, 0		@ translation table base addr
@@ -210,6 +220,17 @@ sleep_save_sp:
 	.text
 resume_after_mmu:
 	mcr     p15, 0, r1, c15, c1, 2		@ enable clock switching
+
+	@ restore IRQ, abort and undefined mode stack pointers
+	ldmfd	sp!, {r4 - r6}
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | IRQ_MODE
+	mov	sp, r4
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | ABT_MODE
+	mov	sp, r5
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | UND_MODE
+	mov	sp, r6
+	msr	cpsr_c, #PSR_F_BIT | PSR_I_BIT | SVC_MODE
+
 	ldmfd	sp!, {r4 - r12, pc}		@ return to caller
 
 
> > Please note that you may see other ARM breakage over the next month
> > or so - I'm going to be concentrating on merging ARM SMP support,
> > and whatever bashing other people like yourself can give the kernel
> > will help ensure that problems are picked up quickly.
> 
> In order to assist with that, can you publish these patches somewhere?
> That way, I can apply them against a known good Zaurus kernel tree and
> know straight away if they break anything (diff/patch format would be
> preferable as my Zaurus trees are all patch based).

I'll see what I can do, but I'm going to be working fairly rapidly on
merging this, so expect roughly a patch each day.  Hopefully though,
the later patches will only affect the Integrator platform.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-19  9:02       ` 2.6.12-rc6-mm1 Russell King
@ 2005-06-19  9:11         ` Russell King
  2005-06-19 17:12           ` 2.6.12-rc6-mm1 Richard Purdie
  0 siblings, 1 reply; 72+ messages in thread
From: Russell King @ 2005-06-19  9:11 UTC (permalink / raw)
  To: Richard Purdie, LKML, Andrew Morton

On Sun, Jun 19, 2005 at 10:02:26AM +0100, Russell King wrote:
> On Sun, Jun 19, 2005 at 02:20:48AM +0100, Richard Purdie wrote:
> > On Sun, 2005-06-19 at 00:18 +0100, Russell King wrote: 
> > > Thinking about what's probably happening, I suspect all the ARM suspend
> > > and resume code needs to be reworked to save more state.  I'll try to
> > > cook up a patch tomorrow to fix it, but I'll need you to provide
> > > feedback.
> > 
> > Ok, thanks. I'm happy to test any fixes/patches.
> 
> This should resolve the problem - we now rely on the stack pointer for
> each CPU mode to remain constant throughout the running time of the
> kernel, which includes across suspend/resume cycles.

Actually, this patch is probably an all-round better solution.

--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -328,7 +328,7 @@ static void __init setup_processor(void)
  * cpu_init dumps the cache information, initialises SMP specific
  * information, and sets up the per-CPU stacks.
  */
-void __init cpu_init(void)
+void cpu_init(void)
 {
 	unsigned int cpu = smp_processor_id();
 	struct stack *stk = &stacks[cpu];
--- a/arch/arm/mach-pxa/pm.c
+++ b/arch/arm/mach-pxa/pm.c
@@ -133,6 +133,8 @@ static int pxa_pm_enter(suspend_state_t 
 	/* *** go zzz *** */
 	pxa_cpu_pm_enter(state);
 
+	cpu_init();
+
 	/* after sleeping, validate the checksum */
 	checksum = 0;
 	for (i = 0; i < SLEEP_SAVE_SIZE - 1; i++)
--- a/arch/arm/mach-sa1100/pm.c
+++ b/arch/arm/mach-sa1100/pm.c
@@ -88,6 +88,8 @@ static int sa11x0_pm_enter(suspend_state
 	/* go zzz */
 	sa1100_cpu_suspend();
 
+	cpu_init();
+
 	/*
 	 * Ensure not to come back here if it wasn't intended
 	 */
--- a/include/asm-arm/system.h
+++ b/include/asm-arm/system.h
@@ -104,6 +104,7 @@ extern void show_pte(struct mm_struct *m
 extern void __show_regs(struct pt_regs *);
 
 extern int cpu_architecture(void);
+extern void cpu_init(void);
 
 #define set_cr(x)					\
 	__asm__ __volatile__(				\


-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-19  9:11         ` 2.6.12-rc6-mm1 Russell King
@ 2005-06-19 17:12           ` Richard Purdie
  2005-06-19 17:39             ` 2.6.12-rc6-mm1 Russell King
  0 siblings, 1 reply; 72+ messages in thread
From: Richard Purdie @ 2005-06-19 17:12 UTC (permalink / raw)
  To: Russell King; +Cc: LKML, Andrew Morton

On Sun, 2005-06-19 at 10:11 +0100, Russell King wrote:
> On Sun, Jun 19, 2005 at 10:02:26AM +0100, Russell King wrote:
> > On Sun, Jun 19, 2005 at 02:20:48AM +0100, Richard Purdie wrote:
> > > On Sun, 2005-06-19 at 00:18 +0100, Russell King wrote: 
> > > > Thinking about what's probably happening, I suspect all the ARM suspend
> > > > and resume code needs to be reworked to save more state.  I'll try to
> > > > cook up a patch tomorrow to fix it, but I'll need you to provide
> > > > feedback.
> > > 
> > > Ok, thanks. I'm happy to test any fixes/patches.
> > 
> > This should resolve the problem - we now rely on the stack pointer for
> > each CPU mode to remain constant throughout the running time of the
> > kernel, which includes across suspend/resume cycles.
> 
> Actually, this patch is probably an all-round better solution.

This patch (the simpler of the two using cpu_init()) allows the pxa to
suspend/resume happily with the git-arm-smp.patch applied.

Richard


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-19 17:12           ` 2.6.12-rc6-mm1 Richard Purdie
@ 2005-06-19 17:39             ` Russell King
  2005-06-19 18:25               ` 2.6.12-rc6-mm1 Richard Purdie
  0 siblings, 1 reply; 72+ messages in thread
From: Russell King @ 2005-06-19 17:39 UTC (permalink / raw)
  To: Richard Purdie; +Cc: LKML, Andrew Morton

On Sun, Jun 19, 2005 at 06:12:38PM +0100, Richard Purdie wrote:
> This patch (the simpler of the two using cpu_init()) allows the pxa to
> suspend/resume happily with the git-arm-smp.patch applied.

Good.  Fix committed.

Next batched smp patch can be found at www.home.arm.../~rmk/nightly
which I'm currently planning to go to Linus tonight.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-19 17:39             ` 2.6.12-rc6-mm1 Russell King
@ 2005-06-19 18:25               ` Richard Purdie
  2005-06-19 18:56                 ` 2.6.12-rc6-mm1 Russell King
  0 siblings, 1 reply; 72+ messages in thread
From: Richard Purdie @ 2005-06-19 18:25 UTC (permalink / raw)
  To: Russell King; +Cc: LKML, Andrew Morton

On Sun, 2005-06-19 at 18:39 +0100, Russell King wrote:
> Good.  Fix committed.

Thanks.

> Next batched smp patch can be found at www.home.arm.../~rmk/nightly
> which I'm currently planning to go to Linus tonight.

I applied smp-20050619.patch to 2.6.12-rc6-mm1 + the last fix and the
Zaurus seems perfectly happy with it. Let me know as and when you have
further releases that need testing (a message to linux-arm-kernel might
be the best way to announce them?).

Richard


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-19 18:25               ` 2.6.12-rc6-mm1 Richard Purdie
@ 2005-06-19 18:56                 ` Russell King
  0 siblings, 0 replies; 72+ messages in thread
From: Russell King @ 2005-06-19 18:56 UTC (permalink / raw)
  To: Richard Purdie; +Cc: LKML, Andrew Morton

On Sun, Jun 19, 2005 at 07:25:59PM +0100, Richard Purdie wrote:
> On Sun, 2005-06-19 at 18:39 +0100, Russell King wrote:
> > Next batched smp patch can be found at www.home.arm.../~rmk/nightly
> > which I'm currently planning to go to Linus tonight.
> 
> I applied smp-20050619.patch to 2.6.12-rc6-mm1 + the last fix and the
> Zaurus seems perfectly happy with it. Let me know as and when you have
> further releases that need testing (a message to linux-arm-kernel might
> be the best way to announce them?).

Thanks for testing.  Most of the other patches are platform specific
so this may not be required.  However, if there are other changes to
non-platform specific, I'll try to point them out a couple of days
before they get merged.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
                   ` (6 preceding siblings ...)
  2005-06-18 22:39 ` 2.6.12-rc6-mm1 Richard Purdie
@ 2005-06-21 13:20 ` Dominik Karall
  2005-06-24 21:27   ` 2.6.12-rc6-mm1 Alexey Dobriyan
  2005-07-29  4:54   ` 2.6.12-rc6-mm1 Andrew Morton
  7 siblings, 2 replies; 72+ messages in thread
From: Dominik Karall @ 2005-06-21 13:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc6/2.
>6.12-rc6-mm1/

After looking in my dmesg output today, I saw following error with 
2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when it exactly 
happens, cause I never used mono last time, I just did an emerge mono on my 
gentoo system, maybe this forced the failure.

note: mono[26736] exited with preempt_count 1
scheduling while atomic: mono/0x10000001/26736

Call Trace:<ffffffff803e13ea>{schedule+122} <ffffffff8013197b>{vprintk+635}
       <ffffffff803e2738>{cond_resched+56} <ffffffff80164de3>{unmap_vmas+1587}
       <ffffffff8016a560>{exit_mmap+128} <ffffffff8012e7bf>{mmput+31}
       <ffffffff80133466>{do_exit+438} 
<ffffffff8013bf25>{__dequeue_signal+501}
       <ffffffff801340c8>{do_group_exit+280} 
<ffffffff8013e147>{get_signal_to_deliver+1575}
       <ffffffff8010de92>{do_signal+162} 
<ffffffff8012d1e0>{default_wake_function+0}
       <ffffffff8010e8e1>{sys_rt_sigreturn+577} 
<ffffffff8010eb3f>{sysret_signal+28}
       <ffffffff8010ee27>{ptregscall_common+103}

cheers,
dominik

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-21 13:20 ` 2.6.12-rc6-mm1 Dominik Karall
@ 2005-06-24 21:27   ` Alexey Dobriyan
  2005-07-29  4:54   ` 2.6.12-rc6-mm1 Andrew Morton
  1 sibling, 0 replies; 72+ messages in thread
From: Alexey Dobriyan @ 2005-06-24 21:27 UTC (permalink / raw)
  To: Dominik Karall; +Cc: Andrew Morton, linux-kernel

On Tuesday 21 June 2005 17:20, Dominik Karall wrote:
> After looking in my dmesg output today, I saw following error with 
> 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when it exactly 
> happens, cause I never used mono last time, I just did an emerge mono on my 
> gentoo system, maybe this forced the failure.
> 
> note: mono[26736] exited with preempt_count 1
> scheduling while atomic: mono/0x10000001/26736

I've filed a bug at kernel bugzilla, so your report won't be lost.
See http://bugme.osdl.org/show_bug.cgi?id=4794

You can register at http://bugme.osdl.org/createaccount.cgi and add yourself
to CC list.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-06-21 13:20 ` 2.6.12-rc6-mm1 Dominik Karall
  2005-06-24 21:27   ` 2.6.12-rc6-mm1 Alexey Dobriyan
@ 2005-07-29  4:54   ` Andrew Morton
  2005-07-29 13:39     ` 2.6.12-rc6-mm1 Dominik Karall
  1 sibling, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-07-29  4:54 UTC (permalink / raw)
  To: Dominik Karall; +Cc: linux-kernel

Dominik Karall <dominik.karall@gmx.net> wrote:
>
> On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc6/2.
> >6.12-rc6-mm1/
> 
> After looking in my dmesg output today, I saw following error with 
> 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when it exactly 
> happens, cause I never used mono last time, I just did an emerge mono on my 
> gentoo system, maybe this forced the failure.
> 
> note: mono[26736] exited with preempt_count 1
> scheduling while atomic: mono/0x10000001/26736
> 
> Call Trace:<ffffffff803e13ea>{schedule+122} <ffffffff8013197b>{vprintk+635}
>        <ffffffff803e2738>{cond_resched+56} <ffffffff80164de3>{unmap_vmas+1587}
>        <ffffffff8016a560>{exit_mmap+128} <ffffffff8012e7bf>{mmput+31}
>        <ffffffff80133466>{do_exit+438} 
> <ffffffff8013bf25>{__dequeue_signal+501}
>        <ffffffff801340c8>{do_group_exit+280} 
> <ffffffff8013e147>{get_signal_to_deliver+1575}
>        <ffffffff8010de92>{do_signal+162} 
> <ffffffff8012d1e0>{default_wake_function+0}
>        <ffffffff8010e8e1>{sys_rt_sigreturn+577} 
> <ffffffff8010eb3f>{sysret_signal+28}
>        <ffffffff8010ee27>{ptregscall_common+103}
> 

A couple of people reported this, but all seems to have gone quiet.  Is it
fixed in later -mm's?   Is 2.6.13-rc4 running OK?

Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-07-29  4:54   ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-07-29 13:39     ` Dominik Karall
  2005-07-29 18:22       ` 2.6.12-rc6-mm1 Andrew Morton
  0 siblings, 1 reply; 72+ messages in thread
From: Dominik Karall @ 2005-07-29 13:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2179 bytes --]

On Friday 29 July 2005 06:54, Andrew Morton wrote:
> Dominik Karall <dominik.karall@gmx.net> wrote:
> > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc
> > >6/2. 6.12-rc6-mm1/
> >
> > After looking in my dmesg output today, I saw following error with
> > 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when it exactly
> > happens, cause I never used mono last time, I just did an emerge mono on
> > my gentoo system, maybe this forced the failure.
> >
> > note: mono[26736] exited with preempt_count 1
> > scheduling while atomic: mono/0x10000001/26736
> >
> > Call Trace:<ffffffff803e13ea>{schedule+122}
> > <ffffffff8013197b>{vprintk+635} <ffffffff803e2738>{cond_resched+56}
> > <ffffffff80164de3>{unmap_vmas+1587} <ffffffff8016a560>{exit_mmap+128}
> > <ffffffff8012e7bf>{mmput+31} <ffffffff80133466>{do_exit+438}
> > <ffffffff8013bf25>{__dequeue_signal+501}
> >        <ffffffff801340c8>{do_group_exit+280}
> > <ffffffff8013e147>{get_signal_to_deliver+1575}
> >        <ffffffff8010de92>{do_signal+162}
> > <ffffffff8012d1e0>{default_wake_function+0}
> >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > <ffffffff8010eb3f>{sysret_signal+28}
> >        <ffffffff8010ee27>{ptregscall_common+103}
>
> A couple of people reported this, but all seems to have gone quiet.  Is it
> fixed in later -mm's?   Is 2.6.13-rc4 running OK?
>
> Thanks.

hi andrew!

I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did an emerge mono 
right now to test it, and I got this one:
Jul 29 15:26:37 [kernel] note: mono[11138] exited with preempt_count 1
Jul 29 15:26:50 [kernel] file[14627]: segfault at 00002aaaab453000 rip 
00002aaaaaf652cf rsp 00007fffffe43b50 error 4
Jul 29 15:26:50 [kernel] file[14633]: segfault at 00002aaaab453000 rip 
00002aaaaaf652cf rsp 00007fffffcc87a0 error 4
Jul 29 15:26:51 [kernel] file[14669]: segfault at 00002aaaab453000 rip 
00002aaaaaf652cf rsp 00007fffff905f80 error 4

DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get more info about 
the bug. Did I forget any debug option?

greets,
dominik

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-07-29 13:39     ` 2.6.12-rc6-mm1 Dominik Karall
@ 2005-07-29 18:22       ` Andrew Morton
  2005-07-29 21:19         ` 2.6.12-rc6-mm1 Dominik Karall
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-07-29 18:22 UTC (permalink / raw)
  To: Dominik Karall; +Cc: linux-kernel

Dominik Karall <dominik.karall@gmx.net> wrote:
>
> On Friday 29 July 2005 06:54, Andrew Morton wrote:
> > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc
> > > >6/2. 6.12-rc6-mm1/
> > >
> > > After looking in my dmesg output today, I saw following error with
> > > 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when it exactly
> > > happens, cause I never used mono last time, I just did an emerge mono on
> > > my gentoo system, maybe this forced the failure.
> > >
> > > note: mono[26736] exited with preempt_count 1
> > > scheduling while atomic: mono/0x10000001/26736
> > >
> > > Call Trace:<ffffffff803e13ea>{schedule+122}
> > > <ffffffff8013197b>{vprintk+635} <ffffffff803e2738>{cond_resched+56}
> > > <ffffffff80164de3>{unmap_vmas+1587} <ffffffff8016a560>{exit_mmap+128}
> > > <ffffffff8012e7bf>{mmput+31} <ffffffff80133466>{do_exit+438}
> > > <ffffffff8013bf25>{__dequeue_signal+501}
> > >        <ffffffff801340c8>{do_group_exit+280}
> > > <ffffffff8013e147>{get_signal_to_deliver+1575}
> > >        <ffffffff8010de92>{do_signal+162}
> > > <ffffffff8012d1e0>{default_wake_function+0}
> > >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > > <ffffffff8010eb3f>{sysret_signal+28}
> > >        <ffffffff8010ee27>{ptregscall_common+103}
> >
> > A couple of people reported this, but all seems to have gone quiet.  Is it
> > fixed in later -mm's?   Is 2.6.13-rc4 running OK?
> >
> > Thanks.
> 
> hi andrew!
> 
> I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did an emerge mono 
> right now to test it, and I got this one:
> Jul 29 15:26:37 [kernel] note: mono[11138] exited with preempt_count 1
> Jul 29 15:26:50 [kernel] file[14627]: segfault at 00002aaaab453000 rip 
> 00002aaaaaf652cf rsp 00007fffffe43b50 error 4
> Jul 29 15:26:50 [kernel] file[14633]: segfault at 00002aaaab453000 rip 
> 00002aaaaaf652cf rsp 00007fffffcc87a0 error 4
> Jul 29 15:26:51 [kernel] file[14669]: segfault at 00002aaaab453000 rip 
> 00002aaaaaf652cf rsp 00007fffff905f80 error 4
> 
> DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get more info about 
> the bug. Did I forget any debug option?

Gee, I don't know how to find this one.  Do you know if the problem is
specific to -mm?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-07-29 18:22       ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-07-29 21:19         ` Dominik Karall
  2005-07-29 21:27           ` 2.6.12-rc6-mm1 Andrew Morton
  0 siblings, 1 reply; 72+ messages in thread
From: Dominik Karall @ 2005-07-29 21:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

On Friday 29 July 2005 20:22, Andrew Morton wrote:
> Dominik Karall <dominik.karall@gmx.net> wrote:
> > On Friday 29 July 2005 06:54, Andrew Morton wrote:
> > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.1
> > > > >2-rc 6/2. 6.12-rc6-mm1/
> > > >
> > > > After looking in my dmesg output today, I saw following error with
> > > > 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when it
> > > > exactly happens, cause I never used mono last time, I just did an
> > > > emerge mono on my gentoo system, maybe this forced the failure.
> > > >
> > > > note: mono[26736] exited with preempt_count 1
> > > > scheduling while atomic: mono/0x10000001/26736
> > > >
> > > > Call Trace:<ffffffff803e13ea>{schedule+122}
> > > > <ffffffff8013197b>{vprintk+635} <ffffffff803e2738>{cond_resched+56}
> > > > <ffffffff80164de3>{unmap_vmas+1587} <ffffffff8016a560>{exit_mmap+128}
> > > > <ffffffff8012e7bf>{mmput+31} <ffffffff80133466>{do_exit+438}
> > > > <ffffffff8013bf25>{__dequeue_signal+501}
> > > >        <ffffffff801340c8>{do_group_exit+280}
> > > > <ffffffff8013e147>{get_signal_to_deliver+1575}
> > > >        <ffffffff8010de92>{do_signal+162}
> > > > <ffffffff8012d1e0>{default_wake_function+0}
> > > >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > > > <ffffffff8010eb3f>{sysret_signal+28}
> > > >        <ffffffff8010ee27>{ptregscall_common+103}
> > >
> > > A couple of people reported this, but all seems to have gone quiet.  Is
> > > it fixed in later -mm's?   Is 2.6.13-rc4 running OK?
> > >
> > > Thanks.
> >
> > hi andrew!
> >
> > I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did an emerge
> > mono right now to test it, and I got this one:
> > Jul 29 15:26:37 [kernel] note: mono[11138] exited with preempt_count 1
> > Jul 29 15:26:50 [kernel] file[14627]: segfault at 00002aaaab453000 rip
> > 00002aaaaaf652cf rsp 00007fffffe43b50 error 4
> > Jul 29 15:26:50 [kernel] file[14633]: segfault at 00002aaaab453000 rip
> > 00002aaaaaf652cf rsp 00007fffffcc87a0 error 4
> > Jul 29 15:26:51 [kernel] file[14669]: segfault at 00002aaaab453000 rip
> > 00002aaaaaf652cf rsp 00007fffff905f80 error 4
> >
> > DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get more info
> > about the bug. Did I forget any debug option?
>
> Gee, I don't know how to find this one.  Do you know if the problem is
> specific to -mm?

Tested with 2.6.13-rc4 and it seems to work. Didn't get any error.

So it seems to be -mm related. Do you suspect any patch which could cause the 
error?

dominik

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-07-29 21:19         ` 2.6.12-rc6-mm1 Dominik Karall
@ 2005-07-29 21:27           ` Andrew Morton
  2005-07-29 21:37             ` 2.6.12-rc6-mm1 Dominik Karall
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-07-29 21:27 UTC (permalink / raw)
  To: Dominik Karall; +Cc: linux-kernel

Dominik Karall <dominik.karall@gmx.net> wrote:
>
> On Friday 29 July 2005 20:22, Andrew Morton wrote:
> > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > On Friday 29 July 2005 06:54, Andrew Morton wrote:
> > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.1
> > > > > >2-rc 6/2. 6.12-rc6-mm1/
> > > > >
> > > > > After looking in my dmesg output today, I saw following error with
> > > > > 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when it
> > > > > exactly happens, cause I never used mono last time, I just did an
> > > > > emerge mono on my gentoo system, maybe this forced the failure.
> > > > >
> > > > > note: mono[26736] exited with preempt_count 1
> > > > > scheduling while atomic: mono/0x10000001/26736
> > > > >
> > > > > Call Trace:<ffffffff803e13ea>{schedule+122}
> > > > > <ffffffff8013197b>{vprintk+635} <ffffffff803e2738>{cond_resched+56}
> > > > > <ffffffff80164de3>{unmap_vmas+1587} <ffffffff8016a560>{exit_mmap+128}
> > > > > <ffffffff8012e7bf>{mmput+31} <ffffffff80133466>{do_exit+438}
> > > > > <ffffffff8013bf25>{__dequeue_signal+501}
> > > > >        <ffffffff801340c8>{do_group_exit+280}
> > > > > <ffffffff8013e147>{get_signal_to_deliver+1575}
> > > > >        <ffffffff8010de92>{do_signal+162}
> > > > > <ffffffff8012d1e0>{default_wake_function+0}
> > > > >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > > > > <ffffffff8010eb3f>{sysret_signal+28}
> > > > >        <ffffffff8010ee27>{ptregscall_common+103}
> > > >
> > > > A couple of people reported this, but all seems to have gone quiet.  Is
> > > > it fixed in later -mm's?   Is 2.6.13-rc4 running OK?
> > > >
> > > > Thanks.
> > >
> > > hi andrew!
> > >
> > > I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did an emerge
> > > mono right now to test it, and I got this one:
> > > Jul 29 15:26:37 [kernel] note: mono[11138] exited with preempt_count 1
> > > Jul 29 15:26:50 [kernel] file[14627]: segfault at 00002aaaab453000 rip
> > > 00002aaaaaf652cf rsp 00007fffffe43b50 error 4
> > > Jul 29 15:26:50 [kernel] file[14633]: segfault at 00002aaaab453000 rip
> > > 00002aaaaaf652cf rsp 00007fffffcc87a0 error 4
> > > Jul 29 15:26:51 [kernel] file[14669]: segfault at 00002aaaab453000 rip
> > > 00002aaaaaf652cf rsp 00007fffff905f80 error 4
> > >
> > > DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get more info
> > > about the bug. Did I forget any debug option?
> >
> > Gee, I don't know how to find this one.  Do you know if the problem is
> > specific to -mm?
> 
> Tested with 2.6.13-rc4 and it seems to work. Didn't get any error.

Great, thanks for that.

> So it seems to be -mm related. Do you suspect any patch which could cause the 
> error?

I wouldn't know, sorry.  Possible the scheduler patches, possibly an
x86_64-specific patch.  Is the problem repeatable?  If so, a binary search
would only take ten build-n-boots ;)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-07-29 21:27           ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-07-29 21:37             ` Dominik Karall
  2005-08-04 19:44               ` 2.6.12-rc6-mm1 Andrew Morton
  0 siblings, 1 reply; 72+ messages in thread
From: Dominik Karall @ 2005-07-29 21:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3597 bytes --]

On Friday 29 July 2005 23:27, Andrew Morton wrote:
> Dominik Karall <dominik.karall@gmx.net> wrote:
> > On Friday 29 July 2005 20:22, Andrew Morton wrote:
> > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > On Friday 29 July 2005 06:54, Andrew Morton wrote:
> > > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2
> > > > > > >.6.1 2-rc 6/2. 6.12-rc6-mm1/
> > > > > >
> > > > > > After looking in my dmesg output today, I saw following error
> > > > > > with 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when
> > > > > > it exactly happens, cause I never used mono last time, I just did
> > > > > > an emerge mono on my gentoo system, maybe this forced the
> > > > > > failure.
> > > > > >
> > > > > > note: mono[26736] exited with preempt_count 1
> > > > > > scheduling while atomic: mono/0x10000001/26736
> > > > > >
> > > > > > Call Trace:<ffffffff803e13ea>{schedule+122}
> > > > > > <ffffffff8013197b>{vprintk+635}
> > > > > > <ffffffff803e2738>{cond_resched+56}
> > > > > > <ffffffff80164de3>{unmap_vmas+1587}
> > > > > > <ffffffff8016a560>{exit_mmap+128} <ffffffff8012e7bf>{mmput+31}
> > > > > > <ffffffff80133466>{do_exit+438}
> > > > > > <ffffffff8013bf25>{__dequeue_signal+501}
> > > > > >        <ffffffff801340c8>{do_group_exit+280}
> > > > > > <ffffffff8013e147>{get_signal_to_deliver+1575}
> > > > > >        <ffffffff8010de92>{do_signal+162}
> > > > > > <ffffffff8012d1e0>{default_wake_function+0}
> > > > > >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > > > > > <ffffffff8010eb3f>{sysret_signal+28}
> > > > > >        <ffffffff8010ee27>{ptregscall_common+103}
> > > > >
> > > > > A couple of people reported this, but all seems to have gone quiet.
> > > > >  Is it fixed in later -mm's?   Is 2.6.13-rc4 running OK?
> > > > >
> > > > > Thanks.
> > > >
> > > > hi andrew!
> > > >
> > > > I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did an
> > > > emerge mono right now to test it, and I got this one:
> > > > Jul 29 15:26:37 [kernel] note: mono[11138] exited with preempt_count
> > > > 1 Jul 29 15:26:50 [kernel] file[14627]: segfault at 00002aaaab453000
> > > > rip 00002aaaaaf652cf rsp 00007fffffe43b50 error 4
> > > > Jul 29 15:26:50 [kernel] file[14633]: segfault at 00002aaaab453000
> > > > rip 00002aaaaaf652cf rsp 00007fffffcc87a0 error 4
> > > > Jul 29 15:26:51 [kernel] file[14669]: segfault at 00002aaaab453000
> > > > rip 00002aaaaaf652cf rsp 00007fffff905f80 error 4
> > > >
> > > > DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get more
> > > > info about the bug. Did I forget any debug option?
> > >
> > > Gee, I don't know how to find this one.  Do you know if the problem is
> > > specific to -mm?
> >
> > Tested with 2.6.13-rc4 and it seems to work. Didn't get any error.
>
> Great, thanks for that.
>
> > So it seems to be -mm related. Do you suspect any patch which could cause
> > the error?
>
> I wouldn't know, sorry.  Possible the scheduler patches, possibly an
> x86_64-specific patch.  Is the problem repeatable?  If so, a binary search
> would only take ten build-n-boots ;)

Yes, it is repeatable. I tested on lastest -mm about 4 times. Ok, I will try 
to find the right patch tomorrow, 10 build-n-boots would end up in morning ;)

btw, as the error occured in 2.6.12-rc6-mm1 too, it must be an old patch which 
wasn't merged to linus tree till now...hope there aren't a lot of them :)

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-07-29 21:37             ` 2.6.12-rc6-mm1 Dominik Karall
@ 2005-08-04 19:44               ` Andrew Morton
  2005-08-04 22:28                 ` 2.6.12-rc6-mm1 Andrew Morton
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-08-04 19:44 UTC (permalink / raw)
  To: Dominik Karall; +Cc: linux-kernel

Dominik Karall <dominik.karall@gmx.net> wrote:
>
> On Friday 29 July 2005 23:27, Andrew Morton wrote:
> > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > On Friday 29 July 2005 20:22, Andrew Morton wrote:
> > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > On Friday 29 July 2005 06:54, Andrew Morton wrote:
> > > > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > > > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2
> > > > > > > >.6.1 2-rc 6/2. 6.12-rc6-mm1/
> > > > > > >
> > > > > > > After looking in my dmesg output today, I saw following error
> > > > > > > with 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when
> > > > > > > it exactly happens, cause I never used mono last time, I just did
> > > > > > > an emerge mono on my gentoo system, maybe this forced the
> > > > > > > failure.
> > > > > > >
> > > > > > > note: mono[26736] exited with preempt_count 1
> > > > > > > scheduling while atomic: mono/0x10000001/26736
> > > > > > >
> > > > > > > Call Trace:<ffffffff803e13ea>{schedule+122}
> > > > > > > <ffffffff8013197b>{vprintk+635}
> > > > > > > <ffffffff803e2738>{cond_resched+56}
> > > > > > > <ffffffff80164de3>{unmap_vmas+1587}
> > > > > > > <ffffffff8016a560>{exit_mmap+128} <ffffffff8012e7bf>{mmput+31}
> > > > > > > <ffffffff80133466>{do_exit+438}
> > > > > > > <ffffffff8013bf25>{__dequeue_signal+501}
> > > > > > >        <ffffffff801340c8>{do_group_exit+280}
> > > > > > > <ffffffff8013e147>{get_signal_to_deliver+1575}
> > > > > > >        <ffffffff8010de92>{do_signal+162}
> > > > > > > <ffffffff8012d1e0>{default_wake_function+0}
> > > > > > >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > > > > > > <ffffffff8010eb3f>{sysret_signal+28}
> > > > > > >        <ffffffff8010ee27>{ptregscall_common+103}
> > > > > >
> > > > > > A couple of people reported this, but all seems to have gone quiet.
> > > > > >  Is it fixed in later -mm's?   Is 2.6.13-rc4 running OK?
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > hi andrew!
> > > > >
> > > > > I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did an
> > > > > emerge mono right now to test it, and I got this one:
> > > > > Jul 29 15:26:37 [kernel] note: mono[11138] exited with preempt_count
> > > > > 1 Jul 29 15:26:50 [kernel] file[14627]: segfault at 00002aaaab453000
> > > > > rip 00002aaaaaf652cf rsp 00007fffffe43b50 error 4
> > > > > Jul 29 15:26:50 [kernel] file[14633]: segfault at 00002aaaab453000
> > > > > rip 00002aaaaaf652cf rsp 00007fffffcc87a0 error 4
> > > > > Jul 29 15:26:51 [kernel] file[14669]: segfault at 00002aaaab453000
> > > > > rip 00002aaaaaf652cf rsp 00007fffff905f80 error 4
> > > > >
> > > > > DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get more
> > > > > info about the bug. Did I forget any debug option?
> > > >
> > > > Gee, I don't know how to find this one.  Do you know if the problem is
> > > > specific to -mm?
> > >
> > > Tested with 2.6.13-rc4 and it seems to work. Didn't get any error.
> >
> > Great, thanks for that.
> >
> > > So it seems to be -mm related. Do you suspect any patch which could cause
> > > the error?
> >
> > I wouldn't know, sorry.  Possible the scheduler patches, possibly an
> > x86_64-specific patch.  Is the problem repeatable?  If so, a binary search
> > would only take ten build-n-boots ;)
> 
> Yes, it is repeatable. I tested on lastest -mm about 4 times. Ok, I will try 
> to find the right patch tomorrow, 10 build-n-boots would end up in morning ;)
> 
> btw, as the error occured in 2.6.12-rc6-mm1 too, it must be an old patch which 
> wasn't merged to linus tree till now...hope there aren't a lot of them :)
> 

Any progress on this?  It kinda measn that the whole of the -mm lineup is
stuck until we can identify the offending patch.  We have a couple of weeks
in which to do this but if you can identify the bad patch it'd help
enormously, thanks.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-08-04 19:44               ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-08-04 22:28                 ` Andrew Morton
  2005-08-04 22:44                   ` 2.6.12-rc6-mm1 Dominik Karall
  0 siblings, 1 reply; 72+ messages in thread
From: Andrew Morton @ 2005-08-04 22:28 UTC (permalink / raw)
  To: dominik.karall, linux-kernel; +Cc: Ingo Molnar

Andrew Morton <akpm@osdl.org> wrote:
>
> Dominik Karall <dominik.karall@gmx.net> wrote:
> >
> > On Friday 29 July 2005 23:27, Andrew Morton wrote:
> > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > On Friday 29 July 2005 20:22, Andrew Morton wrote:
> > > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > > On Friday 29 July 2005 06:54, Andrew Morton wrote:
> > > > > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > > > > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2
> > > > > > > > >.6.1 2-rc 6/2. 6.12-rc6-mm1/
> > > > > > > >
> > > > > > > > After looking in my dmesg output today, I saw following error
> > > > > > > > with 2.6.12-rc6-mm1, maybe it's usefull to you. I don't know when
> > > > > > > > it exactly happens, cause I never used mono last time, I just did
> > > > > > > > an emerge mono on my gentoo system, maybe this forced the
> > > > > > > > failure.
> > > > > > > >
> > > > > > > > note: mono[26736] exited with preempt_count 1
> > > > > > > > scheduling while atomic: mono/0x10000001/26736
> > > > > > > >
> > > > > > > > Call Trace:<ffffffff803e13ea>{schedule+122}
> > > > > > > > <ffffffff8013197b>{vprintk+635}
> > > > > > > > <ffffffff803e2738>{cond_resched+56}
> > > > > > > > <ffffffff80164de3>{unmap_vmas+1587}
> > > > > > > > <ffffffff8016a560>{exit_mmap+128} <ffffffff8012e7bf>{mmput+31}
> > > > > > > > <ffffffff80133466>{do_exit+438}
> > > > > > > > <ffffffff8013bf25>{__dequeue_signal+501}
> > > > > > > >        <ffffffff801340c8>{do_group_exit+280}
> > > > > > > > <ffffffff8013e147>{get_signal_to_deliver+1575}
> > > > > > > >        <ffffffff8010de92>{do_signal+162}
> > > > > > > > <ffffffff8012d1e0>{default_wake_function+0}
> > > > > > > >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > > > > > > > <ffffffff8010eb3f>{sysret_signal+28}
> > > > > > > >        <ffffffff8010ee27>{ptregscall_common+103}
> > > > > > >
> > > > > > > A couple of people reported this, but all seems to have gone quiet.
> > > > > > >  Is it fixed in later -mm's?   Is 2.6.13-rc4 running OK?
> > > > > > >
> > > > > > > Thanks.
> > > > > >
> > > > > > hi andrew!
> > > > > >
> > > > > > I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did an
> > > > > > emerge mono right now to test it, and I got this one:
> > > > > > Jul 29 15:26:37 [kernel] note: mono[11138] exited with preempt_count
> > > > > > 1 Jul 29 15:26:50 [kernel] file[14627]: segfault at 00002aaaab453000
> > > > > > rip 00002aaaaaf652cf rsp 00007fffffe43b50 error 4
> > > > > > Jul 29 15:26:50 [kernel] file[14633]: segfault at 00002aaaab453000
> > > > > > rip 00002aaaaaf652cf rsp 00007fffffcc87a0 error 4
> > > > > > Jul 29 15:26:51 [kernel] file[14669]: segfault at 00002aaaab453000
> > > > > > rip 00002aaaaaf652cf rsp 00007fffff905f80 error 4
> > > > > >
> > > > > > DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get more
> > > > > > info about the bug. Did I forget any debug option?
> > > > >
> > > > > Gee, I don't know how to find this one.  Do you know if the problem is
> > > > > specific to -mm?
> > > >
> > > > Tested with 2.6.13-rc4 and it seems to work. Didn't get any error.
> > >
> > > Great, thanks for that.
> > >
> > > > So it seems to be -mm related. Do you suspect any patch which could cause
> > > > the error?
> > >
> > > I wouldn't know, sorry.  Possible the scheduler patches, possibly an
> > > x86_64-specific patch.  Is the problem repeatable?  If so, a binary search
> > > would only take ten build-n-boots ;)
> > 
> > Yes, it is repeatable. I tested on lastest -mm about 4 times. Ok, I will try 
> > to find the right patch tomorrow, 10 build-n-boots would end up in morning ;)
> > 
> > btw, as the error occured in 2.6.12-rc6-mm1 too, it must be an old patch which 
> > wasn't merged to linus tree till now...hope there aren't a lot of them :)
> > 
> 
> Any progress on this?  It kinda measn that the whole of the -mm lineup is
> stuck until we can identify the offending patch.  We have a couple of weeks
> in which to do this but if you can identify the bad patch it'd help
> enormously, thanks.
> 

OK, Bartosz Taudul tells me that he's occasionally seeing this on stock
2.6.12 (thanks!).  So there's not a lot of point in doing the -mm bisection
search.

I think Ingo was planning on coming up with some infrastructure which would
allow us to debug this further.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 2.6.12-rc6-mm1
  2005-08-04 22:28                 ` 2.6.12-rc6-mm1 Andrew Morton
@ 2005-08-04 22:44                   ` Dominik Karall
  0 siblings, 0 replies; 72+ messages in thread
From: Dominik Karall @ 2005-08-04 22:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 5121 bytes --]

On Friday 05 August 2005 00:28, Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > On Friday 29 July 2005 23:27, Andrew Morton wrote:
> > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > On Friday 29 July 2005 20:22, Andrew Morton wrote:
> > > > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > > > On Friday 29 July 2005 06:54, Andrew Morton wrote:
> > > > > > > > Dominik Karall <dominik.karall@gmx.net> wrote:
> > > > > > > > > On Tuesday 07 June 2005 13:29, Andrew Morton wrote:
> > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches
> > > > > > > > > >/2.6/2 .6.1 2-rc 6/2. 6.12-rc6-mm1/
> > > > > > > > >
> > > > > > > > > After looking in my dmesg output today, I saw following
> > > > > > > > > error with 2.6.12-rc6-mm1, maybe it's usefull to you. I
> > > > > > > > > don't know when it exactly happens, cause I never used mono
> > > > > > > > > last time, I just did an emerge mono on my gentoo system,
> > > > > > > > > maybe this forced the failure.
> > > > > > > > >
> > > > > > > > > note: mono[26736] exited with preempt_count 1
> > > > > > > > > scheduling while atomic: mono/0x10000001/26736
> > > > > > > > >
> > > > > > > > > Call Trace:<ffffffff803e13ea>{schedule+122}
> > > > > > > > > <ffffffff8013197b>{vprintk+635}
> > > > > > > > > <ffffffff803e2738>{cond_resched+56}
> > > > > > > > > <ffffffff80164de3>{unmap_vmas+1587}
> > > > > > > > > <ffffffff8016a560>{exit_mmap+128}
> > > > > > > > > <ffffffff8012e7bf>{mmput+31}
> > > > > > > > > <ffffffff80133466>{do_exit+438}
> > > > > > > > > <ffffffff8013bf25>{__dequeue_signal+501}
> > > > > > > > >        <ffffffff801340c8>{do_group_exit+280}
> > > > > > > > > <ffffffff8013e147>{get_signal_to_deliver+1575}
> > > > > > > > >        <ffffffff8010de92>{do_signal+162}
> > > > > > > > > <ffffffff8012d1e0>{default_wake_function+0}
> > > > > > > > >        <ffffffff8010e8e1>{sys_rt_sigreturn+577}
> > > > > > > > > <ffffffff8010eb3f>{sysret_signal+28}
> > > > > > > > >        <ffffffff8010ee27>{ptregscall_common+103}
> > > > > > > >
> > > > > > > > A couple of people reported this, but all seems to have gone
> > > > > > > > quiet. Is it fixed in later -mm's?   Is 2.6.13-rc4 running
> > > > > > > > OK?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > >
> > > > > > > hi andrew!
> > > > > > >
> > > > > > > I'm sorry, but it's not fixed in current 2.6.13-rc3-mm3. I did
> > > > > > > an emerge mono right now to test it, and I got this one:
> > > > > > > Jul 29 15:26:37 [kernel] note: mono[11138] exited with
> > > > > > > preempt_count 1 Jul 29 15:26:50 [kernel] file[14627]: segfault
> > > > > > > at 00002aaaab453000 rip 00002aaaaaf652cf rsp 00007fffffe43b50
> > > > > > > error 4
> > > > > > > Jul 29 15:26:50 [kernel] file[14633]: segfault at
> > > > > > > 00002aaaab453000 rip 00002aaaaaf652cf rsp 00007fffffcc87a0
> > > > > > > error 4
> > > > > > > Jul 29 15:26:51 [kernel] file[14669]: segfault at
> > > > > > > 00002aaaab453000 rip 00002aaaaaf652cf rsp 00007fffff905f80
> > > > > > > error 4
> > > > > > >
> > > > > > > DEBUG_KERNEL/ PREEMPT/ SPINLOCK are enabled, but I didn't get
> > > > > > > more info about the bug. Did I forget any debug option?
> > > > > >
> > > > > > Gee, I don't know how to find this one.  Do you know if the
> > > > > > problem is specific to -mm?
> > > > >
> > > > > Tested with 2.6.13-rc4 and it seems to work. Didn't get any error.
> > > >
> > > > Great, thanks for that.
> > > >
> > > > > So it seems to be -mm related. Do you suspect any patch which could
> > > > > cause the error?
> > > >
> > > > I wouldn't know, sorry.  Possible the scheduler patches, possibly an
> > > > x86_64-specific patch.  Is the problem repeatable?  If so, a binary
> > > > search would only take ten build-n-boots ;)
> > >
> > > Yes, it is repeatable. I tested on lastest -mm about 4 times. Ok, I
> > > will try to find the right patch tomorrow, 10 build-n-boots would end
> > > up in morning ;)
> > >
> > > btw, as the error occured in 2.6.12-rc6-mm1 too, it must be an old
> > > patch which wasn't merged to linus tree till now...hope there aren't a
> > > lot of them :)
> >
> > Any progress on this?  It kinda measn that the whole of the -mm lineup is
> > stuck until we can identify the offending patch.  We have a couple of
> > weeks in which to do this but if you can identify the bad patch it'd help
> > enormously, thanks.
>
> OK, Bartosz Taudul tells me that he's occasionally seeing this on stock
> 2.6.12 (thanks!).  So there's not a lot of point in doing the -mm bisection
> search.
>
> I think Ingo was planning on coming up with some infrastructure which would
> allow us to debug this further.

I'm sorry that I couldn't do the tests earlier, but I had no time this week. I 
did some tests now and noticed that the bug only occures when kde is 
running...weird.
I'm going to continue testing tomorrow after work, exactly in 12 hours ;)

I will let you know if I have any news!

dominik

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2005-08-04 22:44 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-07 23:50 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-07 23:56 ` 2.6.12-rc6-mm1 Andrew Morton
2005-06-08  0:02   ` 2.6.12-rc6-mm1 Christoph Lameter
2005-06-08  0:08     ` 2.6.12-rc6-mm1 Andrew Morton
2005-06-08  3:17       ` 2.6.12-rc6-mm1 Nick Piggin
2005-06-08  3:33         ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-08  3:50           ` 2.6.12-rc6-mm1 Nick Piggin
2005-06-08 14:15       ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-09 23:56       ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-10  7:02         ` 2.6.12-rc6-mm1 Ingo Molnar
2005-06-10 12:03           ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-10 14:19             ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-10 23:14               ` 2.6.12-rc6-mm1 J.A. Magallon
2005-06-10 23:59                 ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-11  0:18                   ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-11  0:32                   ` 2.6.12-rc6-mm1 J.A. Magallon
2005-06-11  0:48                     ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-11  0:52                       ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-10 23:50               ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-11  4:14                 ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-11  5:22                   ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-11  5:56                     ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-11 20:13                     ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-11 22:20                       ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-11 23:27                         ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-11 23:47                           ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-12  0:23                             ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-12  5:19                               ` 2.6.12-rc6-mm1 Con Kolivas
2005-06-09  1:58     ` 2.6.12-rc6-mm1 Lee Revell
2005-06-08  0:02   ` 2.6.12-rc6-mm1 Martin J. Bligh
  -- strict thread matches above, loose matches on Subject: below --
2005-06-07 11:29 2.6.12-rc6-mm1 Andrew Morton
2005-06-07 14:24 ` 2.6.12-rc6-mm1 Wolfgang Wander
2005-06-07 14:49   ` 2.6.12-rc6-mm1 Wolfgang Wander
2005-06-07 14:48 ` 2.6.12-rc6-mm1 Brice Goglin
2005-06-07 23:15 ` 2.6.12-rc6-mm1 Francois Romieu
2005-06-08  1:59 ` 2.6.12-rc6-mm1 Søren Lott
2005-06-08  5:53   ` 2.6.12-rc6-mm1 Jean Delvare
2005-06-08  7:08     ` 2.6.12-rc6-mm1 Søren Lott
2005-06-08 14:22 ` 2.6.12-rc6-mm1 Andy Whitcroft
2005-06-08 20:01   ` 2.6.12-rc6-mm1 Andrew Morton
2005-06-08 23:14     ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-08 23:22       ` 2.6.12-rc6-mm1 Andrew Morton
2005-06-08 23:34         ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-09  7:17           ` 2.6.12-rc6-mm1 Kirill Korotaev
2005-06-09 13:38             ` 2.6.12-rc6-mm1 Martin J. Bligh
2005-06-10 12:12               ` 2.6.12-rc6-mm1 Kirill Korotaev
2005-06-09  4:27   ` 2.6.12-rc6-mm1 Andrey Panin
2005-06-09 13:12     ` 2.6.12-rc6-mm1 Andy Whitcroft
2005-06-11 11:51 ` 2.6.12-rc6-mm1 Benoit Boissinot
2005-06-18 22:39 ` 2.6.12-rc6-mm1 Richard Purdie
2005-06-18 22:44   ` 2.6.12-rc6-mm1 Andrew Morton
2005-06-18 22:57     ` 2.6.12-rc6-mm1 Richard Purdie
2005-06-18 23:11       ` 2.6.12-rc6-mm1 Richard Purdie
2005-06-18 23:18   ` 2.6.12-rc6-mm1 Russell King
2005-06-19  1:20     ` 2.6.12-rc6-mm1 Richard Purdie
2005-06-19  9:02       ` 2.6.12-rc6-mm1 Russell King
2005-06-19  9:11         ` 2.6.12-rc6-mm1 Russell King
2005-06-19 17:12           ` 2.6.12-rc6-mm1 Richard Purdie
2005-06-19 17:39             ` 2.6.12-rc6-mm1 Russell King
2005-06-19 18:25               ` 2.6.12-rc6-mm1 Richard Purdie
2005-06-19 18:56                 ` 2.6.12-rc6-mm1 Russell King
2005-06-21 13:20 ` 2.6.12-rc6-mm1 Dominik Karall
2005-06-24 21:27   ` 2.6.12-rc6-mm1 Alexey Dobriyan
2005-07-29  4:54   ` 2.6.12-rc6-mm1 Andrew Morton
2005-07-29 13:39     ` 2.6.12-rc6-mm1 Dominik Karall
2005-07-29 18:22       ` 2.6.12-rc6-mm1 Andrew Morton
2005-07-29 21:19         ` 2.6.12-rc6-mm1 Dominik Karall
2005-07-29 21:27           ` 2.6.12-rc6-mm1 Andrew Morton
2005-07-29 21:37             ` 2.6.12-rc6-mm1 Dominik Karall
2005-08-04 19:44               ` 2.6.12-rc6-mm1 Andrew Morton
2005-08-04 22:28                 ` 2.6.12-rc6-mm1 Andrew Morton
2005-08-04 22:44                   ` 2.6.12-rc6-mm1 Dominik Karall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox