2.6.17-mm5

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.17-mm5
@ 2006-07-01 10:35 ` Andrew Morton
  2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
                     ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 10:35 UTC (permalink / raw)
  To: linux-kernel


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/


Nothing very exciting here - a few buggy patches were fixed or dropped.


Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.



Changes since 2.6.17-mm4:


 origin.patch
 git-acpi.patch
 git-cpufreq.patch
 git-geode.patch
 git-gfs2.patch
 git-ia64.patch
 git-infiniband.patch
 git-jfs.patch
 git-klibc.patch
 git-hdrinstall2.patch
 git-libata-all.patch
 git-mtd.patch
 git-netdev-all.patch
 git-nfs.patch
 git-ocfs2.patch
 git-pcmcia-fixup.patch
 git-sas.patch
 git-scsi-misc.patch
 git-scsi-target.patch
 git-supertrak.patch
 git-watchdog.patch
 git-wireless.patch
 git-cryptodev.patch

 git trees.

-fix-sgivwfb-compile.patch
-generic_file_buffered_write-handle-zero-length-iovec-segments-stable.patch
-solve-config-broken-undefined-reference-to-online_page.patch
-sparc-register_cpu-build-fix.patch
-acpi-add-ibm-r60e-laptop-to-proc-idle-blacklist.patch
-drivers-acpi-scanc-make-acpi_bus_type-static.patch
-acpi_srat-needs-acpi.patch
-acpi-identify-which-device-is-not-power-manageable.patch
-the-scheduled-unexport-of-insert_resource.patch
-videocodec-make-1-bit-fields-unsigned.patch
-i2c-801-64bit-resource-fix.patch
-fs-jffs2-make-2-functions-static.patch
-mtd-fix-all-kernel-doc-warnings.patch
-mtd-kernel-doc-fixes-additions.patch
-af_unix-datagram-getpeersec.patch
-drivers-net-irda-mcs7780c-make-struct-mcs_driver-static.patch
-irda-fix-rcu-lock-pairing-on-error-path.patch
-kill-open-coded-offsetof-in-cm4000_csc-zero_dev.patch
-com20020_cs-more-device-support.patch
-git-pcmcia-xirc2ps_cs-fix-ooops-not-a-creditcard.patch
-git-powerpc.patch
-powerpc-fix-idr-locking-in-init_new_context.patch
-gregkh-pci-64bit-resource-c99-changes-for-struct-resource-declarations.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-sound-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-networks-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-pci-core-and-hotplug-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-mtd-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-ide-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-video-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-pcmcia-drivers.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-arch-and-core-code.patch
-gregkh-pci-64bit-resource-fix-up-printks-for-resources-in-misc-drivers.patch
-gregkh-pci-64bit-resource-introduce-resource_size_t-for-the-start-and-end-of-struct-resource.patch
-gregkh-pci-64bit-resource-change-resource-core-to-use-resource_size_t.patch
-gregkh-pci-64bit-resource-change-pci-core-and-arch-code-to-use-resource_size_t.patch
-gregkh-pci-64bit-resource-change-pnp-core-to-use-resource_size_t.patch
-gregkh-pci-64bit-resource-convert-a-few-remaining-drivers-to-use-resource_size_t-where-needed.patch
-gregkh-pci-64bit-resource-finally-enable-64bit-resource-sizes.patch
-gregkh-pci-i386-export-memory-more-than-4g-through-proc-iomem.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-changes-to-generic-pci-code.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-update-documentation-pci_txt.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-make-intel-e1000-driver-legacy-i-o-port-free.patch
-gregkh-pci-pci-legacy-i-o-port-free-driver-make-emulex-lpfc-driver-legacy-i-o-port-free.patch
-64bit-resource-convert-a-few-remaining-drivers-to-use-resource_size_t-where-needed-8139cp.patch
-bugfix-pci-legacy-i-o-port-free-driver.patch
-insert-identical-resources-above-existing-resources.patch
-clear-abnormal-poweroff-flag-on-via-southbridges-fix-resume.patch
-clear-abnormal-poweroff-flag-on-via-southbridges-fix-resume-fix.patch
-small-whitespace-cleanup-for-qlogic-driver.patch
-mpt_interrupt-should-return-irq_none-when.patch
-qla1280-fix-section-mismatch-warnings.patch
-ehci-fix-bogus-alteration-of-a-local-variable.patch
-ipaqc-bugfixes.patch
-ipaqc-timing-parameters.patch
-if-0-drivers-usb-input-hid-corechid_find_field_by_usage.patch
-usb-remove-empty-destructor-from-drivers-usb-mon-mon_textc.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h-s390-fix.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h-fix.patch
-zoned-vm-counters-create-vmstatc-h-from-page_allocc-h-fix-2.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-tidy.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-speedup.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-speedup-fix.patch
-zoned-vm-counters-basic-zvc-zoned-vm-counter-implementation-export-vm_stat.patch
-zoned-vm-counters-convert-nr_mapped-to-per-zone-counter.patch
-zoned-vm-counters-convert-nr_mapped-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_pagecache-to-per-zone-counter.patch
-zoned-vm-counters-remove-nr_file_mapped-from-scan-control-structure.patch
-zoned-vm-counters-remove-nr_file_mapped-from-scan-control-structure-fix.patch
-zoned-vm-counters-split-nr_anon_pages-off-from-nr_file_mapped.patch
-zoned-vm-counters-zone_reclaim-remove-proc-sys-vm-zone_reclaim_interval.patch
-zoned-vm-counters-conversion-of-nr_slab-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_slab-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_slab-to-per-zone-counter-fix-2.patch
-zoned-vm-counters-conversion-of-nr_pagetables-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_pagetables-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_dirty-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_dirty-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_writeback-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_writeback-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_unstable-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_unstable-to-per-zone-counter-nfs-fix.patch
-zoned-vm-counters-conversion-of-nr_unstable-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_bounce-to-per-zone-counter.patch
-zoned-vm-counters-conversion-of-nr_bounce-to-per-zone-counter-fix.patch
-zoned-vm-counters-conversion-of-nr_bounce-to-per-zone-counter-fix-2.patch
-zoned-vm-counters-remove-useless-struct-wbs.patch
-zoned-vm-counters-remove-read_page_state.patch
-use-zoned-vm-counters-for-numa-statistics-v3.patch
-light-weight-event-counters-v5.patch
-slab-consolidate-code-to-free-slabs-from-freelist.patch
-slab-consolidate-code-to-free-slabs-from-freelist-fix.patch
-selinux-extend-task_kill-hook-to-handle-signals-sent.patch
-selinux-add-security-hook-call-to-kill_proc_info_as_uid.patch
-selinux-update-usb-code-with-new-kill_proc_info_as_uid.patch
-add-smp_setup_processor_id.patch
-x86-dont-print-out-smp-info-on-up-kernels.patch
-keys-allow-in-kernel-key-requestor-to-pass-auxiliary-data-to-upcaller.patch
-keys-allow-in-kernel-key-requestor-to-pass-auxiliary-data-to-upcaller-try-2.patch
-cond_resched-fix.patch
-ufs-printk-fix.patch
-arch-i386-mach-visws-setupc-remove-dummy-function-calls.patch
-re-add-config_sound_sscape.patch
-remove-devinit-from-ioc4-pci_driver.patch
-deref-in-drivers-block-paride-pfc.patch
-chardev-gpio-for-scx200-pc-8736x-add-proper-kconfig-makefile-entries.patch
-edac-pci-device-to-device-cleanup.patch
-edac-mc-numbers-refactor-1-of-2.patch
-edac-mc-numbers-refactor-2-of-2.patch
-edac-probe1-cleanup-1-of-2.patch
-edac-probe1-cleanup-2-of-2.patch
-edac-maintainers-update.patch
-i4l-remove-unneeded-include-linux-isdn-tpamh.patch
-skb-leak-in-drivers-isdn-i4l-isdn_x25ifacec.patch
-knfsd-improve-the-test-for-cross-device-rename-in-nfsd.patch
-knfsd-fixing-missing-expkey-support-for-fsid-type-3.patch
-knfsd-remove-noise-about-filehandle-being-uptodate.patch
-knfsd-ignore-ref_fh-when-crossing-a-mountpoint.patch
-knfsd-nfsd4-fix-open_confirm-locking.patch
-knfsd-nfsd-call-nfsd_setuser-on-fh_compose-fix-nfsd4-permissions-problem.patch
-knfsd-nfsd4-remove-superfluous-grace-period-checks.patch
-knfsd-nfsd-fix-misplaced-fh_unlock-in-nfsd_link.patch
-knfsd-svcrpc-gss-simplify-rsc_parse.patch
-knfsd-nfsd4-fix-some-open-argument-tests.patch
-knfsd-nfsd4-fix-open-flag-passing.patch
-knfsd-svcrpc-simplify-nfsd-rpcsec_gss-integrity-code.patch
-knfsd-nfsd-mark-rqstp-to-prevent-use-of-sendfile-in-privacy-case.patch
-knfsd-svcrpc-gss-server-side-implementation-of-rpcsec_gss-privacy.patch
-drivers-md-raid5c-remove-an-unused-variable.patch
-genirq-rename-desc-handler-to-desc-chip.patch
-genirq-rename-desc-handler-to-desc-chip-power-fix.patch
-genirq-rename-desc-handler-to-desc-chip-ia64-fix.patch
-genirq-rename-desc-handler-to-desc-chip-ia64-fix-2.patch
-genirq-rename-desc-handler-to-desc-chip-terminate_irqs-fix.patch
-genirq-rename-desc-handler-to-desc-chip-sparc64-fix.patch
-genirq-sem2mutex-probe_sem-probing_active.patch
-genirq-cleanup-merge-irq_affinity-into-irq_desc.patch
-genirq-cleanup-merge-irq_affinity-into-irq_desc-sparc64-fix.patch
-genirq-cleanup-remove-irq_descp.patch
-genirq-cleanup-remove-irq_descp-fix.patch
-genirq-cleanup-remove-fastcall.patch
-genirq-cleanup-misc-code-cleanups.patch
-genirq-cleanup-reduce-irq_desc_t-use-mark-it-obsolete.patch
-genirq-cleanup-include-linux-irqh.patch
-genirq-cleanup-merge-irq_dir-smp_affinity_entry-into-irq_desc.patch
-genirq-cleanup-merge-pending_irq_cpumask-into-irq_desc.patch
-genirq-cleanup-turn-arch_has_irq_per_cpu-into-config_irq_per_cpu.patch
-genirq-debug-better-debug-printout-in-enable_irq.patch
-genirq-add-retrigger-irq-op-to-consolidate-hw_irq_resend.patch
-genirq-doc-comment-include-linux-irqh-structures.patch
-genirq-doc-handle_irq_event-and-__do_irq-comments.patch
-genirq-cleanup-no_irq_type-cleanups.patch
-genirq-doc-add-design-documentation.patch
-genirq-add-genirq-sw-irq-retrigger.patch
-genirq-add-irq_noprobe-support.patch
-genirq-add-irq_norequest-support.patch
-genirq-add-irq_noautoen-support.patch
-genirq-update-copyrights.patch
-genirq-core.patch
-genirq-core-revert-noisiness-on-spurious-interrupts.patch
-genirq-msi-fixes-2.patch
-genirq-add-irq-chip-support.patch
-genirq-add-irq-chip-support-fix.patch
-genirq-add-irq-chip-support-misroute-irq-dont-call-desc-chip-end.patch
-genirq-add-handle_bad_irq.patch
-genirq-add-irq-wake-power-management-support.patch
-genirq-add-sa_trigger-support.patch
-genirq-cleanup-no_irq_type-no_irq_chip-rename.patch
-genirq-more-verbose-debugging-on-unexpected-irq-vectors.patch
-genirq-ia64-build-fix.patch
-genirq-add-irq_type_sense_mask.patch
-genirq-add-irq-chip-support-fasteoi-handler-handle-interrupt-disabling.patch
-genirq-irq-document-what-an-irq-is.patch
-genirq-add-chip-eoi-fastack-fasteoi-core.patch
-genirq-add-chip-eoi-fastack-fasteoi-fix.patch

 Merged into mainline or a subsystem tree.

+pi-futex-fix-mm_struct-memory-leak.patch
+irq-use-sa_percpu_irq-not-irq_per_cpu-for-irqactionflags.patch
+irq-warning-message-cleanup.patch
+edac-bug-fix-module-names-quoted-in-sysfs.patch
+pi-futex-futex_wake-lockup-fix.patch
+acpi-identify-which-device-is-not-power-manageable.patch

 2.6.17-rc1 queue

-git-acpi-fixup.patch

 Unneeded.

-cpu_relax-use-in-acpi-lock.patch
-cpu_relax-use-in-acpi-lock-fix.patch

 Dropped.

+pnpacpi-support-shareable-interrupts.patch
+serial-allow-shared-8250_pnp-interrupts.patch

 pnpacpi fixes

-git-agpgart-fixup.patch

 Unneeded.

+gregkh-driver-driver-core-bus.c-cleanups.patch
+gregkh-driver-remove-kernel-power-pm.c-pm_unregister_all.patch
+gregkh-driver-the-scheduled-unexport-of-insert_resource.patch
+gregkh-driver-suspend-infrastructure-cleanup-and-extension.patch
+gregkh-driver-suspend-pci.patch

 Driver tree updates.

+gregkh-i2c-w1-fix-idle-check-loop-in-ds2482.patch
+gregkh-i2c-w1-remove-drivers-w1-w1.h.patch

 I2C tree updates

+ib-ipath-name-zero-counter-offsets-so-its-clear.patch
+ib-ipath-update-copyrights-and-other-strings-to.patch
+ib-ipath-share-more-common-code-between-rc-and-uc.patch
+ib-ipath-fix-an-indenting-problem.patch
+ib-ipath-fix-shared-receive-queues-for-rc.patch
+ib-ipath-allow-diags-on-any-unit.patch
+ib-ipath-update-some-comments-and-fix-typos.patch
+ib-ipath-remove-some-duplicate-code.patch
+ib-ipath-dont-allow-resources-to-be-created-with.patch
+ib-ipath-fix-some-memory-leaks-on-failure-paths.patch
+ib-ipath-return-an-error-for-unknown-multicast-gid.patch
+ib-ipath-report-correct-device-identification.patch
+ib-ipath-enforce-device-resource-limits.patch
+ib-ipath-removed-unused-field-ipath_kregvirt-from.patch
+ib-ipath-print-better-debug-info-when-handling.patch
+ib-ipath-enable-freeze-mode-when-shutting-down.patch
+ib-ipath-use-more-appropriate-gfp-flags.patch
+ib-ipath-use-vmalloc-to-allocate-struct.patch
+ib-ipath-memory-management-cleanups.patch
+ib-ipath-reduce-overhead-on-receive-interrupts.patch
+ib-ipath-fixed-bug-9776.patch
+ib-ipath-fix-lost-interrupts-on-ht-400.patch
+ib-ipath-disallow-send-of-invalid-packet-sizes.patch
+ib-ipath-dont-confuse-the-max-message-size-with.patch
+ib-ipath-removed-redundant-statements.patch
+ib-ipath-check-for-valid-lid-and-multicast-lids.patch
+ib-ipath-fixes-to-performance-get-counters-for-ib.patch
+ib-ipath-rc-receive-interrupt-performance-changes.patch
+ib-ipath-purge-sps_lid-and-sps_mlid-arrays.patch
+ib-ipath-drop-the-stats-sysfs-attribute-group.patch
+ib-ipath-support-more-models-of-infinipath-hardware.patch
+ib-ipath-read-write-correct-sizes-through-diag.patch
+ib-ipath-fix-a-bug-that-results-in-addresses-near.patch
+ib-ipath-remove-some-if-0-code-related-to.patch
+ib-ipath-ignore-receive-queue-size-if-srq-is.patch
+ib-ipath-namespace-cleanup-replace-ips-with-ipath.patch

 Infiniband updates

+ib-ipath-fixes-a-bug-where-our-delay-for-eeprom-no.patch

 Unpopular infiniband update

-revert-input-atkbd-fix-hangeul-hanja-keys.patch

 Dropped.

+if-0-drivers-usb-input-hid-corechid_find_field_by_usage.patch

 USB cleanup.

+ia64-kbuild-fix.patch

 Fix kbuild for ia64

-revert-ignore-makes-built-in-rules-variables.patch

 Unneeded.

+git-netdev-all-fixup.patch

 Fix reject due to git-netdev-all.patch

+8139cp-printk-fix.patch

 Fix printk warning

-ni5010-netcard-cleanup-fix.patch

 Folded into ni5010-netcard-cleanup.patch

+ixgb-add-pci-error-recovery-callbacks.patch
+e100-disable-device-on-pci-error.patch
+e1000-disable-device-on-pci-error.patch

 netdev updates

+fix-a-warning-in-ioatdma.patch
+ioat-fix-header-file-kernel-doc.patch
+ioat-fix-kernel-doc-in-source-files.patch

 IOAT driver fixlets

+fs-nfs-make-2-functions-static.patch

 NFS cleanup

+fix-implicit-declaration-on-cell.patch

 powerpc fix

-git-sas-sas_discover-build-fix.patch

 Dropped.

-serial-add-tsi108-8250-serial-support-fix.patch

 Folded into serial-add-tsi108-8250-serial-support.patch

+gregkh-pci-pci-poper-prototype-for-arch-i386-pci-pcbios.c-pcibios_sort.patch
+gregkh-pci-pci-clear-abnormal-poweroff-flag-on-via-southbridges-fix-resume.patch
+gregkh-pci-msi-merge-existing-msi-disabling-quirks.patch
+gregkh-pci-msi-rename-pci_cap_id_ht_irqconf-into-pci_cap_id_ht.patch
+gregkh-pci-msi-blacklist-pci-e-chipsets-depending-on-hypertransport-msi-capabality.patch
+gregkh-pci-msi-factorize-common-msi-detection-code-from-pci_enable_msi-and-msix.patch
+gregkh-pci-msi-stop-inheriting-bus-flags-and-check-root-chipset-bus-flags-instead.patch
+gregkh-pci-msi-drop-pci_msi_quirk.patch
+gregkh-pci-resources-insert-identical-resources-above-existing-resources.patch

 PCI tree updates

-drivers-scsi-qla2xxx-make-more-some-functions-static.patch

 Folded into drivers-scsi-qla2xxx-make-some-functions-static.patch

+stc-improve-sense-output.patch
+my-name-is-ingo-molnar-you-killed-my-make-allyesconfig-prepare-to-die.patch

 scsi fixes.

+gregkh-usb-usb-unusual_devs-entry-for-samsung-mp3-player.patch
+gregkh-usb-usbcore-fixes-for-hub_port_resume.patch
+gregkh-usb-usb-storage-us_fl_max_sectors_64-flag.patch
+gregkh-usb-usb-storage-uname-in-pr-sc-unneeded-message.patch
+gregkh-usb-usb-serial-visor-fix-race-in-open-close.patch
+gregkh-usb-usb-serial-ftdi_sio-prevent-userspace-dos.patch
+gregkh-usb-usb-kill-compiler-warning-in-quirk_usb_handoff_ohci.patch
+gregkh-usb-usb-fix-pointer-dereference-in-drivers-usb-misc-usblcd.patch
+gregkh-usb-usb-add-driver-for-non-composite-sierra-wireless-devices.patch
+gregkh-usb-usb-ehci-fix-bogus-alteration-of-a-local-variable.patch
+gregkh-usb-usb-ipaq.c-bugfixes.patch
+gregkh-usb-usb-ipaq.c-timing-parameters.patch
+gregkh-usb-usb-remove-empty-destructor-from-drivers-usb-mon-mon_text.c.patch
+gregkh-usb-usb-ohci-s3c2410.c-clock-now-usb-bus-host.patch
+gregkh-usb-usb-at91-udc-updates-mostly-power-management.patch
+gregkh-usb-usb-at91-ohci-updates-mostly-power-management.patch
+gregkh-usb-usb-ohci-controller-support-for-pnx4008.patch
+gregkh-usb-usb-move-linux-usb_otg.h-to-linux-usb-otg.h.patch
+gregkh-usb-usb-pxa2xx_udc-understands-gpio-based-vbus-sensing.patch
+gregkh-usb-usb-allow-compile-in-g_ether-fix-typo.patch

 USB updates

+kill-usb-kconfig-warning.patch

 Fix it.

-bcm43xx-opencoded-locking-fix.patch

 Folded into bcm43xx-opencoded-locking.patch

+x86_64-mm-defconfig-update.patch
+x86_64-mm-i386-up-generic-arch.patch
+x86_64-mm-i386-numa-summit-check.patch
+x86_64-mm-temp-revert-arch-perfmon.patch
+x86_64-mm-add-performance-counter-reservation-framework-for-up-kernels.patch
+x86_64-mm-utilize-performance-counter-reservation-framework-in-oprofile.patch
+x86_64-mm-add-smp-support-on-x86_64-to-reservation-framework.patch
+x86_64-mm-add-smp-support-on-i386-to-reservation-framework.patch
+x86_64-mm-cleanup-nmi-interrupt-path.patch
+x86_64-mm-rdtscp-macros.patch
+x86_64-mm-init-rdtscp.patch
+x86_64-mm-mce-amd-fix.patch

 x86-64 tree updates (partial - I dropped all the NMI changes because they
 don't apply and look like they wouldn't build if I fixed them all).

+zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o.patch
+zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o-tunable.patch
+zvc-zone_reclaim-leave-1%-of-unmapped-pagecache-pages-for-file-i-o-tunable-rename.patch

 NUMA memory reclaim tweak.

+mm-fixup-do_wp_page.patch

 MM fix

+mm-msync-cleanup-fix.patch

 Fix mm-msync-cleanup.patch

+mm-make-functions-static.patch

 MM cleanup

-lockdep-add-disable-enable_irq_lockdep-api-fix.patch

 Folded into lockdep-add-disable-enable_irq_lockdep-api.patch

-lockdep-stacktrace-subsystem-s390-support-fix.patch

 Folded into lockdep-stacktrace-subsystem-s390-support.patch

-lockdep-irqtrace-subsystem-x86_64-support-fix.patch
-lockdep-irqtrace-subsystem-x86_64-support-fix-2.patch

 Folded into lockdep-irqtrace-subsystem-x86_64-support.patch

-lockdep-core-improve-non-static-key-warning-message.patch
-lockdep-core-cleanups.patch
-lockdep-core-cleanups-2.patch

 Folded into lockdep-core.patch

-lockdep-annotate-vlan-net-device-as-being-a-special-class-fix.patch

 Folded into lockdep-annotate-vlan-net-device-as-being-a-special-class.patch

+lockdep-core-improve-bug-messages.patch
+lockdep-core-add-set_class_and_name.patch
+lockdep-core-add-set_class_and_name-fix.patch
+lockdep-annotate-blkdev-nesting-fix.patch
+lockdep-annotate-sk_locks.patch
+lockdep-annotate-sk_locks-fix.patch

 lockdep updates

+smp-alternatives-skip-with-up-kernels.patch

 x86 alternatives cleanup

-hpet-rtc-emulation-add-watchdog-timer.patch
+hpet-rtc-emulation-add-watchdog-timer-2.patch

 Updated version of this ancient patch.

-destroy-the-dentries-contributed-by-a-superblock-on-unmounting.patch
-destroy-the-dentries-contributed-by-a-superblock-on-unmounting-fix.patch

 Dropped.

+fix-is_err-threshold-value.patch
+rtc-class-driver-for-samsung-s3c-series-soc.patch
+rtc-class-driver-for-samsung-s3c-series-soc-tidy.patch
+hotcpu_notifier-fixes.patch
+add-___rodata-sections-to-asm-generic-sectionsh.patch
+add-___rodata-sections-to-asm-generic-sectionsh-fix.patch
+s390-put-sys_call_table-into-rodata-section-and-write-protect-it.patch
+reiserfs-update-ctime-and-mtime-on-expanding-truncate.patch
+kernel-doc-consistent-text-man-mode-output.patch
+fix-problem-with-atapi-dma-on-it8212-in-linux.patch
+kernel-doc-make-man-text-mode-function-output-same.patch
+fix-and-enable-edac-sysfs-operation.patch
+edac-new-opteron-athlon64-memory-controller-driver.patch
+edac-new-opteron-athlon64-memory-controller-driver-tidy.patch
+drivers-block-nbdc-compile-fix.patch
+pnp-suppress-request_irq-warning.patch

 Misc patches.

+per-task-delay-accounting-taskstats-interface-tidy.patch

 Tweak per-task-delay-accounting-taskstats-interface.patch

+jmicron-pci-identifiers.patch

 PCI IDs for IDE drivers

+fbdev-add-framebuffer-and-display-update-module-support.patch
+vt-remove-vt-specific-declarations-and-definitions-from.patch
+tty-remove-include-of-screen_infoh-from-ttyh.patch

 fbdev updates

+statistics-infrastructure-update-6.patch
+statistics-infrastructure-update-7.patch
+statistics-infrastructure-update-8.patch

 statistics updates

+genirq-convert-the-x86_64-architecture-to-irq-chips.patch
+genirq-add-chip-eoi-fastack-fasteoi-x86_64.patch
+genirq-convert-the-i386-architecture-to-irq-chips.patch
+genirq-convert-the-i386-architecture-to-irq-chips-fix-2.patch
+genirq-add-chip-eoi-fastack-fasteoi-x86.patch
+genirq-irq-convert-the-move_irq-flag-from-a-32bit-word-to-a-single-bit.patch
+genirq-irq-add-moved_masked_irq.patch
+genirq-x86_64-irq-reenable-migrating-irqs-to-other-cpus.patch
+genirq-x86_64-irq-reenable-migrating-irqs-to-other-cpus-fix.patch
+genirq-msi-simplify-msi-enable-and-disable.patch
+genirq-msi-simplify-msi-enable-and-disable-fix.patch
+genirq-msi-make-the-msi-boolean-tests-return-either-0-or-1.patch
+genirq-msi-implement-helper-functions-read_msi_msg-and-write_msi_msg.patch
+genirq-msi-refactor-the-msi_ops.patch
+genirq-msi-simplify-the-msi-irq-limit-policy.patch
+genirq-irq-add-a-dynamic-irq-creation-api.patch
+genirq-ia64-irq-dynamic-irq-support.patch
+genirq-ia64-irq-dynamic-irq-support-fix.patch
+genirq-i386-irq-dynamic-irq-support.patch
+genirq-i386-irq-dynamic-irq-support-fix.patch
+genirq-x86_64-irq-dynamic-irq-support.patch
+genirq-msi-make-the-msi-code-irq-based-and-not-vector-based.patch
+genirq-x86_64-irq-move-msi-message-composition-into-io_apicc.patch
+genirq-i386-irq-move-msi-message-composition-into-io_apicc.patch
+genirq-msi-only-build-msi-apicc-on-ia64.patch
+genirq-x86_64-irq-remove-the-msi-assumption-that-irq-==-vector.patch
+genirq-i386-irq-remove-the-msi-assumption-that-irq-==-vector.patch
+genirq-i386-irq-remove-the-msi-assumption-that-irq-==-vector-fix.patch
+genirq-i386-irq-remove-the-msi-assumption-that-irq-==-vector-fix-tidies.patch
+genirq-irq-remove-msi-hacks.patch
+genirq-irq-generalize-the-check-for-hardirq_bits.patch
+genirq-x86_64-irq-make-the-external-irq-handlers-report-their-vector-not-the-irq-number.patch
+genirq-x86_64-irq-make-vector_irq-per-cpu.patch
+genirq-x86_64-irq-kill-gsi_irq_sharing.patch
+genirq-x86_64-irq-kill-irq-compression.patch

 Restore the genirq implementation for various architectures.

-ro-bind-mounts-prepare-for-write-access-checks-collapse-if.patch
-ro-bind-mounts-r-o-bind-mount-prepwork-move-open_nameis-vfs_create.patch
-ro-bind-mounts-add-vfsmount-writer-count.patch
-ro-bind-mounts-elevate-mnt-writers-for-callers-of-vfs_mkdir.patch
-ro-bind-mounts-elevate-write-count-during-entire-ncp_ioctl.patch
-ro-bind-mounts-elevate-write-count-during-entire-ncp_ioctl-tidy.patch
-ro-bind-mounts-sys_symlinkat-elevate-write-count-around-vfs_symlink.patch
-ro-bind-mounts-elevate-mount-count-for-extended-attributes.patch
-ro-bind-mounts-sys_linkat-elevate-write-count-around-vfs_link.patch
-ro-bind-mounts-mount_is_safe-add-comment.patch
-ro-bind-mounts-unix_find_other-elevate-write-count-for-touch_atime.patch
-ro-bind-mounts-elevate-write-count-over-calls-to-vfs_rename.patch
-ro-bind-mounts-tricky-elevate-write-count-files-are-opened.patch
-ro-bind-mounts-elevate-writer-count-for-do_sys_truncate.patch
-ro-bind-mounts-elevate-write-count-for-do_utimes.patch
-ro-bind-mounts-elevate-write-count-for-do_sys_utime-and-touch_atime.patch
-ro-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create.patch
-ro-bind-mounts-elevate-mnt-writers-for-vfs_unlink-callers.patch
-ro-bind-mounts-do_rmdir-elevate-write-count.patch
-ro-bind-mounts-elevate-writer-count-for-custom-struct-file.patch
-ro-bind-mounts-honor-r-w-changes-at-do_remount-time.patch

 Dropped.

+the-scheduled-removal-of-some-oss-drivers.patch

 Remove lots of OSS drivers

+make-more-file_operation-structs-static.patch
+make-more-file_operation-structs-static-fix.patch

 constify some file_operations structs.

-slab-leak-detector.patch

 Dropped.

+kernel-printkc-export_symbol_unused.patch
+mm-bootmemc-export_unused_symbol.patch
+mm-memoryc-export_unused_symbol.patch
+mm-mmzonec-export_unused_symbol.patch
+fs-read_writec-export_unused_symbol.patch
+export_unused_symbolgpl-unregister_die_notifier.patch
+kernel-softirqc-export_unused_symbol.patch

 Fiddle with exports.



All 791 patches:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/patch-list




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
@ 2006-07-01 11:08   ` Reuben Farrelly
  2006-07-01 11:51     ` 2.6.17-mm5 Andrew Morton
  2006-07-01 18:03   ` 2.6.17-mm5 Ralf Hildebrandt
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Reuben Farrelly @ 2006-07-01 11:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel



On 1/07/2006 10:35 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/
> 
> 
> Nothing very exciting here - a few buggy patches were fixed or dropped.

Ouch:

Bootdata ok (command line is ro root=/dev/md0 panic=60 console=ttyS0,57600 single)
Linux version 2.6.17-mm5 (root@tornado.reub.net) (gcc version 4.1.1 20060629 
(Red Hat 4.1.1-6)) #1 SMP Sat Jul 1 22:59:00 NZST 2006
BIOS-provided physical RAM map:
  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
  BIOS-e820: 0000000000100000 - 000000003f670000 (usable)
  BIOS-e820: 000000003f670000 - 000000003f6e9000 (ACPI NVS)
  BIOS-e820: 000000003f6e9000 - 000000003f6ec000 (usable)
  BIOS-e820: 000000003f6ec000 - 000000003f6ff000 (ACPI data)
  BIOS-e820: 000000003f6ff000 - 000000003f700000 (usable)
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Setting APIC routing to flat
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 40000000 (gap: 3f700000:c0900000)
Built 1 zonelists.  Total pages: 254547
Kernel command line: ro root=/dev/md0 panic=60 console=ttyS0,57600 single
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Checking aperture...
Memory: 1015044k/1039360k available (2569k kernel code, 22788k reserved, 1660k 
data, 216k init)
Calibrating delay using timer specific routine.. 6006.40 BogoMIPS (lpj=12012800)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 2048K
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM1)
Freeing SMP alternatives: 28k freed
ACPI: Core revision 20060623
Using local APIC timer interrupts.
result 12500450
Detected 12.500 MHz APIC timer.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 5999.87 BogoMIPS (lpj=11999755)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU1: Thermal monitoring enabled (TM1)
               Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 03
Brought up 2 CPUs
testing NMI watchdog ... OK.
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 3000.123 MHz processor.
migration_cost=4
checking if image is initramfs... it is
Freeing initrd memory: 877k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
PCI: Not using MMCONFIG.
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 9 10 *11 12)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 7 9 10 *11 12)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 9 *10 11 12)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 7 9 10 11 12) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 7 *9 10 11 12)
Intel 82802 RNG detected
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
PCI-GART: No AMD northbridge found.
PCI: Ignore bogus resource 6 [0:0] of 0000:00:02.0
PCI: Bridge: 0000:00:1c.0
   IO window: 2000-2fff
   MEM window: 48000000-480fffff
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.2
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.3
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.4
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.5
   IO window: disabled.
   MEM window: disabled.
   PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
   IO window: 1000-1fff
   MEM window: disabled.
   PREFETCH window: disabled.
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
audit: initializing netlink socket (disabled)
audit(1151751831.012:1): initialized
SELinux:  Registering netfilter hooks
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered (default)
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
assign_interrupt_mode Found MSI capability
ACPI: Power Button (FF) [PWRF]
ACPI: Sleep Button (CM) [SLPB]
ACPI: Getting cpuindex for acpiid 0x3
ACPI: Getting cpuindex for acpiid 0x4
Real Time Clock Driver v1.12ac
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
ÿserial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ACPI: PCI Interrupt 0000:06:03.0[A] -> GSI 19 (level, low) -> IRQ 19
0000:06:03.0: ttyS1 at I/O 0x1000 (irq = 19) is a 16550A
0000:06:03.0: ttyS2 at I/O 0x1008 (irq = 19) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 4 RAM disks of 16384K size 1024 blocksize
Intel(R) PRO/1000 Network Driver - version 7.0.38-k4-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
e1000: 0000:01:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:13:20:60:b4:23
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH7: IDE controller at PCI slot 0000:00:1f.1
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
ICH7: chipset revision 1
ICH7: not 100% native mode: will probe irqs later
     ide0: BM-DMA at 0x30b0-0x30b7, BIOS settings: hda:DMA, hdb:pio
hda: PIONEER DVD-RW DVR-111D, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Unable to handle kernel NULL pointer dereference at 00000000000000ce RIP:
  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
PGD 0
Oops: 0000 [1] SMP
last sysfs file:
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.17-mm5 #1
RIP: 0010:[<ffffffff80363a96>]  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
RSP: 0000:ffff81003f601b88  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff81003ec659c8 RCX: 00000000481a0000
RDX: 00000000481a03ff RSI: ffff810037f9aa80 RDI: ffff81003ec65800
RBP: ffff81003f601b88 R08: 0000000000000000 R09: 0000000000000000
R10: ffff810037f9aa80 R11: 0000000000000040 R12: ffff81003ec65800
R13: 0000000000000000 R14: ffffffff805a0620 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff80685000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000000000ce CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81003f600000, task ffff810001fb8740)
Stack:  ffff81003f601bf8 ffffffff80364909 ffff81003f601bc8 ffffffff8035dbee
  0000000000000000 0000000000000005 ffffffff804c8166 ffff81003ec65800
  ffff81003f601bf8 ffff81003ec659c8 ffff81003ec65800 0000000000000000
Call Trace:
  [<ffffffff80364909>] pci_enable_msi+0x19/0x2f2
  [<ffffffff8035dbee>] pci_request_region+0xce/0x180
  [<ffffffff803e8867>] ahci_init_one+0x88/0x93a
  [<ffffffff8026311d>] wait_for_completion+0xb2/0x112
  [<ffffffff80280b4f>] default_wake_function+0x0/0xf
  [<ffffffff80290dcc>] call_usermodehelper_keys+0xd4/0xe8
  [<ffffffff80290de0>] __call_usermodehelper+0x0/0x64
  [<ffffffff8025affa>] kobject_get+0x1a/0x24
  [<ffffffff8035ff1c>] pci_device_probe+0x4d/0x78
  [<ffffffff803aaa8f>] driver_probe_device+0x5c/0xb4
  [<ffffffff803aabc9>] __driver_attach+0x67/0xb9
  [<ffffffff803aab62>] __driver_attach+0x0/0xb9
  [<ffffffff803aa44f>] bus_for_each_dev+0x4f/0x79
  [<ffffffff803aa9bc>] driver_attach+0x1c/0x1e
  [<ffffffff803aa01a>] bus_add_driver+0x7a/0x143
  [<ffffffff803aae63>] driver_register+0x9f/0xa6
  [<ffffffff80280b6e>] wake_up_process+0x10/0x12
  [<ffffffff80360107>] __pci_register_driver+0x59/0x7e
  [<ffffffff806b7799>] ahci_init+0x12/0x14
  [<ffffffff80267ece>] init+0x14e/0x2c2
  [<ffffffff80227b67>] schedule_tail+0x37/0x9e
  [<ffffffff80260972>] child_rip+0x8/0x12
  [<ffffffff80267d80>] init+0x0/0x2c2
  [<ffffffff8026096a>] child_rip+0x0/0x12


Code: f6 80 ce 00 00 00 01 75 04 31 c0 eb 05 b8 ff ff ff ff 5d c3
RIP  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
  RSP <ffff81003f601b88>
CR2: 00000000000000ce
  <0>Kernel panic - not syncing: Attempted to kill init!
  <0>Rebooting in 60 seconds..

Hardware is listed at http://www.reub.net/files/kernel/system-hardware

Reuben


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
@ 2006-07-01 11:51     ` Andrew Morton
  2006-07-01 12:31       ` 2.6.17-mm5 Reuben Farrelly
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 11:51 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: linux-kernel, Brice Goglin, Greg KH

On Sat, 01 Jul 2006 23:08:40 +1200
Reuben Farrelly <reuben-lkml@reub.net> wrote:

> 
> 
> On 1/07/2006 10:35 p.m., Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/
> > 
> > 
> > Nothing very exciting here - a few buggy patches were fixed or dropped.
> 
> Ouch:

Well I didn't say that new buggy patches weren't added.

>      ide0: BM-DMA at 0x30b0-0x30b7, BIOS settings: hda:DMA, hdb:pio
> hda: PIONEER DVD-RW DVR-111D, ATAPI CD/DVD-ROM drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> Unable to handle kernel NULL pointer dereference at 00000000000000ce RIP:
>   [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
> PGD 0
> Oops: 0000 [1] SMP
> last sysfs file:
> CPU 0
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.17-mm5 #1
> RIP: 0010:[<ffffffff80363a96>]  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
> RSP: 0000:ffff81003f601b88  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff81003ec659c8 RCX: 00000000481a0000
> RDX: 00000000481a03ff RSI: ffff810037f9aa80 RDI: ffff81003ec65800
> RBP: ffff81003f601b88 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff810037f9aa80 R11: 0000000000000040 R12: ffff81003ec65800
> R13: 0000000000000000 R14: ffffffff805a0620 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffffffff80685000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000000000ce CR3: 0000000000201000 CR4: 00000000000006e0
> Process swapper (pid: 1, threadinfo ffff81003f600000, task ffff810001fb8740)
> Stack:  ffff81003f601bf8 ffffffff80364909 ffff81003f601bc8 ffffffff8035dbee
>   0000000000000000 0000000000000005 ffffffff804c8166 ffff81003ec65800
>   ffff81003f601bf8 ffff81003ec659c8 ffff81003ec65800 0000000000000000
> Call Trace:
>   [<ffffffff80364909>] pci_enable_msi+0x19/0x2f2
>   [<ffffffff8035dbee>] pci_request_region+0xce/0x180
>   [<ffffffff803e8867>] ahci_init_one+0x88/0x93a
>   [<ffffffff8026311d>] wait_for_completion+0xb2/0x112
>   [<ffffffff80280b4f>] default_wake_function+0x0/0xf
>   [<ffffffff80290dcc>] call_usermodehelper_keys+0xd4/0xe8
>   [<ffffffff80290de0>] __call_usermodehelper+0x0/0x64
>   [<ffffffff8025affa>] kobject_get+0x1a/0x24
>   [<ffffffff8035ff1c>] pci_device_probe+0x4d/0x78
>   [<ffffffff803aaa8f>] driver_probe_device+0x5c/0xb4
>   [<ffffffff803aabc9>] __driver_attach+0x67/0xb9
>   [<ffffffff803aab62>] __driver_attach+0x0/0xb9
>   [<ffffffff803aa44f>] bus_for_each_dev+0x4f/0x79
>   [<ffffffff803aa9bc>] driver_attach+0x1c/0x1e
>   [<ffffffff803aa01a>] bus_add_driver+0x7a/0x143
>   [<ffffffff803aae63>] driver_register+0x9f/0xa6
>   [<ffffffff80280b6e>] wake_up_process+0x10/0x12
>   [<ffffffff80360107>] __pci_register_driver+0x59/0x7e
>   [<ffffffff806b7799>] ahci_init+0x12/0x14
>   [<ffffffff80267ece>] init+0x14e/0x2c2
>   [<ffffffff80227b67>] schedule_tail+0x37/0x9e
>   [<ffffffff80260972>] child_rip+0x8/0x12
>   [<ffffffff80267d80>] init+0x0/0x2c2
>   [<ffffffff8026096a>] child_rip+0x0/0x12
> 
> 
> Code: f6 80 ce 00 00 00 01 75 04 31 c0 eb 05 b8 ff ff ff ff 5d c3

It oopsed here:

static
int pci_msi_supported(struct pci_dev * dev)
{
	struct pci_dev *pdev;

	if (!pci_msi_enable || !dev || dev->no_msi)
		return -1;

	/* find root complex for our device */
	pdev = dev;
	while (pdev->bus && pdev->bus->self)
		pdev = pdev->bus->self;

	/* check its bus flags */
	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
		return -1;

	return 0;
}

pdev->subordinate is NULL.

Two patch series touch that file.  The generic-irq wire-up and a couple of new
ones in Greg's tree.  I'd be suspecting
gregkh-pci-msi-stop-inheriting-bus-flags-and-check-root-chipset-bus-flags-instead.patch.


To confirm that, could you please test 2.6.17 plus
http://www.zip.com.au/~akpm/linux/patches/stuff/rf.bz2 with the same
.config?  That's everything up to but not including the genirq changes.


You may find that this gets things going again:

--- a/drivers/pci/msi.c~a
+++ a/drivers/pci/msi.c
@@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
 	while (pdev->bus && pdev->bus->self)
 		pdev = pdev->bus->self;
 
+	if (!pdev->subordinate)
+		return -1;
+
 	/* check its bus flags */
 	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
 		return -1;
_

Or disable CONFIG_PCI_MSI.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 11:51     ` 2.6.17-mm5 Andrew Morton
@ 2006-07-01 12:31       ` Reuben Farrelly
  2006-07-01 13:06         ` 2.6.17-mm5 Brice Goglin
  0 siblings, 1 reply; 35+ messages in thread
From: Reuben Farrelly @ 2006-07-01 12:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Brice Goglin, Greg KH



On 1/07/2006 11:51 p.m., Andrew Morton wrote:
> On Sat, 01 Jul 2006 23:08:40 +1200
> Reuben Farrelly <reuben-lkml@reub.net> wrote:
> 
>>
>> On 1/07/2006 10:35 p.m., Andrew Morton wrote:
>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm5/
>>>
>>>
>>> Nothing very exciting here - a few buggy patches were fixed or dropped.
>> Ouch:
> 
> Well I didn't say that new buggy patches weren't added.
> 
>>      ide0: BM-DMA at 0x30b0-0x30b7, BIOS settings: hda:DMA, hdb:pio
>> hda: PIONEER DVD-RW DVR-111D, ATAPI CD/DVD-ROM drive
>> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
>> Unable to handle kernel NULL pointer dereference at 00000000000000ce RIP:
>>   [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
>> PGD 0
>> Oops: 0000 [1] SMP
>> last sysfs file:
>> CPU 0
>> Modules linked in:
>> Pid: 1, comm: swapper Not tainted 2.6.17-mm5 #1
>> RIP: 0010:[<ffffffff80363a96>]  [<ffffffff80363a96>] pci_msi_supported+0x37/0x4b
>> RSP: 0000:ffff81003f601b88  EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: ffff81003ec659c8 RCX: 00000000481a0000
>> RDX: 00000000481a03ff RSI: ffff810037f9aa80 RDI: ffff81003ec65800
>> RBP: ffff81003f601b88 R08: 0000000000000000 R09: 0000000000000000
>> R10: ffff810037f9aa80 R11: 0000000000000040 R12: ffff81003ec65800
>> R13: 0000000000000000 R14: ffffffff805a0620 R15: 0000000000000000
>> FS:  0000000000000000(0000) GS:ffffffff80685000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> CR2: 00000000000000ce CR3: 0000000000201000 CR4: 00000000000006e0
>> Process swapper (pid: 1, threadinfo ffff81003f600000, task ffff810001fb8740)
>> Stack:  ffff81003f601bf8 ffffffff80364909 ffff81003f601bc8 ffffffff8035dbee
>>   0000000000000000 0000000000000005 ffffffff804c8166 ffff81003ec65800
>>   ffff81003f601bf8 ffff81003ec659c8 ffff81003ec65800 0000000000000000
>> Call Trace:
>>   [<ffffffff80364909>] pci_enable_msi+0x19/0x2f2
>>   [<ffffffff8035dbee>] pci_request_region+0xce/0x180
>>   [<ffffffff803e8867>] ahci_init_one+0x88/0x93a
>>   [<ffffffff8026311d>] wait_for_completion+0xb2/0x112
>>   [<ffffffff80280b4f>] default_wake_function+0x0/0xf
>>   [<ffffffff80290dcc>] call_usermodehelper_keys+0xd4/0xe8
>>   [<ffffffff80290de0>] __call_usermodehelper+0x0/0x64
>>   [<ffffffff8025affa>] kobject_get+0x1a/0x24
>>   [<ffffffff8035ff1c>] pci_device_probe+0x4d/0x78
>>   [<ffffffff803aaa8f>] driver_probe_device+0x5c/0xb4
>>   [<ffffffff803aabc9>] __driver_attach+0x67/0xb9
>>   [<ffffffff803aab62>] __driver_attach+0x0/0xb9
>>   [<ffffffff803aa44f>] bus_for_each_dev+0x4f/0x79
>>   [<ffffffff803aa9bc>] driver_attach+0x1c/0x1e
>>   [<ffffffff803aa01a>] bus_add_driver+0x7a/0x143
>>   [<ffffffff803aae63>] driver_register+0x9f/0xa6
>>   [<ffffffff80280b6e>] wake_up_process+0x10/0x12
>>   [<ffffffff80360107>] __pci_register_driver+0x59/0x7e
>>   [<ffffffff806b7799>] ahci_init+0x12/0x14
>>   [<ffffffff80267ece>] init+0x14e/0x2c2
>>   [<ffffffff80227b67>] schedule_tail+0x37/0x9e
>>   [<ffffffff80260972>] child_rip+0x8/0x12
>>   [<ffffffff80267d80>] init+0x0/0x2c2
>>   [<ffffffff8026096a>] child_rip+0x0/0x12
>>
>>
>> Code: f6 80 ce 00 00 00 01 75 04 31 c0 eb 05 b8 ff ff ff ff 5d c3
> 
> It oopsed here:
> 
> static
> int pci_msi_supported(struct pci_dev * dev)
> {
> 	struct pci_dev *pdev;
> 
> 	if (!pci_msi_enable || !dev || dev->no_msi)
> 		return -1;
> 
> 	/* find root complex for our device */
> 	pdev = dev;
> 	while (pdev->bus && pdev->bus->self)
> 		pdev = pdev->bus->self;
> 
> 	/* check its bus flags */
> 	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> 		return -1;
> 
> 	return 0;
> }
> 
> pdev->subordinate is NULL.
> 
> Two patch series touch that file.  The generic-irq wire-up and a couple of new
> ones in Greg's tree.  I'd be suspecting
> gregkh-pci-msi-stop-inheriting-bus-flags-and-check-root-chipset-bus-flags-instead.patch.
> 
> 
> To confirm that, could you please test 2.6.17 plus
> http://www.zip.com.au/~akpm/linux/patches/stuff/rf.bz2 with the same
> .config?  That's everything up to but not including the genirq changes.

   CC      arch/x86_64/kernel/smp.o
   CC      arch/x86_64/kernel/smpboot.o
   AS      arch/x86_64/kernel/trampoline.o
   CC      arch/x86_64/kernel/apic.o
   CC      arch/x86_64/kernel/nmi.o
   CC      arch/x86_64/kernel/io_apic.o
arch/x86_64/kernel/io_apic.c: In function 'ioapic_register_intr':
arch/x86_64/kernel/io_apic.c:887: error: 'handle_fastack_irq' undeclared (first
use in this function)
arch/x86_64/kernel/io_apic.c:887: error: (Each undeclared identifier is reported
only once
arch/x86_64/kernel/io_apic.c:887: error: for each function it appears in.)
arch/x86_64/kernel/io_apic.c: In function 'setup_ExtINT_IRQ0_pin':
arch/x86_64/kernel/io_apic.c:992: error: 'handle_fastack_irq' undeclared (first
use in this function)
arch/x86_64/kernel/io_apic.c: In function 'check_timer':
arch/x86_64/kernel/io_apic.c:1830: error: 'handle_fastack_irq' undeclared (first
use in this function)
make[1]: *** [arch/x86_64/kernel/io_apic.o] Error 1
make: *** [arch/x86_64/kernel] Error 2
[root@tornado linux-2.6-mm-temp-mm5tester]#

No go :(

> You may find that this gets things going again:
> 
> --- a/drivers/pci/msi.c~a
> +++ a/drivers/pci/msi.c
> @@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
>  	while (pdev->bus && pdev->bus->self)
>  		pdev = pdev->bus->self;
>  
> +	if (!pdev->subordinate)
> +		return -1;
> +
>  	/* check its bus flags */
>  	if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>  		return -1;
> _
Yes it does.  (Until I then notice that my raid-1 is still broken, but that's 
another story, and to be expected...)

reuben


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 12:31       ` 2.6.17-mm5 Reuben Farrelly
@ 2006-07-01 13:06         ` Brice Goglin
  2006-07-01 17:00           ` 2.6.17-mm5 Greg KH
  0 siblings, 1 reply; 35+ messages in thread
From: Brice Goglin @ 2006-07-01 13:06 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: Andrew Morton, linux-kernel, Greg KH

Reuben Farrelly wrote:
>>
>> It oopsed here:
>>
>> static
>> int pci_msi_supported(struct pci_dev * dev)
>> {
>>     struct pci_dev *pdev;
>>
>>     if (!pci_msi_enable || !dev || dev->no_msi)
>>         return -1;
>>
>>     /* find root complex for our device */
>>     pdev = dev;
>>     while (pdev->bus && pdev->bus->self)
>>         pdev = pdev->bus->self;
>>
>>     /* check its bus flags */
>>     if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>>         return -1;
>>
>>     return 0;
>> }
>>
>> pdev->subordinate is NULL.
>>
>
>> You may find that this gets things going again:
>>
>> --- a/drivers/pci/msi.c~a
>> +++ a/drivers/pci/msi.c
>> @@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
>>      while (pdev->bus && pdev->bus->self)
>>          pdev = pdev->bus->self;
>>  
>> +    if (!pdev->subordinate)
>> +        return -1;
>> +
>>      /* check its bus flags */
>>      if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
>>          return -1;
>> _
> Yes it does.

I was not expecting a root chipset without subordinate bus... Maybe we
should store the NO_MSI flags in the device itself instead of in its
subordinate bus (I would have to rework all my patches then). After all,
we don't inherit bus flags anymore, and I don't see why bus flags would
have been chosen initially except to help flags inheritance.
I am still convinced that checking to root chipset (bus) flags only is a
good idea since the root chipset is where MSI are translated from PCI
messages into DMA (we don't care about MSI support in the bridges
between the chipset and the devices since they only forward PCI messages).

Brice

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 13:06         ` 2.6.17-mm5 Brice Goglin
@ 2006-07-01 17:00           ` Greg KH
  0 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2006-07-01 17:00 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Reuben Farrelly, Andrew Morton, linux-kernel

On Sat, Jul 01, 2006 at 09:06:14AM -0400, Brice Goglin wrote:
> Reuben Farrelly wrote:
> >>
> >> It oopsed here:
> >>
> >> static
> >> int pci_msi_supported(struct pci_dev * dev)
> >> {
> >>     struct pci_dev *pdev;
> >>
> >>     if (!pci_msi_enable || !dev || dev->no_msi)
> >>         return -1;
> >>
> >>     /* find root complex for our device */
> >>     pdev = dev;
> >>     while (pdev->bus && pdev->bus->self)
> >>         pdev = pdev->bus->self;
> >>
> >>     /* check its bus flags */
> >>     if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> >>         return -1;
> >>
> >>     return 0;
> >> }
> >>
> >> pdev->subordinate is NULL.
> >>
> >
> >> You may find that this gets things going again:
> >>
> >> --- a/drivers/pci/msi.c~a
> >> +++ a/drivers/pci/msi.c
> >> @@ -913,6 +913,9 @@ int pci_msi_supported(struct pci_dev * d
> >>      while (pdev->bus && pdev->bus->self)
> >>          pdev = pdev->bus->self;
> >>  
> >> +    if (!pdev->subordinate)
> >> +        return -1;
> >> +
> >>      /* check its bus flags */
> >>      if (pdev->subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
> >>          return -1;
> >> _
> > Yes it does.
> 
> I was not expecting a root chipset without subordinate bus... Maybe we
> should store the NO_MSI flags in the device itself instead of in its
> subordinate bus (I would have to rework all my patches then).

If that solves this issue, I guess so.

> After all,
> we don't inherit bus flags anymore, and I don't see why bus flags would
> have been chosen initially except to help flags inheritance.
> I am still convinced that checking to root chipset (bus) flags only is a
> good idea since the root chipset is where MSI are translated from PCI
> messages into DMA (we don't care about MSI support in the bridges
> between the chipset and the devices since they only forward PCI messages).

Yes, I agree with that, just be able to handle the above issue too :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
  2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
@ 2006-07-01 18:03   ` Ralf Hildebrandt
       [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Ralf Hildebrandt @ 2006-07-01 18:03 UTC (permalink / raw)
  To: linux-kernel


Starting with -mm4 and now with -mm5 I see:

> Jul  1 19:54:29 knarzkiste kernel: powernow-k8: Found 1 AMD Turion(tm) 64 Mobile Technology ML-30 processors (version 2.00.00)
> Jul  1 19:54:29 knarzkiste kernel: ACPI: Invalid package argument
> Jul  1 19:54:29 knarzkiste kernel: ACPI Exception (acpi_processor-0272): AE_BAD_PARAMETER, Invalid _PSS data [20060623]
> Jul  1 19:54:29 knarzkiste kernel: powernow-k8:    0 : fid 0x0 (800 MHz), vid 0x12
> Jul  1 19:54:29 knarzkiste kernel: powernow-k8:    1 : fid 0x8 (1600 MHz), vid 0x4
> Jul  1 19:54:29 knarzkiste kernel: powernow-k8: ph2 null fid transition 0x8

I'm not sure if The "ACPI: Invalid package argument" and "ACPI Exception" are indicative of a real problem.

> Jul  1 19:54:15 knarzkiste kernel: CPU: AMD Turion(tm) 64 Mobile Technology ML-30 stepping 02
> Jul  1 19:54:15 knarzkiste kernel: Checking 'hlt' instruction... OK.
> Jul  1 19:54:15 knarzkiste kernel: ACPI: Core revision 20060623
> Jul  1 19:54:15 knarzkiste kernel: ENABLING IO-APIC IRQs

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums)         Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                 send no mail to spamtrap@charite.de

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
       [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
@ 2006-07-01 21:30     ` Andrew Morton
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
                         ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-01 21:30 UTC (permalink / raw)
  To: Grant Wilson; +Cc: linux-kernel, Neil Brown, linux-scsi

On Sat, 1 Jul 2006 15:24:19 +0100
Grant Wilson <grant.wilson@zen.co.uk> wrote:

> More RAID1 problems - OOPS on shutdown.

Thanks.  Please copy the mailing lists on these reports - I'm not an MD,
SCSI or SATA developer, and this is in their area.

> [   37.482699] md: Autodetecting RAID arrays.
> [   37.547908] md: autorun ...
> [   37.566449] md: considering sdb2 ...
> [   37.589664] md:  adding sdb2 ...
> [   37.610757] md:  adding sda2 ...
> [   37.632116] md: created md1
> [   37.650587] md: bind<sda2>
> [   37.668571] md: bind<sdb2>
> [   37.686541] md: running: <sdb2><sda2>
> [   37.710807] raid1: raid set md1 active with 2 out of 2 mirrors
> [   37.747557] md: ... autorun DONE.
> [   37.784444] EXT3-fs: INFO: recovery required on readonly filesystem.
> [   37.824275] EXT3-fs: write access will be enabled during recovery.
> [   38.814113] kjournald starting.  Commit interval 5 seconds
> [   38.848761] EXT3-fs: sdc1: orphan cleanup on readonly fs
> [   38.985436] EXT3-fs: sdc1: 7 orphan inodes deleted
> [   39.015845] EXT3-fs: recovery complete.
> [   39.072168] EXT3-fs: mounted filesystem with ordered data mode.
> [   44.693986] Adding 995988k swap on /dev/sda1.  Priority:-1 extents:1 across:995988k
> [   44.744558] Adding 995988k swap on /dev/sdb1.  Priority:-2 extents:1 across:995988k
> [   44.966034] EXT3 FS on sdc1, internal journal
> [   49.305350] device-mapper: ioctl: 4.8.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
> [   64.091331] raid1: Disk failure on sdb2, disabling device. 
> [   64.091333] 	Operation continuing on 1 devices
> [   64.212624] RAID1 conf printout:
> [   64.233951]  --- wd:1 rd:2
> [   64.252195]  disk 0, wo:0, o:1, dev:sda2
> [   64.277712]  disk 1, wo:1, o:0, dev:sdb2
> [   64.305627] RAID1 conf printout:
> [   64.326977]  --- wd:1 rd:2
> [   64.345220]  disk 0, wo:0, o:1, dev:sda2
> [

Which device drivers are being used for these disks?

> [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> [  155.292808] CPU 0 
> [  155.304968] Modules linked in: dm_mod evdev
> [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> [  155.966085] Call Trace:
> [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> [  156.542083] 
> [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 

The barrier code is in there again.

mddev->pers is NULL in md_error(), so the test of
!mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
now being exposed by the new barrier-handling problem.


This should get you further, but...

From: Andrew Morton <akpm@osdl.org>

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/md/md.c |    2 ++
 1 file changed, 2 insertions(+)

diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
--- a/drivers/md/md.c~md-oops-workaround
+++ a/drivers/md/md.c
@@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
 		__builtin_return_address(0),__builtin_return_address(1),
 		__builtin_return_address(2),__builtin_return_address(3));
 */
+	if (!mddev->pers)
+		return;
 	if (!mddev->pers->error_handler)
 		return;
 	mddev->pers->error_handler(mddev,rdev);
_


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
@ 2006-07-01 22:26       ` James Bottomley
  2006-07-01 22:32         ` 2.6.17-mm5 Neil Brown
  2006-07-01 22:54       ` 2.6.17-mm5 Jeff Garzik
  2006-07-27 21:02       ` 2.6.17-mm5 Ming Zhang
  2 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2006-07-01 22:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> On Sat, 1 Jul 2006 15:24:19 +0100
> Grant Wilson <grant.wilson@zen.co.uk> wrote:
> 
> > More RAID1 problems - OOPS on shutdown.

Actually, is there any more of the trace, like what was going on just
before the oops?

It looks very like a lifetime issue (i.e. md thinks the array is dead
and has torn it down, but there's still an outstanding command).  It
would be nice to know what the outstanding command might have been.

James


> Thanks.  Please copy the mailing lists on these reports - I'm not an MD,
> SCSI or SATA developer, and this is in their area.
> 
> > [   37.482699] md: Autodetecting RAID arrays.
> > [   37.547908] md: autorun ...
> > [   37.566449] md: considering sdb2 ...
> > [   37.589664] md:  adding sdb2 ...
> > [   37.610757] md:  adding sda2 ...
> > [   37.632116] md: created md1
> > [   37.650587] md: bind<sda2>
> > [   37.668571] md: bind<sdb2>
> > [   37.686541] md: running: <sdb2><sda2>
> > [   37.710807] raid1: raid set md1 active with 2 out of 2 mirrors
> > [   37.747557] md: ... autorun DONE.
> > [   37.784444] EXT3-fs: INFO: recovery required on readonly filesystem.
> > [   37.824275] EXT3-fs: write access will be enabled during recovery.
> > [   38.814113] kjournald starting.  Commit interval 5 seconds
> > [   38.848761] EXT3-fs: sdc1: orphan cleanup on readonly fs
> > [   38.985436] EXT3-fs: sdc1: 7 orphan inodes deleted
> > [   39.015845] EXT3-fs: recovery complete.
> > [   39.072168] EXT3-fs: mounted filesystem with ordered data mode.
> > [   44.693986] Adding 995988k swap on /dev/sda1.  Priority:-1 extents:1 across:995988k
> > [   44.744558] Adding 995988k swap on /dev/sdb1.  Priority:-2 extents:1 across:995988k
> > [   44.966034] EXT3 FS on sdc1, internal journal
> > [   49.305350] device-mapper: ioctl: 4.8.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
> > [   64.091331] raid1: Disk failure on sdb2, disabling device. 
> > [   64.091333] 	Operation continuing on 1 devices
> > [   64.212624] RAID1 conf printout:
> > [   64.233951]  --- wd:1 rd:2
> > [   64.252195]  disk 0, wo:0, o:1, dev:sda2
> > [   64.277712]  disk 1, wo:1, o:0, dev:sdb2
> > [   64.305627] RAID1 conf printout:
> > [   64.326977]  --- wd:1 rd:2
> > [   64.345220]  disk 0, wo:0, o:1, dev:sda2
> > [
> 
> Which device drivers are being used for these disks?
> 
> > [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> > [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> > [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> > [  155.292808] CPU 0 
> > [  155.304968] Modules linked in: dm_mod evdev
> > [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> > [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> > [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> > [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> > [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> > [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> > [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> > [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> > [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> > [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> > [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> > [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> > [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> > [  155.966085] Call Trace:
> > [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> > [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> > [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> > [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> > [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> > [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> > [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> > [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> > [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> > [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> > [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> > [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> > [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> > [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> > [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> > [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> > [  156.542083] 
> > [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 
> 
> The barrier code is in there again.
> 
> mddev->pers is NULL in md_error(), so the test of
> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 
> 
> This should get you further, but...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/md/md.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
> --- a/drivers/md/md.c~md-oops-workaround
> +++ a/drivers/md/md.c
> @@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
>  		__builtin_return_address(0),__builtin_return_address(1),
>  		__builtin_return_address(2),__builtin_return_address(3));
>  */
> +	if (!mddev->pers)
> +		return;
>  	if (!mddev->pers->error_handler)
>  		return;
>  	mddev->pers->error_handler(mddev,rdev);
> _
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
@ 2006-07-01 22:32         ` Neil Brown
  2006-07-01 22:56           ` 2.6.17-mm5 Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Neil Brown @ 2006-07-01 22:32 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, Grant Wilson, linux-kernel, linux-scsi

On Saturday July 1, James.Bottomley@SteelEye.com wrote:
> On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> > On Sat, 1 Jul 2006 15:24:19 +0100
> > Grant Wilson <grant.wilson@zen.co.uk> wrote:
> > 
> > > More RAID1 problems - OOPS on shutdown.
> 
> Actually, is there any more of the trace, like what was going on just
> before the oops?
> 
> It looks very like a lifetime issue (i.e. md thinks the array is dead
> and has torn it down, but there's still an outstanding command).  It
> would be nice to know what the outstanding command might have been.

md writes the superblock after tearing down the array, which is
admittedly a bit careless.

The problem seems to be simply that on some hardware at least,
BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.

NeilBrown

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
@ 2006-07-01 22:54       ` Jeff Garzik
  2006-07-27 21:02       ` 2.6.17-mm5 Ming Zhang
  2 siblings, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, Jul 01, 2006 at 02:30:47PM -0700, Andrew Morton wrote:
> Grant Wilson <grant.wilson@zen.co.uk> wrote:
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 

Also, would be nice to re-test without preempt.

Disabling preempt _continues_ to fix (bandaid?) problems...

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:32         ` 2.6.17-mm5 Neil Brown
@ 2006-07-01 22:56           ` Jeff Garzik
  2006-07-02  0:10             ` 2.6.17-mm5 James Bottomley
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2006-07-01 22:56 UTC (permalink / raw)
  To: Neil Brown
  Cc: James Bottomley, Andrew Morton, Grant Wilson, linux-kernel,
	linux-scsi

On Sun, Jul 02, 2006 at 08:32:28AM +1000, Neil Brown wrote:
> The problem seems to be simply that on some hardware at least,
> BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.

Could be that <whatever device> is choking on FLUSH CACHE (ATA)
or SYNCHRONIZE CACHE (SCSI).

That's one possible reason why EIO may result from a barrier...

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 22:56           ` 2.6.17-mm5 Jeff Garzik
@ 2006-07-02  0:10             ` James Bottomley
  0 siblings, 0 replies; 35+ messages in thread
From: James Bottomley @ 2006-07-02  0:10 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Neil Brown, Andrew Morton, Grant Wilson, linux-kernel, linux-scsi

On Sat, 2006-07-01 at 18:56 -0400, Jeff Garzik wrote:
> On Sun, Jul 02, 2006 at 08:32:28AM +1000, Neil Brown wrote:
> > The problem seems to be simply that on some hardware at least,
> > BIO_RW_BARRIER writes result in an EIO.  Don't know why yet.
> 
> Could be that <whatever device> is choking on FLUSH CACHE (ATA)
> or SYNCHRONIZE CACHE (SCSI).
> 
> That's one possible reason why EIO may result from a barrier...

There is no barrier implementation on SCSI (basically you can't maintain
barriers in the face of TCQ, so only depth one devices can do it and
hence all the scsi drivers turn it off), so it must be a FLUSH CACHE.

This one looks like it went down via prepare_flush rather than
issue_flush, so the normal error printing case that issue flush has is
skipped.  This patch should tell us what the error was on the command.

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3d04a9f..3e3e3b7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1162,7 +1162,20 @@ static int scsi_issue_flush_fn(request_q
 
 static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
 {
+	int res = cmd->result;
+	struct scsi_sense_hdr sshdr;
+
 	BUG_ON(!blk_pc_request(cmd->request));
+	if (!res) {
+		printk(KERN_ERR "REQ_BLOCK_PC FAILED for ");
+		__scsi_print_command(cmd->cmnd);
+		printk(KERN_ERR "FAILED\n  status = %x, message = %02x, "
+		       "host = %d, driver = %02x\n  ",
+		       status_byte(res), msg_byte(res),
+		       host_byte(res), driver_byte(res));
+		if (scsi_command_normalize_sense(cmd, &sshdr))
+			scsi_print_sense_hdr("sd", &sshdr);
+	}
 	/*
 	 * This will complete the whole command with uptodate=1 so
 	 * as far as the block layer is concerned the command completed


James



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
                     ` (2 preceding siblings ...)
       [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
@ 2006-07-02 10:03   ` Andy Whitcroft
  2006-07-02 10:14     ` 2.6.17-mm5 Andrew Morton
  2006-07-03  0:47   ` 2.6.17-mm5 Theodore Tso
  2006-07-03  7:32   ` 2.6.17-mm5 Heiko Carstens
  5 siblings, 1 reply; 35+ messages in thread
From: Andy Whitcroft @ 2006-07-02 10:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Seems that we have some kind of schedular balance panic, I want to say
back as this seems very familiar.  Seems to be affecting the multi-node
NUMA-Q systems here.  The single node ones appear unaffected.

Nothing jumps out of the patch list.  Any suggestions as to what to rip
out :)

-apw

divide error: 0000 [#1]
8K_STACKS SMP
last sysfs file:
Modules linked in:
CPU:    3
EIP:    0060:[<c0112b6e>]    Not tainted VLI
EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1)
EIP is at find_busiest_group+0x1a3/0x47c
eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
esi: 00000000   edi: e7677264   ebp: e74a3ec8   esp: e74a3e58
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=e74a2000 task=e7485030 task.ti=e74a2000)
Stack: e7677264 00000010 c0119020 00000000 00000000 00000000 00000000
00000000
       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080
00000000
       00000000 00000200 00000020 00000080 00000000 00000000 e7677260
c13dc960
Call Trace:
 [<c0119020>] vprintk+0x5f/0x213
 [<c0112efb>] load_balance+0x54/0x1d6
 [<c011332d>] rebalance_tick+0xc5/0xe3
 [<c01137a3>] scheduler_tick+0x2cb/0x2d3
 [<c01215b4>] update_process_times+0x51/0x5d
 [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
 [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
 [<c01006c0>] default_idle+0x0/0x59
 [<c01006f1>] default_idle+0x31/0x59
 [<c0100791>] cpu_idle+0x64/0x79
Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45
dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1
83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b
EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e74a3e58
 <0>Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 10:03   ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-02 10:14     ` Andrew Morton
  2006-07-02 10:40       ` 2.6.17-mm5 Andy Whitcroft
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2006-07-02 10:14 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-kernel

On Sun, 02 Jul 2006 11:03:16 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> Seems that we have some kind of schedular balance panic, I want to say
> back as this seems very familiar.  Seems to be affecting the multi-node
> NUMA-Q systems here.  The single node ones appear unaffected.
> 
> Nothing jumps out of the patch list.  Any suggestions as to what to rip
> out :)
> 
> -apw
> 
> divide error: 0000 [#1]
> 8K_STACKS SMP
> last sysfs file:
> Modules linked in:
> CPU:    3
> EIP:    0060:[<c0112b6e>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1)
> EIP is at find_busiest_group+0x1a3/0x47c
> eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> esi: 00000000   edi: e7677264   ebp: e74a3ec8   esp: e74a3e58
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, ti=e74a2000 task=e7485030 task.ti=e74a2000)
> Stack: e7677264 00000010 c0119020 00000000 00000000 00000000 00000000
> 00000000
>        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080
> 00000000
>        00000000 00000200 00000020 00000080 00000000 00000000 e7677260
> c13dc960
> Call Trace:
>  [<c0119020>] vprintk+0x5f/0x213
>  [<c0112efb>] load_balance+0x54/0x1d6
>  [<c011332d>] rebalance_tick+0xc5/0xe3
>  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>  [<c01215b4>] update_process_times+0x51/0x5d
>  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>  [<c01006c0>] default_idle+0x0/0x59
>  [<c01006f1>] default_idle+0x31/0x59
>  [<c0100791>] cpu_idle+0x64/0x79
> Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45
> dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1
> 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b
> EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e74a3e58
>  <0>Kernel panic - not syncing: Fatal exception in interrupt

Well there are only a handful of divides in find_busiest_group().  Wanna
have a poke around in gdb and work out which one you're hitting?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 10:14     ` 2.6.17-mm5 Andrew Morton
@ 2006-07-02 10:40       ` Andy Whitcroft
  2006-07-02 11:14         ` 2.6.17-mm5 Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Andy Whitcroft @ 2006-07-02 10:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> On Sun, 02 Jul 2006 11:03:16 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
> 
>>Seems that we have some kind of schedular balance panic, I want to say
>>back as this seems very familiar.  Seems to be affecting the multi-node
>>NUMA-Q systems here.  The single node ones appear unaffected.
>>
>>Nothing jumps out of the patch list.  Any suggestions as to what to rip
>>out :)
>>
>>-apw
>>
>>divide error: 0000 [#1]
>>8K_STACKS SMP
>>last sysfs file:
>>Modules linked in:
>>CPU:    3
>>EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1)
>>EIP is at find_busiest_group+0x1a3/0x47c
>>eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>esi: 00000000   edi: e7677264   ebp: e74a3ec8   esp: e74a3e58
>>ds: 007b   es: 007b   ss: 0068
>>Process swapper (pid: 0, ti=e74a2000 task=e7485030 task.ti=e74a2000)
>>Stack: e7677264 00000010 c0119020 00000000 00000000 00000000 00000000
>>00000000
>>       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080
>>00000000
>>       00000000 00000200 00000020 00000080 00000000 00000000 e7677260
>>c13dc960
>>Call Trace:
>> [<c0119020>] vprintk+0x5f/0x213
>> [<c0112efb>] load_balance+0x54/0x1d6
>> [<c011332d>] rebalance_tick+0xc5/0xe3
>> [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>> [<c01215b4>] update_process_times+0x51/0x5d
>> [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>> [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>> [<c01006c0>] default_idle+0x0/0x59
>> [<c01006f1>] default_idle+0x31/0x59
>> [<c0100791>] cpu_idle+0x64/0x79
>>Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45
>>dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1
>>83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b
>>EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e74a3e58
>> <0>Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> Well there are only a handful of divides in find_busiest_group().  Wanna
> have a poke around in gdb and work out which one you're hitting?

Sure I'll see what information I can get on this one.

-apw

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 10:40       ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-02 11:14         ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-02 11:14 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-kernel

On Sun, 02 Jul 2006 11:40:26 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> > Well there are only a handful of divides in find_busiest_group().  Wanna
> > have a poke around in gdb and work out which one you're hitting?
> 
> Sure I'll see what information I can get on this one.

Easy way:

Set CONFIG_DEBUG_INFO, do:

make kernel/sched.o
gdb kernel/sched.o
(gdb) p find_busiest_group
$1 = {struct sched_group *(struct sched_domain *, int, long unsigned int *, 
    enum idle_type, int *)} 0xff0 <find_busiest_group>
(gdb) l *(0xff0 + 0x1a3)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* 2.6.17-mm5
@ 2006-07-02 23:27 Martin J. Bligh
  2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Martin J. Bligh @ 2006-07-02 23:27 UTC (permalink / raw)
  To: akpm, linux-kernel

Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch

divide error: 0000 [#1]
8K_STACKS SMP 
last sysfs file: 
Modules linked in:
CPU:    1
EIP:    0060:[<c0112b6e>]    Not tainted VLI
EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
EIP is at find_busiest_group+0x1a3/0x47c
eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
       00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
Call Trace:
 [<c0119020>] vprintk+0x5f/0x213
 [<c0112efb>] load_balance+0x54/0x1d6
 [<c011332d>] rebalance_tick+0xc5/0xe3
 [<c01137a3>] scheduler_tick+0x2cb/0x2d3
 [<c01215b4>] update_process_times+0x51/0x5d
 [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
 [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
 [<c01006c0>] default_idle+0x0/0x59
 [<c01006f1>] default_idle+0x31/0x59
 [<c0100791>] cpu_idle+0x64/0x79
Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 23:27 2.6.17-mm5 Martin J. Bligh
@ 2006-07-02 23:41 ` Andrew Morton
  2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
  2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
  0 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-02 23:41 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andy Whitcroft

On Sun, 02 Jul 2006 16:27:55 -0700
"Martin J. Bligh" <mbligh@mbligh.org> wrote:

> Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
> 
> divide error: 0000 [#1]
> 8K_STACKS SMP 
> last sysfs file: 
> Modules linked in:
> CPU:    1
> EIP:    0060:[<c0112b6e>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
> EIP is at find_busiest_group+0x1a3/0x47c
> eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
> Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
> Call Trace:
>  [<c0119020>] vprintk+0x5f/0x213
>  [<c0112efb>] load_balance+0x54/0x1d6
>  [<c011332d>] rebalance_tick+0xc5/0xe3
>  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>  [<c01215b4>] update_process_times+0x51/0x5d
>  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>  [<c01006c0>] default_idle+0x0/0x59
>  [<c01006f1>] default_idle+0x31/0x59
>  [<c0100791>] cpu_idle+0x64/0x79
> Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
> EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58

Yes, Andy's reporting that too.  I asked him to identify the file-n-line
and he ran away on me.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
                     ` (3 preceding siblings ...)
  2006-07-02 10:03   ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-03  0:47   ` Theodore Tso
  2006-07-03  7:32   ` 2.6.17-mm5 Heiko Carstens
  5 siblings, 0 replies; 35+ messages in thread
From: Theodore Tso @ 2006-07-03  0:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

The following patch is needed to fix UML compilation in -mm5 given
that alternatives_smp_module_add and alternatives_smp_module_del are
null inline functions if !CONFIG_SMP.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Index: linux-2.6.17-mm5/arch/um/kernel/um_arch.c
===================================================================
--- linux-2.6.17-mm5.orig/arch/um/kernel/um_arch.c	2006-07-02 20:37:17.000000000 -0400
+++ linux-2.6.17-mm5/arch/um/kernel/um_arch.c	2006-07-02 20:38:08.000000000 -0400
@@ -495,6 +495,7 @@
 {
 }
 
+#ifdef CONFIG_SMP
 void alternatives_smp_module_add(struct module *mod, char *name,
 				 void *locks, void *locks_end,
 				 void *text,  void *text_end)
@@ -504,3 +505,4 @@
 void alternatives_smp_module_del(struct module *mod)
 {
 }
+#endif

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch] sched: fix macro -> inline function conversion bug
  2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
@ 2006-07-03  5:25   ` Ingo Molnar
  2006-07-03  5:42     ` Andrew Morton
  2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
  1 sibling, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2006-07-03  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel, Andy Whitcroft


* Andrew Morton <akpm@osdl.org> wrote:

> On Sun, 02 Jul 2006 16:27:55 -0700
> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> 
> > Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
> > 
> > divide error: 0000 [#1]
> > 8K_STACKS SMP 
> > last sysfs file: 
> > Modules linked in:
> > CPU:    1
> > EIP:    0060:[<c0112b6e>]    Not tainted VLI
> > EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
> > EIP is at find_busiest_group+0x1a3/0x47c
> > eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> > esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
> > ds: 007b   es: 007b   ss: 0068
> > Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
> > Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
> >        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
> >        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
> > Call Trace:
> >  [<c0119020>] vprintk+0x5f/0x213
> >  [<c0112efb>] load_balance+0x54/0x1d6
> >  [<c011332d>] rebalance_tick+0xc5/0xe3
> >  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
> >  [<c01215b4>] update_process_times+0x51/0x5d
> >  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
> >  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
> >  [<c01006c0>] default_idle+0x0/0x59
> >  [<c01006f1>] default_idle+0x31/0x59
> >  [<c0100791>] cpu_idle+0x64/0x79
> > Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
> > EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
> 
> Yes, Andy's reporting that too.  I asked him to identify the 
> file-n-line and he ran away on me.

i checked the scheduler queue and nothing jumped out at me, except the 
cleanup bug fixed by the patch below. (which should be harmless in this 
particular case - nr_running should never be smaller than 0 or larger 
than ~4 billion. A fix is warranted nevertheless.)

	Ingo

-------------->
Subject: sched: fix macro -> inline function conversion bug
From: Ingo Molnar <mingo@elte.hu>

nr_running is long, not int.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -2480,7 +2480,7 @@ find_busiest_queue(struct sched_group *g
  */
 #define MAX_PINNED_INTERVAL	512
 
-static inline int minus_1_or_zero(int n)
+static inline unsigned long minus_1_or_zero(unsigned long n)
 {
 	return n > 0 ? n - 1 : 0;
 }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
@ 2006-07-03  5:42     ` Andrew Morton
  2006-07-03  6:03       ` Ingo Molnar
  2006-07-03  6:06       ` Peter Williams
  0 siblings, 2 replies; 35+ messages in thread
From: Andrew Morton @ 2006-07-03  5:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: mbligh, linux-kernel, apw

On Mon, 3 Jul 2006 07:25:39 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > On Sun, 02 Jul 2006 16:27:55 -0700
> > "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> > 
> > > Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
> > > 
> > > divide error: 0000 [#1]
> > > 8K_STACKS SMP 
> > > last sysfs file: 
> > > Modules linked in:
> > > CPU:    1
> > > EIP:    0060:[<c0112b6e>]    Not tainted VLI
> > > EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
> > > EIP is at find_busiest_group+0x1a3/0x47c
> > > eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
> > > esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
> > > ds: 007b   es: 007b   ss: 0068
> > > Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
> > > Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
> > >        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
> > >        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
> > > Call Trace:
> > >  [<c0119020>] vprintk+0x5f/0x213
> > >  [<c0112efb>] load_balance+0x54/0x1d6
> > >  [<c011332d>] rebalance_tick+0xc5/0xe3
> > >  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
> > >  [<c01215b4>] update_process_times+0x51/0x5d
> > >  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
> > >  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
> > >  [<c01006c0>] default_idle+0x0/0x59
> > >  [<c01006f1>] default_idle+0x31/0x59
> > >  [<c0100791>] cpu_idle+0x64/0x79
> > > Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
> > > EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
> > 
> > Yes, Andy's reporting that too.  I asked him to identify the 
> > file-n-line and he ran away on me.
> 
> i checked the scheduler queue and nothing jumped out at me, except the 
> cleanup bug fixed by the patch below. (which should be harmless in this 
> particular case - nr_running should never be smaller than 0 or larger 
> than ~4 billion. A fix is warranted nevertheless.)

Did you work out which divide is getting the div-by-zero?  I started at it
a bit and wasn't sure - am getting wildly different code generation over
here.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  5:42     ` Andrew Morton
@ 2006-07-03  6:03       ` Ingo Molnar
  2006-07-03  6:08         ` Ingo Molnar
  2006-07-03  6:06       ` Peter Williams
  1 sibling, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2006-07-03  6:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mbligh, linux-kernel, apw


* Andrew Morton <akpm@osdl.org> wrote:

> > i checked the scheduler queue and nothing jumped out at me, except 
> > the cleanup bug fixed by the patch below. (which should be harmless 
> > in this particular case - nr_running should never be smaller than 0 
> > or larger than ~4 billion. A fix is warranted nevertheless.)
> 
> Did you work out which divide is getting the div-by-zero?  I started 
> at it a bit and wasn't sure - am getting wildly different code 
> generation over here.

my bet is on sched-group-cpu-power-setup-cleanup.patch.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  5:42     ` Andrew Morton
  2006-07-03  6:03       ` Ingo Molnar
@ 2006-07-03  6:06       ` Peter Williams
  1 sibling, 0 replies; 35+ messages in thread
From: Peter Williams @ 2006-07-03  6:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, mbligh, linux-kernel, apw

Andrew Morton wrote:
> On Mon, 3 Jul 2006 07:25:39 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
>> * Andrew Morton <akpm@osdl.org> wrote:
>>
>>> On Sun, 02 Jul 2006 16:27:55 -0700
>>> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
>>>
>>>> Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
>>>>
>>>> divide error: 0000 [#1]
>>>> 8K_STACKS SMP 
>>>> last sysfs file: 
>>>> Modules linked in:
>>>> CPU:    1
>>>> EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>>> EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
>>>> EIP is at find_busiest_group+0x1a3/0x47c
>>>> eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>>> esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
>>>> ds: 007b   es: 007b   ss: 0068
>>>> Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
>>>> Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>>>>        ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>>>>        00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
>>>> Call Trace:
>>>>  [<c0119020>] vprintk+0x5f/0x213
>>>>  [<c0112efb>] load_balance+0x54/0x1d6
>>>>  [<c011332d>] rebalance_tick+0xc5/0xe3
>>>>  [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>>>>  [<c01215b4>] update_process_times+0x51/0x5d
>>>>  [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>>>>  [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>>>>  [<c01006c0>] default_idle+0x0/0x59
>>>>  [<c01006f1>] default_idle+0x31/0x59
>>>>  [<c0100791>] cpu_idle+0x64/0x79
>>>> Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
>>>> EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
>>> Yes, Andy's reporting that too.  I asked him to identify the 
>>> file-n-line and he ran away on me.
>> i checked the scheduler queue and nothing jumped out at me, except the 
>> cleanup bug fixed by the patch below. (which should be harmless in this 
>> particular case - nr_running should never be smaller than 0 or larger 
>> than ~4 billion. A fix is warranted nevertheless.)
> 
> Did you work out which divide is getting the div-by-zero?  I started at it
> a bit and wasn't sure - am getting wildly different code generation over
> here.

As far as I can see all divides, except those that rely on 
group->cpu_power being non zero, in find_busiest_queue() are protected 
against divide by zero.  So this would suggest that initialization of 
the scheduler group data would be the place to look.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  6:03       ` Ingo Molnar
@ 2006-07-03  6:08         ` Ingo Molnar
  2006-07-05 19:36           ` Siddha, Suresh B
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2006-07-03  6:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mbligh, linux-kernel, apw


* Ingo Molnar <mingo@elte.hu> wrote:

> > Did you work out which divide is getting the div-by-zero?  I started 
> > at it a bit and wasn't sure - am getting wildly different code 
> > generation over here.
> 
> my bet is on sched-group-cpu-power-setup-cleanup.patch.

in particular, we dont seem to initialize ->cpu_power properly. Martin, 
does the patch below solve your crash?

	Ingo

-------------->
Subject: sched: group cpu power setup cleanup, fix
From: Ingo Molnar <mingo@elte.hu>

- fix missing initialization of ->cpu_power
- clean up the cleanup

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h |    2 +-
 kernel/sched.c        |    9 +++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -636,7 +636,7 @@ enum idle_type
 	((sched_mc_power_savings || sched_smt_power_savings) ?	\
 					SD_POWERSAVINGS_BALANCE : 0)
 
-#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
+#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
 
 
 struct sched_group {
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1292,7 +1292,7 @@ static int sched_balance_self(int cpu, i
 		cpu = new_cpu;
 nextlevel:
 		sd = sd->child;
-		if (sd && sd->flags & flag)
+		if (test_sd_flag(sd, flag))
 			goto nextlevel;
 		/* while loop will break here if sd == NULL */
 	}
@@ -6224,6 +6224,7 @@ static int cpu_to_allnodes_group(int cpu
 {
 	return cpu_to_node(cpu);
 }
+
 static void init_numa_sched_groups_power(struct sched_group *group_head)
 {
 	struct sched_group *sg = group_head;
@@ -6314,8 +6315,12 @@ static void init_sched_groups_power(int 
 	struct sched_domain *child;
 	struct sched_group *group;
 
-	if (!sd || !sd->groups || (cpu != first_cpu(sd->groups->cpumask)))
+	WARN_ON(!sd || !sd->groups);
+
+	if (cpu != first_cpu(sd->groups->cpumask)) {
+		sd->groups->cpu_power = SCHED_LOAD_SCALE;
 		return;
+	}
 
 	child = sd->child;
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
                     ` (4 preceding siblings ...)
  2006-07-03  0:47   ` 2.6.17-mm5 Theodore Tso
@ 2006-07-03  7:32   ` Heiko Carstens
  5 siblings, 0 replies; 35+ messages in thread
From: Heiko Carstens @ 2006-07-03  7:32 UTC (permalink / raw)
  To: Andrew Morton, Martin Peschke; +Cc: linux-kernel

  LD      .tmp_vmlinux1
drivers/s390/built-in.o(.text+0x587f2): In function `zfcp_ccw_set_online':
: undefined reference to `statistic_create'
drivers/s390/built-in.o(.text+0x58838): In function `zfcp_ccw_set_online':
: undefined reference to `statistic_remove'
drivers/s390/built-in.o(.text+0x58954): In function `zfcp_ccw_set_offline':
: undefined reference to `statistic_remove'
drivers/s390/built-in.o(.text+0x603e0): In function `zfcp_erp_thread':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x60676): In function `zfcp_erp_thread':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x62000): In function `zfcp_qdio_response_handler':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x622b2): In function `zfcp_qdio_sbals_from_sg':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x6258a): In function `zfcp_qdio_sbals_from_scsicmnd':
: undefined reference to `statistic_add'
drivers/s390/built-in.o(.text+0x6280c): more undefined references to `statistic_add' follow
make: *** [.tmp_vmlinux1] Error 1

Guess there is a couple of do {} while(0) defines missing in
include/linux/statistic.h for the !CONFIG_STATISTICS case. Martin?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
  2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
@ 2006-07-03  8:23   ` Andy Whitcroft
  2006-07-03 14:19     ` 2.6.17-mm5 Andy Whitcroft
  1 sibling, 1 reply; 35+ messages in thread
From: Andy Whitcroft @ 2006-07-03  8:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel

Andrew Morton wrote:
> On Sun, 02 Jul 2006 16:27:55 -0700
> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> 
> 
>>Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
>>
>>divide error: 0000 [#1]
>>8K_STACKS SMP 
>>last sysfs file: 
>>Modules linked in:
>>CPU:    1
>>EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
>>EIP is at find_busiest_group+0x1a3/0x47c
>>eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
>>ds: 007b   es: 007b   ss: 0068
>>Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
>>Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>>       ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>>       00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
>>Call Trace:
>> [<c0119020>] vprintk+0x5f/0x213
>> [<c0112efb>] load_balance+0x54/0x1d6
>> [<c011332d>] rebalance_tick+0xc5/0xe3
>> [<c01137a3>] scheduler_tick+0x2cb/0x2d3
>> [<c01215b4>] update_process_times+0x51/0x5d
>> [<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>> [<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>> [<c01006c0>] default_idle+0x0/0x59
>> [<c01006f1>] default_idle+0x31/0x59
>> [<c0100791>] cpu_idle+0x64/0x79
>>Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
>>EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
> 
> 
> Yes, Andy's reporting that too.  I asked him to identify the file-n-line
> and he ran away on me.

I went away to debug it, but then had to skip out to a BBQ.  Its
definatly the cpu_power on the group being zero.

group->cpu_power ZERO => c3150920

/me gives Ingo's patch a spin.

-apw

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
@ 2006-07-03 14:19     ` Andy Whitcroft
  0 siblings, 0 replies; 35+ messages in thread
From: Andy Whitcroft @ 2006-07-03 14:19 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Andy Whitcroft, Andrew Morton, Martin J. Bligh, linux-kernel,
	Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 3284 bytes --]

Andy Whitcroft wrote:
> Andrew Morton wrote:
> 
>>On Sun, 02 Jul 2006 16:27:55 -0700
>>"Martin J. Bligh" <mbligh@mbligh.org> wrote:
>>
>>
>>
>>>Panic on NUMA-Q (mm4 was fine). Presumably some new scheduler patch
>>>
>>>divide error: 0000 [#1]
>>>8K_STACKS SMP 
>>>last sysfs file: 
>>>Modules linked in:
>>>CPU:    1
>>>EIP:    0060:[<c0112b6e>]    Not tainted VLI
>>>EFLAGS: 00010046   (2.6.17-mm5-autokern1 #1) 
>>>EIP is at find_busiest_group+0x1a3/0x47c
>>>eax: 00000000   ebx: 00000007   ecx: 00000000   edx: 00000000
>>>esi: 00000000   edi: e75ff264   ebp: e7405ec8   esp: e7405e58
>>>ds: 007b   es: 007b   ss: 0068
>>>Process swapper (pid: 0, ti=e7404000 task=c13f8560 task.ti=e7404000)
>>>Stack: e75ff264 00000010 c0119020 00000000 00000000 00000000 00000000 00000000 
>>>      ffffffff 00000000 00000000 00000001 00000001 00000001 00000080 00000000 
>>>      00000000 00000200 00000020 00000080 00000000 00000000 e75ff260 c1364960 
>>>Call Trace:
>>>[<c0119020>] vprintk+0x5f/0x213
>>>[<c0112efb>] load_balance+0x54/0x1d6
>>>[<c011332d>] rebalance_tick+0xc5/0xe3
>>>[<c01137a3>] scheduler_tick+0x2cb/0x2d3
>>>[<c01215b4>] update_process_times+0x51/0x5d
>>>[<c010c224>] smp_apic_timer_interrupt+0x5a/0x61
>>>[<c0102d5b>] apic_timer_interrupt+0x1f/0x24
>>>[<c01006c0>] default_idle+0x0/0x59
>>>[<c01006f1>] default_idle+0x31/0x59
>>>[<c0100791>] cpu_idle+0x64/0x79
>>>Code: 00 5b 83 f8 1f 89 c6 5f 0f 8e 63 ff ff ff 8b 45 e0 8b 55 e8 01 45 dc 8b 4a 08 89 c2 01 4d d4 c1 e2 07 89 d0 31 d2 89 ce c1 ee 07 <f7> f1 83 7d 9c 00 89 45 e0 74 17 89 45 d8 8b 55 e8 8b 4d a4 8b 
>>>EIP: [<c0112b6e>] find_busiest_group+0x1a3/0x47c SS:ESP 0068:e7405e58
>>
>>
>>Yes, Andy's reporting that too.  I asked him to identify the file-n-line
>>and he ran away on me.
> 
> 
> I went away to debug it, but then had to skip out to a BBQ.  Its
> definatly the cpu_power on the group being zero.
> 
> group->cpu_power ZERO => c3150920
> 
> /me gives Ingo's patch a spin.

Ok.  Thats not fixed it either.

Confirmed that it is caused by the patch below, backing it out sorts things:

    sched-group-cpu-power-setup-cleanup.patch

Did a fair bit of analysis of this problem, and it seems that the issue
is where we initialise the NUMA domains.  For each cpu we initialise a
domain spanning the whole machine, but ordered starting at the node in
which we start.  In the original code we used the following to
initialise each of these groups:

        for (i = 0; i < MAX_NUMNODES; i++)
                init_numa_sched_groups_power(sched_group_nodes[i]);

init_numa_sched_groups_power iterated over all of the groups within the
domain and added up power based on the physical packages.  Now we use:

        for_each_cpu_mask(i, *cpu_map) {
                sd = &per_cpu(node_domains, i);
                init_sched_groups_power(i, sd);
        }

init_sched_groups_power only thinks in terms of the first group of the
domain, which leaves the subsequent groups in the domain at power 0.

I've tried reverting just that part of the change (as attached) and that
also seems to fix things.  However, are we correct in all the other
cases ignoring the subsequent groups?  I am also not sure if this will
change the purpose of the patch?  It seems unlikely but ...

Suresh I guess I'll punt to you :).

-apw

[-- Attachment #2: sched-revert-numa-domain-init --]
[-- Type: text/plain, Size: 1025 bytes --]

sched revert numa domain init

Seems that the schedular domains for the NUMA nodea arn't being
initialised correctly.  Their cpu_power is left at 0 leading to
panic in load balancing.  Seem that just reverting this part of
the change below is enough to fix booting of NUMA systems.

   sched-group-cpu-power-setup-cleanup.patch

Not sure if this changes the purpose of that patch.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
 sched.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)
diff -upN reference/kernel/sched.c current/kernel/sched.c
--- reference/kernel/sched.c
+++ current/kernel/sched.c
@@ -6656,10 +6656,8 @@ printk(KERN_WARNING "init CPU domains\n"
 
 printk(KERN_WARNING "init NUMA domains\n");
 #ifdef CONFIG_NUMA
-	for_each_cpu_mask(i, *cpu_map) {
-		sd = &per_cpu(node_domains, i);
-		init_sched_groups_power(i, sd);
-	}
+        for (i = 0; i < MAX_NUMNODES; i++)
+		init_numa_sched_groups_power(sched_group_nodes[i]);
 
 	init_numa_sched_groups_power(sched_group_allnodes);
 #endif

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-03  6:08         ` Ingo Molnar
@ 2006-07-05 19:36           ` Siddha, Suresh B
  2006-07-05 20:02             ` Ingo Molnar
  2006-07-06  8:27             ` Andy Whitcroft
  0 siblings, 2 replies; 35+ messages in thread
From: Siddha, Suresh B @ 2006-07-05 19:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, mbligh, linux-kernel, apw

Martin, Andy: Can you please try the appended patch on top of 2.6.17-mm5?

thanks,
suresh

On Mon, Jul 03, 2006 at 08:08:32AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > Did you work out which divide is getting the div-by-zero?  I started 
> > > at it a bit and wasn't sure - am getting wildly different code 
> > > generation over here.
> > 
> > my bet is on sched-group-cpu-power-setup-cleanup.patch.
> 
> in particular, we dont seem to initialize ->cpu_power properly. Martin, 
> does the patch below solve your crash?
> 
>  		sd = sd->child;
> -		if (sd && sd->flags & flag)
> +		if (test_sd_flag(sd, flag))

There is a bug in my patch. Appended patch fixes this.

> -	if (!sd || !sd->groups || (cpu != first_cpu(sd->groups->cpumask)))
> +	WARN_ON(!sd || !sd->groups);
> +
> +	if (cpu != first_cpu(sd->groups->cpumask)) {
> +		sd->groups->cpu_power = SCHED_LOAD_SCALE;
>  		return;

This is also not correct and will corrupt some of the groups cpu_power.
NUMA sched group setup is some what different from the other domains like
HT and SMP. Appended patch has the correct fix.

--

- go back to original numa sched group power initialization
- fix the sched_balance_self code
- some cleanup as suggested by Ingo.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

--- linux-2.6.17mm5/kernel/sched.c~	2006-07-05 10:15:27.274721992 -0700
+++ linux-2.6.17mm5/kernel/sched.c	2006-07-05 10:34:01.072399008 -0700
@@ -1292,7 +1292,7 @@ static int sched_balance_self(int cpu, i
 		cpu = new_cpu;
 nextlevel:
 		sd = sd->child;
-		if (sd && sd->flags & flag)
+		if (sd && !(sd->flags & flag))
 			goto nextlevel;
 		/* while loop will break here if sd == NULL */
 	}
@@ -5534,7 +5534,7 @@ static void cpu_attach_domain(struct sch
 
 	if (sd && sd_degenerate(sd)) {
 		sd = sd->parent;
-		if(sd)
+		if (sd)
 			sd->child = NULL;
 	}
 
@@ -6224,6 +6224,7 @@ static int cpu_to_allnodes_group(int cpu
 {
 	return cpu_to_node(cpu);
 }
+
 static void init_numa_sched_groups_power(struct sched_group *group_head)
 {
 	struct sched_group *sg = group_head;
@@ -6314,7 +6315,9 @@ static void init_sched_groups_power(int 
 	struct sched_domain *child;
 	struct sched_group *group;
 
-	if (!sd || !sd->groups || (cpu != first_cpu(sd->groups->cpumask)))
+	WARN_ON(!sd || !sd->groups);
+
+	if (cpu != first_cpu(sd->groups->cpumask))
 		return;
 
 	child = sd->child;
@@ -6596,10 +6599,8 @@ static int build_sched_domains(const cpu
 	}
 
 #ifdef CONFIG_NUMA
-	for_each_cpu_mask(i, *cpu_map) {
-		sd = &per_cpu(node_domains, i);
-		init_sched_groups_power(i, sd);
-	}
+	for (i = 0; i < MAX_NUMNODES; i++)
+		init_numa_sched_groups_power(sched_group_nodes[i]);
 
 	init_numa_sched_groups_power(sched_group_allnodes);
 #endif
--- linux-2.6.17mm5/include/linux/sched.h~	2006-07-05 10:18:10.014981712 -0700
+++ linux-2.6.17mm5/include/linux/sched.h	2006-07-05 10:30:55.889551080 -0700
@@ -636,7 +636,7 @@ enum idle_type
 	((sched_mc_power_savings || sched_smt_power_savings) ?	\
 					SD_POWERSAVINGS_BALANCE : 0)
 
-#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
+#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
 
 
 struct sched_group {

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 19:36           ` Siddha, Suresh B
@ 2006-07-05 20:02             ` Ingo Molnar
  2006-07-05 21:09               ` Siddha, Suresh B
  2006-07-06  8:27             ` Andy Whitcroft
  1 sibling, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2006-07-05 20:02 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: Andrew Morton, mbligh, linux-kernel, apw


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> -		if (sd && sd->flags & flag)
> +		if (sd && !(sd->flags & flag))

use test_sd_flag() here, as i did in my fix patch.

> -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)

remove the 'sd' check in test_sd_flag. In the other cases we know that 
there's an sd. (it's usually a sign of spaghetti code if tests like this 
include a check for the existence of the object checked)

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 20:02             ` Ingo Molnar
@ 2006-07-05 21:09               ` Siddha, Suresh B
  2006-07-05 21:17                 ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Siddha, Suresh B @ 2006-07-05 21:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Siddha, Suresh B, Andrew Morton, mbligh, linux-kernel, apw

On Wed, Jul 05, 2006 at 10:02:45PM +0200, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> 
> > -		if (sd && sd->flags & flag)
> > +		if (sd && !(sd->flags & flag))
> 
> use test_sd_flag() here, as i did in my fix patch.
> 
> > -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> > +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
> 
> remove the 'sd' check in test_sd_flag. In the other cases we know that 
> there's an sd. (it's usually a sign of spaghetti code if tests like this 
> include a check for the existence of the object checked)

In other cases, we are passing sd->parent as the first argument to
test_sd_flag(). We know that there is a 'sd' but not sure about sd->parent or
sd->child.

thanks,
suresh

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 21:09               ` Siddha, Suresh B
@ 2006-07-05 21:17                 ` Ingo Molnar
  2006-07-05 21:21                   ` Siddha, Suresh B
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2006-07-05 21:17 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: Andrew Morton, mbligh, linux-kernel, apw


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> On Wed, Jul 05, 2006 at 10:02:45PM +0200, Ingo Molnar wrote:
> > 
> > * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> > 
> > > -		if (sd && sd->flags & flag)
> > > +		if (sd && !(sd->flags & flag))
> > 
> > use test_sd_flag() here, as i did in my fix patch.
> > 
> > > -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> > > +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
> > 
> > remove the 'sd' check in test_sd_flag. In the other cases we know that 
> > there's an sd. (it's usually a sign of spaghetti code if tests like this 
> > include a check for the existence of the object checked)
> 
> In other cases, we are passing sd->parent as the first argument to 
> test_sd_flag(). We know that there is a 'sd' but not sure about 
> sd->parent or sd->child.

ok. But the first issue above should be fixed.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 21:17                 ` Ingo Molnar
@ 2006-07-05 21:21                   ` Siddha, Suresh B
  0 siblings, 0 replies; 35+ messages in thread
From: Siddha, Suresh B @ 2006-07-05 21:21 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Siddha, Suresh B, Andrew Morton, mbligh, linux-kernel, apw

On Wed, Jul 05, 2006 at 11:17:02PM +0200, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> 
> > On Wed, Jul 05, 2006 at 10:02:45PM +0200, Ingo Molnar wrote:
> > > 
> > > * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> > > 
> > > > -		if (sd && sd->flags & flag)
> > > > +		if (sd && !(sd->flags & flag))
> > > 
> > > use test_sd_flag() here, as i did in my fix patch.
> > > 
> > > > -#define test_sd_flag(sd, flag)	((sd && sd->flags & flag) ? 1 : 0)
> > > > +#define test_sd_flag(sd, flag)	((sd && (sd->flags & flag)) ? 1 : 0)
> > > 
> > > remove the 'sd' check in test_sd_flag. In the other cases we know that 
> > > there's an sd. (it's usually a sign of spaghetti code if tests like this 
> > > include a check for the existence of the object checked)
> > 
> > In other cases, we are passing sd->parent as the first argument to 
> > test_sd_flag(). We know that there is a 'sd' but not sure about 
> > sd->parent or sd->child.
> 
> ok. But the first issue above should be fixed.

I can't simply change it to test_sd_flag(). In sched_balance_self(), paths for
sd == 0 and a 'flag' not set in sd->flags are different.

I can change that piece of code to (sd && !test_sd_flag(sd, flag)) though..
but that is not clean, right?

thanks,
suresh

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch] sched: fix macro -> inline function conversion bug
  2006-07-05 19:36           ` Siddha, Suresh B
  2006-07-05 20:02             ` Ingo Molnar
@ 2006-07-06  8:27             ` Andy Whitcroft
  1 sibling, 0 replies; 35+ messages in thread
From: Andy Whitcroft @ 2006-07-06  8:27 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: Ingo Molnar, Andrew Morton, mbligh, linux-kernel

Siddha, Suresh B wrote:
> Martin, Andy: Can you please try the appended patch on top of 2.6.17-mm5?

Submitted, will let you know.

-apw

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 2.6.17-mm5
  2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
  2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
  2006-07-01 22:54       ` 2.6.17-mm5 Jeff Garzik
@ 2006-07-27 21:02       ` Ming Zhang
  2 siblings, 0 replies; 35+ messages in thread
From: Ming Zhang @ 2006-07-27 21:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grant Wilson, linux-kernel, Neil Brown, linux-scsi

On Sat, 2006-07-01 at 14:30 -0700, Andrew Morton wrote:
> On Sat, 1 Jul 2006 15:24:19 +0100
<...>

> 
> > [  155.123022] Unable to handle kernel NULL pointer dereference at 0000000000000048 RIP: 
> > [  155.155867]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.200353] PGD 77954067 PUD 726e5067 PMD 0 
> > [  155.226233] Oops: 0000 [1] PREEMPT SMP 
> > [  155.249516] last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> > [  155.292808] CPU 0 
> > [  155.304968] Modules linked in: dm_mod evdev
> > [  155.330331] Pid: 0, comm: swapper Not tainted 2.6.17-mm5 #1
> > [  155.363697] RIP: 0010:[<ffffffff8047157a>]  [<ffffffff8047157a>] md_error+0x45/0x91
> > [  155.409638] RSP: 0018:ffffffff807a0c50  EFLAGS: 00010046
> > [  155.441445] RAX: 0000000000000000 RBX: ffff81007aa34708 RCX: 000000000000003f
> > [  155.484216] RDX: 00000000fffffffb RSI: ffff81007a821d28 RDI: ffff81007aa34708
> > [  155.526989] RBP: ffffffff807a0c60 R08: 0000000000000000 R09: ffff81007aac43b0
> > [  155.569759] R10: ffffffff804221e5 R11: 0000000000000058 R12: ffff81007aac4ab0
> > [  155.612533] R13: ffff81007aac43b0 R14: ffff81007aac4ab0 R15: 00000000fffffffb
> > [  155.655303] FS:  00002aeb361606d0(0000) GS:ffffffff80a46000(0000) knlGS:0000000000000000
> > [  155.703791] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > [  155.738195] CR2: 0000000000000048 CR3: 0000000070997000 CR4: 00000000000006e0
> > [  155.780969] Process swapper (pid: 0, threadinfo ffffffff80a64000, task ffffffff80696a00)
> > [  155.829404] Stack:  ffff81007a821d28 ffff81007aa34708 ffffffff807a0c80 ffffffff804728d9
> > [  155.877840]  ffff81007a821d28 ffff81007aa34708 ffffffff807a0cc0 ffffffff8047409c
> > [  155.922535]  00001000807a0d00 ffff81007aac4ab0 00000000fffffffb ffff81007aac4ab0
> > [  155.966085] Call Trace:
> > [  155.982416]  [<ffffffff804728d9>] super_written+0x30/0x65
> > [  156.015292]  [<ffffffff8047409c>] super_written_barrier+0xc4/0xd1
> > [  156.052297]  [<ffffffff8023a5a5>] bio_endio+0x56/0x5b
> > [  156.082688]  [<ffffffff8022d21b>] __end_that_request_first+0x1c9/0x4c9
> > [  156.122068]  [<ffffffff8024a0d6>] end_that_request_first+0xc/0xe
> > [  156.158343]  [<ffffffff8036a692>] blk_ordered_complete_seq+0x7c/0x8b
> > [  156.196705]  [<ffffffff8036a6d1>] post_flush_end_io+0x30/0x35
> > [  156.231419]  [<ffffffff8036a5b5>] end_that_request_last+0xd9/0xf6
> > [  156.268215]  [<ffffffff80422204>] scsi_end_request+0xad/0xd7
> > [  156.302573]  [<ffffffff80422637>] scsi_io_completion+0x3e1/0x3f0
> > [  156.339004]  [<ffffffff8042266c>] scsi_blk_pc_done+0x26/0x28
> > [  156.373357]  [<ffffffff8041d11e>] scsi_finish_command+0xa9/0xb2
> > [  156.409264]  [<ffffffff804229f9>] scsi_softirq_done+0xf4/0xfd
> > [  156.444143]  [<ffffffff80237f66>] blk_done_softirq+0x70/0x7f
> > [  156.478323]  [<ffffffff80211366>] __do_softirq+0x67/0xf4
> > [  156.510224]  [<ffffffff8025f95e>] call_softirq+0x1e/0x28
> > [  156.542083] 
> > [  156.542083] Code: 48 8b 40 48 48 85 c0 74 3f ff d0 f0 0f ba ab e0 01 00 00 03 
> 
> The barrier code is in there again.
> 
> mddev->pers is NULL in md_error(), so the test of


feel curious, how did you find out it is because "mddev->pers is NULL"?

thanks!


> !mddev->pers->error_handler oopsed.  Perhaps this is a real MD bug which is
> now being exposed by the new barrier-handling problem.
> 
> 
> This should get you further, but...
> 
> From: Andrew Morton <akpm@osdl.org>
> 
> Cc: Neil Brown <neilb@suse.de>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
> 
>  drivers/md/md.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff -puN drivers/md/md.c~md-oops-workaround drivers/md/md.c
> --- a/drivers/md/md.c~md-oops-workaround
> +++ a/drivers/md/md.c
> @@ -4586,6 +4586,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t
>  		__builtin_return_address(0),__builtin_return_address(1),
>  		__builtin_return_address(2),__builtin_return_address(3));
>  */
> +	if (!mddev->pers)
> +		return;
>  	if (!mddev->pers->error_handler)
>  		return;
>  	mddev->pers->error_handler(mddev,rdev);
> _
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2006-07-27 21:02 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-02 23:27 2.6.17-mm5 Martin J. Bligh
2006-07-02 23:41 ` 2.6.17-mm5 Andrew Morton
2006-07-03  5:25   ` [patch] sched: fix macro -> inline function conversion bug Ingo Molnar
2006-07-03  5:42     ` Andrew Morton
2006-07-03  6:03       ` Ingo Molnar
2006-07-03  6:08         ` Ingo Molnar
2006-07-05 19:36           ` Siddha, Suresh B
2006-07-05 20:02             ` Ingo Molnar
2006-07-05 21:09               ` Siddha, Suresh B
2006-07-05 21:17                 ` Ingo Molnar
2006-07-05 21:21                   ` Siddha, Suresh B
2006-07-06  8:27             ` Andy Whitcroft
2006-07-03  6:06       ` Peter Williams
2006-07-03  8:23   ` 2.6.17-mm5 Andy Whitcroft
2006-07-03 14:19     ` 2.6.17-mm5 Andy Whitcroft
     [not found] <20060701175444.958D6E00608B@knarzkiste.dyndns.org>
2006-07-01 10:35 ` 2.6.17-mm5 Andrew Morton
2006-07-01 11:08   ` 2.6.17-mm5 Reuben Farrelly
2006-07-01 11:51     ` 2.6.17-mm5 Andrew Morton
2006-07-01 12:31       ` 2.6.17-mm5 Reuben Farrelly
2006-07-01 13:06         ` 2.6.17-mm5 Brice Goglin
2006-07-01 17:00           ` 2.6.17-mm5 Greg KH
2006-07-01 18:03   ` 2.6.17-mm5 Ralf Hildebrandt
     [not found]   ` <20060701142419.GB28750@tlg.swandive.local>
2006-07-01 21:30     ` 2.6.17-mm5 Andrew Morton
2006-07-01 22:26       ` 2.6.17-mm5 James Bottomley
2006-07-01 22:32         ` 2.6.17-mm5 Neil Brown
2006-07-01 22:56           ` 2.6.17-mm5 Jeff Garzik
2006-07-02  0:10             ` 2.6.17-mm5 James Bottomley
2006-07-01 22:54       ` 2.6.17-mm5 Jeff Garzik
2006-07-27 21:02       ` 2.6.17-mm5 Ming Zhang
2006-07-02 10:03   ` 2.6.17-mm5 Andy Whitcroft
2006-07-02 10:14     ` 2.6.17-mm5 Andrew Morton
2006-07-02 10:40       ` 2.6.17-mm5 Andy Whitcroft
2006-07-02 11:14         ` 2.6.17-mm5 Andrew Morton
2006-07-03  0:47   ` 2.6.17-mm5 Theodore Tso
2006-07-03  7:32   ` 2.6.17-mm5 Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox