* -mm merge plans for 2.6.24
@ 2007-10-01 21:22 Andrew Morton
2007-10-01 21:34 ` wibbling over the cpuset shed domain connnection Paul Jackson
` (13 more replies)
0 siblings, 14 replies; 112+ messages in thread
From: Andrew Morton @ 2007-10-01 21:22 UTC (permalink / raw)
To: linux-kernel
When replying, please rewrite the Subject: appropriately and attempt to cc the
relevant developers and mailing lists, thanks.
consolidate-ptrace_detach.patch
slow-down-printk-during-boot.patch
slow-down-printk-during-boot-fix-2.patch
slow-down-printk-during-boot-fix-3.patch
slow-down-printk-during-boot-fix-4.patch
clockevents-fix-bogus-next_event-reset-for-oneshot-broadcast-devices.patch
Merge
exit-acpi-processor-module-gracefully-if-acpi-is-disabled.patch
acpi-enable-c3-power-state-on-dell-inspiron-8200.patch
acpi-add-reboot-mechanism.patch
hibernation-make-sure-that-acpi-is-enabled-in-acpi_hibernation_finish.patch
acpi-clean-up-acpi_enter_sleep_state_prep.patch
acpi-sbs-fix-sbs-add-alarm-patch.patch
acpi-suppress-uninitialized-var-warning.patch
acpi-fix-bdc-handling-in-drivers-acpi-sleep-procc.patch
Send to lenb
sound-snd_register_device_for_dev-fix.patch
Send to perex & tiwai
working-3d-dri-intel-agpko-resume-for-i815-chip.patch
fix-use-after-free--double-free-bug-in-amd_create_gatt_pages--amd_free_gatt_pages.patch
generic-ac97-mixer-modem-oss-use-list_for_each_entry.patch
Send to airlied
documentation-arm-00-index-add-missing-entries.patch
at91-remove-at91_lcdch.patch
make-power-supply-class-available-for-arm-architecture.patch
Send to rmk
fix-auditscc-kernel-doc.patch
Send to viro
fs-cifs-connectc-kmalloc-memset-conversion-to-kzalloc.patch
Send to sfrench
cpufreq-move-policys-governor-initialisation-out-of-low-level-drivers-into-cpufreq-core.patch
cpufreq-allow-ondemand-and-conservative-cpufreq-governors-to-be-used-as-default.patch
allow-ondemand-and-conservative-cpufreq-governors-to-be-used-as-default-kconfig-fix.patch
cpufreq-mark-hotplug-notifier-callback-as-__cpuinit.patch
cpufreq-implement-config_cpu_freq-stub-for.patch
cpufreq_stats-misc-cpuinit-section-annotations.patch
Send to davej
git-powerpc.patch
powerpc-vdso-install-unstripped-copies-on-disk.patch
powerpc-vdso-install-unstripped-copies-on-disk-update.patch
Send to paulus
sky-cpu-and-nexus-code-style-improvement.patch
sky-cpu-and-nexus-include-ioh.patch
sky-cpu-and-nexus-check-for-platform_get_resource-ret.patch
sky-cpu-and-nexus-check-for-create_proc_entry-ret-code.patch
sky-cpu-use-c99-style-for-struct-init.patch
sky-cpu-and-nexus-get-rid-of-useless-null-init.patch
sky-cpu-and-nexus-use-seq_file-single_open-on-proc-interface.patch
I don't think this driver is maintained or used by anyone any more and
Paul's reaction was along the lines of "wtf". Might drop these patches.
powerpc-proper-defconfig-for-crosscompiles.patch
powerpc-proper-defconfig-for-crosscompiles-fix.patch
powerpc-ptrace-check_full_regs.patch
Send to paulus
revert-gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch
fix-gregkh-driver-kobject-remove-the-static-array-for-the-name.patch
fix-3-gregkh-driver-kobject-remove-the-static-array-for-the-name.patch
fix-2--gregkh-driver-drivers-clean-up-direct-setting-of-the-name-of-a-kset.patch
fix-gregkh-driver-drivers-clean-up-direct-setting-of-the-name-of-a-kset.patch
make-kobject-dynamic-allocation-check-use-kallsyms_lookup.patch
kobject-temporarily-save-k_name-on-cleanup-for-debug-message.patch
Send to Greg
mga_dma-return-err-not-just-zero-from-mga_do_cleanup_dma.patch
drm-via-invalid-device-ids-removal.patch
Send to airlied
dvb_en_50221-convert-to-kthread-api.patch
fix-mux-setup-for-composite-sound-on-avertv-307.patch
v4l-stk11xx-add-a-new-webcam-driver.patch
v4l-stk11xx-use-array_size-in-another-2-cases.patch
v4l-stk11xx-use-retval-from-stk11xx_check_device.patch
v4l-stk11xx-add-static-to-tables.patch
bw-qcam-use-data_reverse-instead-of-manually-poking-the-control-register-fix.patch
git-dvb-build-fix.patch
Send to mchehab
fix-amd-mips-alchemy-au1550-i2c-interface.patch
bfin_twi-remove-useless-twi_lock-mutex.patch
Send to khali
drivers-hid-hid-debugc-add-kern_debug-prefix-fix-typo-constify-fix.patch
Send to jkosina
ia64-tree-wide-misc-__cpuinitdata-init-exit.patch
ia64-tree-wide-misc-__cpuinitdata-init-exit-fix.patch
ia64-perfmon-remove-exit_pfm_fs.patch
Send to Tony
infiniband-work-around-gcc-slub-problem.patch
reresend to Roland.
hdaps-switch-to-using-input-polldev.patch
adbhid-produce-all-capslock-key-events.patch
keyboard-capsshift-lock.patch
console-keyboard-events-and-accessibility.patch
console-keyboard-events-and-accessibility-fix.patch
console-keyboard-events-and-accessibility-fix-2.patch
first-stab-at-elantech-touchpad-driver-for-26226-testers.patch
first-stab-at-elantech-touchpad-driver-for-26226-testers-fix.patch
make-wistron-btns-recognize-special-keys-on-medion-wim2160-notebooks.patch
Send to dtor
applesmc-for-mac-pro-2-x-quad-core.patch
I have a note that this needs updating
git-jg-misc-fix.patch
git-jg-warning-fixes.patch
Send to jgarzik
include-linux-kbuild-remove-duplicate-entries.patch
tristate-choices-with-mixed-tristate-and-boolean.patch
menuconfig-distinguish-between-selected-by-another-options-and-comments.patch
Send to Sam
pata_acpi-restore-driver.patch
pata_acpi-rework-the-acpi-drivers-based-upon-experience.patch
pata_acpi-use-ata_sff_port_start.patch
libata-implement-ata_wait_after_reset.patch
libata-correct-handling-of-srst-reset-sequences.patch
libata-add-a-drivers-ide-style-dma-disable.patch
ata-pata_marvell-use-ioread-for-iomap-ped-memory.patch
drivers-ata-pata_ixp4xx_cfc-ioremap-return-code-check.patch
Send to jgarzik
libata-add-human-readable-error-value-decoding-v3.patch
I think I own this now. Will send to jgarzik then drop it if it doesn't
stick.
scsi-expose-an-support-to-user-space.patch
libata-expose-an-to-user-space.patch
libata-add-a-horkage-entry-for-drq-mishandling-atapi.patch
ahci-add-mcp79-support-to-ahci-driver.patch
ahci-add-mcp79-support-to-ahci-driver-update.patch
libata_scsi-fix-transfer-lengths.patch
Send to jgarzik
libata-fix-hopefully-all-the-remaining-problems-with.patch
Another stuck-in-mm-for-no-obvious-reasons ata patch.
ide-arm-hack.patch
fix-ide-ide-hook-acpi-psx-method-to-ide-power-on-off.patch
fix-ide-ide-remove-ide-dma-check.patch
Send to Bart
mips-add-gpio-support-to-the-bcm947xx-platform.patch
mips-replace-config_usb_ohci-with-config_usb_ohci_hcd-in-a-few-overlooked-files.patch
Send to Ralf
mmc-fix-gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct.patch
Send to drzeus
gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct-vs-git-mmc.patch
Send to whichever of Greg and drzeus ends up needing it.
git-mtd-vs-powerpc.patch
Send to whichever of dwmw2 and paulus ends up needing it.
mtd-fix-ctrl-alt-del-cant-reboot-for-intel-flash-bug.patch
remove-fs-jffs2-ioctlc.patch
blackfin-on-chip-nand-flash-controller-driver.patch
Send to dwmw2
git-net-fixup.patch
git-net-fix-ax25-build.patch
git-net-more-bustage.patch
dgrs-remove-from-build-config-and-maintainer-list.patch
ipgc-doesnt-compile-with-with-config_highmem64g.patch
ipgc-doesnt-compile-with-with-config_highmem64g-fix.patch
git-net-sctp-hack.patch
Sort this stuff out with davem
phy-fixed-driver-rework-release-path-and-update.patch
pci-x-pci-express-read-control-interfaces-e1000.patch
drivers-net-cxgb3-xgmacc-remove-dead-code.patch
avoid-possible-null-pointer-deref-in-3c359-driver.patch
skge-remove-broken-and-unused-phy_m_pc_mdi_xmode-macro.patch
fix-a-potential-null-pointer-dereference-in-uli526x_interrupt.patch
phylib-spinlock-fixes-for-softirqs.patch
forcedeth-power-down-phy-when-interface-is-down.patch
forcedeth-no-link-is-informational.patch
phylib-irq-event-workqueue-handling-fixes.patch
phylib-fix-an-interrupt-loop-potential-when-halting.patch
clean-up-redundant-phy-write-line-for-uli526x-ethernet.patch
ax88796-add-93cx6-eeprom-support.patch
Send to jgarzik. (Actually they're already sent)
wol-bugfix-for-3c59xc.patch
I have a note that this "needs confirmation". Sort this out with klassert.
3x59x-fix-pci-resource-management.patch
Ditto
update-smc91x-driver-with-arm-versatile-board-info.patch
This has been stuck in -mm for over a year. I've forgotten why.
git-net-vs-git-nfs.patch
Send to whichever of davem and Trond ends up needing it
git-nfs-vs-git-unionfs.patch
Send to whichever of Erez Zadok and Trond ends up needing it
git-nfs-make-nfs_wb_page_priority-static.patch
Send to Trond
clean-up-duplicate-includes-in-fs-ntfs.patch
Send to Anton
pa-risc-use-page-allocator-instead-of-slab-allocator.patch
parisc-extern-inline-static-inline.patch
Send to parisc dudes
# pcmcia-delete-obsolete-pcmcia_ioctl-feature.patch: Dominik-only
pcmcia-delete-obsolete-pcmcia_ioctl-feature.patch
This is stuck waiting for Dominik to resurface.
use-menuconfig-objects-pcmcia.patch
pxa2xx-pcmcia-timing-issue-on-ipaq-h5550.patch
move-a-few-definitions-to-au1000_xxs1500c.patch
move-a-few-definitions-to-au1000_xxs1500c-fix.patch
pcmcia-cistpl-use-get_unaligned-in-cis-parsing.patch
add-support-for-pcmcia-card-sierra-wireless-ac850.patch
introduce-dma_mask_none-as-a-signal-for-unable-to-do.patch
pcmcia-use-dma_mask_none-for-the-default-for-all.patch
Will re-review and merge
pcmcia-pccard-deadlock-fix.patch
I have a note that akpm didn't like this. Will retain pending Dominik's
return.
serial_txx9-cleanup-includes.patch
serial-keep-the-dtr-setting-for-serial-console.patch
8250_pci-autodetect-mainpine-cards.patch
8250_pci-autodetect-mainpine-cards-fix.patch
provide-stubs-for-enable_irq_wake-and-disable_irq_wake.patch
wake-up-from-a-serial-port.patch
serial_txx9-use-upf_fixed_port.patch
Serial stuff: will re-review and merge
revert-gregkh-pci-pci_bridge-device.patch
pci-remove-irritating-try-pci=assign-busses-warning.patch
fix-ide-legacy-mode-resources.patch
fix-ide-legacy-mode-resources-fix.patch
Send to Greg
rt-ptracer-can-monopolize-cpu-was-cpu-hotplug-and-real-time.patch
some-proc-entries-are-missed-in-sched_domain-sys_ctl-debug.patch
Send to mingo
sh-cleanup-struct-irqaction-initializers.patch
sh64-cleanup-struct-irqaction-initializers.patch
Send to lethal
git-scsi-misc-arcmsr-build-fix.patch
pci-error-recovery-symbios-scsi-base-support.patch
pci-error-recovery-symbios-scsi-first-failure.patch
drivers-scsi-pcmcia-nsp_csc-remove-kernel-24-code.patch
remove-dead-references-to-module_parm-macro.patch
fix-drivers-scsi-fdomainc-config_pci=n-warnings.patch
nsp32_restart_autoscsi-remove-error-check.patch
fix-a-potential-null-pointer-deref-in-the-aic7xxx-ahc_print_register-function.patch
add-includes-to-scsi_transport_iscsih.patch
scsi-send-media-state-change-modification-events.patch
fix-section-mismatch-in-the-adaptec-dpt-scsi-raid-driver.patch
advansys-printk-fix.patch
drivers-scsi-immc-fix-check-after-use.patch
scsi-early-detection-of-medium-not-present-updated.patch
mpt-fusion-shut-up-uninitialized-variable.patch
mptbase-reset-ioc-initiator-during-pci-resume.patch
scsi-use-notifier-chain-for-asynchronous-event.patch
Send to jejb
x86-64-pci-gart-iommu-sg-chaining-zeroes-wrong-sg.patch
Send to jens
sparc-fix-build-due-to-termios-changes.patch
Send to davem
partially-fix-up-the-lookup_one_noperm-mess.patch
Will merge
fix-gregkh-usb-usb-sisusb2vga-convert-printk-to-dev_-macros.patch
usb-gadget-ether-prevent-oops-caused-by-error-interrupt-race.patch
drivers-usb-misc-sisusbvga-sisusbc-kill-two-unused-variables.patch
Send to Greg
git-net-vs-git-wireless.patch
git-wireless-vs-gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct.patch
net-add-ath5k-wireless-driver-fix.patch
Wireless damage control. Will see what happens.
revert-x86_64-mm-cpa-einval.patch
fix-x86_64-mm-sched-clock-share.patch
agp-fix-race-condition-between-unmapping-and-freeing-pages.patch
x86_64-mce-poll-at-idle_start-and-printk-fix.patch
fix-x86_64-mm-unwinder.patch
geode-mfgpt-support-for-geode-class-machines.patch
geode-mfgpt-clock-event-device-support.patch
x86_64-add-acpi-reboot-option.patch
i386-convert-mm_context_t-semaphore-to-a-mutex.patch
dma-use-dev_to_node-to-get-node-for-device-in-dma_alloc_pages.patch
x86-make-io-apic-not-connected-pin-print-complete.patch
intel_cacheinfo-misc-section-annotation-fixes.patch
intel_cacheinfo-call-cache_add_dev-from-cache_sysfs_init.patch
x86-use-num_online_nodes-to-get-physical-cpus-numbers-for.patch
i386-stop-bogus-nmi-softlockup-warnings-in-show_mem.patch
voyager-include-asm-smph-to-fix-compile-error.patch
x86-64-disable-local-apic-timer-use-on-amd-systems-with-c1e.patch
clockevents-remove-unused-inline-function.patch
clockevents-allow-build-without-runtime-use.patch
x86_64-consolidate-tsc-calibration.patch
i386-prepare-sharing-hpet-code.patch
i386-hpet-add-x8664-hpet-bits.patch
i386-prepare-sharing-pit-code.patch
x86_64-use-i386-i8253-h.patch
x86_64-preparatory-apic-set-lvtt.patch
x86_64-apic-remove-bogus-pit-synchronization.patch
x86_64-apic-shuffle-calibration-around.patch
x86_64-apic-calibration-remove-divisor.patch
x86_64-apic-change-setup-calling-convention.patch
x86_64-apic-remove-nested-irq-disable.patch
x86_64-prep-idle-loop-for-dynticks.patch
x86_64-apic-add-clockevents-functions.patch
x86_64-convert-to-clockevents.patch
x86_64-remove-unused-code.patch
x86_64-cleanup-apic-c.patch
x86_64-cleanup-apic-c-fix.patch
x86_64-cleanup-apic-c-fix-2.patch
jiffies-remove-unused-macros.patch
acpi-remove-the-useless-ifdef-code.patch
i386-pit-remove-the-useless-ifdefs.patch
i386-hpet-sharing-optimize.patch
ich-force-hpet-make-generic-time-capable-of-switching-broadcast-timer.patch
ich-force-hpet-restructure-hpet-generic-clock-code.patch
ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable.patch
ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable-fix.patch
ich-force-hpet-late-initialization-of-hpet-after-quirk.patch
ich-force-hpet-ich5-quirk-to-force-detect-enable.patch
ich-force-hpet-ich5-quirk-to-force-detect-enable-fix.patch
ich-force-hpet-ich5-fix-a-bug-with-suspend-resume.patch
ich-force-hpet-add-ich7_0-pciid-to-quirk-list.patch
x86-fix-cpu_to_node-references.patch
x86-convert-cpu_core_map-to-be-a-per-cpu-variable.patch
convert-cpu_sibling_map-to-be-a-per-cpu-variable.patch
convert-cpu_sibling_map-to-a-per_cpu-data-array-ia64.patch
# convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64.patch: busted
convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64.patch
convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix.patch
convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix-2.patch
convert-cpu_sibling_map-to-a-per_cpu-data-array-sparc64.patch
x86-convert-x86_cpu_to_apicid-to-be-a-per-cpu-variable.patch
x86-convert-cpu_llc_id-to-be-a-per-cpu-variable.patch
x86-acpi-use-cpu_physical_id.patch
i386-visws-extern-inline-static-inline.patch
i386-cleanup-struct-irqaction-initializers.patch
x86_64-cleanup-struct-irqaction-initializers.patch
asm-i386-ioh-fix-constness.patch
optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft.patch
optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft-fix.patch
hpet-force-enable-on-vt8235-37-chipsets.patch
x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron.patch
x86_64-check-and-enable-mmconfig-for-amd-family-10h-opteron.patch
x86_64-check-and-enable-mmconfig-for-amd-family-10h-opteron-fix.patch
x86_64-set-cfg_size-for-amd-family-10h-in-case-mmconfig-is.patch
x86_64-set-cfg_size-for-amd-family-10h-in-case-mmconfig-is-fix.patch
voyager-dont-try-to-support-unprocessor-builds.patch
x86_64-nx-bit-handling-in-change_page_attr.patch
x86-64-calgary-fix-calgary=disable=busnum-for-calioc2.patch
x86-64-calgary-get-rid-of-translate_phb.patch
x86_64-vdso-linker-script-cleanup.patch
x86_64-vdso-put-vars-in-rodata.patch
x86-convert-cpuinfo_x86-array-to-a-per_cpu-array.patch
x86_64-nmi_watchdog-fix-to-be-more-like-i386.patch
x86_64-nmi_watchdog-fix-to-be-more-like-i386-fix.patch
pci-use-pci=bfsort-for-hp-dl385-g2-dl585-g2.patch
Send to Andi
kgdb-fix-help-text.patch
kgdb-fix-docbook-and-kernel-doc-typos.patch
disable-kgdb-on-ppc.patch
Send to Jason
vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch
Hold
sparsemem-clean-up-spelling-error-in-comments.patch
sparsemem-record-when-a-section-has-a-valid-mem_map.patch
sparsemem-record-when-a-section-has-a-valid-mem_map-fix.patch
generic-virtual-memmap-support-for-sparsemem.patch
generic-virtual-memmap-support-for-sparsemem-fix.patch
generic-virtual-memmap-support-for-sparsemem-remove-excess-debugging.patch
generic-virtual-memmap-support-for-sparsemem-simplify-initialisation-code-and-reduce-duplication.patch
generic-virtual-memmap-support-for-sparsemem-pull-out-the-vmemmap-code-into-its-own-file.patch
generic-virtual-memmap-support-vmemmap-generify-initialisation-via-helpers.patch
x86_64-sparsemem_vmemmap-2m-page-size-support.patch
x86_64-sparsemem_vmemmap-2m-page-size-support-ensure-end-of-section-memmap-is-initialised.patch
x86_64-sparsemem_vmemmap-vmemmap-x86_64-convert-to-new-helper-based-initialisation.patch
ia64-sparsemem_vmemmap-16k-page-size-support.patch
ia64-sparsemem_vmemmap-16k-page-size-support-convert-to-new-helper-based-initialisation.patch
sparc64-sparsemem_vmemmap-support.patch
sparc64-sparsemem_vmemmap-support-vmemmap-convert-to-new-config-options.patch
ppc64-sparsemem_vmemmap-support.patch
ppc64-sparsemem_vmemmap-support-vmemmap-ppc64-convert-vmm_-macros-to-a-real-function.patch
ppc64-sparsemem_vmemmap-support-vmemmap-ppc64-convert-vmm_-macros-to-a-real-function-fix.patch
ppc64-sparsemem_vmemmap-support-convert-to-new-config-options.patch
virtual memmap: merge
slubcearly_kmem_cache_node_alloc-shouldnt-be.patch
during-vm-oom-condition-kill-all-threads-in-process-group.patch
clean-up-duplicate-includes-in-include-linux-memory_hotplugh.patch
clean-up-duplicate-includes-in-mm.patch
Merge
readahead-compacting-file_ra_state.patch
readahead-mmap-read-around-simplification.patch
readahead-combine-file_ra_stateprev_index-prev_offset-into-prev_pos.patch
readahead-combine-file_ra_stateprev_index-prev_offset-into-prev_pos-fix.patch
readahead-combine-file_ra_stateprev_index-prev_offset-into-prev_pos-fix-2.patch
radixtree-introduce-radix_tree_next_hole.patch
readahead-basic-support-of-interleaved-reads.patch
readahead-remove-the-local-copy-of-ra-in-do_generic_mapping_read.patch
readahead-remove-several-readahead-macros.patch
readahead-remove-the-limit-max_sectors_kb-imposed-on-max_readahead_kb.patch
filemap-trivial-code-cleanups.patch
filemap-convert-some-unsigned-long-to-pgoff_t.patch
Merge
vm-dont-run-touch_buffer-during-buffercache-lookups.patch
This is like
vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch.
An interesting VM patch, probably a bugfix, but nobody knows if it makes
things better or worse. Will remain stranded in -mm.
slub-direct-pass-through-of-page-size-or-higher-kmalloc.patch
Merge
hugetlb-allow-extending-ftruncate-on-hugetlbfs.patch
I've a note here that David Gibson had issues with this. Repolled him.
remove-zero_page.patch
Linus dislikes it. Probably drop it.
mm-use-lockless-radix-tree-probe.patch
mm-improve-find_lock_page.patch
mm-clarify-__add_to_swap_cache-locking.patch
mm-clarify-__add_to_swap_cache-locking-fix.patch
radix-tree-use-indirect-bit.patch
Merge
move-mm_struct-and-vm_area_struct.patch
move-mm_struct-and-vm_area_struct-fix.patch
slub-slob-use-unlikely-for-kfreezero_or_null_ptr-check.patch
calculation-of-pgoff-in-do_linear_fault-uses-mixed.patch
slab-allocators-fail-if-ksize-is-called-with-a-null-parameter.patch
mm-add-end_buffer_read-helper-function.patch
fs-fix-nobh-error-handling.patch
fix-the-max-path-calculation-in-radix-treec.patch
fix-the-max-path-calculation-in-radix-treec-update.patch
mm-no-need-to-cast-vmalloc-return-value-in-zone_wait_table_init.patch
# use-vm_read-write-exec-to-set-vm_page_prot.patch: Hugh wanted changes
use-vm_read-write-exec-to-set-vm_page_prot.patch
prevent-kswapd-from-freeing-excessive-amounts-of-lowmem.patch
mem-policy-add-mpol_f_mems_allowed-get_mempolicy-flag.patch
Merge
mm-use-pagevec-to-rotate-reclaimable-page.patch
mm-use-pagevec-to-rotate-reclaimable-page-fix.patch
mm-use-pagevec-to-rotate-reclaimable-page-fix-2.patch
mm-use-pagevec-to-rotate-reclaimable-page-fix-function-declaration.patch
mm-use-pagevec-to-rotate-reclaimable-page-fix-bug-at-include-linux-mmh220.patch
mm-use-pagevec-to-rotate-reclaimable-page-kill-redundancy-in-rotate_reclaimable_page.patch
mm-use-pagevec-to-rotate-reclaimable-page-move_tail_pages-into-lru_add_drain.patch
I guess I'll merge this. Would be nice to have wider perfromance testing
but I guess it'll be easy enough to undo.
mm-revert-kernel_ds-buffered-write-optimisation.patch
revert-81b0c8713385ce1b1b9058e916edcf9561ad76d6.patch
revert-6527c2bdf1f833cc18e8f42bd97973d583e4aa83.patch
mm-clean-up-buffered-write-code.patch
mm-debug-write-deadlocks.patch
mm-trim-more-holes.patch
mm-buffered-write-cleanup.patch
mm-write-iovec-cleanup.patch
mm-fix-pagecache-write-deadlocks.patch
mm-buffered-write-iterator.patch
fs-fix-data-loss-on-error.patch
fs-introduce-write_begin-write_end-and-perform_write-aops.patch
introduce-write_begin-write_end-aops-important-fix.patch
introduce-write_begin-write_end-aops-fix2.patch
deny-partial-write-for-loop-dev-fd.patch
mm-restore-kernel_ds-optimisations.patch
implement-simple-fs-aops.patch
implement-simple-fs-aops-fix.patch
block_dev-convert-to-new-aops.patch
ext2-convert-to-new-aops.patch
ext2-convert-to-new-aops-fix.patch
ext2-convert-to-new-aops-fix2.patch
ext3-convert-to-new-aops.patch
ext3-convert-to-new-aops-fix.patch
ext3-convert-to-new-aops-fix-fix.patch
ext4-convert-to-new-aops.patch
ext4-convert-to-new-aops-fix.patch
ext4-convert-to-new-aops-fix-fix.patch
xfs-convert-to-new-aops.patch
gfs2-convert-to-new-aops.patch
gfs2-convert-to-new-aops-fix.patch
fs-new-cont-helpers.patch
fat-convert-to-new-aops.patch
#adfs-convert-to-new-aops.patch
hfs-convert-to-new-aops.patch
hfsplus-convert-to-new-aops.patch
hpfs-convert-to-new-aops.patch
bfs-convert-to-new-aops.patch
qnx4-convert-to-new-aops.patch
reiserfs-use-generic-write.patch
reiserfs-convert-to-new-aops.patch
reiserfs-convert-to-new-aops-fix.patch
reiserfs-convert-to-new-aops-fix2.patch
reiserfs-use-generic_cont_expand_simple.patch
with-reiserfs-no-longer-using-the-weird-generic_cont_expand-remove-it-completely.patch
nfs-convert-to-new-aops.patch
git-nfs-vs-nfs-convert-to-new-aops.patch
git-nfs-vs-nfs-convert-to-new-aops-fix.patch
smb-convert-to-new-aops.patch
fuse-convert-to-new-aops.patch
hostfs-convert-to-new-aops.patch
hostfs-convert-to-new-aops-fix.patch
hostfs-convert-to-new-aops-fix-fix.patch
jffs2-convert-to-new-aops.patch
ufs-convert-to-new-aops.patch
ufs-convert-to-new-aops-fix.patch
ufs-convert-to-new-aops-fix2.patch
udf-convert-to-new-aops.patch
udf-convert-to-new-aops-fix.patch
sysv-convert-to-new-aops.patch
sysv-convert-to-new-aops-fix.patch
sysv-convert-to-new-aops-fix2.patch
minix-convert-to-new-aops.patch
minix-convert-to-new-aops-fix.patch
minix-convert-to-new-aops-fix2.patch
jfs-convert-to-new-aops.patch
fs-adfs-convert-to-new-aops.patch
fs-affs-convert-to-new-aops.patch
affs-convert-to-new-aops-fix.patch
affs-convert-to-new-aops-fix-fix.patch
ocfs2-convert-to-new-aops.patch
fs-remove-some-aop_truncated_page.patch
Merge
memoryless-nodes-generic-management-of-nodemasks-for-various-purposes.patch
memoryless-nodes-generic-management-of-nodemasks-for-various-purposes-fix.patch
memoryless-nodes-introduce-mask-of-nodes-with-memory.patch
memoryless-nodes-introduce-mask-of-nodes-with-memory-fix.patch
# update-n_high_memory-node-state-for-memory-hotadd.patch: fold
update-n_high_memory-node-state-for-memory-hotadd.patch
update-n_high_memory-node-state-for-memory-hotadd-fix.patch
memoryless-nodes-fix-interleave-behavior-for-memoryless-nodes.patch
memoryless-nodes-oom-use-n_high_memory-map-instead-of-constructing-one-on-the-fly.patch
memoryless-nodes-no-need-for-kswapd.patch
memoryless-nodes-slab-support.patch
memoryless-nodes-slub-support.patch
memoryless-nodes-uncached-allocator-updates.patch
memoryless-nodes-allow-profiling-data-to-fall-back-to-other-nodes.patch
memoryless-nodes-update-memory-policy-and-page-migration.patch
memoryless-nodes-add-n_cpu-node-state.patch
memoryless-nodes-add-n_cpu-node-state-move-setup-of-n_cpu-node-state-mask.patch
memoryless-nodes-drop-one-memoryless-node-boot-warning.patch
memoryless-nodes-fix-gfp_thisnode-behavior.patch
memoryless-nodes-use-n_high_memory-for-cpusets.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2-3.patch
fix-panic-of-cpu-online-with-memory-less-node.patch
Merge
categorize-gfp-flags.patch
categorize-gfp-flags-fix.patch
make-swappiness-safer-to-use.patch
Merge
flush-cache-before-installing-new-page-at-migraton.patch
flush-icache-before-set_pte-on-ia64-flush-icache-at-set_pte.patch
flush-icache-before-set_pte-on-ia64-flush-icache-at-set_pte-fix.patch
flush-icache-before-set_pte-on-ia64-flush-icache-at-set_pte-fix-update.patch
Merge
add-a-bitmap-that-is-used-to-track-flags-affecting-a-block-of-pages.patch
split-the-free-lists-for-movable-and-unmovable-allocations.patch
choose-pages-from-the-per-cpu-list-based-on-migration-type.patch
add-a-configure-option-to-group-pages-by-mobility.patch
drain-per-cpu-lists-when-high-order-allocations-fail.patch
move-free-pages-between-lists-on-steal.patch
group-short-lived-and-reclaimable-kernel-allocations.patch
group-high-order-atomic-allocations.patch
do-not-group-pages-by-mobility-type-on-low-memory-systems.patch
bias-the-placement-of-kernel-pages-at-lower-pfns.patch
be-more-agressive-about-stealing-when-migrate_reclaimable-allocations-fallback.patch
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2.patch
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2-fix.patch
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2-fix-fix.patch
bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch
remove-page_group_by_mobility.patch
dont-group-high-order-atomic-allocations.patch
fix-calculation-in-move_freepages_block-for-counting-pages.patch
do-not-depend-on-max_order-when-grouping-pages-by-mobility.patch
print-out-statistics-in-relation-to-fragmentation-avoidance-to-proc-pagetypeinfo.patch
grouping pages by mobility patches: merge
mm-page_allocc-make-code-static.patch
Merge
maps2-uninline-some-functions-in-the-page-walker.patch
maps2-eliminate-the-pmd_walker-struct-in-the-page-walker.patch
maps2-remove-vma-from-args-in-the-page-walker.patch
maps2-propagate-errors-from-callback-in-page-walker.patch
maps2-add-callbacks-for-each-level-to-page-walker.patch
maps2-move-the-page-walker-code-to-lib.patch
maps2-simplify-interdependence-of-proc-pid-maps-and-smaps.patch
maps2-move-clear_refs-code-to-task_mmuc.patch
maps2-regroup-task_mmu-by-interface.patch
maps2-make-proc-pid-smaps-optional-under-config_embedded.patch
maps2-make-proc-pid-clear_refs-option-under-config_embedded.patch
maps2-add-proc-pid-pagemap-interface.patch
maps2-add-proc-pid-pagemap-interface-fix-proc-pid-pagemap-return-length-calculation.patch
maps2-add-proc-pid-pagemap-interface-fix-proc-pid-pagemap-end-address-calculation.patch
maps2-add-proc-pid-pagemap-interface-fix-proc-pid-pagemap-header-copy-to-userspace.patch
maps2-add-proc-kpagemap-interface.patch
mmaps2-vma-out-of-mem_size_stats.patch
maps2-make-proc-pid-smaps-optional-under-config_embeddedpatch.patch
maps2-make-proc-pid-smaps-optional-under-config_embeddedpatch-fix.patch
argh. STILL waiting for the updates to this. It's was 96% ready for
2.6.22, 97% ready for 2.6.23 and we really don't want to merge 98% ready
stuff into 2.6.24.
maps-pssproportional-set-size-accounting-in-smaps.patch
Merge
slub-avoid-page-struct-cacheline-bouncing-due-to-remote-frees-to-cpu-slab.patch
slub-do-not-use-page-mapping.patch
slub-do-not-use-page-mapping-fix.patch
slub-move-page-offset-to-kmem_cache_cpu-offset.patch
slub-avoid-touching-page-struct-when-freeing-to-per-cpu-slab.patch
slub-avoid-touching-page-struct-when-freeing-to-per-cpu-slab-fix.patch
slub-place-kmem_cache_cpu-structures-in-a-numa-aware-way.patch
slub-optimize-cacheline-use-for-zeroing.patch
Merge
#
# slub && antifrag
#
have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
slub-exploit-page-mobility-to-increase-allocation-order.patch
slub-reduce-antifrag-max-order.patch
I think this stuff is in the "mm stuff we don't want to merge" category.
If so, I really should have dropped it ages ago.
slub-slab-validation-move-tracking-information-alloc-outside-of-melstuff.patch
Not sure.
breakout-page_order-to-internalh-to-avoid-special-knowledge-of-the-buddy-allocator.patch
Merge, if it applies.
memory-unplug-v7-memory-hotplug-cleanup.patch
memory-unplug-v7-page-isolation.patch
memory-unplug-v7-page-offline.patch
memory-unplug-v7-page-offline-fix.patch
memory-unplug-v7-ia64-interface.patch
fix-memory-hot-remove-not-configured-case.patch
fix-memory-hot-remove-not-configured-case-fix.patch
memory-hotplug-hot-add-with-sparsemem-vmemmap.patch
memory-hotplug-hot-add-with-sparsemem-vmemmap-update.patch
Merge
hugetlbfs-read-support.patch
hugetlbfs-read-support-fix.patch
hugetlbfs-read-support-fix-2.patch
Dunno. Probably merge.
mm-shmemc-make-3-functions-static.patch
mm-mempolicyc-cleanups.patch
mm-mempolicyc-cleanups-fix.patch
mm-vmstatc-cleanups.patch
Merge
add-node-states-sysfs-class-attributes-v5.patch
Merge
nfs-remove-congestion_end.patch
lib-percpu_counter_add.patch
lib-percpu_counter_sub.patch
lib-percpu_counter-variable-batch.patch
lib-make-percpu_counter_add-take-s64.patch
lib-percpu_counter_set.patch
lib-percpu_counter_sum_positive.patch
lib-percpu_count_sum.patch
lib-percpu_counter_init-error-handling.patch
lib-percpu_counter_init_irq.patch
mm-bdi-init-hooks.patch
mm-scalable-bdi-statistics-counters.patch
mm-count-reclaimable-pages-per-bdi.patch
mm-count-writeback-pages-per-bdi.patch
mm-expose-bdi-statistics-in-sysfs.patch
lib-floating-proportions.patch
mm-per-device-dirty-threshold.patch
mm-per-device-dirty-threshold-warning-fix.patch
mm-per-device-dirty-threshold-fix.patch
mm-dirty-balancing-for-tasks.patch
mm-dirty-balancing-for-tasks-warning-fix.patch
debug-sysfs-files-for-the-current-ratio-size-total.patch
Merge
slub-simplify-irq-off-handling.patch
slab-api-remove-useless-ctor-parameter-and-reorder-parameters.patch
slab-api-remove-useless-ctor-parameter-and-reorder-parameters-fix.patch
slab-api-remove-useless-ctor-parameter-and-reorder-parameters-fix-2.patch
slab-api-remove-useless-ctor-parameter-and-reorder-parameters-vs-unionfs.patch
Merge
oom-move-prototypes-to-appropriate-header-file.patch
oom-move-prototypes-to-appropriate-header-file-fix.patch
oom-move-constraints-to-enum.patch
oom-change-all_unreclaimable-zone-member-to-flags.patch
oom-change-all_unreclaimable-zone-member-to-flags-fix.patch
oom-add-per-zone-locking.patch
oom-serialize-out-of-memory-calls.patch
oom-add-oom_kill_allocating_task-sysctl.patch
oom-suppress-extraneous-stack-and-memory-dump.patch
oom-compare-cpuset-mems_allowed-instead-of-exclusive.patch
oom-do-not-take-callback_mutex.patch
oom-do-not-take-callback_mutex-fix.patch
oom-prevent-including-schedh-in-header-file.patch
oom-add-header-file-to-kbuild-as-unifdef.patch
oom-convert-zone_scan_lock-from-mutex-to-spinlock.patch
mm-test-and-set-zone-reclaim-lock-before-starting.patch
mm-test-and-set-zone-reclaim-lock-before-starting-cleanup.patch
mm-document-tree_lock-zonelock-lockorder.patch
Merge
security-convert-lsm-into-a-static-interface.patch
security-convert-lsm-into-a-static-interface-fix.patch
security-convert-lsm-into-a-static-interface-fix-2.patch
security-convert-lsm-into-a-static-interface-fix-2-fix.patch
security-convert-lsm-into-a-static-interface-fix-unionfs.patch
security-convert-lsm-into-a-static-interface-vs-fix-null-pointer-dereference-in-__vm_enough_memory.patch
Merge
ifdef-struct-task_structsecurity.patch
Merge
implement-file-posix-capabilities.patch
implement-file-posix-capabilities-fix.patch
file-capabilities-introduce-cap_setfcap.patch
file-capabilities-get_file_caps-cleanups.patch
file-caps-update-selinux-xattr-hooks.patch
file-capabilities-clear-caps-cleanup.patch
file-capabilities-clear-caps-cleanup-fix.patch
file-capabilities-change-xattr-format-v2.patch
file-capabilities-change-fe-to-a-bool.patch
#
file-caps-clean-up-for-linux-capabilityh.patch
capabilityh-remove-include-of-currenth.patch
file-capabilities-clear-fcaps-on-inode-change.patch
file-capabilities-clear-fcaps-on-inode-change-fix.patch
capabilities-reset-current-pdeath_signal-when-increasing-capabilities.patch
Have been nursing this along for nearly a year. I think it's ready now.
Will repoll people.
security-cleanups.patch
Merge
remove-frv-usage-of-flush_tlb_pgtables.patch
include-asm-frv-thread_infoh-kmalloc-memset-conversion-to-kzalloc.patch
frv-cleanup-struct-irqaction-initializers.patch
blackfin-enable-arbitary-speed-serial-setting.patch
m68knommu-remove-unused-config-symbol-config_disktel.patch
cleanup-arch-alpha-makefile.patch
alpha-convert-to-generic-sys_ptrace.patch
alpha-beautify-vmlinuxlds.patch
include-asm-m32r-thread_infoh-kmalloc-memset-conversion-to-kzalloc.patch
m32r-cleanup-struct-irqaction-initializers.patch
m32r-serial-remove-m32r_sio_share_irqs.patch
m32r-convert-to-generic-sys_ptrace.patch
cris-cleanup-struct-irqaction-initializers.patch
tty-bring-the-old-cris-driver-back-somewhere-into-the.patch
v850-cleanup-struct-irqaction-initializers.patch
Misc arch patches. Merge.
make-kernel-power-maincsuspend_enter-static.patch
pm-move-definition-of-struct-pm_ops-to-suspendh.patch
pm-rename-struct-pm_ops-and-related-things.patch
pm-rework-struct-platform_suspend_ops.patch
pm-make-suspend_ops-static.patch
pm-rework-struct-hibernation_ops.patch
pm-rename-hibernation_ops-to-platform_hibernation_ops.patch
freezer-document-relationship-with-memory-shrinking.patch
freezer-do-not-sync-filesystems-from-freeze_processes.patch
freezer-prevent-new-tasks-from-inheriting-tif_freeze-set.patch
freezer-introduce-freezer-firendly-waiting-macros.patch
freezer-introduce-freezer-firendly-waiting-macros-fix.patch
freezer-do-not-send-signals-to-kernel-threads.patch
unexport-pm_power_off_prepare.patch
pm_trace-displays-the-wrong-time-from-the-rtc.patch
freezer-be-more-verbose.patch
freezer-use-wait-queue-instead-of-busy-looping.patch
freezer-measure-freezing-time.patch
serial-turn-serial-console-suspend-a-boot-rather-than-compile-time-option.patch
serial-turn-serial-console-suspend-a-boot-rather-than-compile-time-option-update.patch
s2ram-kill-old-debugging-junk.patch
hibernation-arbitrary-boot-kernel-support-generic-code-rev-2.patch
hibernation-arbitrary-boot-kernel-support-on-x86_64-rev-2.patch
hibernation-pass-cr3-in-the-image-header-on-x86_64-rev-2.patch
hibernation-use-temporary-page-tables-for-kernel-text-mapping-on-x86_64.patch
hibernation-check-if-acpi-is-enabled-during-restore-in-the-right-place.patch
hibernation-enter-platform-hibernation-state-in-a-consistent-way-rev-4.patch
hibernation-enter-platform-hibernation-state-in-a-consistent-way-rev-4-fix.patch
Power management: merge
uml-move-userspace-code-to-userspace-file.patch
uml-tidy-recently-moved-code.patch
uml-fix-error-cleanup-ordering.patch
uml-console-subsystem-tidying.patch
uml-fix-console-writing-bugs.patch
uml-console-tidying.patch
uml-stop-using-libc-asm-pageh.patch
uml-fix-an-ipv6-libc-vs-kernel-symbol-clash.patch
uml-fix-nonremovability-of-watchdog.patch
uml-stop-specially-protecting-kernel-stacks.patch
uml-stop-saving-process-fp-state.patch
uml-stop-saving-process-fp-state-fix.patch
uml-physmem-code-tidying.patch
uml-add-vde-networking-support.patch
uml-remove-unnecessary-hostfs_getattr.patch
uml-throw-out-config_mode_tt.patch
uml-remove-sysdep-threadh.patch
uml-style-fixes-pass-1.patch
uml-throw-out-choose_mode.patch
uml-style-fixes-pass-2.patch
uml-remove-code-made-redundant-by-choose_mode-removal.patch
uml-style-fixes-pass-3.patch
uml-remove-__u64-usage-from-physical-memory-subsystem.patch
uml-get-rid-of-do_longjmp.patch
uml-fold-mmu_context_skas-into-mm_context.patch
uml-rename-pt_regs-general-purpose-register-file.patch
uml-rename-pt_regs-general-purpose-register-file-fix.patch
uml-free-ldt-state-on-process-exit.patch
uml-remove-os_-usage-from-userspace-files.patch
uml-replace-clone-with-fork.patch
uml-fix-inlines.patch
uml-userspace-files-should-call-libc-directly.patch
uml-clean-up-tlb-flush-path.patch
uml-remove-unneeded-if-from-hostfs.patch
uml-fix-hostfs-style.patch
uml-dont-use-glibc-asm-userh.patch
uml-floating-point-signal-delivery-fixes.patch
uml-ptrace-floating-point-fixes.patch
uml-coredumping-floating-point-fixes.patch
uml-sysrq-and-mconsole-fixes.patch
uml-style-fixes-in-fp-code.patch
uml-eliminate-floating-point-state-from-register-file.patch
uml-remove-unneeded-void-cast.patch
uml-remove-unused-file.patch
uml-more-idiomatic-parameter-parsing.patch
uml-eliminate-hz.patch
uml-fix-timer-switching.patch
uml-simplify-interval-setting.patch
uml-separate-timer-initialization.patch
uml-generic_time-support.patch
uml-generic_clockevents-support.patch
uml-clocksource-support.patch
uml-clocksource-support-fix.patch
uml-tickless-support.patch
uml-tickless-support-fix.patch
uml-eliminate-interrupts-in-the-idle-loop.patch
uml-time-build-fix.patch
uml-eliminate-sigalrm.patch
uml-use-sec_per_sec-constants.patch
uml-network-formatting.patch
uml-network-driver-mtu-cleanups.patch
uml-correctly-handle-skb-allocation-failures.patch
uml-correctly-handle-skb-allocation-failures-fix.patch
Merge
i-oat-new-device-ids.patch
i-oat-rename-the-source-file.patch
i-oat-code-cleanup-from-checkpatch-output.patch
i-oat-split-pci-startup-from-dma-handling-code.patch
i-oat-add-support-for-msi-and-msi-x.patch
i-oat-add-support-for-msi-and-msi-x-fix.patch
dca-add-direct-cache-access-driver.patch
i-oat-add-dca-services.patch
Merge
deprecate-smbfs-in-favour-of-cifs.patch
re-poll sfrench on this
cpuset-remove-sched-domain-hooks-from-cpusets.patch
Paul continues to wibble over this. Hold, I guess.
clone-flag-clone_parent_tidptr-leaves-invalid-results-in-memory.patch
Eric B had issues with this. Repolled him.
cache-pipe-buf-page-address-for-non-highmem-arch.patch
This isn't very popular. Will probably drop.
drivers-pmc-msp71xx-gpio-char-driver.patch
david-b didn't like this. Repolled.
fs-reiserfs-cleanups.patch
use-list_head-in-binfmt-handling-update.patch
make-unregister_binfmt-return-void.patch
immunize-rcu_dereference-against-crazy-compiler-writers.patch
remove-workaround-for-unimmunized-rcu_dereference-from-mce_log.patch
softlockup-use-cpu_clock-instead-of-sched_clock.patch
fix-the-softlockup-watchdog-to-actually-work.patch
softlockup-make-asm-irq_regsh-available-on-every-platform.patch
softlockup-improve-debug-output.patch
softlockup-improve-debug-output-fix.patch
softlockup-watchdog-style-cleanups.patch
softlockup-add-a-proc-tuning-parameter.patch
softlockup-add-a-proc-tuning-parameter-fix.patch
slab_panic-more-proc-posix-timers-shmem.patch
zisofs-use-mutex-instead-of-semaphore.patch
force-erroneous-inclusions-of-compiler-h-files-to-be-errors.patch
force-erroneous-inclusions-of-compiler-h-files-to-be-errors-fix.patch
driver-for-the-atmel-on-chip-ssc-on-at32ap-and-at91.patch
driver-for-the-atmel-on-chip-ssc-on-at32ap-and-at91-fix.patch
unexport-asm-shmparamh.patch
ext2-statfs-improvement-for-block-and-inode-free-count.patch
kill-declare_mutex_locked.patch
add-kernel-notifierc.patch
add-kernel-notifierc-fix.patch
add-kernel-notifierc-fix-2.patch
nbd-use-list_for_each_entry_safe-to-make-it-more-consolidated-and-readable.patch
nbd-change-a-parameters-type-to-remove-a-memcpy-call.patch
fs-romfs-inodec-trivial-improvements.patch
fs-mark-nibblemap-const.patch
kconfig-make-instrumentation-support-non-experimental.patch
faster-ext2_clear_inode.patch
remove-unneded-lock_kernel-in-driver-block-loopc.patch
do_sys_poll-simplify-playing-with-on-stack-data.patch
do_sys_poll-simplify-playing-with-on-stack-data-fix.patch
do_poll-return-eintr-when-signalled.patch
fs-proc-mmuc-headers-butchery.patch
i386-mark-pit_clockevent-static.patch
fs-use-kmem_cache_zalloc-instead.patch
pcmcia-compactflash-driver-for-pa-semi-electra-boards.patch
pcmcia-compactflash-driver-for-pa-semi-electra-boards-fix.patch
remove-sysctlh-from-fsh.patch
clean-up-duplicate-includes-in-drivers-char.patch
clean-up-duplicate-includes-in-drivers-w1.patch
clean-up-duplicate-includes-in-fs.patch
clean-up-duplicate-includes-in-fs-ecryptfs.patch
clean-up-duplicate-includes-in-kernel.patch
time-simplify-smp_call_function_single-call-sequence.patch
convert-ill-defined-log2-to-ilog2.patch
ext2-show-all-mount-options.patch
ext3-show-all-mount-options.patch
ext4-show-all-mount-options.patch
remove-unsafe-from-module-struct.patch
report-the-per-irq-statistics-on-allarches.patch
fix-config_debug_shirq-trigger-on-free_irq.patch
fs-remove-the-unused-mempages-parameter.patch
remove-unused-bh-in-calls-to-ext234_get_group_desc.patch
add-in-sunos-41x-compatible-mode-for-ufs.patch
add-in-sunos-41x-compatible-mode-for-ufs-fix.patch
add-in-sunos-41x-compatible-mode-for-ufs-fix-2.patch
ufs-implement-show_options.patch
argv_split-allow-argv_split-to-handle-null-pointer-in-argcp-parameter-gracefully.patch
core_pattern-ignore-rlimit_core-if-core_pattern-is-a-pipe.patch
core_pattern-ignore-rlimit_core-if-core_pattern-is-a-pipe-fix.patch
core_pattern-allow-passing-of-arguments-to-user-mode-helper-when-core_pattern-is-a-pipe.patch
core_pattern-fix-up-a-few-miscellaneous-bugs.patch
core_pattern-fix-up-a-few-miscellaneous-bugs-fix.patch
epcac-reformat-comments-and-coding-style-improvements.patch
#fs-partitions-checkc-add-add_partition-error-handling.patch
add-sys-module-name-notes.patch
kernel-rtmutex-debugc-cleanups.patch
fs-afs-possible-cleanups.patch
lib-ioremapc-should-include-linux-ioh.patch
ipc-shmc-make-2-functions-static.patch
printk-add-interfaces-for-external-access-to-the-log-buffer.patch
printk-add-interfaces-for-external-access-to-the-log-buffer-fix.patch
printk-add-interfaces-for-external-access-to-the-log-buffer-fix-2.patch
drivers-char-consolemapc-kmalloc-memset-conversion-to-kzalloc.patch
doc-firmware_sample_firmware_classc-kmalloc-memset-conversion-to-kzalloc.patch
fs-autofs4-inodec-kmalloc-memset-conversion-to-kzalloc.patch
drivers-char-ip2-ip2mainc-kmalloc-memset-conversion-to-kzalloc.patch
tpm_tis-fix-interrupt-probing.patch
pi-futex-set-pf_exiting-without-taking-pi_lock.patch
do_sigaction-remove-now-unneeded-recalc_sigpending.patch
deprecate-aout-elf-interpreters.patch
deprecate-aout-elf-interpreters-fix.patch
handle-the-multi-threaded-inits-exit-properly.patch
tweak-proc-ipmi-removal.patch
ufs-move-non-layout-parts-of-ufs_fsh-to-fs-ufs.patch
ufs-fix-sun-state-fix-mount-check-in-ufs_fill_super.patch
#msleep-with-hrtimers.patch: overflow bug
#msleep-with-hrtimers.patch
add-linux-elfcore-compath.patch
x86_64-use-linux-elfcore-compath.patch
powerpc-use-linux-elfcore-compath.patch
avoid-a-small-unlikely-memory-leak-in-proc_read_escd.patch
wait_task_zombie-remove-unneeded-child-signal-check.patch
wait_task_zombie-fix-2-3-races-vs-forget_original_parent.patch
exit_notify-dont-take-tasklist-for-tif_sigpending-re-targeting.patch
zap_other_threads-dont-optimize-thread_group_empty-case.patch
wait_task_zombie-dont-fight-with-non-existing-race-with-a-dying-ptracee.patch
__group_complete_signal-eliminate-unneeded-wakeup-of-group_exit_task.patch
wait_task_stopped-continued-remove-unneeded-p-signal-=-null-check.patch
do-not-export-usr-include-scsi-in-make-headers_install.patch
add-mmf_dump_elf_headers.patch
ext2-ext3-ext4-add-block-bitmap-validation.patch
ext2-ext3-ext4-add-block-bitmap-validation-fix.patch
aoe-remove-unecessary-wrapper-function.patch
unicode-diacritics-support.patch
unicode-diacritics-support-s390-fix.patch
mxser-remove-use-of-dead-tty_flipbuf_size-definition.patch
jsm-remove-further-unneeded-crud.patch
jsm-remove-further-unneeded-crud-fix.patch
remove-consolemaph-from-header-exports.patch
lib-sortc-optimization.patch
x86_64-efi-boot-support-efi-frame-buffer-driver.patch
x86_64-efi-boot-support-efi-boot-document.patch
vfs-check-nanoseconds-in-utimensat.patch
fix-execute-checking-in-permission.patch
exec-remove-unnecessary-check-for-mnt_noexec.patch
clean-out-unused-code-in-dentry-pruning.patch
include-linux-typesh-in-if_fddih.patch
pie-executable-randomization.patch
pie-executable-randomization-fix.patch
pie-executable-randomization-fix-2.patch
pie-executable-randomization-fix-3.patch
i386-and-x86_64-randomize-brk-2.patch
cramfs-error-message-about-endianess.patch
remove-strict-ansi-check-from-__u64-in-asm-typesh.patch
shrink-struct-task_structoomkilladj.patch
remove-struct-task_structio_wait.patch
ext2-4-use-is_power_of_2.patch
limit-minixfs-printks-on-corrupted-dir-i_size.patch
kernel-time-timekeepingc-cleanups.patch
make-fs-libfscsimple_commit_write-static.patch
allow-disabling-dnotify-without-embedded.patch
seqfile-merge-duplite-code-to-seq_open_private.patch
# use-erestartnohand-if-poll-is-interrupted-by-a-signal.patch: tricky
use-erestart_restartblock-if-poll-is-interrupted-by-a-signal.patch
use-erestart_restartblock-if-poll-is-interrupted-by-a-signal-fix.patch
use-num_possible_cpus-instead-of-nr_cpus-for-timer.patch
make-rcutorture-rng-use-temporal-entropy.patch
aio-account-i-o-wait-time-properly.patch
fix-f_version-type-should-be-u64-instead-of-unsigned-long.patch
exec-simplify-sighand-switching.patch
exec-simplify-the-new-sighand-allocation.patch
exec-consolidate-2-fast-paths.patch
exec-rt-sub-thread-can-livelock-and-monopolize-cpu-on-exec.patch
do_sigaction-dont-worry-about-signal_pending.patch
jbd-remove-printk-from-j_assert-macros.patch
jbd2-remove-printk-from-j_assert-macros.patch
autofs4-reinstate-negatitive-timeout-of-mount-fails.patch
autofs4-reinstate-negatitive-timeout-of-mount-fails-fix.patch
add-stack-checking-for-blackfin.patch
binfmt_flat-warning-fixes.patch
console-events-and-accessibility.patch
console-events-and-accessibility-fix.patch
add-vmcoreinfo.patch
add-vmcore-cleanup-the-coding-style-according-to-andrews-comments.patch
add-vmcore-add-nodemask_ts-size-and-nr_free_pagess-value-to-vmcoreinfo_data.patch
add-vmcore-use-the-existing-ia64_tpa-instead-of-asm-code.patch
add-vmcore-add-a-prefix-vmcoreinfo_-to-the-vmcoreinfo-macros.patch
maintainters-use-our-mail-list-as-blackfin-arch-maintainters.patch
shrink-task_struct-if-config_futex=n.patch
ttyh-remove-dead-define.patch
fix-a-trivial-typo-in-scripts-checkstackpl.patch
move-preempt_notifiers-into-an-always-included-kconfig.patch
floppy-tolerate-dma-channel-unavailability.patch
cleanup-floppyh.patch
codingstyle-relax-the-80-cole-rule.patch
script-to-check-for-undefined-kconfig-symbols.patch
nbd-set-uninitialized-devices-to-size-0.patch
nbd-allow-hung-network-i-o-to-be-cancelled.patch
cciss-fix-error-reporting-for-sg_io.patch
drop-some-headers-from-mmh.patch
remove-include-asm-ipch.patch
n_hdlcc-fix-check-after-use.patch
kernel-sys_nic-add-dummy-sys_ni_syscall-prototype.patch
make-kernel-profilectime_hook-static.patch
drivers-block-ccissc-fix-check-after-use.patch
#track-accurate-idle-time-with-tick_schedidle_sleeptime.patch: needs acks
track-accurate-idle-time-with-tick_schedidle_sleeptime.patch
track-accurate-idle-time-with-tick_schedidle_sleeptime-fix.patch
remove-valueless-definition-of-hard-selected-ramfs-option.patch
local_t-documentation-update-2.patch
atomic_opstxt-mention-local_t.patch
local_t-update-documentation.patch
docs-ramdisk-initrd-initramfs-corrections.patch
remove-final-traces-of-long-deprecated-ramdisk-kernel.patch
send-quota-messages-via-netlink.patch
send-quota-messages-via-netlink-fix.patch
send-quota-messages-via-netlink-fix-fix.patch
make-dmapool-code-use-__set_current_state.patch
add-a-rounddown_pow_of_two-routine-to-log2h.patch
add-a-rounddown_pow_of_two-routine-to-log2hpatch-fix.patch
fix-discrepancy-between-vdso-based-gettimeofday-and-sys_gettimeofday.patch
handle-recursive-calls-to-bust_spinlocks.patch
store-__setup_str_-in-a-more-compact-way.patch
constify-string-array-kparam-tracking-structures.patch
avoid-negative-and-full-width-shifts-in-radix-treec.patch
add-config_vt_unicode.patch
update-checkpatchpl-to-version-010.patch
i2o-fix-defined-but-not-used-build-warnings.patch
i2o-fix-defined-but-not-used-build-warnings-fix.patch
ipc-namespace-remove-config-ipc-ns-fix.patch
spelling-fix-weired-weird.patch
mutex-documentation-is-unclear-about-software-interrupts-tasklets-and-timers.patch
dcache-trivial-comment-fix.patch
procfs-detect-duplicate-names.patch
procfs-detect-duplicate-names-fix.patch
procfs-detect-duplicate-names-fix-fix-2.patch
remove-dma_cache_wbackinvwback_inv-functions.patch
maintainers-linux-omap-list-is-subscribers-only.patch
try-to-reap-reiserfs-pages-left-around-by-invalidatepage.patch
keys-make-request_key-and-co-fundamentally-asynchronous.patch
keys-make-request_key-and-co-fundamentally-asynchronous-update.patch
keys-make-request_key-and-co-fundamentally-asynchronous-vs-git-mmc.patch
keys-missing-word-in-documentation.patch
make-the-pr_-family-of-macros-in-kernelh-complete.patch
doc-about-email-clients-for-linux-patches.patch
jbd-slab-cleanups.patch
jbd-slab-cleanups-2.patch
jbd-slab-cleanups-3.patch
reiserfs-fix-kernel-panic-on-corrupted-directory.patch
lib-iomapcbad_io_access-print-0x-hex-prefix.patch
lk201-remove-obsolete-driver.patch
shrink_dcache_sb-speedup.patch
add-consts-where-appropriate-in-fs-nls.patch
reiserfs-workaround-for-dead-loop-in-finish_unfinished.patch
reiserfs-workaround-for-dead-loop-in-finish_unfinished-fix.patch
unify-dma_bit_mask-definitions-v31.patch
delete-gcc-295-compatible-structure-definition.patch
fs-isofs-nameic-remove-uninitialized-local-vars-warning.patch
ide-cd-is-unmaintained.patch
tty-expose-new-methods-needed-for-drivers-to-get-termios.patch
tty-expose-new-methods-needed-for-drivers-to-get-termios-fix.patch
kernel-printkc-concerns-about-the-console-handover.patch
atomic_opstxt-has-incorrect-misleading-and-insufficient-information.patch
udf-code-style-fixup-v3.patch
userc-deinline.patch
userc-ifdef-mq_bytes.patch
userc-ifdef-mq_bytes-fix.patch
remove-unused-member-from-nsproxy.patch
use-kmem_cache-macro-to-create-the-nsproxy-cache.patch
jbd-ext3-cleanups-convert-to-kzalloc.patch
vfs-use-the-predefined-d_unhashed-inline-function-instead.patch
move-kasprintfo-to-obj-y.patch
#increase-at_vector_size-to-terminate-saved_auxv-properly.patch: Tony wanted enhancements
increase-at_vector_size-to-terminate-saved_auxv-properly.patch
change-inotifyfs-magic-as-the-same-magic-is-used-for-futexfs-v2.patch
delay-creation-of-khcvd-thread.patch
hvc-console-is-also-used-by-iseries-so-add-that-to-hvc_driver-help.patch
lockdep-give-each-filesystem-its-own-inode-lock-class.patch
menuconfig-transform-nls-and-dlm-menus.patch
menuconfig-transform-network-filesystems-menu.patch
fs-udf-ballocc-mark-a-variable-as-uninitialized_var.patch
jbd-config_jbd_debug-cannot-create-proc-entry.patch
jbd-config_jbd_debug-cannot-create-proc-entry-fix.patch
jbd-fix-commit-code-to-properly-abort-journal.patch
jbd-fix-jbd-warnings-when-compiling-with-config_jbd_debug.patch
dont-truncate-proc-pid-environ-at-4096-characters.patch
fix-wrong-filename-reference-in-drivers-testingtxt.patch
anon-inodes-use-open-coded-atomic_inc-for-the-shared-inode.patch
ncr53c8xx-remove-deprecated-irq-flags-sa_.patch
completely-remove-deprecated-irq-flags-sa_.patch
compile-handle_percpu_irq-even-for-uniprocessor-kernels.patch
fs-correct-sus-compliance-for-open-of-large-file-without.patch
ext3-remove-ifdef-config_ext3_index.patch
rename-signalfd_siginfo-fields.patch
break-elf_platform-and-stack-pointer-randomization-dependency.patch
spin_lock_unlocked-cleanups.patch
binfmt_flat-minimum-support-for-the-blackfin-relocations.patch
binfmt_flat-minimum-support-for-the-blackfin-relocations-checkpatch-fixes.patch
The infamous misc. Will re-review and will merge basically all of them.
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
writeback-fix-comment-use-helper-function.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch
writeback-fix-periodic-superblock-dirty-inode-flushing.patch
introduce-i_sync.patch
introduce-i_sync-fix.patch
writeback-remove-unnecessary-wait-in-throttle_vm_writeout.patch
Merge
sync_sb_inodes-propagate-errors.patch
Unready
#
# spi
#
clean-up-duplicate-includes-in-drivers-spi.patch
omap2-mcspi-code-cleanup.patch
spi-driver-runtime-footprint-shrinkage.patch
Merge
revert-faster-ext2_clear_inode.patch
ext2-reservations.patch
ext2-reservations-fix-for-percpu_counter-changes.patch
fix-for-ext2-reservation.patch
remove-fs-ext2-balloccreserve_blocks.patch
ext2-balloc-use-io_error-label.patch
This is surviving google QA. Will merge.
#
# kprobes
#
kprobes-support-kretprobe-blacklist.patch
Merge
#
# i4l
#
gigaset-remove-pointless-locking.patch
use-mutex-instead-of-semaphore-in-isdn-subsystem-common-functions.patch
fix-possible-null-deref-on-low-memory-condition-in-capidrvcsend_message.patch
isdn-guard-against-a-potential-null-pointer-dereference-in-old_capi_manufacturer.patch
isdn-hisax-hfc_usbc-fix-check-after-use.patch
Merge
#
# nfsd
#
fs-nfsd-exportc-make-3-functions-static.patch
Send to neilb and bfields
ecryptfs-add-key-list-structure-search-keyring.patch
ecryptfs-use-list_for_each_entry_safe-when-wiping-auth-toks.patch
ecryptfs-kmem_cache-objects-for-multiple-keys-init-exit-functions.patch
ecryptfs-fix-tag-1-parsing-code.patch
ecryptfs-fix-tag-3-parsing-code.patch
ecryptfs-fix-tag-11-parsing-code.patch
ecryptfs-fix-tag-11-writing-code.patch
ecryptfs-update-comment-and-debug-statement.patch
ecryptfs-printk-warning-fixes.patch
ecryptfs-remove-unnecessary-bug_on.patch
ecryptfs-collapse-flag-set-into-one-statement.patch
ecryptfs-grammatical-fix-destruct-to-destroy.patch
ecryptfs-comments-for-some-structs.patch
ecryptfs-kerneldoc-fixes-for-cryptoc-and-keystorec.patch
ecryptfs-remove-unnecessary-variable-initializations.patch
ecryptfs-make-needlessly-global-symbols-static.patch
ecryptfs-use-generic_file_splice_read.patch
ecryptfs-remove-header_extent_size.patch
ecryptfs-remove-header_extent_size-fix.patch
ecryptfs-remove-assignments-in-if-statements.patch
ecryptfs-fix-error-handling.patch
ecryptfs-read_writec-routines.patch
ecryptfs-replace-encrypt-decrypt-and-inode-size-write.patch
ecryptfs-set-up-and-destroy-persistent-lower-file.patch
ecryptfs-update-metadata-read-write-functions.patch
ecryptfs-update-metadata-read-write-functions-cleanup.patch
ecryptfs-make-open-truncate-and-setattr-use-persistent-file.patch
ecryptfs-convert-mmap-functions-to-use-persistent-file.patch
ecryptfs-convert-mmap-functions-to-use-persistent-file-fix.patch
ecryptfs-fix-data-types.patch
ecryptfs-initialize-persistent-lower-file-on-inode-create.patch
ecryptfs-remove-unused-functions-and-kmem_cache.patch
ecryptfs-replace-magic-numbers.patch
ecryptfs-clean-up-page-flag-handling.patch
Merge
rtc-periodic-irq-fix.patch
rtc_irq_set_freq-requires-power-of-two-and-associated-kerneldoc.patch
no-need-to-convert-file-private_data-to-rtc-device.patch
rtc-make-rtc-ds1553-driver-hotplug-aware-take-3.patch
rtc-make-rtc-ds1742-driver-hotplug-aware-take-2.patch
rtc-pcf8583-check-for-i2c-adapter-functionality.patch
rtc-rtc-class-driver-for-the-ds1374.patch
rtc-fix-readback-from-sys-class-rtc-rtc-wakealarm.patch
Merge
unprivileged-mounts-add-user-mounts-to-the-kernel.patch
unprivileged-mounts-allow-unprivileged-umount.patch
unprivileged-mounts-account-user-mounts.patch
unprivileged-mounts-propagate-error-values-from-clone_mnt.patch
unprivileged-mounts-allow-unprivileged-bind-mounts.patch
unprivileged-mounts-put-declaration-of-put_filesystem-in-fsh.patch
unprivileged-mounts-allow-unprivileged-mounts.patch
unprivileged-mounts-allow-unprivileged-mounts-fix-subtype-handling.patch
unprivileged-mounts-allow-unprivileged-fuse-mounts.patch
unprivileged-mounts-propagation-inherit-owner-from-parent.patch
unprivileged-mounts-propagation-inherit-owner-from-parent-fix-for-git-audit.patch
unprivileged-mounts-add-no-submounts-flag.patch
Need input from VFS guys on this.
fbdev-export-fb_destroy_modelist.patch
connector-change-connectors-max-message-size.patch
uvesafb-add-connector-entries.patch
uvesafb-the-driver-core.patch
uvesafb-the-driver-core-uvesafb-set-the-refresh-rate-to-60hz-if-nocrtc-is-used.patch
uvesafb-the-driver-core-uvesafb-always-use-mutexes-when-accessing-uvfb_tasks.patch
uvesafb-the-driver-core-uvesafb-fix-a-typo-in-a-warning.patch
uvesafb-the-driver-core-uvesafb-use-visual_truecolor-as-the-default-visual.patch
uvesafb-the-driver-core-uvesafb-use-the-default-refresh-rate-if-the-monitor-limits-are-not-set.patch
uvesafb-the-driver-core-uvesafb-try-to-set-mode-with-default-timings-if-setting-it-with-our-own-timings-failed.patch
uvesafb-the-driver-core-dont-access-vga-registers-directly-when-running-on-non-x86.patch
uvesafb-documentation.patch
uvesafb-documentation-uvesafb-add-info-about-pmipal-yrap-and-ypan-being-available-only-on-x86.patch
pm3fb-copyarea-and-partial-imageblit-suppor.patch
skeletonfb-wrong-field-name-fix.patch
pm3fb-header-file-reduction.patch
pm3fb-imageblit-improved.patch
pm3fb-3-small-fixes.patch
pm3fb-improvements-and-cleanups.patch
pm3fb-mtrr-support-and-noaccel-option.patch
pm3fb-mtrr-support-and-noaccel-option-make-pm3fb_init-static-again.patch
pm2fb-mtrr-support-and-noaccel-option.patch
pm2fb-mtrr-support-and-noaccel-option-pm2fb-lowsyncs-section-mismatch-fix.patch
pm2fb-accelerated-imageblit.patch
pm2fb-source-code-improvements.patch
pm2fb-permedia-2v-initialization-fixes.patch
pm2fb-accelerated-24-bit-fillrect.patch
sm501fb-update-suspend-and-resume-code.patch
sm501fb-call-fb-suspend-function-during-suspend-and-resume.patch
sm501fb-ensure-panel-interface-is-not-tristated-when-setup.patch
mbxfb-improvements-and-new-features.patch
pxafb-add-support-for-other-palette-formats.patch
tridentfb-coding-style-improvement.patch
tdfxfb-coding-style-improvement.patch
tdfxfb-3-fixes.patch
tdfxfb-palette-fixes.patch
radeon_driver_vblank_do_wait-static.patch
unexport-fb_prepare_logo.patch
fbdev-fix-incorrect-timings-in-some-modedb-entries.patch
tdfxfb-code-improvements.patch
tdfxfb-hardware-cursor.patch
tdfxfb-mtrr-support.patch
tdfxfb-mtrr-support-fix.patch
tdfxfb-mtrr-support-fix-2.patch
pm2fb-checkpatch-fixes.patch
pm3fb-checkpatch-fixes.patch
drivers-video-geode-lxfb_corec-fix-lxfb_setup-warning.patch
fbdev-fb_create_modedb-non-static-int-first-=-1.patch
fbdev-fb_create_modedb-non-static-int-first-=-1-fix.patch
pm2fb-permedia-2v-hardware-cursor-support.patch
pm3fb-hardware-cursor-support.patch
s3c2410fb-code-cleanup.patch
s3c2410fb-remove-fb_info-pointer-from-s3c2410fb_info.patch
s3c2410fb-multi-display-support.patch
s3c2410fb-add-margin-fields-to-s3c2410fb_display.patch
s3c2410fb-use-new-margin-fields.patch
s3c2410fb-remove-lcdcon3-register-from-s3c2410fb_display.patch
s3c2410fb-add-vertical-margins-fields-to-s3c2410fb_display.patch
s3c2410fb-use-vertical-margins-values.patch
s3c2410fb-add-pulse-length-fields-to-s3c2410fb_display.patch
s3c2410fb-remove-lcdcon2-and-lcdcon3-register-fields.patch
s3c2410fb-fix-missing-registers-offset.patch
s3c2410fb-byte-ordering-fixes.patch
atyfb-atyfb-unshare-pseudo_palette.patch
fbcon-convert-struct-font_desc-to-use-iso-c-initializers.patch
fbcon-convert-struct-font_desc-to-use-iso-c-initializers-update.patch
vt-fix-warnings-in-selectionh.patch
fbdev-change-asm-uaccessh-to-linux-uaccessh.patch
s3c2410fb-source-code-improvements.patch
s3c2410fb-adds-pixclock-to-s3c2410fb_display.patch
s3c2410fb-removes-lcdcon1-register-value-from-s3c2410fb_display.patch
s3c2410fb-make-use-of-default_display-settings.patch
cirrusfb-checkpatchpl-cleanup.patch
cirrusfb-checkpatchpl-cleanup-ppc-fix.patch
cirrusfb-remove-typedefs.patch
cirrusfb-remove-fields-from-cirrusfb_info.patch
cirrusfb-code-improvements.patch
cirrusfb-code-improvement-2nd-part.patch
pm3fb-header-file-cleanup.patch
pm2fb-hardware-cursor-support-for-the-permedia2.patch
pm2fb-panning-and-hardware-cursor-fixes.patch
vfb-make-virtual-framebuffer-mmapable.patch
intel-fb-support-for-interlaced-video-modes.patch
fbdev-find-mode-with-the-highest-safest-refresh-rate-in-fb_find_mode.patch
nvidiafb-add-boot-option-to-reverse-i2c-port-assignment.patch
fbdev-support-for-byte-reversed-framebuffer-formats.patch
ps3-fix-black-and-white-stripes.patch
ps3fb-fix-spurious-mode-change-failures.patch
fbdev-update-documentation-fb-00-index.patch
tdfxfb-replace-busy-waiting-with-cpu_relax.patch
pm2fb-replace-busy-waiting-with-cpu_relax.patch
pm3fb-replace-busy-waiting-with-cpu_relax.patch
tdfxfb-checkpatch-fixes.patch
drivers-video-kconfig-fix-fb_pmagb_b-dependencies.patch
export-font_vga_8x16.patch
radeonfb-xpress-200m-rc410-support-patch.patch
drivers-video-pmag-ba-fbc-improve-diagnostics.patch
drivers-video-pmag-ba-fbc-improve-diagnostics-fix.patch
intel-fb-whitespace-bracket-and-other-clean-ups.patch
intel-fb-obvious-changes-and-corrections.patch
intel-fb-force-even-line-count-in-interlaced-mode.patch
intel-fb-more-interlaced-mode-support.patch
video-gfx-fix-menu-ordering.patch
Merge
md-software-raid-autodetect-dev-list-not-array.patch
md-software-raid-autodetect-dev-list-not-array-fix.patch
bitmaph-remove-dead-artifacts.patch
Merge subject to acks
cpu-hotplug-slab-cleanup-cpuup_callback.patch
cpu-hotplug-slab-fix-memory-leak-in-cpu-hotplug-error-path.patch
cpu-hotplug-cpu-deliver-cpu_up_canceled-only-to-notify_oked-callbacks-with-cpu_up_prepare.patch
cpu-hotplug-topology-remove-topology_dev_map.patch
cpu-hotplug-thermal_throttle-fix-cpu-hotplug-error-handling.patch
cpu-hotplug-msr-fix-cpu-hotplug-error-handling.patch
cpu-hotplug-mce-fix-cpu-hotplug-error-handling.patch
cpu-hotplug-intel_cacheinfo-fix-cpu-hotplug-error-handling.patch
cpu-hotplug-intel_cacheinfo-fix-cpu-hotplug-error-handling-fix-a-section-mismatch-warning.patch
Merge
do-cpu_dead-migrating-under-read_locktasklist-instead-of-write_lock_irqtasklist.patch
migration_callcpu_dead-use-spin_lock_irq-instead-of-task_rq_lock.patch
Merge
floppy-do-a-very-minimal-style-cleanup-of-the-floppy-driver.patch
floppy-remove-dead-commented-out-code-from-floppy-driver.patch
floppy-remove-register-keyword-use-from-floppy-driver.patch
Merge
intel-iommu-dmar-detection-and-parsing-logic.patch
intel-iommu-pci-generic-helper-function.patch
intel-iommu-clflush_cache_range-now-takes-size-param.patch
intel-iommu-iova-allocation-and-management-routines.patch
intel-iommu-intel-iommu-driver.patch
intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch
intel-iommu-intel-iommu-cmdline-option-forcedac.patch
intel-iommu-dmar-fault-handling-support.patch
intel-iommu-iommu-gfx-workaround.patch
intel-iommu-iommu-gfx-workaround-kconfig-fix.patch
intel-iommu-iommu-floppy-workaround.patch
intel-iommu-iommu-floppy-workaround-kconfig-fix.patch
intel-iommu-optimize-sg-map-unmap-calls.patch
Merge
fuse-update-backing_dev_info-congestion-state.patch
fuse-fix-reserved-request-wake-up.patch
fuse-add-reference-counting-to-fuse_file.patch
fuse-truncate-on-spontaneous-size-change.patch
fuse-fix-page-invalidation.patch
fuse-set-i_nlink-to-sane-value-after-mount.patch
fuse-refresh-stale-attributes-in-fuse_permission.patch
fuse-fix-permission-checking-on-sticky-directories.patch
fuse-fix-permission-checking-on-sticky-directories-fix.patch
fuse-fix-permission-checking-on-sticky-directories-fix-setting-i_mode-bits.patch
fuse-cleanup-in-release.patch
fuse-no-abort-on-interrupt.patch
fuse-no-enoent-from-fuse-device-read.patch
fuse-clean-up-execute-permission-checking.patch
Merge
peterz-vs-ext4-mballoc-core.patch
64-bit-i_version-afs-fixes.patch
jbd2-ext4-cleanups-convert-to-kzalloc.patch
jbd2-fix-commit-code-to-properly-abort-journal.patch
jbd2-debug-code-cleanup.patch
ext4-remove-ifdef-config_ext4_index.patch
Send to tytso
pnp-make-pnpacpi_suspend-handle-errors.patch
pnp-dont-fail-device-init-if-no-dma-channel.patch
fix-very-high-interrupt-rate-for-irq8-rtc-unless-pnpacpi=off.patch
pnp-remove-null-pointer-checks.patch
pnp-simplify-pnp-card-error-handling.patch
pnp-use-dev_info-dev_err-etc-in-core.patch
pnp-use-dev_info-in-system-driver.patch
pnp-simplify-pnpbios-insert_device.patch
pnp-add-debug-message-for-adding-new-device.patch
Merge
ecryptfs-allow-lower-fs-to-interpret-attr_kill_sid.patch
knfsd-only-set-attr_kill_sid-if-attr_mode-isnt-being-explicitly-set.patch
reiserfs-turn-of-attr_kill_sid-at-beginning-of-reiserfs_setattr.patch
unionfs-fix-unionfs_setattr-to-handle-attr_kill_sid.patch
vfs-make-notify_change-pass-attr_kill_sid-to-setattr-operations.patch
nfs-if-attr_kill_sid-bits-are-set-then-skip-mode-change.patch
cifs-ignore-mode-change-if-its-just-for-clearing-setuid-setgid-bits.patch
Merge
r-o-bind-mounts-filesystem-helpers-for-custom-struct-files.patch
r-o-bind-mounts-rearrange-may_open-to-be-r-o-friendly.patch
r-o-bind-mounts-give-permission-a-local-mnt-variable.patch
r-o-bind-mounts-create-cleanup-helper-svc_msnfs.patch
r-o-bind-mounts-stub-functions.patch
r-o-bind-mounts-elevate-write-count-opend-files.patch
r-o-bind-mounts-elevate-write-count-for-some-ioctls.patch
r-o-bind-mounts-elevate-writer-count-for-chown-and-friends.patch
r-o-bind-mounts-make-access-use-mnt-check.patch
r-o-bind-mounts-elevate-mnt-writers-for-callers-of-vfs_mkdir.patch
r-o-bind-mounts-elevate-write-count-during-entire-ncp_ioctl.patch
r-o-bind-mounts-elevate-write-count-during-entire-ncp_ioctl-fix.patch
r-o-bind-mounts-elevate-write-count-for-link-and-symlink-calls.patch
r-o-bind-mounts-elevate-mount-count-for-extended-attributes.patch
r-o-bind-mounts-elevate-write-count-for-file_update_time.patch
r-o-bind-mounts-unix_find_other-elevate-write-count-for-touch_atime.patch
r-o-bind-mounts-elevate-write-count-over-calls-to-vfs_rename.patch
r-o-bind-mounts-nfs-check-mnt-instead-of-superblock-directly.patch
r-o-bind-mounts-elevate-writer-count-for-do_sys_truncate.patch
r-o-bind-mounts-elevate-write-count-for-do_utimes.patch
r-o-bind-mounts-elevate-write-count-for-do_utimes-touch-command-causes-oops.patch
r-o-bind-mounts-elevate-write-count-for-do_sys_utime-and-touch_atime.patch
r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create.patch
r-o-bind-mounts-sys_mknodat-elevate-write-count-for-vfs_mknod-create-fix.patch
r-o-bind-mounts-elevate-mnt-writers-for-vfs_unlink-callers.patch
r-o-bind-mounts-do_rmdir-elevate-write-count.patch
r-o-bind-mounts-track-number-of-mount-writers.patch
r-o-bind-mounts-track-number-of-mount-writers-make-lockdep-happy-with-r-o-bind-mounts.patch
r-o-bind-mounts-honor-r-w-changes-at-do_remount-time.patch
ext2-reservations-fix-for-r-o-bind-mounts-take-writer-count-v2.patch
make-reiserfs-stop-using-struct-file-for-internal.patch
Doesn't seem ready yet
revoke-special-mmap-handling.patch
revoke-special-mmap-handling-vs-fault-vs-invalidate.patch
revoke-core-code.patch
slab-api-remove-useless-ctor-parameter-and-reorder-parameters-vs-revoke.patch
revoke-support-for-ext2-and-ext3.patch
revoke-add-documentation.patch
revoke-wire-up-i386-system-calls.patch
fs-introduce-write_begin-write_end-and-perform_write-aops-revoke.patch
fs-introduce-write_begin-write_end-and-perform_write-aops-revoke-fix.patch
revoke-vs-git-block.patch
Not sure - opinions sought.
clean-up-duplicate-includes-in-documentation.patch
documentation-make-headers_installtxt.patch
documentation-add-entries-to-filesystems-00-index-for-several-untracked-files.patch
add-a-missing-00-index-file-for-documentation-vm.patch
add-a-missing-00-index-file-for-documentation-vm-fix.patch
add-a-00-index-file-to-documentation-mips.patch
add-a-00-index-file-to-documentation-sysctl.patch
add-a-00-index-file-to-documentation-telephony.patch
kernel-doc-fix-doc-blocks-and-html.patch
documentation-delete-unreferenced-xterm-linuxxpm-file.patch
express-relocatability-of-kernel-on-x86_64-in-documentation.patch
express-relocatability-of-kernel-on-x86_64-in.patch
express-new-elf32-mechanisms-in-documentation.patch
add-reset_devices-to-the-recommended-parameters.patch
Merge
sysctl-core-stop-using-the-unnecessary-ctl_table-typedef.patch
sysctl-factor-out-sysctl_data.patch
sysct-mqueue-remove-the-binary-sysctl-numbers.patch
sysctl-remove-binary-sysctl-support-where-it-clearly-doesnt-work.patch
sysctl-fix-neighbour-table-sysctls.patch
sysctl-ipv6-route-flushing-kill-binary-path.patch
sysctl-remove-broken-sunrpc-debug-binary-sysctls.patch
sysctl-x86_64-remove-unnecessary-binary-paths.patch
sysctl-remove-broken-cdrom-binary-sysctls.patch
sysctl-remove-broken-cdrom-binary-sysctls-update.patch
sysctl-ipv4-remove-binary-sysctl-paths-where-they-are-broken.patch
sysctl-remove-the-binary-interface-for-aio-nr-aio-max-nr-acpi_video_flags.patch
sysctl-parport-remove-binary-paths.patch
sysctl-parport-remove-binary-paths-fix.patch
sysctl-simplify-the-pty-sysctl-logic.patch
sysctl-remove-broken-netfilter-binary-sysctls.patch
sysctl-remove-the-cad_pid-binary-sysctl-path.patch
sysctl-properly-register-the-irda-binary-sysctl-numbers.patch
sysctl-error-on-bad-sysctl-tables.patch
sysctl-error-on-bad-sysctl-tables-kernel-sysctl_checkc-must-include-linux-stringh.patch
sysctl-update-sysctl_check_table.patch
sysctl-update-sysctl_checks-list-of-binary-paths.patch
sysctl-update-sysctl_check_table-sysctl-update-sysctl_check-to-handle-compiled-out-code.patch
sysctl-for-irda-update-sysctl_checks-list-of-binary-paths.patch
sysctl-deprecate-sys_sysctl-in-a-user-space-visible-fashion.patch
sysctl-deprecate-sys_sysctl-in-a-user-space-visible-fashion-fix.patch
Merge
v3-file-capabilities-alter-behavior-of-cap_setpcap.patch
This is part of implement-file-posix-capabilities.patch, but the patch is
all tangled up with intervening patches. I've repolled the security guys.
char-mxser_new-upgrade-to-110.patch
char-mxser_new-move-to-pci_vdevice.patch
char-mxser_new-remove-useless-comments-in-mxser_cards.patch
mxser-remove-commented-crap.patch
mxser-fix-compiler-warning-when-building-withoug-config_pci.patch
mxser-fix-compiler-warning-when-building-withoug-config_pci-fix.patch
Merge
cpuset-zero-malloc-revert-the-old-cpuset-fix.patch
task-containersv11-basic-task-container-framework.patch
task-containersv11-basic-task-container-framework-fix.patch
task-containersv11-basic-task-container-framework-containers-fix-refcount-bug.patch
task-containersv11-basic-task-container-framework-fix-cgroup_create_dir-comments.patch
task-containersv11-add-tasks-file-interface.patch
add-cgroup-write_uint-helper-method.patch
task-containersv11-add-fork-exit-hooks.patch
task-containersv11-add-container_clone-interface.patch
task-containersv11-add-container_clone-interface-containers-fix-refcount-bug.patch
task-containersv11-add-procfs-interface.patch
task-containersv11-add-procfs-interface-containers-bdi-init-hooks.patch
task-containersv11-shared-container-subsystem-group-arrays.patch
task-containersv11-shared-container-subsystem-group-arrays-avoid-lockdep-warning.patch
task-containersv11-shared-container-subsystem-group-arrays-include-fix.patch
task-containersv11-automatic-userspace-notification-of-idle-containers.patch
task-containersv11-make-cpusets-a-client-of-containers.patch
task-containersv11-example-cpu-accounting-subsystem.patch
task-containersv11-simple-task-container-debug-info-subsystem.patch
task-containers-enable-containers-by-default-in-some-configs.patch
Merge
add-containerstats-v3.patch
add-containerstats-v3-fix.patch
Merge
containers-implement-namespace-tracking-subsystem.patch
containers-implement-namespace-tracking-subsystem-fix-order-of-container-subsystems-in-init-kconfig.patch
Merge
pid-namespaces-round-up-the-api.patch
pid-namespaces-make-get_pid_ns-return-the-namespace-itself.patch
pid-namespaces-dynamic-kmem-cache-allocator-for-pid-namespaces.patch
pid-namespaces-dynamic-kmem-cache-allocator-for-pid-namespaces-fix.patch
pid-namespaces-define-and-use-task_active_pid_ns-wrapper.patch
pid-namespaces-rename-child_reaper-function.patch
pid-namespaces-use-task_pid-to-find-leaders-pid.patch
pid-namespaces-define-is_global_init-and-is_container_init.patch
pid-namespaces-define-is_global_init-and-is_container_init-fix.patch
pid-namespaces-define-is_global_init-and-is_container_init-m32r-fix.patch
pid-namespaces-define-is_global_init-and-is_container_init-kernel-pidc-remove-unused-exports.patch
pid-namespaces-define-is_global_init-and-is_container_init-fix-capabilityc-to-work-with-threaded-init.patch
pid-namespaces-define-is_global_init-and-is_container_init-versus-x86_64-mm-i386-show-unhandled-signals-v3.patch
pid-namespaces-move-alloc_pid-to-copy_process.patch
Merge
make-access-to-tasks-nsproxy-lighter.patch
make-access-to-tasks-nsproxy-lighterpatch-breaks-unshare.patch
make-access-to-tasks-nsproxy-lighter-update-get_net_ns_by_pid.patch
Merge
workqueue-debug-flushing-deadlocks-with-lockdep.patch
workqueue-debug-work-related-deadlocks-with-lockdep.patch
Merge
fs-file_tablec-use-list_for_each_entry-instead-of-list_for_each.patch
fs-eventpollc-use-list_for_each_entry-instead-of-list_for_each.patch
fs-superc-use-list_for_each_entry-instead-of-list_for_each.patch
fs-superc-use-list_for_each_entry-instead-of-list_for_each-fix.patch
fs-locksc-use-list_for_each_entry-instead-of-list_for_each.patch
kernel-exitc-use-list_for_each_entry_safe-instead-of-list_for_each_safe.patch
kernel-time-clocksourcec-use-list_for_each_entry-instead-of-list_for_each.patch
mm-oom_killc-use-list_for_each_entry-instead-of-list_for_each.patch
Merge
whitespace-fixes-time-syscalls.patch
whitespace-fixes-process-accounting.patch
whitespace-fixes-cpuset.patch
whitespace-fixes-relayfs.patch
whitespace-fixes-audit-filtering.patch
whitespace-fixes-dma-channel-allocator.patch
whitespace-fixes-fork.patch
whitespace-fixes-module-loading.patch
whitespace-fixes-panic-handling.patch
whitespace-fixes-capability-syscalls.patch
whitespace-fixes-syscall-auditing.patch
whitespace-fixes-compat-syscalls.patch
whitespace-fixes-system-auditing.patch
whitespace-fixes-execution-domains.patch
whitespace-fixes-interval-timers.patch
whitespace-fixes-system-timers.patch
whitespace-fixes-task-exit-handling.patch
Merge
pid-namespaces-rework-forget_original_parent.patch
pid-namespaces-move-exit_task_namespaces.patch
pid-namespaces-introduce-ms_kernmount-flag.patch
pid-namespaces-prepare-proc_flust_task-to-flush-entries-from-multiple-proc-trees.patch
pid-namespaces-introduce-struct-upid.patch
pid-namespaces-add-support-for-pid-namespaces-hierarchy.patch
pid-namespaces-make-alloc_pid-free_pid-and-put_pid-work-with-struct-upid.patch
pid-namespaces-helpers-to-obtain-pid-numbers.patch
pid-namespaces-helpers-to-find-the-task-by-its-numerical-ids.patch
pid-namespaces-helpers-to-find-the-task-by-its-numerical-ids-fix.patch
pid-namespaces-move-alloc_pid-lower-in-copy_process.patch
pid-namespaces-make-proc-have-multiple-superblocks-one-for-each-namespace.patch
pid-namespaces-miscelaneous-preparations-for-pid-namespaces.patch
pid-namespaces-allow-cloning-of-new-namespace.patch
pid-namespaces-allow-cloning-of-new-namespace-fix-check-for-return-value-of-create_pid_namespace.patch
pid-namespaces-make-proc_flush_task-actually-from-entries-from-multiple-namespaces.patch
pid-namespaces-initialize-the-namespaces-proc_mnt.patch
pid-namespaces-create-a-slab-cache-for-struct-pid_namespace.patch
pid-namespaces-allow-signalling-container-init.patch
pid-namespaces-destroy-pid-namespace-on-inits-death.patch
pid-namespaces-changes-to-show-virtual-ids-to-user.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-fix-the-return-value-of-sys_set_tid_address.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-use-find_task_by_pid_ns-in-places-that-operate-with-virtual.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-use-find_task_by_pid_ns-in-places-that-operate-with-virtual-fix.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-use-find_task_by_pid_ns-in-places-that-operate-with-virtual-fix-2.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-use-find_task_by_pid_ns-in-places-that-operate-with-virtual-fix-3.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-sys_getsid-sys_getpgid-return-wrong-id-for-task-from-another.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-fix-the-sys_setpgrp-to-work-between-namespaces.patch
uninline-find_task_by_xxx-set-of-functions.patch
pid-namespaces-changes-to-show-virtual-ids-to-user-fix.patch
pid-namespaces-remove-the-struct-pid-unneeded-fields.patch
isolate-some-explicit-usage-of-task-tgid.patch
uninline-find_pid-etc-set-of-functions.patch
uninline-the-task_xid_nr_ns-calls.patch
Merge
memory-controller-add-documentation.patch
memory-controller-resource-counters-v7.patch
memory-controller-resource-counters-v7-fix.patch
memory-controller-containers-setup-v7.patch
memory-controller-accounting-setup-v7.patch
memory-controller-memory-accounting-v7.patch
memory-controller-memory-accounting-v7-fix.patch
memory-controller-memory-accounting-v7-fix-swapoff-breakage-however.patch
memory-controller-task-migration-v7.patch
memory-controller-add-per-container-lru-and-reclaim-v7.patch
memory-controller-add-per-container-lru-and-reclaim-v7-fix.patch
memory-controller-add-per-container-lru-and-reclaim-v7-fix-2.patch
memory-controller-add-per-container-lru-and-reclaim-v7-cleanup.patch
memory-controller-improve-user-interface.patch
memory-controller-oom-handling-v7.patch
memory-controller-oom-handling-v7-vs-oom-killer-stuff.patch
memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7.patch
memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7-cleanup.patch
memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7-fix-2.patch
memory-controller-make-page_referenced-container-aware-v7.patch
memory-controller-make-charging-gfp-mask-aware.patch
memory-controller-make-charging-gfp-mask-aware-fix.patch
memory-controller-bug_on.patch
mem-controller-gfp-mask-fix.patch
memcontrol-move-mm_cgroup-to-header-file.patch
memcontrol-move-oom-task-exclusion-to-tasklist.patch
memcontrol-move-oom-task-exclusion-to-tasklist-fix.patch
oom-add-sysctl-to-enable-task-memory-dump.patch
kswapd-should-only-wait-on-io-if-there-is-io.patch
Hold. This needs a serious going-over by page reclaim people.
the-next-round-of-scheduled-oss-code-removal.patch
char-moxa-fix-and-optimise-empty-timer.patch
char-cyclades-remove-bottom-half-processing.patch
char-cyclades-make-the-isr-code-readable.patch
char-cyclades-move-spin_lock-to-one-place.patch
char-cyclades-fix-some-w-warnings.patch
cyclades-avoid-label-defined-but-not-used-warning.patch
char-moxa-cleanup-prints.patch
char-moxa-function-names-cleanup.patch
char-moxa-remove-sleep_on.patch
add-missing-newlines-to-some-uses-of-dev_level-messages.patch
Merge
add-scaled-time-to-taskstats-based-process-accounting.patch
add-missing-newlines-to-some-uses-of-dev_level-messages-fix.patch
powerpc-add-scaled-time-accounting.patch
Merge
fs-select-remove-unused-macros.patch
remove-asm-bitopsh-includes.patch
forbid-asm-bitopsh-direct-inclusion.patch
cyber2000fb-rename-bit-macro.patch
cyber2000fb-checkpatch-fixes.patch
i2c-pxa-rename-bit-macro-to-pxa_bit.patch
s2io-rename-bit-macro.patch
amba-pl011-rename-bit-macro.patch
define-first-set-of-bit-macros.patch
get-rid-of-input-bit-duplicate-defines.patch
define-global-bit-macro.patch
flashpoint-use-bit-instead-of-bitw.patch
remove-bits_to_type-macro.patch
remove-bits_to_type-macro-fix.patch
Merge
proc-export-a-processes-resource-limits-via-proc-pid.patch
fix-tsk-exit_state-usage-resend.patch
isolate-the-explicit-usage-of-signal-pgrp.patch
use-helpers-to-obtain-task-pid-in-printks.patch
use-helpers-to-obtain-task-pid-in-printks-drm-fix.patch
use-helpers-to-obtain-task-pid-in-printks-arch-code.patch
remove-unused-variables-from-fs-proc-basec.patch
use-task_pid_nr-in-ip_vs_syncc.patch
Merge
redefine-unregister_hotcpu_notifier-hotplug_cpu-stubs.patch
x86-msr-driver-misc-cpuinit-annotations.patch
i386-cpuid-misc-cpuinit-annotations.patch
Send to Andi
hotplug-cpu-migrate-a-task-within-its-cpuset.patch
hotplug-cpu-migrate-a-task-within-its-cpuset-fix.patch
hotplug-cpu-migrate-a-task-within-its-cpuset-doc.patch
cpu-hotplug-avoid-hotadd-when-proper-possible_map-isnt-specified.patch
cpu-hotplug-avoid-hotadd-when-proper-possible_map-isnt-specified-checkpatch-fixes.patch
Merge
bitops-introduce-lock-ops.patch
alpha-fix-bitops.patch
alpha-lock-bitops.patch
alpha-lock-bitops-fix.patch
ia64-lock-bitops.patch
mips-fix-bitops.patch
mips-lock-bitops.patch
powerpc-lock-bitops.patch
powerpc-lock-bitops-fix.patch
bit_spin_lock-use-lock-bitops.patch
Merge
fs-cramfs-inodec-remove-unused-variable.patch
fs-cramfs-inodec-replace-hardcoded-value-with-preprocessor-constant.patch
Merge
ipc-store-ipcs-into-idrs.patch
ipc-unify-the-syscalls-code.patch
ipc-remove-the-ipc_get-routine.patch
ipc-integrate-ipc_checkid-into-ipc_lock.patch
ipc-integrate-ipc_checkid-into-ipc_lock-fix.patch
ipc-integrate-ipc_checkid-into-ipc_lock-fix-2.patch
ipc-integrate-ipc_checkid-into-ipc_lock-fix-3.patch
storing-ipcs-into-idrs.patch
ipc-introduce-the-ipcid_to_idx-macro.patch
ipc-inline-ipc_buildid.patch
ipc_fix_wrong_comments.patch
fix-idr_find-locking.patch
ipc-remove-unneeded-parameters.patch
Merge
extended-crashkernel-command-line.patch
extended-crashkernel-command-line-update.patch
extended-crashkernel-command-line-comment-fix.patch
extended-crashkernel-command-line-improve-error-handling-in-parse_crashkernel_mem.patch
use-extended-crashkernel-command-line-on-i386.patch
use-extended-crashkernel-command-line-on-i386-update.patch
use-extended-crashkernel-command-line-on-x86_64.patch
use-extended-crashkernel-command-line-on-x86_64-update.patch
use-extended-crashkernel-command-line-on-ia64.patch
use-extended-crashkernel-command-line-on-ia64-fix.patch
use-extended-crashkernel-command-line-on-ia64-update.patch
use-extended-crashkernel-command-line-on-ppc64.patch
use-extended-crashkernel-command-line-on-ppc64-update.patch
use-extended-crashkernel-command-line-on-sh.patch
use-extended-crashkernel-command-line-on-sh-update.patch
add-documentation-for-extended-crashkernel-syntax.patch
add-documentation-for-extended-crashkernel-syntax-add-extended-crashkernel-syntax-to-kernel-parameterstxt.patch
Merge
cleanup-macros-for-distinguishing-mandatory-locks.patch
gfs2-cleanup-explicit-check-for-mandatory-locks.patch
9pfs-cleanup-explicit-check-for-mandatory-locks.patch
afs-cleanup-explicit-check-for-mandatory-locks.patch
nfs-cleanup-explicit-check-for-mandatory-locks.patch
rework-proc-locks-via-seq_files-and-seq_list-helpers.patch
rework-proc-locks-via-seq_files-and-seq_list-helpers-fix.patch
rework-proc-locks-via-seq_files-and-seq_list-helpers-fix-2.patch
Will either merge or will send to bfields (part of my cunning plan to make
him the locks.c maintainer)
exportfs-add-fid-type.patch
exportfs-add-new-methods.patch
ext2-new-export-ops.patch
ext3-new-export-ops.patch
ext4-new-export-ops.patch
efs-new-export-ops.patch
jfs-new-export-ops.patch
ntfs-new-export-ops.patch
xfs-new-export-ops.patch
fat-new-export-ops.patch
isofs-new-export-ops.patch
shmem-new-export-ops.patch
reiserfs-new-export-ops.patch
gfs2-new-export-ops.patch
ocfs2-new-export-ops.patch
exportfs-remove-old-methods.patch
exportfs-make-struct-export_operations-const.patch
exportfs-update-documentation.patch
Merge
usb_serial-stop-passing-null-to-functions-that-expect-data.patch
ark3116-update-termios-handling.patch
usb-serial-kill-another-case-we-pass-null-and-shouldnt.patch
ch341-fix-termios-handling.patch
digi_acceleport-fix-termios-and-also-readability-a-bit.patch
empeg-clean-up-and-handle-speeds.patch
funsoft-fix-termios.patch
ir_usb-termios-handling.patch
keyspan-termios-tidy.patch
kobil_sct-termios-encoding-fixups.patch
option-termios-handling.patch
sierra-termios.patch
usb-serial-handle-null-termios-methods-as-no-hardware-changing-support.patch
visor-termios-bits.patch
These depend on
tty-expose-new-methods-needed-for-drivers-to-get-termios.patch. Once that
is in mainline, these patches go to Greg for the USB tree.
hook-up-group-scheduler-with-control-groups.patch
hook-up-group-scheduler-with-control-groups-fix.patch
Merge
combine-instrumentation-menus-in-kernel-kconfiginstrumentation.patch
linux-kernel-markers.patch
linux-kernel-markers-checkpatch-fixes.patch
add-samples-subdir.patch
linux-kernel-markers-samples.patch
linux-kernel-markers-samples-checkpatch-fixes.patch
linux-kernel-markers-documentation.patch
Merge
smack-simplified-mandatory-access-control-kernel.patch
Still needs some fixups but it's looking like a merge.
reiser4.patch
Hold.
make-sure-nobodys-leaking-resources.patch
journal_add_journal_head-debug.patch
page-owner-tracking-leak-detector.patch
releasing-resources-with-children.patch
nr_blockdev_pages-in_interrupt-warning.patch
detect-atomic-counter-underflows.patch
device-suspend-debug.patch
#slab-cache-shrinker-statistics.patch
mm-debug-dump-pageframes-on-bad_page.patch
make-frame_pointer-default=y.patch
mutex-subsystem-synchro-test-module.patch
slab-leaks3-default-y.patch
profile-likely-unlikely-macros.patch
profile-likely-unlikely-macros-fix.patch
put_bh-debug.patch
lockdep-show-held-locks-when-showing-a-stackdump.patch
add-debugging-aid-for-memory-initialisation-problems.patch
kmap_atomic-debugging.patch
shrink_slab-handle-bad-shrinkers.patch
keep-track-of-network-interface-renaming.patch
workaround-for-a-pci-restoring-bug.patch
prio_tree-debugging-patch.patch
check_dirty_inode_list.patch
single_open-seq_release-leak-diagnostics.patch
add-a-refcount-check-in-dput.patch
w1-build-fix.patch
These are -mm-only patches.
^ permalink raw reply [flat|nested] 112+ messages in thread* wibbling over the cpuset shed domain connnection 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton @ 2007-10-01 21:34 ` Paul Jackson 2007-10-02 12:36 ` Nick Piggin 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh ` (12 subsequent siblings) 13 siblings, 1 reply; 112+ messages in thread From: Paul Jackson @ 2007-10-01 21:34 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Nick Piggin In -mm merge plans for 2.6.24, Andrew wrote: > cpuset-remove-sched-domain-hooks-from-cpusets.patch > > Paul continues to wibble over this. Hold, I guess. Oh dear ... after looking at the following to figure out what a wibble is, I wonder which one Andrew had in mind: http://www.urbandictionary.com/define.php?term=wibble The insanity, the rubbish, being overwhelmed, ... ? <grin> If one of Nick or I can knock some sense into the others head, then this saga should come to a close soon. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: wibbling over the cpuset shed domain connnection 2007-10-01 21:34 ` wibbling over the cpuset shed domain connnection Paul Jackson @ 2007-10-02 12:36 ` Nick Piggin 2007-10-03 5:21 ` Paul Jackson 0 siblings, 1 reply; 112+ messages in thread From: Nick Piggin @ 2007-10-02 12:36 UTC (permalink / raw) To: Paul Jackson; +Cc: Andrew Morton, linux-kernel On Tuesday 02 October 2007 07:34, Paul Jackson wrote: > In -mm merge plans for 2.6.24, Andrew wrote: > > cpuset-remove-sched-domain-hooks-from-cpusets.patch > > > > Paul continues to wibble over this. Hold, I guess. > > Oh dear ... after looking at the following to figure out what > a wibble is, I wonder which one Andrew had in mind: > > http://www.urbandictionary.com/define.php?term=wibble > > The insanity, the rubbish, being overwhelmed, ... ? > > <grin> > > If one of Nick or I can knock some sense into the others head, > then this saga should come to a close soon. In the meantime, that patch should be merged though, shouldn't it? cpusets is currently telling the scheduler to do the wrong thing WRT the user interface definition of cpusets, right? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: wibbling over the cpuset shed domain connnection 2007-10-02 12:36 ` Nick Piggin @ 2007-10-03 5:21 ` Paul Jackson 2007-10-02 13:12 ` Nick Piggin 0 siblings, 1 reply; 112+ messages in thread From: Paul Jackson @ 2007-10-03 5:21 UTC (permalink / raw) To: Nick Piggin; +Cc: akpm, linux-kernel > In the meantime, that patch should be merged though, shouldn't it? Which patch do you refer to: 1) the year old patch to disconnect cpusets and sched domains: cpuset-remove-sched-domain-hooks-from-cpusets.patch 2) my patch of a few days ago to add a 'sched_load_balance' flag: cpuset and sched domains: sched_load_balance flag I can't push one without the other, because some real time folks are depending on the sched domain hooks that (1) would remove, so need some alternative, such as in (2). Even though (1) is rather broken, as you note, it still provides a way that the real time folks can disable load balancing at runtime on selected CPUs, so is essential to their work. I can't delay any more resolving this, because the cgroup (aka container) code is tangled up with (1), and Andrew needs a clear path to send cgroups to Linus real soon now. In my last message to you, a couple of days ago, I asked what I thought were a couple of key and simple questions -- can sched domains overlap, and what does it mean for user space if they overlap? A further question comes to mind now -- if sched domains can overlap, does this provide some capability to user space that is important to provide? Could you take a minute, Nick, to consider these questions? Thanks. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: wibbling over the cpuset shed domain connnection 2007-10-03 5:21 ` Paul Jackson @ 2007-10-02 13:12 ` Nick Piggin 2007-10-03 7:00 ` Paul Jackson 0 siblings, 1 reply; 112+ messages in thread From: Nick Piggin @ 2007-10-02 13:12 UTC (permalink / raw) To: Paul Jackson; +Cc: akpm, linux-kernel On Wednesday 03 October 2007 15:21, Paul Jackson wrote: > > In the meantime, that patch should be merged though, shouldn't it? > > Which patch do you refer to: > 1) the year old patch to disconnect cpusets and sched domains: > cpuset-remove-sched-domain-hooks-from-cpusets.patch > 2) my patch of a few days ago to add a 'sched_load_balance' flag: > cpuset and sched domains: sched_load_balance flag The one quoted, of course. > I can't push one without the other, because some real time folks are > depending on the sched domain hooks that (1) would remove, so need some > alternative, such as in (2). Even though (1) is rather broken, as you > note, it still provides a way that the real time folks can disable load > balancing at runtime on selected CPUs, so is essential to their work. OK. > I can't delay any more resolving this, because the cgroup (aka > container) code is tangled up with (1), and Andrew needs a clear path > to send cgroups to Linus real soon now. If code isn't ready to go, it doesn't need to rush, it can just be untangled or fixed properly etc. > In my last message to you, a couple of days ago, I asked what I thought > were a couple of key and simple questions -- can sched domains overlap, > and what does it mean for user space if they overlap? A further > question comes to mind now -- if sched domains can overlap, does this > provide some capability to user space that is important to provide? > > Could you take a minute, Nick, to consider these questions? Thanks. Yeah, it arrived after I had a 24 hour flight. I just see it now. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: wibbling over the cpuset shed domain connnection 2007-10-02 13:12 ` Nick Piggin @ 2007-10-03 7:00 ` Paul Jackson 2007-10-03 10:57 ` Andrew Morton 0 siblings, 1 reply; 112+ messages in thread From: Paul Jackson @ 2007-10-03 7:00 UTC (permalink / raw) To: Nick Piggin; +Cc: akpm, linux-kernel Nick wrote: > If code isn't ready to go, it doesn't need to rush, it can just be untangled > or fixed properly etc. True ... though we seem to be going in circles now. I doubt taking longer will help much; we should strive to resolve this now, if we can. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: wibbling over the cpuset shed domain connnection 2007-10-03 7:00 ` Paul Jackson @ 2007-10-03 10:57 ` Andrew Morton 0 siblings, 0 replies; 112+ messages in thread From: Andrew Morton @ 2007-10-03 10:57 UTC (permalink / raw) To: Paul Jackson; +Cc: Nick Piggin, linux-kernel On Wed, 3 Oct 2007 00:00:58 -0700 Paul Jackson <pj@sgi.com> wrote: > Nick wrote: > > If code isn't ready to go, it doesn't need to rush, it can just be untangled > > or fixed properly etc. It's close enough for an rc1. > True ... though we seem to be going in circles now. I doubt > taking longer will help much; we should strive to resolve this > now, if we can. > Please, work out what you want to do from a design perspective, then cook up a patch against rc8-mm2. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton 2007-10-01 21:34 ` wibbling over the cpuset shed domain connnection Paul Jackson @ 2007-10-02 4:21 ` Balbir Singh 2007-10-02 15:46 ` Hugh Dickins 2007-10-10 21:07 ` Rik van Riel 2007-10-02 6:18 ` x86 patches was Re: -mm merge plans for 2.6.24 Andi Kleen ` (11 subsequent siblings) 13 siblings, 2 replies; 112+ messages in thread From: Balbir Singh @ 2007-10-02 4:21 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Linux Memory Management List Andrew Morton wrote: > memory-controller-add-documentation.patch > memory-controller-resource-counters-v7.patch > memory-controller-resource-counters-v7-fix.patch > memory-controller-containers-setup-v7.patch > memory-controller-accounting-setup-v7.patch > memory-controller-memory-accounting-v7.patch > memory-controller-memory-accounting-v7-fix.patch > memory-controller-memory-accounting-v7-fix-swapoff-breakage-however.patch > memory-controller-task-migration-v7.patch > memory-controller-add-per-container-lru-and-reclaim-v7.patch > memory-controller-add-per-container-lru-and-reclaim-v7-fix.patch > memory-controller-add-per-container-lru-and-reclaim-v7-fix-2.patch > memory-controller-add-per-container-lru-and-reclaim-v7-cleanup.patch > memory-controller-improve-user-interface.patch > memory-controller-oom-handling-v7.patch > memory-controller-oom-handling-v7-vs-oom-killer-stuff.patch > memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7.patch > memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7-cleanup.patch > memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7-fix-2.patch > memory-controller-make-page_referenced-container-aware-v7.patch > memory-controller-make-charging-gfp-mask-aware.patch > memory-controller-make-charging-gfp-mask-aware-fix.patch > memory-controller-bug_on.patch > mem-controller-gfp-mask-fix.patch > memcontrol-move-mm_cgroup-to-header-file.patch > memcontrol-move-oom-task-exclusion-to-tasklist.patch > memcontrol-move-oom-task-exclusion-to-tasklist-fix.patch > oom-add-sysctl-to-enable-task-memory-dump.patch > kswapd-should-only-wait-on-io-if-there-is-io.patch > > Hold. This needs a serious going-over by page reclaim people. > Hi, Andrew, I mostly agree with your decision. I am a little concerned however that as we develop and add more features (a.k.a better statistics/ forced reclaim), which are very important; the code base gets larger, the review takes longer :) I was hopeful of getting the bare minimal infrastructure for memory control in mainline, so that review is easy and additional changes can be well reviewed as well. Here are the pros and cons of merging the memory controller Pros 1. Smaller size, easy to review and merge 2. Incremental development, makes it easier to maintain the code Cons 1. Needs more review like you said 2. Although the UI is stable, it's a good chance to review it once more before merging the code into mainline Having said that, I'll continue testing the patches and make the solution more complete and usable. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh @ 2007-10-02 15:46 ` Hugh Dickins 2007-10-03 8:13 ` Balbir Singh 2007-10-04 16:10 ` Paul Menage 2007-10-10 21:07 ` Rik van Riel 1 sibling, 2 replies; 112+ messages in thread From: Hugh Dickins @ 2007-10-02 15:46 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Tue, 2 Oct 2007, Balbir Singh wrote: > Andrew Morton wrote: > > memory-controller-add-documentation.patch > > ... > > kswapd-should-only-wait-on-io-if-there-is-io.patch > > > > Hold. This needs a serious going-over by page reclaim people. > > I mostly agree with your decision. I am a little concerned however > that as we develop and add more features (a.k.a better statistics/ > forced reclaim), which are very important; the code base gets larger, > the review takes longer :) I agree with putting the memory controller stuff on hold from 2.6.24. Sorry, Balbir, I've failed to get back to you, still attending to priorities. Let me briefly summarize my issue with the mem controller: you've not yet given enough attention to swap. I accept that full swap control is something you're intending to add incrementally later; but the current state doesn't make sense to me. The problems are swapoff and swapin readahead. These pull pages into the swap cache, which are assigned to the cgroup (or the whatever-we- call-the-remainder-outside-all-the-cgroups) which is running swapoff or faulting in its own page; yet they very clearly don't (in general) belong to that cgroup, but to other cgroups which will be discovered later. I did try removing the cgroup mods to mm/swap_state.c, so swap pages get assigned to a cgroup only once it's really known; but that's not enough by itself, because cgroup RSS reclaim doesn't touch those pages, so the cgroup can easily OOM much too soon. I was thinking that you need a "limbo" cgroup for these pages, which can be attacked for reclaim along with any cgroup being reclaimed, but from which pages are readily migrated to their real cgroup once that's known. But I had to switch over to other work before trying that out: perhaps the idea doesn't really fly at all. And it might well be no longer needed once full mem+swap control is there. So in the current memory controller, that unuse_pte mem charge I was originally worried about failing (I hadn't at that point delved in to see how it tries to reclaim) actually never fails (and never does anything): the page is already assigned to some cgroup-or- whatever and is never charged to vma->vm_mm at that point. And small point: once that is sorted out and the page is properly assigned in unuse_pte, you'll be needing to pte_unmap_unlock and pte_offset_map_lock around the mem_cgroup_charge call there - you're right to call it with GFP_KERNEL, but cannot do so while holding the page table locked and mapped. (But because the page lock is held, there shouldn't be any raciness to dropping and retaking the ptl.) Hugh ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 15:46 ` Hugh Dickins @ 2007-10-03 8:13 ` Balbir Singh 2007-10-03 18:47 ` Hugh Dickins 2007-10-04 16:10 ` Paul Menage 1 sibling, 1 reply; 112+ messages in thread From: Balbir Singh @ 2007-10-03 8:13 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Tue, 2 Oct 2007, Balbir Singh wrote: >> Andrew Morton wrote: >>> memory-controller-add-documentation.patch >>> ... >>> kswapd-should-only-wait-on-io-if-there-is-io.patch >>> >>> Hold. This needs a serious going-over by page reclaim people. >> I mostly agree with your decision. I am a little concerned however >> that as we develop and add more features (a.k.a better statistics/ >> forced reclaim), which are very important; the code base gets larger, >> the review takes longer :) > > I agree with putting the memory controller stuff on hold from 2.6.24. > > Sorry, Balbir, I've failed to get back to you, still attending to > priorities. Let me briefly summarize my issue with the mem controller: > you've not yet given enough attention to swap. > I am open to suggestions and ways and means of making swap control complete and more usable. > I accept that full swap control is something you're intending to add > incrementally later; but the current state doesn't make sense to me. > > The problems are swapoff and swapin readahead. These pull pages into > the swap cache, which are assigned to the cgroup (or the whatever-we- > call-the-remainder-outside-all-the-cgroups) which is running swapoff > or faulting in its own page; yet they very clearly don't (in general) > belong to that cgroup, but to other cgroups which will be discovered > later. > I understand what your trying to say, but with several approaches that we tried in the past, we found caches the hardest to most accurately account. IIRC, with readahead, we don't even know if all the pages readahead will be used, that's why we charge everything to the cgroup that added the page to the cache. > I did try removing the cgroup mods to mm/swap_state.c, so swap pages > get assigned to a cgroup only once it's really known; but that's not > enough by itself, because cgroup RSS reclaim doesn't touch those > pages, so the cgroup can easily OOM much too soon. I was thinking > that you need a "limbo" cgroup for these pages, which can be attacked > for reclaim along with any cgroup being reclaimed, but from which > pages are readily migrated to their real cgroup once that's known. > Is migrating the charge to the real cgroup really required? > But I had to switch over to other work before trying that out: > perhaps the idea doesn't really fly at all. And it might well > be no longer needed once full mem+swap control is there. > > So in the current memory controller, that unuse_pte mem charge I was > originally worried about failing (I hadn't at that point delved in > to see how it tries to reclaim) actually never fails (and never > does anything): the page is already assigned to some cgroup-or- > whatever and is never charged to vma->vm_mm at that point. > Excellent! > And small point: once that is sorted out and the page is properly > assigned in unuse_pte, you'll be needing to pte_unmap_unlock and > pte_offset_map_lock around the mem_cgroup_charge call there - > you're right to call it with GFP_KERNEL, but cannot do so while > holding the page table locked and mapped. (But because the page > lock is held, there shouldn't be any raciness to dropping and > retaking the ptl.) > Good catch! I'll fix that. > Hugh -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-03 8:13 ` Balbir Singh @ 2007-10-03 18:47 ` Hugh Dickins 2007-10-04 4:16 ` Balbir Singh 0 siblings, 1 reply; 112+ messages in thread From: Hugh Dickins @ 2007-10-03 18:47 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Wed, 3 Oct 2007, Balbir Singh wrote: > Hugh Dickins wrote: > > > > Sorry, Balbir, I've failed to get back to you, still attending to > > priorities. Let me briefly summarize my issue with the mem controller: > > you've not yet given enough attention to swap. > > I am open to suggestions and ways and means of making swap control > complete and more usable. Well, swap control is another subject. I guess for that you'll need to track which cgroup each swap page belongs to (rather more expensive than the current swap_map of unsigned shorts). And I doubt it'll be swap control as such that's required, but control of rss+swap. But here I'm just worrying about how the existence of swap makes something of a nonsense of your rss control. > > I accept that full swap control is something you're intending to add > > incrementally later; but the current state doesn't make sense to me. > > > > The problems are swapoff and swapin readahead. These pull pages into > > the swap cache, which are assigned to the cgroup (or the whatever-we- > > call-the-remainder-outside-all-the-cgroups) which is running swapoff ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I'd appreciate it if you'd teach me the right name for that! > > or faulting in its own page; yet they very clearly don't (in general) > > belong to that cgroup, but to other cgroups which will be discovered > > later. > > I understand what your trying to say, but with several approaches that > we tried in the past, we found caches the hardest to most accurately > account. IIRC, with readahead, we don't even know if all the pages > readahead will be used, that's why we charge everything to the cgroup > that added the page to the cache. Yes, readahead is anyway problematic. My guess is that in the file cache case, you'll tend not to go too far wrong by charging to the one that added - though we're all aware that's fairly unsatisfactory. My point is that in the swap cache case, it's badly wrong: there's no page more obviously owned by a cgroup than its anonymous pages (forgetting for a moment that minority shared between cgroups until copy-on-write), so it's very wrong for swapin readahead or swapoff to go charging those to another or to no cgroup. Imagine a cgroup at its rss limit, with more out on swap. Then another cgroup does some swap readahead, bringing pages private to the first into cache. Or runs swapoff which actually plugs them into the rss of the first cgroup, so it goes over limit. Those are pages we'd want to swap out when the first cgroup faults to go further over its limit; but they're now not even identified as belonging to the right cgroup, so won't be found. > > I did try removing the cgroup mods to mm/swap_state.c, so swap pages > > get assigned to a cgroup only once it's really known; but that's not > > enough by itself, because cgroup RSS reclaim doesn't touch those > > pages, so the cgroup can easily OOM much too soon. I was thinking > > that you need a "limbo" cgroup for these pages, which can be attacked > > for reclaim along with any cgroup being reclaimed, but from which > > pages are readily migrated to their real cgroup once that's known. > > > > Is migrating the charge to the real cgroup really required? My answer is definitely yes. I'm not suggesting that you need general migration between cgroups at this stage (something for later quite likely); but I am suggesting you need one pseudo-cgroup to hold these cases temporarily, and that you cannot properly track rss without it (if there is any swap). > > But I had to switch over to other work before trying that out: > > perhaps the idea doesn't really fly at all. And it might well > > be no longer needed once full mem+swap control is there. > > > > So in the current memory controller, that unuse_pte mem charge I was > > originally worried about failing (I hadn't at that point delved in > > to see how it tries to reclaim) actually never fails (and never > > does anything): the page is already assigned to some cgroup-or- > > whatever and is never charged to vma->vm_mm at that point. > > > > Excellent! Umm, please explain what's excellent about that. Hugh ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-03 18:47 ` Hugh Dickins @ 2007-10-04 4:16 ` Balbir Singh 2007-10-04 13:16 ` Hugh Dickins 0 siblings, 1 reply; 112+ messages in thread From: Balbir Singh @ 2007-10-04 4:16 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Wed, 3 Oct 2007, Balbir Singh wrote: >> Hugh Dickins wrote: >>> Sorry, Balbir, I've failed to get back to you, still attending to >>> priorities. Let me briefly summarize my issue with the mem controller: >>> you've not yet given enough attention to swap. >> I am open to suggestions and ways and means of making swap control >> complete and more usable. > > Well, swap control is another subject. I guess for that you'll need > to track which cgroup each swap page belongs to (rather more expensive > than the current swap_map of unsigned shorts). And I doubt it'll be > swap control as such that's required, but control of rss+swap. > I see what you mean now, other people have recommending a per cgroup swap file/device. > But here I'm just worrying about how the existence of swap makes > something of a nonsense of your rss control. > Ideally, pages would not reside for too long in swap cache (unless I've misunderstood swap cache or there are special cases for tmpfs/ ramfs). Once pages have been swapped back in, they get assigned back to their respective cgroup's in do_swap_page() (where we charge them back to the cgroup). The swap cache pages will be the first ones to go, once the cgroup exceeds its limit. There might be gaps in my understanding or I might be missing a use case scenario, where things work differently. >>> I accept that full swap control is something you're intending to add >>> incrementally later; but the current state doesn't make sense to me. >>> >>> The problems are swapoff and swapin readahead. These pull pages into >>> the swap cache, which are assigned to the cgroup (or the whatever-we- >>> call-the-remainder-outside-all-the-cgroups) which is running swapoff > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > I'd appreciate it if you'd teach me the right name for that! > In the past people have used names like default cgroup, we could use the root cgroup as the default cgroup. >>> or faulting in its own page; yet they very clearly don't (in general) >>> belong to that cgroup, but to other cgroups which will be discovered >>> later. >> I understand what your trying to say, but with several approaches that >> we tried in the past, we found caches the hardest to most accurately >> account. IIRC, with readahead, we don't even know if all the pages >> readahead will be used, that's why we charge everything to the cgroup >> that added the page to the cache. > > Yes, readahead is anyway problematic. My guess is that in the file > cache case, you'll tend not to go too far wrong by charging to the > one that added - though we're all aware that's fairly unsatisfactory. > > My point is that in the swap cache case, it's badly wrong: there's > no page more obviously owned by a cgroup than its anonymous pages > (forgetting for a moment that minority shared between cgroups > until copy-on-write), so it's very wrong for swapin readahead > or swapoff to go charging those to another or to no cgroup. > > Imagine a cgroup at its rss limit, with more out on swap. Then > another cgroup does some swap readahead, bringing pages private > to the first into cache. Or runs swapoff which actually plugs > them into the rss of the first cgroup, so it goes over limit. > > Those are pages we'd want to swap out when the first cgroup > faults to go further over its limit; but they're now not even > identified as belonging to the right cgroup, so won't be found. > Won't the right cgroup assignment happen as discussed above? >>> I did try removing the cgroup mods to mm/swap_state.c, so swap pages >>> get assigned to a cgroup only once it's really known; but that's not >>> enough by itself, because cgroup RSS reclaim doesn't touch those >>> pages, so the cgroup can easily OOM much too soon. I was thinking >>> that you need a "limbo" cgroup for these pages, which can be attacked >>> for reclaim along with any cgroup being reclaimed, but from which >>> pages are readily migrated to their real cgroup once that's known. >>> >> Is migrating the charge to the real cgroup really required? > > My answer is definitely yes. I'm not suggesting that you need > general migration between cgroups at this stage (something for > later quite likely); but I am suggesting you need one pseudo-cgroup > to hold these cases temporarily, and that you cannot properly track > rss without it (if there is any swap). > If what I understand and discussed earlier is, then we don't need to go this route. But I think the idea of having a pseduo cgroup is interesting (needs more thought). >>> So in the current memory controller, that unuse_pte mem charge I was >>> originally worried about failing (I hadn't at that point delved in >>> to see how it tries to reclaim) actually never fails (and never >>> does anything): the page is already assigned to some cgroup-or- >>> whatever and is never charged to vma->vm_mm at that point. >>> >> Excellent! > > Umm, please explain what's excellent about that. > Nothing really, I was glad that we dont fail, even though we might assign pages to some other cgroup. Not really exciting, but not failing was a relief :-) In summary, there's nothing excellent about it. > Hugh -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-04 4:16 ` Balbir Singh @ 2007-10-04 13:16 ` Hugh Dickins 2007-10-05 3:07 ` Balbir Singh 0 siblings, 1 reply; 112+ messages in thread From: Hugh Dickins @ 2007-10-04 13:16 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Thu, 4 Oct 2007, Balbir Singh wrote: > Hugh Dickins wrote: > > Well, swap control is another subject. I guess for that you'll need > > to track which cgroup each swap page belongs to (rather more expensive > > than the current swap_map of unsigned shorts). And I doubt it'll be > > swap control as such that's required, but control of rss+swap. > > I see what you mean now, other people have recommending a per cgroup > swap file/device. Sounds too inflexible, and too many swap areas to me. Perhaps the right answer will fall in between: assign clusters of swap pages to different cgroups as needed. But worry about that some other time. > > > But here I'm just worrying about how the existence of swap makes > > something of a nonsense of your rss control. > > > > Ideally, pages would not reside for too long in swap cache (unless Thinking particularly of those brought in by swapoff or swap readahead: some will get attached to mms once accessed, others will simply get freed when tasks exit or munmap, others will hang around until they reach the bottom of the LRU and are reclaimed again by memory pressure. But as your code stands, that'll be total memory pressure: in-cgroup memory pressure will tend to miss them, since typically they're assigned to the wrong cgroup; until then their presence is liable to cause other pages to be reclaimed which ideally should not be. > I've misunderstood swap cache or there are special cases for tmpfs/ > ramfs). ramfs pages are always in RAM, never go out to swap, no need to worry about them in this regard. But tmpfs pages can indeed go out to swap, so whatever we come up with needs to make sense with them too, yes. I don't think its swapoff/readahead issues are any harder to handle than the anonymous mapped page case, but it will need its own code to handle them. > Once pages have been swapped back in, they get assigned > back to their respective cgroup's in do_swap_page() (where we charge > them back to the cgroup). > That's where it should happen, yes; but my point is that it very often does not. Because the swap cache page (read in as part of the readaround cluster of some other cgroup, or in swapoff by some other cgroup) is already assigned to that other cgroup (by the mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The page_cgroup exists and the page has already been accounted" route when mem_cgroup_charge is called from do_swap_page. Doesn't it? Are we misunderstanding each other, because I'm assuming MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? though I can't see that _MAPPED and _CACHED are actually supported, there being no reference to them outside the enum that defines them. Or are you deceived by that ifdef NUMA code in swapin_readahead, which propagates the fantasy that swap allocation follows vma layout? That nonsense has been around too long, I'll soon be sending a patch to remove it. > The swap cache pages will be the first ones to go, once the cgroup > exceeds its limit. No, because they're (in general) booked to the wrong cgroup. > > There might be gaps in my understanding or I might be missing a use > case scenario, where things work differently. > > >>> I accept that full swap control is something you're intending to add > >>> incrementally later; but the current state doesn't make sense to me. > >>> > >>> The problems are swapoff and swapin readahead. These pull pages into > >>> the swap cache, which are assigned to the cgroup (or the whatever-we- > >>> call-the-remainder-outside-all-the-cgroups) which is running swapoff > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > I'd appreciate it if you'd teach me the right name for that! > > > > In the past people have used names like default cgroup, we could use > the root cgroup as the default cgroup. Okay, thanks. Hugh ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-04 13:16 ` Hugh Dickins @ 2007-10-05 3:07 ` Balbir Singh 2007-10-07 17:41 ` Hugh Dickins 0 siblings, 1 reply; 112+ messages in thread From: Balbir Singh @ 2007-10-05 3:07 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Thu, 4 Oct 2007, Balbir Singh wrote: >> Hugh Dickins wrote: >>> Well, swap control is another subject. I guess for that you'll need >>> to track which cgroup each swap page belongs to (rather more expensive >>> than the current swap_map of unsigned shorts). And I doubt it'll be >>> swap control as such that's required, but control of rss+swap. >> I see what you mean now, other people have recommending a per cgroup >> swap file/device. > > Sounds too inflexible, and too many swap areas to me. Perhaps the > right answer will fall in between: assign clusters of swap pages to > different cgroups as needed. But worry about that some other time. > Yes, depending on the number of cgroups, we'll need to share swap areas between them. It requires more work and thought process. >>> But here I'm just worrying about how the existence of swap makes >>> something of a nonsense of your rss control. >>> >> Ideally, pages would not reside for too long in swap cache (unless > > Thinking particularly of those brought in by swapoff or swap readahead: > some will get attached to mms once accessed, others will simply get > freed when tasks exit or munmap, others will hang around until they > reach the bottom of the LRU and are reclaimed again by memory pressure. > > But as your code stands, that'll be total memory pressure: in-cgroup > memory pressure will tend to miss them, since typically they're > assigned to the wrong cgroup; until then their presence is liable > to cause other pages to be reclaimed which ideally should not be. > in-cgroup pressure will not affect them, since they are in different cgroups. If there is pressure in the cgroup to which they are wrongly assigned, they would get reclaimed first. >> I've misunderstood swap cache or there are special cases for tmpfs/ >> ramfs). > > ramfs pages are always in RAM, never go out to swap, no need to > worry about them in this regard. But tmpfs pages can indeed go > out to swap, so whatever we come up with needs to make sense > with them too, yes. I don't think its swapoff/readahead issues > are any harder to handle than the anonymous mapped page case, > but it will need its own code to handle them. > >> Once pages have been swapped back in, they get assigned >> back to their respective cgroup's in do_swap_page() (where we charge >> them back to the cgroup). >> > > That's where it should happen, yes; but my point is that it very > often does not. Because the swap cache page (read in as part of > the readaround cluster of some other cgroup, or in swapoff by some > other cgroup) is already assigned to that other cgroup (by the > mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The > page_cgroup exists and the page has already been accounted" route > when mem_cgroup_charge is called from do_swap_page. Doesn't it? > You are right, at this point I am beginning to wonder if I should account for the swap cache at all? We account for the pages in RSS and when the page comes back into the page table(s) via do_swap_page. If we believe that the swap cache is transitional and the current expected working behaviour does not seem right or hard to fix, it might be easy to ignore unuse_pte() and add/remove_from_swap_cache() for accounting and control. The expected working behaviour of the memory controller is that currently, as you point out several pages get accounted to the cgroup that initiates swapin readahead or swapoff. On cgroup pressure (the one that initiated swapin or swapoff), the cgroup would discard these pages first. These pages are discarded from the cgroup, but still live on the global LRU. When the original cgroup is under pressure, these pages might not be effected as they belong to a different cgroup, which might not be under any sort of pressure. > Are we misunderstanding each other, because I'm assuming > MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? > though I can't see that _MAPPED and _CACHED are actually supported, > there being no reference to them outside the enum that defines them. > I am also assuming MEM_CGROUP_TYPE_ALL for the purpose of our discussion. The accounting is split into mem_cgroup_charge() and mem_cgroup_cache_charge(). While charging the caches is when we check for the control_type. > Or are you deceived by that ifdef NUMA code in swapin_readahead, > which propagates the fantasy that swap allocation follows vma layout? > That nonsense has been around too long, I'll soon be sending a patch > to remove it. > The swapin readahead code under #ifdef NUMA is very confusing. I also noticed another confusing thing during my test, swap cache does not drop to 0, even though I've disabled all swap using swapoff. May be those are tmpfs pages. The other interesting thing I tried was running swapoff after a cgroup went over it's limit, the swapoff succeeded, but I see strange numbers for free swap. I'll start another thread after investigating a bit more. >> The swap cache pages will be the first ones to go, once the cgroup >> exceeds its limit. > > No, because they're (in general) booked to the wrong cgroup. > I meant for the wrong cgroup, in the wrong cgroup, these will be the first set of pages to be reclaimed. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-05 3:07 ` Balbir Singh @ 2007-10-07 17:41 ` Hugh Dickins 2007-10-08 2:54 ` Balbir Singh 0 siblings, 1 reply; 112+ messages in thread From: Hugh Dickins @ 2007-10-07 17:41 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Fri, 5 Oct 2007, Balbir Singh wrote: > Hugh Dickins wrote: > > > > That's where it should happen, yes; but my point is that it very > > often does not. Because the swap cache page (read in as part of > > the readaround cluster of some other cgroup, or in swapoff by some > > other cgroup) is already assigned to that other cgroup (by the > > mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The > > page_cgroup exists and the page has already been accounted" route > > when mem_cgroup_charge is called from do_swap_page. Doesn't it? > > > > You are right, at this point I am beginning to wonder if I should > account for the swap cache at all? We account for the pages in RSS > and when the page comes back into the page table(s) via do_swap_page. > If we believe that the swap cache is transitional and the current > expected working behaviour does not seem right or hard to fix, > it might be easy to ignore unuse_pte() and add/remove_from_swap_cache() > for accounting and control. It would be wrong to ignore the unuse_pte() case: what it's intending to do is correct, it's just being prevented by the swapcache issue from doing what it intends at present. (Though I'm not thrilled with the idea of it causing an admin's swapoff to fail because of a cgroup reaching mem limit there, I do agree with your earlier argument that that's the right thing to happen, and it's up to the admin to fix things up - my original objection came from not realizing that normally the cgroup will reclaim from itself to free its mem. Hmm, would the charge fail or the mm get OOM'ed?) Ignoring add_to/remove_from swap cache is what I've tried before, and again today. It's not enough: if you trying run a memhog (something that allocates and touches more memory than the cgroup is allowed, relying on pushing out to swap to complete), then that works well with the present accounting in add_to/remove_from swap cache, but it OOMs once I remove the memcontrol mods from mm/swap_state.c. I keep going back to investigate why, keep on thinking I understand it, then later realize I don't. Please give it a try, I hope you've got better mental models than I have. And I don't think it will be enough to handle shmem/tmpfs either; but won't worry about that until we've properly understood why exempting swapcache leads to those OOMs, and fixed that up. > > Are we misunderstanding each other, because I'm assuming > > MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? > > though I can't see that _MAPPED and _CACHED are actually supported, > > there being no reference to them outside the enum that defines them. > > I am also assuming MEM_CGROUP_TYPE_ALL for the purpose of our > discussion. The accounting is split into mem_cgroup_charge() and > mem_cgroup_cache_charge(). While charging the caches is when we > check for the control_type. It checks MEM_CGROUP_TYPE_ALL there, yes; but I can't find anything checking for either MEM_CGROUP_TYPE_MAPPED or MEM_CGROUP_TYPE_CACHED. (Or is it hidden in one of those preprocesor ## things which frustrate both my greps and me!?) > > Or are you deceived by that ifdef NUMA code in swapin_readahead, > > which propagates the fantasy that swap allocation follows vma layout? > > That nonsense has been around too long, I'll soon be sending a patch > > to remove it. > > The swapin readahead code under #ifdef NUMA is very confusing. I sent a patch to linux-mm last night, to remove that confusion. > I also > noticed another confusing thing during my test, swap cache does not > drop to 0, even though I've disabled all swap using swapoff. May be > those are tmpfs pages. The other interesting thing I tried was running > swapoff after a cgroup went over it's limit, the swapoff succeeded, > but I see strange numbers for free swap. I'll start another thread > after investigating a bit more. Those indeed are strange behaviours (if the swapoff really has succeeded, rather than lying), I not seen such and don't have an explanation. tmpfs doesn't add any weirdness there: when there's no swap, there can be no swap cache. Or is the swapoff still in progress? While it's busy, we keep /proc/meminfo looking sensible, but <Alt><SysRq>m can show negative free swap (IIRC). I'll be interested to hear what your investigation shows. Hugh ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-07 17:41 ` Hugh Dickins @ 2007-10-08 2:54 ` Balbir Singh 0 siblings, 0 replies; 112+ messages in thread From: Balbir Singh @ 2007-10-08 2:54 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Fri, 5 Oct 2007, Balbir Singh wrote: >> Hugh Dickins wrote: >>> That's where it should happen, yes; but my point is that it very >>> often does not. Because the swap cache page (read in as part of >>> the readaround cluster of some other cgroup, or in swapoff by some >>> other cgroup) is already assigned to that other cgroup (by the >>> mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The >>> page_cgroup exists and the page has already been accounted" route >>> when mem_cgroup_charge is called from do_swap_page. Doesn't it? >>> >> You are right, at this point I am beginning to wonder if I should >> account for the swap cache at all? We account for the pages in RSS >> and when the page comes back into the page table(s) via do_swap_page. >> If we believe that the swap cache is transitional and the current >> expected working behaviour does not seem right or hard to fix, >> it might be easy to ignore unuse_pte() and add/remove_from_swap_cache() >> for accounting and control. > > It would be wrong to ignore the unuse_pte() case: what it's intending > to do is correct, it's just being prevented by the swapcache issue > from doing what it intends at present. > OK > (Though I'm not thrilled with the idea of it causing an admin's > swapoff to fail because of a cgroup reaching mem limit there, I do > agree with your earlier argument that that's the right thing to happen, > and it's up to the admin to fix things up - my original objection came > from not realizing that normally the cgroup will reclaim from itself > to free its mem. I'm glad we have that sorted out. Hmm, would the charge fail or the mm get OOM'ed?) > Right now, we OOM if charging and reclaim fails. > Ignoring add_to/remove_from swap cache is what I've tried before, > and again today. It's not enough: if you trying run a memhog > (something that allocates and touches more memory than the cgroup > is allowed, relying on pushing out to swap to complete), then that > works well with the present accounting in add_to/remove_from swap > cache, but it OOMs once I remove the memcontrol mods from > mm/swap_state.c. I keep going back to investigate why, keep on > thinking I understand it, then later realize I don't. Please > give it a try, I hope you've got better mental models than I have. > I will try it. Another way to try it, is to set memory.control_type to 1, that removes charging of cache pages (both swap cache and page cache). I just did a quick small test on the memory controller with swap cache changes disabled and it worked fine for me on my UML image (without OOMing). I'll try the same test on a bigger box. Disabling swap does usually cause an OOM for workloads using anonymous pages if the cgroup goes over it's limit (since the cgroup cannot pushout memory). > And I don't think it will be enough to handle shmem/tmpfs either; > but won't worry about that until we've properly understood why > exempting swapcache leads to those OOMs, and fixed that up. > Sure. >>> Are we misunderstanding each other, because I'm assuming >>> MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? >>> though I can't see that _MAPPED and _CACHED are actually supported, >>> there being no reference to them outside the enum that defines them. >> I am also assuming MEM_CGROUP_TYPE_ALL for the purpose of our >> discussion. The accounting is split into mem_cgroup_charge() and >> mem_cgroup_cache_charge(). While charging the caches is when we >> check for the control_type. > > It checks MEM_CGROUP_TYPE_ALL there, yes; but I can't find anything > checking for either MEM_CGROUP_TYPE_MAPPED or MEM_CGROUP_TYPE_CACHED. > (Or is it hidden in one of those preprocesor ## things which frustrate > both my greps and me!?) > MEM_CGROUP_TYPE_ALL is defined to be (MEM_CGROUP_TYPE_CACHED | MEM_CGROUP_TYPE_MAPPED). I'll make that more explicit with a patch. When the type is not MEM_CGROUP_TYPE_ALL, cached pages are ignored. >>> Or are you deceived by that ifdef NUMA code in swapin_readahead, >>> which propagates the fantasy that swap allocation follows vma layout? >>> That nonsense has been around too long, I'll soon be sending a patch >>> to remove it. >> The swapin readahead code under #ifdef NUMA is very confusing. > > I sent a patch to linux-mm last night, to remove that confusion. > Thanks, I saw that. >> I also >> noticed another confusing thing during my test, swap cache does not >> drop to 0, even though I've disabled all swap using swapoff. May be >> those are tmpfs pages. The other interesting thing I tried was running >> swapoff after a cgroup went over it's limit, the swapoff succeeded, >> but I see strange numbers for free swap. I'll start another thread >> after investigating a bit more. > > Those indeed are strange behaviours (if the swapoff really has > succeeded, rather than lying), I not seen such and don't have an > explanation. tmpfs doesn't add any weirdness there: when there's > no swap, there can be no swap cache. Or is the swapoff still in > progress? While it's busy, we keep /proc/meminfo looking sensible, > but <Alt><SysRq>m can show negative free swap (IIRC). > > I'll be interested to hear what your investigation shows. > With the new OOM killer changes, I see negative swap. When I run swapoff with a memory hogger workload, I see (after swapoff succeeds) .... Swap cache: add 473215, delete 473214, find 31744/36688, race 0+0 Free swap = 18446744073709105092kB Total swap = 0kB Free swap: -446524kB ... > Hugh -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 15:46 ` Hugh Dickins 2007-10-03 8:13 ` Balbir Singh @ 2007-10-04 16:10 ` Paul Menage 1 sibling, 0 replies; 112+ messages in thread From: Paul Menage @ 2007-10-04 16:10 UTC (permalink / raw) To: Hugh Dickins Cc: Balbir Singh, Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On 10/2/07, Hugh Dickins <hugh@veritas.com> wrote: > > I accept that full swap control is something you're intending to add > incrementally later; but the current state doesn't make sense to me. One comment on swap - ideally it should be a separate subsystem from the memory controller. That way people who are using cpusets to provide memory isolation (rather than using the page-based memory controller) can also get swap isolation. Paul ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh 2007-10-02 15:46 ` Hugh Dickins @ 2007-10-10 21:07 ` Rik van Riel 2007-10-11 6:33 ` Balbir Singh 1 sibling, 1 reply; 112+ messages in thread From: Rik van Riel @ 2007-10-10 21:07 UTC (permalink / raw) To: balbir; +Cc: Andrew Morton, linux-kernel, Linux Memory Management List On Tue, 02 Oct 2007 09:51:11 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > I was hopeful of getting the bare minimal infrastructure for memory > control in mainline, so that review is easy and additional changes > can be well reviewed as well. I am not yet convinced that the way the memory controller code and lumpy reclaim have been merged is correct. I am combing through the code now and will send in a patch when I figure out if/what is wrong. I ran into this because I'm trying to merge the split VM code up to the latest -mm... -- All Rights Reversed ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-10 21:07 ` Rik van Riel @ 2007-10-11 6:33 ` Balbir Singh 0 siblings, 0 replies; 112+ messages in thread From: Balbir Singh @ 2007-10-11 6:33 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, Linux Memory Management List Rik van Riel wrote: > On Tue, 02 Oct 2007 09:51:11 +0530 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > >> I was hopeful of getting the bare minimal infrastructure for memory >> control in mainline, so that review is easy and additional changes >> can be well reviewed as well. > > I am not yet convinced that the way the memory controller code and > lumpy reclaim have been merged is correct. I am combing through the > code now and will send in a patch when I figure out if/what is wrong. > Hi, Rik, Do you mean the way the memory controller and lumpy reclaim work together? The reclaim in memory controller (on hitting the limit) is not lumpy. Would you like to see that change? Please do share your findings in the form of comments or patches. > I ran into this because I'm trying to merge the split VM code up to > the latest -mm... > Interesting, I'll see if I can find some spare test cycles to help test this code. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 112+ messages in thread
* x86 patches was Re: -mm merge plans for 2.6.24 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton 2007-10-01 21:34 ` wibbling over the cpuset shed domain connnection Paul Jackson 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh @ 2007-10-02 6:18 ` Andi Kleen 2007-10-02 6:32 ` Andrew Morton 2007-10-02 7:59 ` v4l-stk11xx* [Was: -mm merge plans for 2.6.24] Jiri Slaby ` (10 subsequent siblings) 13 siblings, 1 reply; 112+ messages in thread From: Andi Kleen @ 2007-10-02 6:18 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, mpm Andrew Morton <akpm@linux-foundation.org> writes: > > revert-x86_64-mm-cpa-einval.patch > fix-x86_64-mm-sched-clock-share.patch > agp-fix-race-condition-between-unmapping-and-freeing-pages.patch > x86_64-mce-poll-at-idle_start-and-printk-fix.patch > fix-x86_64-mm-unwinder.patch > geode-mfgpt-support-for-geode-class-machines.patch > geode-mfgpt-clock-event-device-support.patch > x86_64-add-acpi-reboot-option.patch > i386-convert-mm_context_t-semaphore-to-a-mutex.patch > dma-use-dev_to_node-to-get-node-for-device-in-dma_alloc_pages.patch > x86-make-io-apic-not-connected-pin-print-complete.patch > intel_cacheinfo-misc-section-annotation-fixes.patch > intel_cacheinfo-call-cache_add_dev-from-cache_sysfs_init.patch > x86-use-num_online_nodes-to-get-physical-cpus-numbers-for.patch > i386-stop-bogus-nmi-softlockup-warnings-in-show_mem.patch > voyager-include-asm-smph-to-fix-compile-error.patch > x86-64-disable-local-apic-timer-use-on-amd-systems-with-c1e.patch > clockevents-remove-unused-inline-function.patch > clockevents-allow-build-without-runtime-use.patch > x86_64-consolidate-tsc-calibration.patch > i386-prepare-sharing-hpet-code.patch > i386-hpet-add-x8664-hpet-bits.patch > i386-prepare-sharing-pit-code.patch > x86_64-use-i386-i8253-h.patch > x86_64-preparatory-apic-set-lvtt.patch > x86_64-apic-remove-bogus-pit-synchronization.patch > x86_64-apic-shuffle-calibration-around.patch > x86_64-apic-calibration-remove-divisor.patch > x86_64-apic-change-setup-calling-convention.patch > x86_64-apic-remove-nested-irq-disable.patch > x86_64-prep-idle-loop-for-dynticks.patch > x86_64-apic-add-clockevents-functions.patch > x86_64-convert-to-clockevents.patch > x86_64-remove-unused-code.patch > x86_64-cleanup-apic-c.patch > x86_64-cleanup-apic-c-fix.patch > x86_64-cleanup-apic-c-fix-2.patch > jiffies-remove-unused-macros.patch > acpi-remove-the-useless-ifdef-code.patch > i386-pit-remove-the-useless-ifdefs.patch > i386-hpet-sharing-optimize.patch > ich-force-hpet-make-generic-time-capable-of-switching-broadcast-timer.patch > ich-force-hpet-restructure-hpet-generic-clock-code.patch > ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable.patch > ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable-fix.patch > ich-force-hpet-late-initialization-of-hpet-after-quirk.patch > ich-force-hpet-ich5-quirk-to-force-detect-enable.patch > ich-force-hpet-ich5-quirk-to-force-detect-enable-fix.patch > ich-force-hpet-ich5-fix-a-bug-with-suspend-resume.patch > ich-force-hpet-add-ich7_0-pciid-to-quirk-list.patch > x86-fix-cpu_to_node-references.patch > x86-convert-cpu_core_map-to-be-a-per-cpu-variable.patch > convert-cpu_sibling_map-to-be-a-per-cpu-variable.patch > convert-cpu_sibling_map-to-a-per_cpu-data-array-ia64.patch > # convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64.patch: busted > convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64.patch > convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix.patch > convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix-2.patch > convert-cpu_sibling_map-to-a-per_cpu-data-array-sparc64.patch These are fine to me, but should not all go through my tree because most changes are in other architectures. > x86-convert-x86_cpu_to_apicid-to-be-a-per-cpu-variable.patch > x86-convert-cpu_llc_id-to-be-a-per-cpu-variable.patch > x86-acpi-use-cpu_physical_id.patch > i386-visws-extern-inline-static-inline.patch > i386-cleanup-struct-irqaction-initializers.patch > x86_64-cleanup-struct-irqaction-initializers.patch > asm-i386-ioh-fix-constness.patch > optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft.patch > optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft-fix.patch > hpet-force-enable-on-vt8235-37-chipsets.patch > x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron.patch > x86_64-check-and-enable-mmconfig-for-amd-family-10h-opteron.patch > x86_64-check-and-enable-mmconfig-for-amd-family-10h-opteron-fix.patch > x86_64-set-cfg_size-for-amd-family-10h-in-case-mmconfig-is.patch > x86_64-set-cfg_size-for-amd-family-10h-in-case-mmconfig-is-fix.patch > voyager-dont-try-to-support-unprocessor-builds.patch > x86_64-nx-bit-handling-in-change_page_attr.patch > x86-64-calgary-fix-calgary=disable=busnum-for-calioc2.patch > x86-64-calgary-get-rid-of-translate_phb.patch > x86_64-vdso-linker-script-cleanup.patch > x86_64-vdso-put-vars-in-rodata.patch > x86-convert-cpuinfo_x86-array-to-a-per_cpu-array.patch > x86_64-nmi_watchdog-fix-to-be-more-like-i386.patch > x86_64-nmi_watchdog-fix-to-be-more-like-i386-fix.patch > pci-use-pci=bfsort-for-hp-dl385-g2-dl585-g2.patch > > Send to Andi Did you resend it? I have nothing pending currently. I rejected also quite a few of these. The clockevents patches are not included in this; but given the recent trouble i'm not 100% sure they are even ready yet. > sparsemem-clean-up-spelling-error-in-comments.patch > sparsemem-record-when-a-section-has-a-valid-mem_map.patch > sparsemem-record-when-a-section-has-a-valid-mem_map-fix.patch > generic-virtual-memmap-support-for-sparsemem.patch > generic-virtual-memmap-support-for-sparsemem-fix.patch > generic-virtual-memmap-support-for-sparsemem-remove-excess-debugging.patch > generic-virtual-memmap-support-for-sparsemem-simplify-initialisation-code-and-reduce-duplication.patch > generic-virtual-memmap-support-for-sparsemem-pull-out-the-vmemmap-code-into-its-own-file.patch > generic-virtual-memmap-support-vmemmap-generify-initialisation-via-helpers.patch > x86_64-sparsemem_vmemmap-2m-page-size-support.patch > x86_64-sparsemem_vmemmap-2m-page-size-support-ensure-end-of-section-memmap-is-initialised.patch > x86_64-sparsemem_vmemmap-vmemmap-x86_64-convert-to-new-helper-based-initialisation.patch > ia64-sparsemem_vmemmap-16k-page-size-support.patch > ia64-sparsemem_vmemmap-16k-page-size-support-convert-to-new-helper-based-initialisation.patch > sparc64-sparsemem_vmemmap-support.patch > sparc64-sparsemem_vmemmap-support-vmemmap-convert-to-new-config-options.patch > ppc64-sparsemem_vmemmap-support.patch > ppc64-sparsemem_vmemmap-support-vmemmap-ppc64-convert-vmm_-macros-to-a-real-function.patch > ppc64-sparsemem_vmemmap-support-vmemmap-ppc64-convert-vmm_-macros-to-a-real-function-fix.patch > ppc64-sparsemem_vmemmap-support-convert-to-new-config-options.patch > > virtual memmap: merge Hmm, need to recheck the x86_64 bits I think. > memoryless-nodes-generic-management-of-nodemasks-for-various-purposes.patch > memoryless-nodes-generic-management-of-nodemasks-for-various-purposes-fix.patch > memoryless-nodes-introduce-mask-of-nodes-with-memory.patch > memoryless-nodes-introduce-mask-of-nodes-with-memory-fix.patch > # update-n_high_memory-node-state-for-memory-hotadd.patch: fold > update-n_high_memory-node-state-for-memory-hotadd.patch > update-n_high_memory-node-state-for-memory-hotadd-fix.patch > memoryless-nodes-fix-interleave-behavior-for-memoryless-nodes.patch > memoryless-nodes-oom-use-n_high_memory-map-instead-of-constructing-one-on-the-fly.patch > memoryless-nodes-no-need-for-kswapd.patch > memoryless-nodes-slab-support.patch > memoryless-nodes-slub-support.patch > memoryless-nodes-uncached-allocator-updates.patch > memoryless-nodes-allow-profiling-data-to-fall-back-to-other-nodes.patch > memoryless-nodes-update-memory-policy-and-page-migration.patch > memoryless-nodes-add-n_cpu-node-state.patch > memoryless-nodes-add-n_cpu-node-state-move-setup-of-n_cpu-node-state-mask.patch > memoryless-nodes-drop-one-memoryless-node-boot-warning.patch > memoryless-nodes-fix-gfp_thisnode-behavior.patch > memoryless-nodes-use-n_high_memory-for-cpusets.patch > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix.patch > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2.patch > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2-3.patch > fix-panic-of-cpu-online-with-memory-less-node.patch > > Merge At least I still believe the whole concept of memoryless node is dubious. > maps2-add-proc-pid-pagemap-interface.patch + * For each page in the address space, this file contains one long + * representing the corresponding physical page frame number (PFN) or The interface is clearly not compat clean at all > x86_64-efi-boot-support-efi-frame-buffer-driver.patch > x86_64-efi-boot-support-efi-boot-document.patch This required changes from review I think. And the previous patch is useless without a boot protocol. -Andi ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 6:18 ` x86 patches was Re: -mm merge plans for 2.6.24 Andi Kleen @ 2007-10-02 6:32 ` Andrew Morton 2007-10-02 7:01 ` Andi Kleen 2007-10-02 7:37 ` Ingo Molnar 0 siblings, 2 replies; 112+ messages in thread From: Andrew Morton @ 2007-10-02 6:32 UTC (permalink / raw) To: Andi Kleen Cc: linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter On 02 Oct 2007 08:18:17 +0200 Andi Kleen <andi@firstfloor.org> wrote: > Andrew Morton <akpm@linux-foundation.org> writes: > > > > revert-x86_64-mm-cpa-einval.patch > > fix-x86_64-mm-sched-clock-share.patch > > agp-fix-race-condition-between-unmapping-and-freeing-pages.patch > > x86_64-mce-poll-at-idle_start-and-printk-fix.patch > > fix-x86_64-mm-unwinder.patch > > geode-mfgpt-support-for-geode-class-machines.patch > > geode-mfgpt-clock-event-device-support.patch > > x86_64-add-acpi-reboot-option.patch > > i386-convert-mm_context_t-semaphore-to-a-mutex.patch > > dma-use-dev_to_node-to-get-node-for-device-in-dma_alloc_pages.patch > > x86-make-io-apic-not-connected-pin-print-complete.patch > > intel_cacheinfo-misc-section-annotation-fixes.patch > > intel_cacheinfo-call-cache_add_dev-from-cache_sysfs_init.patch > > x86-use-num_online_nodes-to-get-physical-cpus-numbers-for.patch > > i386-stop-bogus-nmi-softlockup-warnings-in-show_mem.patch > > voyager-include-asm-smph-to-fix-compile-error.patch > > x86-64-disable-local-apic-timer-use-on-amd-systems-with-c1e.patch > > clockevents-remove-unused-inline-function.patch > > clockevents-allow-build-without-runtime-use.patch > > x86_64-consolidate-tsc-calibration.patch > > i386-prepare-sharing-hpet-code.patch > > i386-hpet-add-x8664-hpet-bits.patch > > i386-prepare-sharing-pit-code.patch > > x86_64-use-i386-i8253-h.patch > > x86_64-preparatory-apic-set-lvtt.patch > > x86_64-apic-remove-bogus-pit-synchronization.patch > > x86_64-apic-shuffle-calibration-around.patch > > x86_64-apic-calibration-remove-divisor.patch > > x86_64-apic-change-setup-calling-convention.patch > > x86_64-apic-remove-nested-irq-disable.patch > > x86_64-prep-idle-loop-for-dynticks.patch > > x86_64-apic-add-clockevents-functions.patch > > x86_64-convert-to-clockevents.patch > > x86_64-remove-unused-code.patch > > x86_64-cleanup-apic-c.patch > > x86_64-cleanup-apic-c-fix.patch > > x86_64-cleanup-apic-c-fix-2.patch > > jiffies-remove-unused-macros.patch > > acpi-remove-the-useless-ifdef-code.patch > > i386-pit-remove-the-useless-ifdefs.patch > > i386-hpet-sharing-optimize.patch > > ich-force-hpet-make-generic-time-capable-of-switching-broadcast-timer.patch > > ich-force-hpet-restructure-hpet-generic-clock-code.patch > > ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable.patch > > ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable-fix.patch > > ich-force-hpet-late-initialization-of-hpet-after-quirk.patch > > ich-force-hpet-ich5-quirk-to-force-detect-enable.patch > > ich-force-hpet-ich5-quirk-to-force-detect-enable-fix.patch > > ich-force-hpet-ich5-fix-a-bug-with-suspend-resume.patch > > ich-force-hpet-add-ich7_0-pciid-to-quirk-list.patch > > x86-fix-cpu_to_node-references.patch > > x86-convert-cpu_core_map-to-be-a-per-cpu-variable.patch > > convert-cpu_sibling_map-to-be-a-per-cpu-variable.patch > > convert-cpu_sibling_map-to-a-per_cpu-data-array-ia64.patch > > # convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64.patch: busted > > convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64.patch > > convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix.patch > > convert-cpu_sibling_map-to-a-per_cpu-data-array-ppc64-fix-2.patch > > convert-cpu_sibling_map-to-a-per_cpu-data-array-sparc64.patch > > These are fine to me, but should not all go through my tree > because most changes are in other architectures. I assume you're referring to just convert-cpu_sibling_map-to-be-a-per-cpu-variable* here. > > x86-convert-x86_cpu_to_apicid-to-be-a-per-cpu-variable.patch > > x86-convert-cpu_llc_id-to-be-a-per-cpu-variable.patch > > x86-acpi-use-cpu_physical_id.patch > > i386-visws-extern-inline-static-inline.patch > > i386-cleanup-struct-irqaction-initializers.patch > > x86_64-cleanup-struct-irqaction-initializers.patch > > asm-i386-ioh-fix-constness.patch > > optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft.patch > > optimize-x86-page-faults-like-all-other-achitectures-and-kill-notifier-cruft-fix.patch > > hpet-force-enable-on-vt8235-37-chipsets.patch > > x86_64-check-msr-to-get-mmconfig-for-amd-family-10h-opteron.patch > > x86_64-check-and-enable-mmconfig-for-amd-family-10h-opteron.patch > > x86_64-check-and-enable-mmconfig-for-amd-family-10h-opteron-fix.patch > > x86_64-set-cfg_size-for-amd-family-10h-in-case-mmconfig-is.patch > > x86_64-set-cfg_size-for-amd-family-10h-in-case-mmconfig-is-fix.patch > > voyager-dont-try-to-support-unprocessor-builds.patch > > x86_64-nx-bit-handling-in-change_page_attr.patch > > x86-64-calgary-fix-calgary=disable=busnum-for-calioc2.patch > > x86-64-calgary-get-rid-of-translate_phb.patch > > x86_64-vdso-linker-script-cleanup.patch > > x86_64-vdso-put-vars-in-rodata.patch > > x86-convert-cpuinfo_x86-array-to-a-per_cpu-array.patch > > x86_64-nmi_watchdog-fix-to-be-more-like-i386.patch > > x86_64-nmi_watchdog-fix-to-be-more-like-i386-fix.patch > > pci-use-pci=bfsort-for-hp-dl385-g2-dl585-g2.patch > > > > Send to Andi > > Did you resend it? "Send", not "Sent". > I have nothing pending currently. I rejected > also quite a few of these. You did? I'd have dropped them if you had. Oh well, I was planning on a maintainer patch-bombing tomorrow - let's go through them again. > The clockevents patches are not included in this; but given the recent > trouble i'm not 100% sure they are even ready yet. hm, well, I hope you and Thomas are on the same page regarding precisely what the remaining issues are. > > sparsemem-clean-up-spelling-error-in-comments.patch > > sparsemem-record-when-a-section-has-a-valid-mem_map.patch > > sparsemem-record-when-a-section-has-a-valid-mem_map-fix.patch > > generic-virtual-memmap-support-for-sparsemem.patch > > generic-virtual-memmap-support-for-sparsemem-fix.patch > > generic-virtual-memmap-support-for-sparsemem-remove-excess-debugging.patch > > generic-virtual-memmap-support-for-sparsemem-simplify-initialisation-code-and-reduce-duplication.patch > > generic-virtual-memmap-support-for-sparsemem-pull-out-the-vmemmap-code-into-its-own-file.patch > > generic-virtual-memmap-support-vmemmap-generify-initialisation-via-helpers.patch > > x86_64-sparsemem_vmemmap-2m-page-size-support.patch > > x86_64-sparsemem_vmemmap-2m-page-size-support-ensure-end-of-section-memmap-is-initialised.patch > > x86_64-sparsemem_vmemmap-vmemmap-x86_64-convert-to-new-helper-based-initialisation.patch > > ia64-sparsemem_vmemmap-16k-page-size-support.patch > > ia64-sparsemem_vmemmap-16k-page-size-support-convert-to-new-helper-based-initialisation.patch > > sparc64-sparsemem_vmemmap-support.patch > > sparc64-sparsemem_vmemmap-support-vmemmap-convert-to-new-config-options.patch > > ppc64-sparsemem_vmemmap-support.patch > > ppc64-sparsemem_vmemmap-support-vmemmap-ppc64-convert-vmm_-macros-to-a-real-function.patch > > ppc64-sparsemem_vmemmap-support-vmemmap-ppc64-convert-vmm_-macros-to-a-real-function-fix.patch > > ppc64-sparsemem_vmemmap-support-convert-to-new-config-options.patch > > > > virtual memmap: merge > > Hmm, need to recheck the x86_64 bits I think. Thanks. > > memoryless-nodes-generic-management-of-nodemasks-for-various-purposes.patch > > memoryless-nodes-generic-management-of-nodemasks-for-various-purposes-fix.patch > > memoryless-nodes-introduce-mask-of-nodes-with-memory.patch > > memoryless-nodes-introduce-mask-of-nodes-with-memory-fix.patch > > # update-n_high_memory-node-state-for-memory-hotadd.patch: fold > > update-n_high_memory-node-state-for-memory-hotadd.patch > > update-n_high_memory-node-state-for-memory-hotadd-fix.patch > > memoryless-nodes-fix-interleave-behavior-for-memoryless-nodes.patch > > memoryless-nodes-oom-use-n_high_memory-map-instead-of-constructing-one-on-the-fly.patch > > memoryless-nodes-no-need-for-kswapd.patch > > memoryless-nodes-slab-support.patch > > memoryless-nodes-slub-support.patch > > memoryless-nodes-uncached-allocator-updates.patch > > memoryless-nodes-allow-profiling-data-to-fall-back-to-other-nodes.patch > > memoryless-nodes-update-memory-policy-and-page-migration.patch > > memoryless-nodes-add-n_cpu-node-state.patch > > memoryless-nodes-add-n_cpu-node-state-move-setup-of-n_cpu-node-state-mask.patch > > memoryless-nodes-drop-one-memoryless-node-boot-warning.patch > > memoryless-nodes-fix-gfp_thisnode-behavior.patch > > memoryless-nodes-use-n_high_memory-for-cpusets.patch > > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code.patch > > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix.patch > > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2.patch > > memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2-3.patch > > fix-panic-of-cpu-online-with-memory-less-node.patch > > > > Merge > > At least I still believe the whole concept of memoryless node is dubious. > How come? Memoryless node can and do occur in real-world machines. Kernel should support that? > > > maps2-add-proc-pid-pagemap-interface.patch > > + * For each page in the address space, this file contains one long > + * representing the corresponding physical page frame number (PFN) or > > The interface is clearly not compat clean at all Well that would be bad. What's the issue here? Both 32-bit and 64-bit userspace will see 64-bit data. So the problem is that 32-bit applications on 32-bit kernels will see 32-bit data but 32-bit applications on 64-bit kernels will see 64-bit data? If so, that might be OK - the app just needs a reliable way of working out whether it's on a 32- or 64-bit kernel? > > x86_64-efi-boot-support-efi-frame-buffer-driver.patch > > x86_64-efi-boot-support-efi-boot-document.patch > > This required changes from review I think. And the previous patch is useless > without a boot protocol. So should I drop them? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 6:32 ` Andrew Morton @ 2007-10-02 7:01 ` Andi Kleen 2007-10-02 7:18 ` Andrew Morton 2007-10-02 9:26 ` Andy Whitcroft 2007-10-02 7:37 ` Ingo Molnar 1 sibling, 2 replies; 112+ messages in thread From: Andi Kleen @ 2007-10-02 7:01 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw > > These are fine to me, but should not all go through my tree > > because most changes are in other architectures. > > I assume you're referring to just > convert-cpu_sibling_map-to-be-a-per-cpu-variable* here. All the *-to-*per-cpu* patches from Mike yes > > > I have nothing pending currently. I rejected > > also quite a few of these. > > You did? I'd have dropped them if you had. > > Oh well, I was planning on a maintainer patch-bombing tomorrow - let's go > through them again. I'll send you a detailed list after the patch bomb. > > > > Hmm, need to recheck the x86_64 bits I think. > > Thanks. Done now (adding ccs) x86_64-sparsemem_vmemmap-2m-page-size-support.patch x86_64-sparsemem_vmemmap-vmemmap-x86_64-convert-to-new-helper-based-initialisation.patch Look like these two should be merged together Also I'm concerned about a third variant of memmappery. Can we agree to only merge that when the old sparsemem support is removed from x86-64? Otherwise it looks good to me. > How come? Memoryless node can and do occur in real-world machines. Kernel > should support that? But a node is just defined by its memory? > If so, that might be OK - the app just needs a reliable way of working out > whether it's on a 32- or 64-bit kernel? That would be ugly and a little error prone (would this case really be tested in user space normally?) but might work. > > > > x86_64-efi-boot-support-efi-frame-buffer-driver.patch > > > x86_64-efi-boot-support-efi-boot-document.patch > > > > This required changes from review I think. And the previous patch is useless > > without a boot protocol. > > So should I drop them? Yes for now please. e.g. we at least need a patch to actually check the version number of the boot protocol. -Andi ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:01 ` Andi Kleen @ 2007-10-02 7:18 ` Andrew Morton 2007-10-02 7:36 ` KAMEZAWA Hiroyuki 2007-10-02 7:55 ` Matt Mackall 2007-10-02 9:26 ` Andy Whitcroft 1 sibling, 2 replies; 112+ messages in thread From: Andrew Morton @ 2007-10-02 7:18 UTC (permalink / raw) To: Andi Kleen Cc: linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, KAMEZAWA Hiroyuki On Tue, 2 Oct 2007 09:01:10 +0200 Andi Kleen <andi@firstfloor.org> wrote: > > > These are fine to me, but should not all go through my tree > > > because most changes are in other architectures. > > > > I assume you're referring to just > > convert-cpu_sibling_map-to-be-a-per-cpu-variable* here. > > All the *-to-*per-cpu* patches from Mike yes OK, I'll merge those directly. > > > > > I have nothing pending currently. I rejected > > > also quite a few of these. > > > > You did? I'd have dropped them if you had. > > > > Oh well, I was planning on a maintainer patch-bombing tomorrow - let's go > > through them again. > > I'll send you a detailed list after the patch bomb. Thanks > > > > > > Hmm, need to recheck the x86_64 bits I think. > > > > Thanks. > > Done now (adding ccs) > > x86_64-sparsemem_vmemmap-2m-page-size-support.patch > x86_64-sparsemem_vmemmap-vmemmap-x86_64-convert-to-new-helper-based-initialisation.patch > Look like these two should be merged together Shall do. > Also I'm concerned about a third variant of memmappery. Can we agree > to only merge that when the old sparsemem support is removed from x86-64? How much work would that be? > Otherwise it looks good to me. > > > How come? Memoryless node can and do occur in real-world machines. Kernel > > should support that? > > But a node is just defined by its memory? Don't think so. A node is a lump of circuitry which can have zero or more CPUs, IO and memory. It may initially have been conceived as a memory-only concept in the Linux kernel, but that doesn't fully map onto reality (does it?) There was a real-world need for this, I think from the Fujitsu guys. That should be spelled out in the changelog but isn't. > > If so, that might be OK - the app just needs a reliable way of working out > > whether it's on a 32- or 64-bit kernel? > > That would be ugly and a little error prone (would this case really be > tested in user space normally?) but might work. I guess it wouldn't be too hard for a 64-bit kernel to fake up 32-bit data for 32-bit userspace. For each architecture :( But let's see what Matt thinks. > > > > > > x86_64-efi-boot-support-efi-frame-buffer-driver.patch > > > > x86_64-efi-boot-support-efi-boot-document.patch > > > > > > This required changes from review I think. And the previous patch is useless > > > without a boot protocol. > > > > So should I drop them? > > Yes for now please. Done. > e.g. we at least need a patch to actually check the version number > of the boot protocol. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:18 ` Andrew Morton @ 2007-10-02 7:36 ` KAMEZAWA Hiroyuki 2007-10-02 7:43 ` Andrew Morton ` (3 more replies) 2007-10-02 7:55 ` Matt Mackall 1 sibling, 4 replies; 112+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-10-02 7:36 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, Lee.Schermerhorn@hp.com On Tue, 2 Oct 2007 00:18:09 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > How come? Memoryless node can and do occur in real-world machines. Kernel > > > should support that? > > > > But a node is just defined by its memory? > > Don't think so. A node is a lump of circuitry which can have zero or more > CPUs, IO and memory. > > It may initially have been conceived as a memory-only concept in the Linux > kernel, but that doesn't fully map onto reality (does it?) > > There was a real-world need for this, I think from the Fujitsu guys. That > should be spelled out in the changelog but isn't. Yes, Fujitsu and HP guys really need this memory-less-node support. Thanks, -Kame ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:36 ` KAMEZAWA Hiroyuki @ 2007-10-02 7:43 ` Andrew Morton 2007-10-02 8:16 ` KAMEZAWA Hiroyuki 2007-10-02 17:25 ` Lee Schermerhorn 2007-10-02 16:40 ` Nish Aravamudan ` (2 subsequent siblings) 3 siblings, 2 replies; 112+ messages in thread From: Andrew Morton @ 2007-10-02 7:43 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, Lee.Schermerhorn@hp.com On Tue, 2 Oct 2007 16:36:24 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 2 Oct 2007 00:18:09 -0700 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > How come? Memoryless node can and do occur in real-world machines. Kernel > > > > should support that? > > > > > > But a node is just defined by its memory? > > > > Don't think so. A node is a lump of circuitry which can have zero or more > > CPUs, IO and memory. > > > > It may initially have been conceived as a memory-only concept in the Linux > > kernel, but that doesn't fully map onto reality (does it?) > > > > There was a real-world need for this, I think from the Fujitsu guys. That > > should be spelled out in the changelog but isn't. > > Yes, Fujitsu and HP guys really need this memory-less-node support. > For what reason, please? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:43 ` Andrew Morton @ 2007-10-02 8:16 ` KAMEZAWA Hiroyuki 2007-10-02 10:48 ` Yasunori Goto 2007-10-02 18:18 ` Christoph Lameter 2007-10-02 17:25 ` Lee Schermerhorn 1 sibling, 2 replies; 112+ messages in thread From: KAMEZAWA Hiroyuki @ 2007-10-02 8:16 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, Lee.Schermerhorn@hp.com On Tue, 2 Oct 2007 00:43:24 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Tue, 2 Oct 2007 16:36:24 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > Don't think so. A node is a lump of circuitry which can have zero or more > > > CPUs, IO and memory. > > > > > > It may initially have been conceived as a memory-only concept in the Linux > > > kernel, but that doesn't fully map onto reality (does it?) > > > > > > There was a real-world need for this, I think from the Fujitsu guys. That > > > should be spelled out in the changelog but isn't. > > > > Yes, Fujitsu and HP guys really need this memory-less-node support. > > > > For what reason, please? > For fujitsu, problem is called "empty" node. When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init) creates nodes, which includes no memory, no cpu. I tried to remove empty-node in past, but that was denied. It was because we can hot-add cpu to the empty node. (node-hotplug triggered by cpu is not implemented now. and it will be ugly.) For HP, (Lee can comment on this later), they have memory-less-node. As far as I hear, HP's machine can have following configration. (example) Node0: CPU0 memory AAA MB Node1: CPU1 memory AAA MB Node2: CPU2 memory AAA MB Node3: CPU3 memory AAA MB Node4: Memory XXX GB AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap. After boot, only Node 4 has valid memory (but have no cpu.) Maybe this is memory-interleave by firmware config. Thanks, -Kame ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 8:16 ` KAMEZAWA Hiroyuki @ 2007-10-02 10:48 ` Yasunori Goto 2007-10-02 18:18 ` Christoph Lameter 1 sibling, 0 replies; 112+ messages in thread From: Yasunori Goto @ 2007-10-02 10:48 UTC (permalink / raw) To: Andrew Morton, Andi Kleen Cc: linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, Lee.Schermerhorn@hp.com, KAMEZAWA Hiroyuki > On Tue, 2 Oct 2007 00:43:24 -0700 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > On Tue, 2 Oct 2007 16:36:24 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> > > > > Don't think so. A node is a lump of circuitry which can have zero or more > > > > CPUs, IO and memory. > > > > > > > > It may initially have been conceived as a memory-only concept in the Linux > > > > kernel, but that doesn't fully map onto reality (does it?) > > > > > > > > There was a real-world need for this, I think from the Fujitsu guys. That > > > > should be spelled out in the changelog but isn't. > > > > > > Yes, Fujitsu and HP guys really need this memory-less-node support. > > > > > > > For what reason, please? > > > > For fujitsu, problem is called "empty" node. > > When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init) > creates nodes, which includes no memory, no cpu. > > I tried to remove empty-node in past, but that was denied. > It was because we can hot-add cpu to the empty node. > (node-hotplug triggered by cpu is not implemented now. and it will be ugly.) > > > For HP, (Lee can comment on this later), they have memory-less-node. > As far as I hear, HP's machine can have following configration. > > (example) > Node0: CPU0 memory AAA MB > Node1: CPU1 memory AAA MB > Node2: CPU2 memory AAA MB > Node3: CPU3 memory AAA MB > Node4: Memory XXX GB > > AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap. > After boot, only Node 4 has valid memory (but have no cpu.) > > Maybe this is memory-interleave by firmware config. >From memory-hotplug view, memory-less node is very helpful. It can define and arrange some "halfway conditions" of node hot-plug. I guess that node unpluging code will be simpler by it. Bye. -- Yasunori Goto ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 8:16 ` KAMEZAWA Hiroyuki 2007-10-02 10:48 ` Yasunori Goto @ 2007-10-02 18:18 ` Christoph Lameter 1 sibling, 0 replies; 112+ messages in thread From: Christoph Lameter @ 2007-10-02 18:18 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, apw, Lee.Schermerhorn@hp.com On Tue, 2 Oct 2007, KAMEZAWA Hiroyuki wrote: > For fujitsu, problem is called "empty" node. Future SGI platforms (actually also current one can have but nothing like that is deployed to my knowledge) have nodes with only cpus. Current SGI platforms have nodes with just I/O that we so far cannot manage in the core. So the arch code maps them to the nearest memory node. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:43 ` Andrew Morton 2007-10-02 8:16 ` KAMEZAWA Hiroyuki @ 2007-10-02 17:25 ` Lee Schermerhorn 1 sibling, 0 replies; 112+ messages in thread From: Lee Schermerhorn @ 2007-10-02 17:25 UTC (permalink / raw) To: Andrew Morton Cc: KAMEZAWA Hiroyuki, Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw On Tue, 2007-10-02 at 00:43 -0700, Andrew Morton wrote: > On Tue, 2 Oct 2007 16:36:24 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > On Tue, 2 Oct 2007 00:18:09 -0700 > > Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > How come? Memoryless node can and do occur in real-world machines. Kernel > > > > > should support that? > > > > > > > > But a node is just defined by its memory? > > > > > > Don't think so. A node is a lump of circuitry which can have zero or more > > > CPUs, IO and memory. > > > > > > It may initially have been conceived as a memory-only concept in the Linux > > > kernel, but that doesn't fully map onto reality (does it?) > > > > > > There was a real-world need for this, I think from the Fujitsu guys. That > > > should be spelled out in the changelog but isn't. > > > > Yes, Fujitsu and HP guys really need this memory-less-node support. > > > > For what reason, please? For the HP platforms, we can configure each cell with from 0% to 100% "cell local memory". When we configure with <100% CLM, the "missing percentages" are interleaved by hardware on a cache-line granularity to improve bandwidth at the expense of latency for numa-challenged applications [and OSes, but not our problem ;-)]. When we boot Linux on such a config, all of the real nodes have no memory--it all resides in a single interleaved pseudo-node. When we boot Linux on a 100% CLM configuration [== NUMA], we still have the interleaved pseudo-node. It contains a few hundred MB stolen from the real nodes to contain the DMA zone. [Interleaved memory resides at phys addr 0]. The memoryless-nodes patches, along with the zoneorder patches, support this config as well. Also, when we boot a NUMA config with the "mem=" command line, specifying less memory than actually exists, Linux takes the excluded memory "off the top" rather than distributing it across the nodes. This can result in memoryless nodes, as well. Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:36 ` KAMEZAWA Hiroyuki 2007-10-02 7:43 ` Andrew Morton @ 2007-10-02 16:40 ` Nish Aravamudan 2007-10-02 17:17 ` Lee Schermerhorn 2007-10-02 18:16 ` Christoph Lameter 3 siblings, 0 replies; 112+ messages in thread From: Nish Aravamudan @ 2007-10-02 16:40 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, Lee.Schermerhorn@hp.com On 10/2/07, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 2 Oct 2007 00:18:09 -0700 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > How come? Memoryless node can and do occur in real-world machines. Kernel > > > > should support that? > > > > > > But a node is just defined by its memory? > > > > Don't think so. A node is a lump of circuitry which can have zero or more > > CPUs, IO and memory. > > > > It may initially have been conceived as a memory-only concept in the Linux > > kernel, but that doesn't fully map onto reality (does it?) > > > > There was a real-world need for this, I think from the Fujitsu guys. That > > should be spelled out in the changelog but isn't. > > Yes, Fujitsu and HP guys really need this memory-less-node support. Anton's post (http://marc.info/?l=linux-mm&m=118133042025995&w=2) (and my subsequent reposts) may have helped prompt this full series, which then was picked up and shown to be useful to other folks. NUMA systems with memoryless nodes exist in the wild and Linux did not do the right thing there. Admittedly, Anton's case is hugetlb-specific, but the fix I've been proposing (and hope to repost soon) depends on Christoph's patches, especially the THISNODE fix. Thanks, Nish ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:36 ` KAMEZAWA Hiroyuki 2007-10-02 7:43 ` Andrew Morton 2007-10-02 16:40 ` Nish Aravamudan @ 2007-10-02 17:17 ` Lee Schermerhorn 2007-10-02 18:16 ` Christoph Lameter 3 siblings, 0 replies; 112+ messages in thread From: Lee Schermerhorn @ 2007-10-02 17:17 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw On Tue, 2007-10-02 at 16:36 +0900, KAMEZAWA Hiroyuki wrote: > On Tue, 2 Oct 2007 00:18:09 -0700 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > How come? Memoryless node can and do occur in real-world machines. Kernel > > > > should support that? > > > > > > But a node is just defined by its memory? > > > > Don't think so. A node is a lump of circuitry which can have zero or more > > CPUs, IO and memory. > > > > It may initially have been conceived as a memory-only concept in the Linux > > kernel, but that doesn't fully map onto reality (does it?) > > > > There was a real-world need for this, I think from the Fujitsu guys. That > > should be spelled out in the changelog but isn't. > > Yes, Fujitsu and HP guys really need this memory-less-node support. Agreed! Lee ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:36 ` KAMEZAWA Hiroyuki ` (2 preceding siblings ...) 2007-10-02 17:17 ` Lee Schermerhorn @ 2007-10-02 18:16 ` Christoph Lameter 3 siblings, 0 replies; 112+ messages in thread From: Christoph Lameter @ 2007-10-02 18:16 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrew Morton, Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, apw, Lee.Schermerhorn@hp.com On Tue, 2 Oct 2007, KAMEZAWA Hiroyuki wrote: > > There was a real-world need for this, I think from the Fujitsu guys. That > > should be spelled out in the changelog but isn't. > > Yes, Fujitsu and HP guys really need this memory-less-node support. The SGI guys also need this support in the future. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:18 ` Andrew Morton 2007-10-02 7:36 ` KAMEZAWA Hiroyuki @ 2007-10-02 7:55 ` Matt Mackall 2007-10-02 7:59 ` Andi Kleen 1 sibling, 1 reply; 112+ messages in thread From: Matt Mackall @ 2007-10-02 7:55 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, linux-kernel, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, KAMEZAWA Hiroyuki On Tue, Oct 02, 2007 at 12:18:09AM -0700, Andrew Morton wrote: > > > If so, that might be OK - the app just needs a reliable way of working out > > > whether it's on a 32- or 64-bit kernel? > > > > That would be ugly and a little error prone (would this case really be > > tested in user space normally?) but might work. > > I guess it wouldn't be too hard for a 64-bit kernel to fake up 32-bit data > for 32-bit userspace. For each architecture :( But let's see what Matt > thinks. Grumble. The options are: a) export it in the kernel's native size and have userspace figure it out b) add a header c) lie to 32-bit apps on 64-bit kernels d) always export 32 bits e) always export 64 bits I started with (a), switched to (b), and then Alan and Dave convinced me to switch back to (a). I don't think (c) is desireable, especially as it means having two code paths. (d) would work until memory got large enough that PFNs didn't fit in 32 bits. (e) would be ok all around, except for the extra overhead. Ho hum. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:55 ` Matt Mackall @ 2007-10-02 7:59 ` Andi Kleen 0 siblings, 0 replies; 112+ messages in thread From: Andi Kleen @ 2007-10-02 7:59 UTC (permalink / raw) To: Matt Mackall Cc: Andrew Morton, Andi Kleen, linux-kernel, Huang, Ying, Thomas Gleixner, Christoph Lameter, apw, KAMEZAWA Hiroyuki > Grumble. The options are: > > a) export it in the kernel's native size and have userspace figure it > out > b) add a header > c) lie to 32-bit apps on 64-bit kernels > d) always export 32 bits > e) always export 64 bits > > I started with (a), switched to (b), and then Alan and Dave convinced > me to switch back to (a). I don't think (c) is desireable, especially > as it means having two code paths. (d) would work until memory got > large enough that PFNs didn't fit in 32 bits. (e) would be ok all > around, except for the extra overhead. Ho hum. If the overhead of (e) is ok for 64bit it should be ok for 32bit too. -Andi ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:01 ` Andi Kleen 2007-10-02 7:18 ` Andrew Morton @ 2007-10-02 9:26 ` Andy Whitcroft 1 sibling, 0 replies; 112+ messages in thread From: Andy Whitcroft @ 2007-10-02 9:26 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter On Tue, Oct 02, 2007 at 09:01:10AM +0200, Andi Kleen wrote: > x86_64-sparsemem_vmemmap-2m-page-size-support.patch > x86_64-sparsemem_vmemmap-vmemmap-x86_64-convert-to-new-helper-based-initialisation.patch > Look like these two should be merged together > > Also I'm concerned about a third variant of memmappery. Can we agree > to only merge that when the old sparsemem support is removed from x86-64? > > Otherwise it looks good to me. sparsemem vmemmap is a sparsemem variant. By that I mean that it uses all the same infrastructure as sparsemem. That sparsemem code is generic code and shared with the other architectures. There essentially is no code to remove which is not generic and currently in use by other architectures. The patches as they stand select the vmemmap variant unconditionally when sparsemem is selected, we are not adding a new option for x86_64 overall -- in that sense classic sparsemem is already removed for x86_64 by these patches. The longer plan is to pull out the other memory models where they are no longer beneficial with a view to ending up with only one. A good example is the private virtual memory map implemented on ia64, which is an early target. As discussed at VM summit, we are also looking at removing discontigmem for x86. That review will continue. > > How come? Memoryless node can and do occur in real-world machines. Kernel > > should support that? > > But a node is just defined by its memory? I thought that a node was a unit of numa locality. Cirtainly some machines seem to express themselves as memory only nodes and cpu only nodes; in the past I am sure we have also heard of IO only nodes representing "io drawers" and the like. -apw ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 6:32 ` Andrew Morton 2007-10-02 7:01 ` Andi Kleen @ 2007-10-02 7:37 ` Ingo Molnar 2007-10-02 7:46 ` Andi Kleen 1 sibling, 1 reply; 112+ messages in thread From: Ingo Molnar @ 2007-10-02 7:37 UTC (permalink / raw) To: Andrew Morton Cc: Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, Arjan van de Ven * Andrew Morton <akpm@linux-foundation.org> wrote: > On 02 Oct 2007 08:18:17 +0200 Andi Kleen <andi@firstfloor.org> wrote: > > > The clockevents patches are not included in this; but given the > > recent trouble i'm not 100% sure they are even ready yet. i'm curious, which "recent trouble" do you refer to? (The NMI watchdog bug [which is off by default] was fixed quickly. The C1E bug was found and fixed quickly. Anything else i missed?) > hm, well, I hope you and Thomas are on the same page regarding > precisely what the remaining issues are. i'd like to see the 64-bit clockevents (& dynticks) patches merged. Demand from users and distros is high: the 64-bit CE patches are merged into the Fedora 8 and Ubuntu kernels already, it's been in -mm and in -rt too for a long time and powertop users demand it as well. It also makes it obviously easier to unify the 64-bit and 32-bit code. So there's multiple good reasons to do it. Ingo ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:37 ` Ingo Molnar @ 2007-10-02 7:46 ` Andi Kleen 2007-10-02 7:58 ` Thomas Gleixner 0 siblings, 1 reply; 112+ messages in thread From: Andi Kleen @ 2007-10-02 7:46 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, Andi Kleen, linux-kernel, mpm, Huang, Ying, Thomas Gleixner, Christoph Lameter, Arjan van de Ven On Tue, Oct 02, 2007 at 09:37:03AM +0200, Ingo Molnar wrote: > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > On 02 Oct 2007 08:18:17 +0200 Andi Kleen <andi@firstfloor.org> wrote: > > > > > The clockevents patches are not included in this; but given the > > > recent trouble i'm not 100% sure they are even ready yet. > > i'm curious, which "recent trouble" do you refer to? (The NMI watchdog > bug [which is off by default] was fixed quickly. The C1E bug was found > and fixed quickly. Anything else i missed?) C1e and now the misrouted irq 0s Thomas reported. Also i'm a little worried about the missing C1e check; it looks like it needs a re-review to make sure not other infrastructure was missing. > > > hm, well, I hope you and Thomas are on the same page regarding > > precisely what the remaining issues are. > > i'd like to see the 64-bit clockevents (& dynticks) patches merged. > Demand from users and distros is high. I'm aware of that. No argument on that it will eventually go in. -Andi ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: x86 patches was Re: -mm merge plans for 2.6.24 2007-10-02 7:46 ` Andi Kleen @ 2007-10-02 7:58 ` Thomas Gleixner 0 siblings, 0 replies; 112+ messages in thread From: Thomas Gleixner @ 2007-10-02 7:58 UTC (permalink / raw) To: Andi Kleen Cc: Ingo Molnar, Andrew Morton, linux-kernel, mpm, Huang, Ying, Christoph Lameter, Arjan van de Ven On Tue, 2 Oct 2007, Andi Kleen wrote: > On Tue, Oct 02, 2007 at 09:37:03AM +0200, Ingo Molnar wrote: > > > > * Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > On 02 Oct 2007 08:18:17 +0200 Andi Kleen <andi@firstfloor.org> wrote: > > > > > > > The clockevents patches are not included in this; but given the > > > > recent trouble i'm not 100% sure they are even ready yet. > > > > i'm curious, which "recent trouble" do you refer to? (The NMI watchdog > > bug [which is off by default] was fixed quickly. The C1E bug was found > > and fixed quickly. Anything else i missed?) > > C1e and now the misrouted irq 0s Thomas reported. > > Also i'm a little worried about the missing C1e check; it looks > like it needs a re-review to make sure not other infrastructure was > missing. I had completely forgotten about the C1E problem, which we debugged half a year ago on 32bit. I went through the other pitfalls we had in 32bit carefully again and they are all covered on 64 bit too. C1E was the only one I missed. The irq0 problem is not a real one. The clock events code has no irq0 bound to cpuX assumption at all. The only affected part is nmi_watchdog and I have a fix ready to handle this even for the irq#0 not on cpu#0 case. tglx ^ permalink raw reply [flat|nested] 112+ messages in thread
* v4l-stk11xx* [Was: -mm merge plans for 2.6.24] 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (2 preceding siblings ...) 2007-10-02 6:18 ` x86 patches was Re: -mm merge plans for 2.6.24 Andi Kleen @ 2007-10-02 7:59 ` Jiri Slaby [not found] ` <4701FC79.3060608@gmail.com> ` (9 subsequent siblings) 13 siblings, 0 replies; 112+ messages in thread From: Jiri Slaby @ 2007-10-02 7:59 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Mauro Carvalho Chehab, Linux and Kernel Video On 10/01/2007 11:22 PM, Andrew Morton wrote: > v4l-stk11xx-add-a-new-webcam-driver.patch > v4l-stk11xx-use-array_size-in-another-2-cases.patch > v4l-stk11xx-use-retval-from-stk11xx_check_device.patch > v4l-stk11xx-add-static-to-tables.patch No, hold it please, until v4l extension will be developped and bayer->rgb conversion dismiss from the driver. > bw-qcam-use-data_reverse-instead-of-manually-poking-the-control-register-fix.patch > git-dvb-build-fix.patch > > Send to mchehab thanks, -- Jiri Slaby (jirislaby@gmail.com) Faculty of Informatics, Masaryk University ^ permalink raw reply [flat|nested] 112+ messages in thread
[parent not found: <4701FC79.3060608@gmail.com>]
* Re: Wireless damage [Was: -mm merge plans for 2.6.24] [not found] ` <4701FC79.3060608@gmail.com> @ 2007-10-02 8:10 ` Jiri Slaby 0 siblings, 0 replies; 112+ messages in thread From: Jiri Slaby @ 2007-10-02 8:10 UTC (permalink / raw) To: Jiri Slaby Cc: Andrew Morton, linux-wireless, Luis R. Rodriguez, Nick Kossifidis, John W. Linville, Jeff Garzik, Linux Kernel Mailing List Huh, Cc back lkml. On 10/02/2007 10:08 AM, Jiri Slaby wrote: > On 10/01/2007 11:22 PM, Andrew Morton wrote: >> git-net-vs-git-wireless.patch > > Hum, latest wireless-2.6?h=everything already had a proper fix when this patch > was merged. But pull/push is stuck somewhere :(. > >> git-wireless-vs-gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct.patch >> net-add-ath5k-wireless-driver-fix.patch > > let it live in -mm (wireless-2.6) so far. > >> Wireless damage control. Will see what happens. ^ permalink raw reply [flat|nested] 112+ messages in thread
* per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (4 preceding siblings ...) [not found] ` <4701FC79.3060608@gmail.com> @ 2007-10-02 8:17 ` Peter Zijlstra [not found] ` <20071002082831.GA19954@mail.ustc.edu.cn> ` (2 more replies) [not found] ` <20071002083922.GA28892@mail.ustc.edu.cn> ` (7 subsequent siblings) 13 siblings, 3 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-10-02 8:17 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Jens Axboe, Fengguang Wu On Mon, 2007-10-01 at 14:22 -0700, Andrew Morton wrote: > nfs-remove-congestion_end.patch > lib-percpu_counter_add.patch > lib-percpu_counter_sub.patch > lib-percpu_counter-variable-batch.patch > lib-make-percpu_counter_add-take-s64.patch > lib-percpu_counter_set.patch > lib-percpu_counter_sum_positive.patch > lib-percpu_count_sum.patch > lib-percpu_counter_init-error-handling.patch > lib-percpu_counter_init_irq.patch > mm-bdi-init-hooks.patch > mm-scalable-bdi-statistics-counters.patch > mm-count-reclaimable-pages-per-bdi.patch > mm-count-writeback-pages-per-bdi.patch This one: > mm-expose-bdi-statistics-in-sysfs.patch > lib-floating-proportions.patch > mm-per-device-dirty-threshold.patch > mm-per-device-dirty-threshold-warning-fix.patch > mm-per-device-dirty-threshold-fix.patch > mm-dirty-balancing-for-tasks.patch > mm-dirty-balancing-for-tasks-warning-fix.patch And, this one: > debug-sysfs-files-for-the-current-ratio-size-total.patch I'm not sure polluting /sys/block/<foo>/queue/ like that is The Right Thing. These patches sure were handy when debugging this, but not sure they want to move to maineline. Maybe we want /sys/bdi/<foo>/ or maybe /debug/bdi/<foo>/ Opinions? ^ permalink raw reply [flat|nested] 112+ messages in thread
[parent not found: <20071002082831.GA19954@mail.ustc.edu.cn>]
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) [not found] ` <20071002082831.GA19954@mail.ustc.edu.cn> @ 2007-10-02 8:28 ` Fengguang Wu 0 siblings, 0 replies; 112+ messages in thread From: Fengguang Wu @ 2007-10-02 8:28 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu On Tue, Oct 02, 2007 at 10:17:05AM +0200, Peter Zijlstra wrote: > On Mon, 2007-10-01 at 14:22 -0700, Andrew Morton wrote: > > > nfs-remove-congestion_end.patch > > lib-percpu_counter_add.patch > > lib-percpu_counter_sub.patch > > lib-percpu_counter-variable-batch.patch > > lib-make-percpu_counter_add-take-s64.patch > > lib-percpu_counter_set.patch > > lib-percpu_counter_sum_positive.patch > > lib-percpu_count_sum.patch > > lib-percpu_counter_init-error-handling.patch > > lib-percpu_counter_init_irq.patch > > mm-bdi-init-hooks.patch > > mm-scalable-bdi-statistics-counters.patch > > mm-count-reclaimable-pages-per-bdi.patch > > mm-count-writeback-pages-per-bdi.patch > > This one: > > mm-expose-bdi-statistics-in-sysfs.patch > > > lib-floating-proportions.patch > > mm-per-device-dirty-threshold.patch > > mm-per-device-dirty-threshold-warning-fix.patch > > mm-per-device-dirty-threshold-fix.patch > > mm-dirty-balancing-for-tasks.patch > > mm-dirty-balancing-for-tasks-warning-fix.patch > > And, this one: > > debug-sysfs-files-for-the-current-ratio-size-total.patch > > > I'm not sure polluting /sys/block/<foo>/queue/ like that is The Right > Thing. These patches sure were handy when debugging this, but not sure > they want to move to maineline. > Maybe we want /sys/bdi/<foo>/ or maybe /debug/bdi/<foo>/ I'd vote for /sys/bdi/. It will be more than debug variables. It's good to expose per-bdi tunable parameters and allow one to view bdi states. It would also allow one to tune things like NFS readahead :-) ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 8:17 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Peter Zijlstra [not found] ` <20071002082831.GA19954@mail.ustc.edu.cn> @ 2007-10-02 8:31 ` Andrew Morton 2007-10-02 8:48 ` Peter Zijlstra 2007-10-03 11:00 ` Martin Knoblauch 2 siblings, 1 reply; 112+ messages in thread From: Andrew Morton @ 2007-10-02 8:31 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Jens Axboe, Fengguang Wu On Tue, 02 Oct 2007 10:17:05 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2007-10-01 at 14:22 -0700, Andrew Morton wrote: > > > nfs-remove-congestion_end.patch > > lib-percpu_counter_add.patch > > lib-percpu_counter_sub.patch > > lib-percpu_counter-variable-batch.patch > > lib-make-percpu_counter_add-take-s64.patch > > lib-percpu_counter_set.patch > > lib-percpu_counter_sum_positive.patch > > lib-percpu_count_sum.patch > > lib-percpu_counter_init-error-handling.patch > > lib-percpu_counter_init_irq.patch > > mm-bdi-init-hooks.patch > > mm-scalable-bdi-statistics-counters.patch > > mm-count-reclaimable-pages-per-bdi.patch > > mm-count-writeback-pages-per-bdi.patch > > This one: > > mm-expose-bdi-statistics-in-sysfs.patch > > > lib-floating-proportions.patch > > mm-per-device-dirty-threshold.patch > > mm-per-device-dirty-threshold-warning-fix.patch > > mm-per-device-dirty-threshold-fix.patch > > mm-dirty-balancing-for-tasks.patch > > mm-dirty-balancing-for-tasks-warning-fix.patch > > And, this one: > > debug-sysfs-files-for-the-current-ratio-size-total.patch > > > I'm not sure polluting /sys/block/<foo>/queue/ like that is The Right > Thing. hm, I suppose not. It leaves nowhere for nfs, for a start. > These patches sure were handy when debugging this, but not sure > they want to move to maineline. > > Maybe we want /sys/bdi/<foo>/ or maybe /debug/bdi/<foo>/ > If you think we'll need this stuff for support/debug during 2.6.24-rcX then sure - we can always take it out prior to 2.6.24-final. otoh, if we're going to take that approach we might as well leave things in /sys/block/<foo>/queue. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 8:31 ` Andrew Morton @ 2007-10-02 8:48 ` Peter Zijlstra 2007-10-02 10:31 ` Kay Sievers 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-02 8:48 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Jens Axboe, Fengguang Wu On Tue, 2007-10-02 at 01:31 -0700, Andrew Morton wrote: > On Tue, 02 Oct 2007 10:17:05 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > > On Mon, 2007-10-01 at 14:22 -0700, Andrew Morton wrote: > > > > > nfs-remove-congestion_end.patch > > > lib-percpu_counter_add.patch > > > lib-percpu_counter_sub.patch > > > lib-percpu_counter-variable-batch.patch > > > lib-make-percpu_counter_add-take-s64.patch > > > lib-percpu_counter_set.patch > > > lib-percpu_counter_sum_positive.patch > > > lib-percpu_count_sum.patch > > > lib-percpu_counter_init-error-handling.patch > > > lib-percpu_counter_init_irq.patch > > > mm-bdi-init-hooks.patch > > > mm-scalable-bdi-statistics-counters.patch > > > mm-count-reclaimable-pages-per-bdi.patch > > > mm-count-writeback-pages-per-bdi.patch > > > > This one: > > > mm-expose-bdi-statistics-in-sysfs.patch > > > > > lib-floating-proportions.patch > > > mm-per-device-dirty-threshold.patch > > > mm-per-device-dirty-threshold-warning-fix.patch > > > mm-per-device-dirty-threshold-fix.patch > > > mm-dirty-balancing-for-tasks.patch > > > mm-dirty-balancing-for-tasks-warning-fix.patch > > > > And, this one: > > > debug-sysfs-files-for-the-current-ratio-size-total.patch > > > > > > I'm not sure polluting /sys/block/<foo>/queue/ like that is The Right > > Thing. > > hm, I suppose not. It leaves nowhere for nfs, for a start. > > > These patches sure were handy when debugging this, but not sure > > they want to move to maineline. > > > > Maybe we want /sys/bdi/<foo>/ or maybe /debug/bdi/<foo>/ > > > > If you think we'll need this stuff for support/debug during 2.6.24-rcX then > sure - we can always take it out prior to 2.6.24-final. > > otoh, if we're going to take that approach we might as well leave things in > /sys/block/<foo>/queue. People seem to have tested the code in various demanding scenarios (both -mm and the back-port to .22), so I'm fairly confident that this part works well (famous last words,.. I know I'll regret having said this). Also the NFS issue bothers me to no end. I'll try and come up with a /sys/bdi/<foo> thing. SLUB should have some sysfs code I can copy from - last time I looked at doing sysfs I just gave up. God awful mess that is. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 8:48 ` Peter Zijlstra @ 2007-10-02 10:31 ` Kay Sievers 2007-10-02 10:44 ` Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-10-02 10:31 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On 10/2/07, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2007-10-02 at 01:31 -0700, Andrew Morton wrote: > > On Tue, 02 Oct 2007 10:17:05 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > > > > On Mon, 2007-10-01 at 14:22 -0700, Andrew Morton wrote: > > > > > > > nfs-remove-congestion_end.patch > > > > lib-percpu_counter_add.patch > > > > lib-percpu_counter_sub.patch > > > > lib-percpu_counter-variable-batch.patch > > > > lib-make-percpu_counter_add-take-s64.patch > > > > lib-percpu_counter_set.patch > > > > lib-percpu_counter_sum_positive.patch > > > > lib-percpu_count_sum.patch > > > > lib-percpu_counter_init-error-handling.patch > > > > lib-percpu_counter_init_irq.patch > > > > mm-bdi-init-hooks.patch > > > > mm-scalable-bdi-statistics-counters.patch > > > > mm-count-reclaimable-pages-per-bdi.patch > > > > mm-count-writeback-pages-per-bdi.patch > > > > > > This one: > > > > mm-expose-bdi-statistics-in-sysfs.patch > > > > > > > lib-floating-proportions.patch > > > > mm-per-device-dirty-threshold.patch > > > > mm-per-device-dirty-threshold-warning-fix.patch > > > > mm-per-device-dirty-threshold-fix.patch > > > > mm-dirty-balancing-for-tasks.patch > > > > mm-dirty-balancing-for-tasks-warning-fix.patch > > > > > > And, this one: > > > > debug-sysfs-files-for-the-current-ratio-size-total.patch > > > > > > > > > I'm not sure polluting /sys/block/<foo>/queue/ like that is The Right > > > Thing. > > > > hm, I suppose not. It leaves nowhere for nfs, for a start. > > > > > These patches sure were handy when debugging this, but not sure > > > they want to move to maineline. > > > > > > Maybe we want /sys/bdi/<foo>/ or maybe /debug/bdi/<foo>/ > > > > > > > If you think we'll need this stuff for support/debug during 2.6.24-rcX then > > sure - we can always take it out prior to 2.6.24-final. > > > > otoh, if we're going to take that approach we might as well leave things in > > /sys/block/<foo>/queue. > > People seem to have tested the code in various demanding scenarios (both > -mm and the back-port to .22), so I'm fairly confident that this part > works well (famous last words,.. I know I'll regret having said this). > > Also the NFS issue bothers me to no end. > > I'll try and come up with a /sys/bdi/<foo> thing. SLUB should have some > sysfs code I can copy from - last time I looked at doing sysfs I just > gave up. God awful mess that is. What would be the point in another top-level tree for device information? All devices you are exporting information for, are already in the sysfs tree, right? Thanks, Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 10:31 ` Kay Sievers @ 2007-10-02 10:44 ` Peter Zijlstra [not found] ` <20071002104734.GA9410@mail.ustc.edu.cn> 2007-10-02 11:21 ` Kay Sievers 0 siblings, 2 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-10-02 10:44 UTC (permalink / raw) To: Kay Sievers; +Cc: Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Tue, 2007-10-02 at 12:31 +0200, Kay Sievers wrote: > What would be the point in another top-level tree for device > information? All devices you are exporting information for, are > already in the sysfs tree, right? Never did find NFS mounts/servers/superblocks or whatever constitutes a BDI for NFS in there. Same goes for all other networked filesystems for that matter. ^ permalink raw reply [flat|nested] 112+ messages in thread
[parent not found: <20071002104734.GA9410@mail.ustc.edu.cn>]
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) [not found] ` <20071002104734.GA9410@mail.ustc.edu.cn> @ 2007-10-02 10:47 ` Fengguang Wu 2007-10-02 11:22 ` Kay Sievers 0 siblings, 1 reply; 112+ messages in thread From: Fengguang Wu @ 2007-10-02 10:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Kay Sievers, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Tue, Oct 02, 2007 at 12:44:21PM +0200, Peter Zijlstra wrote: > On Tue, 2007-10-02 at 12:31 +0200, Kay Sievers wrote: > > > What would be the point in another top-level tree for device > > information? All devices you are exporting information for, are > > already in the sysfs tree, right? > > Never did find NFS mounts/servers/superblocks or whatever constitutes a > BDI for NFS in there. Same goes for all other networked filesystems for > that matter. And loop/md/dm devices... ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 10:47 ` Fengguang Wu @ 2007-10-02 11:22 ` Kay Sievers [not found] ` <20071002112802.GA12607@mail.ustc.edu.cn> 0 siblings, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-10-02 11:22 UTC (permalink / raw) To: Fengguang Wu Cc: Peter Zijlstra, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On 10/2/07, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote: > On Tue, Oct 02, 2007 at 12:44:21PM +0200, Peter Zijlstra wrote: > > On Tue, 2007-10-02 at 12:31 +0200, Kay Sievers wrote: > > > > > What would be the point in another top-level tree for device > > > information? All devices you are exporting information for, are > > > already in the sysfs tree, right? > > > > Never did find NFS mounts/servers/superblocks or whatever constitutes a > > BDI for NFS in there. Same goes for all other networked filesystems for > > that matter. > > And loop/md/dm devices... Hmm, /sys/block/mdX, /sys/block/loopX, /sys/block/dm-X are all there today. Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
[parent not found: <20071002112802.GA12607@mail.ustc.edu.cn>]
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) [not found] ` <20071002112802.GA12607@mail.ustc.edu.cn> @ 2007-10-02 11:28 ` Fengguang Wu 0 siblings, 0 replies; 112+ messages in thread From: Fengguang Wu @ 2007-10-02 11:28 UTC (permalink / raw) To: Kay Sievers Cc: Peter Zijlstra, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Tue, Oct 02, 2007 at 01:22:32PM +0200, Kay Sievers wrote: > On 10/2/07, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote: > > On Tue, Oct 02, 2007 at 12:44:21PM +0200, Peter Zijlstra wrote: > > > On Tue, 2007-10-02 at 12:31 +0200, Kay Sievers wrote: > > > > > > > What would be the point in another top-level tree for device > > > > information? All devices you are exporting information for, are > > > > already in the sysfs tree, right? > > > > > > Never did find NFS mounts/servers/superblocks or whatever constitutes a > > > BDI for NFS in there. Same goes for all other networked filesystems for > > > that matter. > > > > And loop/md/dm devices... > > Hmm, /sys/block/mdX, /sys/block/loopX, /sys/block/dm-X are all there today. Yes, but they have no /queue/ subdir, which (for sda etc.) exports both elevator and bdi variables. Now we want to access these bdi variables... ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 10:44 ` Peter Zijlstra [not found] ` <20071002104734.GA9410@mail.ustc.edu.cn> @ 2007-10-02 11:21 ` Kay Sievers 2007-10-02 11:40 ` Peter Zijlstra 1 sibling, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-10-02 11:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On 10/2/07, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2007-10-02 at 12:31 +0200, Kay Sievers wrote: > > > What would be the point in another top-level tree for device > > information? All devices you are exporting information for, are > > already in the sysfs tree, right? > > Never did find NFS mounts/servers/superblocks or whatever constitutes a > BDI for NFS in there. Same goes for all other networked filesystems for > that matter. How about adding this information to the tree then, instead of creating a new top-level hack, just because something that you think you need doesn't exist. You called sysfs a mess, seems you work on that topic too. :) Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 11:21 ` Kay Sievers @ 2007-10-02 11:40 ` Peter Zijlstra 2007-10-02 12:05 ` Nick Piggin 2007-10-02 14:38 ` Kay Sievers 0 siblings, 2 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-10-02 11:40 UTC (permalink / raw) To: Kay Sievers; +Cc: Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > On 10/2/07, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, 2007-10-02 at 12:31 +0200, Kay Sievers wrote: > > > > > What would be the point in another top-level tree for device > > > information? All devices you are exporting information for, are > > > already in the sysfs tree, right? > > > > Never did find NFS mounts/servers/superblocks or whatever constitutes a > > BDI for NFS in there. Same goes for all other networked filesystems for > > that matter. > > How about adding this information to the tree then, instead of > creating a new top-level hack, just because something that you think > you need doesn't exist. So you suggest adding all the various network filesystems in there (where?), and adding the concept of a BDI, and ensuring all are properly linked together - somehow. Feel free to do so. > You called sysfs a mess, seems you work on that topic too. :) I called the in-kernel API to create sysfs files a mess. Not that I have another opinion on the content of /sys though, always takes to damn long to find anything in there. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 11:40 ` Peter Zijlstra @ 2007-10-02 12:05 ` Nick Piggin 2007-10-03 10:15 ` Kay Sievers 2007-10-02 14:38 ` Kay Sievers 1 sibling, 1 reply; 112+ messages in thread From: Nick Piggin @ 2007-10-02 12:05 UTC (permalink / raw) To: Peter Zijlstra Cc: Kay Sievers, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Tuesday 02 October 2007 21:40, Peter Zijlstra wrote: > On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > > How about adding this information to the tree then, instead of > > creating a new top-level hack, just because something that you think > > you need doesn't exist. > > So you suggest adding all the various network filesystems in there > (where?), and adding the concept of a BDI, and ensuring all are properly > linked together - somehow. Feel free to do so. Would something fit better under /sys/fs/? At least filesystems are already an existing concept to userspace. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 12:05 ` Nick Piggin @ 2007-10-03 10:15 ` Kay Sievers 2007-10-03 10:37 ` Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-10-03 10:15 UTC (permalink / raw) To: Nick Piggin Cc: Peter Zijlstra, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Tue, 2007-10-02 at 22:05 +1000, Nick Piggin wrote: > On Tuesday 02 October 2007 21:40, Peter Zijlstra wrote: > > On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > > > > How about adding this information to the tree then, instead of > > > creating a new top-level hack, just because something that you think > > > you need doesn't exist. > > > > So you suggest adding all the various network filesystems in there > > (where?), and adding the concept of a BDI, and ensuring all are properly > > linked together - somehow. Feel free to do so. > > Would something fit better under /sys/fs/? At least filesystems are > already an existing concept to userspace. Sounds at least less messy than an new top-level directory. But again, if it's "device" releated, like the name suggests, it should be reachable from the device tree. Which userspace tool is supposed to set these values, and at what time? An init-script, something at device discovery/setup? If that is is ever going to be used in a hotplug setup, you really don't want to go look for directories with magic device names in another disconnected tree. Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-03 10:15 ` Kay Sievers @ 2007-10-03 10:37 ` Peter Zijlstra 2007-10-03 13:35 ` Kay Sievers 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-03 10:37 UTC (permalink / raw) To: Kay Sievers Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Wed, 2007-10-03 at 12:15 +0200, Kay Sievers wrote: > On Tue, 2007-10-02 at 22:05 +1000, Nick Piggin wrote: > > On Tuesday 02 October 2007 21:40, Peter Zijlstra wrote: > > > On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > > > > > > How about adding this information to the tree then, instead of > > > > creating a new top-level hack, just because something that you think > > > > you need doesn't exist. > > > > > > So you suggest adding all the various network filesystems in there > > > (where?), and adding the concept of a BDI, and ensuring all are properly > > > linked together - somehow. Feel free to do so. > > > > Would something fit better under /sys/fs/? At least filesystems are > > already an existing concept to userspace. > > Sounds at least less messy than an new top-level directory. > > But again, if it's "device" releated, like the name suggests, it should > be reachable from the device tree. > Which userspace tool is supposed to set these values, and at what time? > An init-script, something at device discovery/setup? If that is is ever > going to be used in a hotplug setup, you really don't want to go look > for directories with magic device names in another disconnected tree. Filesystems don't really map to BDIs either. One can have multiple FSs per BDI. 'Normally' a BDI relates to a block device, but networked (and other non-block device) filesystems have to create a BDI too. So these need to be represented some place as well. The typical usage would indeed be init scripts. The typical example would be setting the read-ahead window. Currently that cannot be done for NFS mounts. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-03 10:37 ` Peter Zijlstra @ 2007-10-03 13:35 ` Kay Sievers 2007-10-03 13:58 ` Peter Zijlstra 2007-10-26 14:48 ` Peter Zijlstra 0 siblings, 2 replies; 112+ messages in thread From: Kay Sievers @ 2007-10-03 13:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Wed, 2007-10-03 at 12:37 +0200, Peter Zijlstra wrote: > On Wed, 2007-10-03 at 12:15 +0200, Kay Sievers wrote: > > On Tue, 2007-10-02 at 22:05 +1000, Nick Piggin wrote: > > > On Tuesday 02 October 2007 21:40, Peter Zijlstra wrote: > > > > On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > > > > > > > > How about adding this information to the tree then, instead of > > > > > creating a new top-level hack, just because something that you think > > > > > you need doesn't exist. > > > > > > > > So you suggest adding all the various network filesystems in there > > > > (where?), and adding the concept of a BDI, and ensuring all are properly > > > > linked together - somehow. Feel free to do so. > > > > > > Would something fit better under /sys/fs/? At least filesystems are > > > already an existing concept to userspace. > > > > Sounds at least less messy than an new top-level directory. > > > > But again, if it's "device" releated, like the name suggests, it should > > be reachable from the device tree. > > Which userspace tool is supposed to set these values, and at what time? > > An init-script, something at device discovery/setup? If that is is ever > > going to be used in a hotplug setup, you really don't want to go look > > for directories with magic device names in another disconnected tree. > > Filesystems don't really map to BDIs either. One can have multiple FSs > per BDI. > > 'Normally' a BDI relates to a block device, but networked (and other > non-block device) filesystems have to create a BDI too. So these need to > be represented some place as well. > > The typical usage would indeed be init scripts. The typical example > would be setting the read-ahead window. Currently that cannot be done > for NFS mounts. What kind of context for a non-block based fs will get the bdi controls added? Is there a generic place, or does every non-block based filesystem needs to be adapted individually to use it? Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-03 13:35 ` Kay Sievers @ 2007-10-03 13:58 ` Peter Zijlstra 2007-10-26 14:48 ` Peter Zijlstra 1 sibling, 0 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-10-03 13:58 UTC (permalink / raw) To: Kay Sievers Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On Wed, 2007-10-03 at 15:35 +0200, Kay Sievers wrote: > On Wed, 2007-10-03 at 12:37 +0200, Peter Zijlstra wrote: > > On Wed, 2007-10-03 at 12:15 +0200, Kay Sievers wrote: > > > On Tue, 2007-10-02 at 22:05 +1000, Nick Piggin wrote: > > > > On Tuesday 02 October 2007 21:40, Peter Zijlstra wrote: > > > > > On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > > > > > > > > > > How about adding this information to the tree then, instead of > > > > > > creating a new top-level hack, just because something that you think > > > > > > you need doesn't exist. > > > > > > > > > > So you suggest adding all the various network filesystems in there > > > > > (where?), and adding the concept of a BDI, and ensuring all are properly > > > > > linked together - somehow. Feel free to do so. > > > > > > > > Would something fit better under /sys/fs/? At least filesystems are > > > > already an existing concept to userspace. > > > > > > Sounds at least less messy than an new top-level directory. > > > > > > But again, if it's "device" releated, like the name suggests, it should > > > be reachable from the device tree. > > > Which userspace tool is supposed to set these values, and at what time? > > > An init-script, something at device discovery/setup? If that is is ever > > > going to be used in a hotplug setup, you really don't want to go look > > > for directories with magic device names in another disconnected tree. > > > > Filesystems don't really map to BDIs either. One can have multiple FSs > > per BDI. > > > > 'Normally' a BDI relates to a block device, but networked (and other > > non-block device) filesystems have to create a BDI too. So these need to > > be represented some place as well. > > > > The typical usage would indeed be init scripts. The typical example > > would be setting the read-ahead window. Currently that cannot be done > > for NFS mounts. > > What kind of context for a non-block based fs will get the bdi controls > added? Is there a generic place, or does every non-block based > filesystem needs to be adapted individually to use it? Not sure what the other non-block FSs do, but NFS puts it in its superblock. So that would roughly be per mount. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-03 13:35 ` Kay Sievers 2007-10-03 13:58 ` Peter Zijlstra @ 2007-10-26 14:48 ` Peter Zijlstra 2007-10-26 15:06 ` Miklos Szeredi ` (2 more replies) 1 sibling, 3 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-10-26 14:48 UTC (permalink / raw) To: Kay Sievers Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi On Wed, 2007-10-03 at 15:35 +0200, Kay Sievers wrote: > On Wed, 2007-10-03 at 12:37 +0200, Peter Zijlstra wrote: > > On Wed, 2007-10-03 at 12:15 +0200, Kay Sievers wrote: > > > On Tue, 2007-10-02 at 22:05 +1000, Nick Piggin wrote: > > > > On Tuesday 02 October 2007 21:40, Peter Zijlstra wrote: > > > > > On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > > > > > > > > > > How about adding this information to the tree then, instead of > > > > > > creating a new top-level hack, just because something that you think > > > > > > you need doesn't exist. > > > > > > > > > > So you suggest adding all the various network filesystems in there > > > > > (where?), and adding the concept of a BDI, and ensuring all are properly > > > > > linked together - somehow. Feel free to do so. > > > > > > > > Would something fit better under /sys/fs/? At least filesystems are > > > > already an existing concept to userspace. > > > > > > Sounds at least less messy than an new top-level directory. > > > > > > But again, if it's "device" releated, like the name suggests, it should > > > be reachable from the device tree. > > > Which userspace tool is supposed to set these values, and at what time? > > > An init-script, something at device discovery/setup? If that is is ever > > > going to be used in a hotplug setup, you really don't want to go look > > > for directories with magic device names in another disconnected tree. > > > > Filesystems don't really map to BDIs either. One can have multiple FSs > > per BDI. > > > > 'Normally' a BDI relates to a block device, but networked (and other > > non-block device) filesystems have to create a BDI too. So these need to > > be represented some place as well. > > > > The typical usage would indeed be init scripts. The typical example > > would be setting the read-ahead window. Currently that cannot be done > > for NFS mounts. > > What kind of context for a non-block based fs will get the bdi controls > added? Is there a generic place, or does every non-block based > filesystem needs to be adapted individually to use it? --- Subject: bdi: debugfs interface Expose the BDI stats (and readahead window) in /debug/bdi/ I'm still thinking it should go into /sys somewhere, however I just noticed not all block devices that have a queue have a /queue directory. Noticeably those that use make_request_fn() as opposed to request_fn(). And then of course there are the non-block/non-queue BDIs. A BDI is basically the object that represents the 'thing' you dirty pages against. For block devices that is related to the block device (and is typically embedded in the queue object), for NFS mounts its the remote server object of the client. For FUSE, yet again something else. I appreciate the sysfs people their opinion that /sys/bdi/ might not be the best from their POV, however I'm not seeing where to hook the BDI object from so that it all makes sense, a few of the things are currently not exposed in sysfs at all, like the NFS and FUSE things. So, for now, I've exposed the thing in debugfs. Please suggest a better alternative. Miklos, Trond: could you suggest a better fmt for the bdi_init_fmt() for your respective filesystems? Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Miklos Szeredi <miklos@szeredi.hu> CC: Trond Myklebust <trond.myklebust@fys.uio.no> --- block/genhd.c | 2 block/ll_rw_blk.c | 1 drivers/block/loop.c | 7 ++ drivers/md/dm.c | 2 drivers/md/md.c | 2 fs/fuse/inode.c | 2 fs/nfs/client.c | 2 include/linux/backing-dev.h | 15 ++++ include/linux/debugfs.h | 11 +++ include/linux/writeback.h | 3 mm/backing-dev.c | 153 ++++++++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 2 12 files changed, 199 insertions(+), 3 deletions(-) Index: linux-2.6-2/fs/fuse/inode.c =================================================================== --- linux-2.6-2.orig/fs/fuse/inode.c +++ linux-2.6-2/fs/fuse/inode.c @@ -467,7 +467,7 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; - err = bdi_init(&fc->bdi); + err = bdi_init_fmt(&fc->bdi, "fuse-%p", fc); if (err) { kfree(fc); fc = NULL; Index: linux-2.6-2/fs/nfs/client.c =================================================================== --- linux-2.6-2.orig/fs/nfs/client.c +++ linux-2.6-2/fs/nfs/client.c @@ -678,7 +678,7 @@ static int nfs_probe_fsinfo(struct nfs_s goto out_error; nfs_server_set_fsinfo(server, &fsinfo); - error = bdi_init(&server->backing_dev_info); + error = bdi_init_fmt(&server->backing_dev_info, "nfs-%s-%p", clp->cl_hostname, server); if (error) goto out_error; Index: linux-2.6-2/include/linux/backing-dev.h =================================================================== --- linux-2.6-2.orig/include/linux/backing-dev.h +++ linux-2.6-2/include/linux/backing-dev.h @@ -11,6 +11,7 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> #include <asm/atomic.h> struct page; @@ -48,11 +49,25 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + +#ifdef CONFIG_DEBUG_FS + char *name; + + struct dentry *debugfs_dir; + struct dentry *debugfs_ra; + struct dentry *debugfs_stat[NR_BDI_STAT_ITEMS]; + struct dentry *debugfs_dirty; + struct dentry *debugfs_bdi_dirty; +#endif }; int bdi_init(struct backing_dev_info *bdi); +int bdi_init_fmt(struct backing_dev_info *bdi, const char *fmt, ...); void bdi_destroy(struct backing_dev_info *bdi); +int bdi_register(struct backing_dev_info *bdi, char *name); +void bdi_unregister(struct backing_dev_info *bdi); + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { Index: linux-2.6-2/include/linux/debugfs.h =================================================================== --- linux-2.6-2.orig/include/linux/debugfs.h +++ linux-2.6-2/include/linux/debugfs.h @@ -165,4 +165,15 @@ static inline struct dentry *debugfs_cre #endif +static inline struct dentry *debugfs_create_long(const char *name, mode_t mode, + struct dentry *parent, + unsigned long *value) +{ +#if BITS_PER_LONG == 32 + return debugfs_create_u32(name,mode, parent, (u32*)value); +#else + return debugfs_create_u64(name,mode, parent, (u64*)value); +#endif +} + #endif Index: linux-2.6-2/include/linux/writeback.h =================================================================== --- linux-2.6-2.orig/include/linux/writeback.h +++ linux-2.6-2/include/linux/writeback.h @@ -113,6 +113,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); Index: linux-2.6-2/mm/backing-dev.c =================================================================== --- linux-2.6-2.orig/mm/backing-dev.c +++ linux-2.6-2/mm/backing-dev.c @@ -4,12 +4,158 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/debugfs.h> +#include <linux/writeback.h> + +#ifdef CONFIG_DEBUG_FS + +static struct dentry *debugfs_dir; + +static __init int bdifs_init(void) +{ + debugfs_dir = debugfs_create_dir("bdi", NULL); + return 0; +} + +__initcall(bdifs_init); + +static const char *stat_name[NR_BDI_STAT_ITEMS] = { + "reclaimable_pages", + "writeback_pages", +}; + +static u64 stat_get(void *data) +{ + return percpu_counter_read_positive((struct percpu_counter *)data); +} + +DEFINE_SIMPLE_ATTRIBUTE(stat_ops, stat_get, NULL, "%llu\n"); + +static u64 dirty_get(void *data) +{ + struct backing_dev_info *bdi = data; + long background, dirty, bdi_dirty; + + get_dirty_limits(&background, &dirty, &bdi_dirty, bdi); + + return dirty; +} + +DEFINE_SIMPLE_ATTRIBUTE(dirty_ops, dirty_get, NULL, "%llu\n"); + +static u64 bdi_dirty_get(void *data) +{ + struct backing_dev_info *bdi = data; + long background, dirty, bdi_dirty; + + get_dirty_limits(&background, &dirty, &bdi_dirty, bdi); + + return bdi_dirty; +} + +DEFINE_SIMPLE_ATTRIBUTE(bdi_dirty_ops, bdi_dirty_get, NULL, "%llu\n"); + +int bdi_register(struct backing_dev_info *bdi, char *name) +{ + int i; + + if (bdi->debugfs_dir) + return -EEXIST; + + bdi->name = kstrdup(name, GFP_KERNEL); + if (!name) + return -ENOMEM; + + bdi->debugfs_dir = debugfs_create_dir(bdi->name, debugfs_dir); + if (bdi->debugfs_dir) { + bdi->debugfs_ra = debugfs_create_long("readahead_pages", 0644, + bdi->debugfs_dir, &bdi->ra_pages); + + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { + bdi->debugfs_stat[i] = + debugfs_create_file(stat_name[i], + 0444, bdi->debugfs_dir, + &bdi->bdi_stat[i], &stat_ops); + } + + bdi->debugfs_dirty = + debugfs_create_file("dirty_pages", + 0444, bdi->debugfs_dir, + bdi, &dirty_ops); + + bdi->debugfs_bdi_dirty = + debugfs_create_file("bdi_dirty_pages", + 0444, bdi->debugfs_dir, + bdi, &bdi_dirty_ops); + } else + return -ENOMEM; + + return 0; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ + int i; + + debugfs_remove(bdi->debugfs_bdi_dirty); + debugfs_remove(bdi->debugfs_dirty); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) + debugfs_remove(bdi->debugfs_stat[i]); + debugfs_remove(bdi->debugfs_ra); + debugfs_remove(bdi->debugfs_dir); + + kfree(bdi->name); + + bdi->debugfs_dir = NULL; +} + +int bdi_init_fmt(struct backing_dev_info *bdi, const char *fmt, ...) +{ + int ret; + va_list args; + char buf[64]; + + va_start(args, fmt); + vsnprintf(buf, sizeof(buf), fmt, args); + va_end(args); + + ret = bdi_init(bdi); + if (!ret) { + ret = bdi_register(bdi, buf); + if (ret) + bdi_destroy(bdi); + } + + return ret; +} + +#else + +int bdi_register(struct backing_dev_info *bdi, char *name) +{ + return 0; +} + +inline void bdi_unregister(struct backing_dev_info *bdi) +{ +} + +int bdi_init_fmt(struct backing_dev_info *bdi, const char *fmt, ...) +{ + return bdi_init(bdi); +} + +#endif + +EXPORT_SYMBOL(bdi_init_fmt); int bdi_init(struct backing_dev_info *bdi) { int i, j; int err; + memset(bdi, 0, sizeof(*bdi)); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +181,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); Index: linux-2.6-2/mm/page-writeback.c =================================================================== --- linux-2.6-2.orig/mm/page-writeback.c +++ linux-2.6-2/mm/page-writeback.c @@ -291,7 +291,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { Index: linux-2.6-2/block/genhd.c =================================================================== --- linux-2.6-2.orig/block/genhd.c +++ linux-2.6-2/block/genhd.c @@ -182,6 +182,7 @@ void add_disk(struct gendisk *disk) disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + bdi_register(&disk->queue->backing_dev_info, disk->disk_name); } EXPORT_SYMBOL(add_disk); @@ -190,6 +191,7 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { blk_unregister_queue(disk); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); } ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 14:48 ` Peter Zijlstra @ 2007-10-26 15:06 ` Miklos Szeredi 2007-10-26 15:10 ` Kay Sievers 2007-10-26 16:37 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Trond Myklebust 2 siblings, 0 replies; 112+ messages in thread From: Miklos Szeredi @ 2007-10-26 15:06 UTC (permalink / raw) To: peterz Cc: kay.sievers, nickpiggin, akpm, linux-kernel, jens.axboe, fengguang.wu, greg, trond.myklebust, miklos > Subject: bdi: debugfs interface > > Expose the BDI stats (and readahead window) in /debug/bdi/ > > I'm still thinking it should go into /sys somewhere, however I just noticed > not all block devices that have a queue have a /queue directory. Noticeably > those that use make_request_fn() as opposed to request_fn(). And then of > course there are the non-block/non-queue BDIs. > > A BDI is basically the object that represents the 'thing' you dirty pages > against. For block devices that is related to the block device (and is > typically embedded in the queue object), for NFS mounts its the remote server > object of the client. For FUSE, yet again something else. > > I appreciate the sysfs people their opinion that /sys/bdi/ might not be the > best from their POV, however I'm not seeing where to hook the BDI object from > so that it all makes sense, a few of the things are currently not exposed in > sysfs at all, like the NFS and FUSE things. > > So, for now, I've exposed the thing in debugfs. Please suggest a better > alternative. > > Miklos, Trond: could you suggest a better fmt for the bdi_init_fmt() for your > respective filesystems? For fuse: err = bdi_init_fmt(&fc->bdi, "fuse-%llu", (unsigned long long) fc->id); This would match the connection ID in /sys/fs/fuse/connections/* Miklos ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 14:48 ` Peter Zijlstra 2007-10-26 15:06 ` Miklos Szeredi @ 2007-10-26 15:10 ` Kay Sievers 2007-10-26 15:22 ` Peter Zijlstra 2007-10-26 16:37 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Trond Myklebust 2 siblings, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-10-26 15:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: > I appreciate the sysfs people their opinion that /sys/bdi/ might not be the > best from their POV, however I'm not seeing where to hook the BDI object from > so that it all makes sense, a few of the things are currently not exposed in > sysfs at all, like the NFS and FUSE things. What happended to the idea to create a "bdi" class, and have the existing devices as parents, and for stuff that is not (not now, or never) in sysfs, no parent is set. Thanks, Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 15:10 ` Kay Sievers @ 2007-10-26 15:22 ` Peter Zijlstra 2007-10-26 15:33 ` Kay Sievers 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-26 15:22 UTC (permalink / raw) To: Kay Sievers Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi On Fri, 2007-10-26 at 17:10 +0200, Kay Sievers wrote: > On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: > > > I appreciate the sysfs people their opinion that /sys/bdi/ might not be the > > best from their POV, however I'm not seeing where to hook the BDI object from > > so that it all makes sense, a few of the things are currently not exposed in > > sysfs at all, like the NFS and FUSE things. > > What happended to the idea to create a "bdi" class, and have the > existing devices as parents, and for stuff that is not (not now, or > never) in sysfs, no parent is set. Must have forgotten about that, mainly because I'm not sure I fully understand it. So we create a class, create these objects, which are all called bdi and have children with these attributes in it. Now, I supposed there is a directory that lists all unparented thingies, how do I locate the one that matches my nfs mount? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 15:22 ` Peter Zijlstra @ 2007-10-26 15:33 ` Kay Sievers 2007-10-26 15:33 ` Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-10-26 15:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi On Fri, 2007-10-26 at 17:22 +0200, Peter Zijlstra wrote: > On Fri, 2007-10-26 at 17:10 +0200, Kay Sievers wrote: > > On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: > > > > > I appreciate the sysfs people their opinion that /sys/bdi/ might not be the > > > best from their POV, however I'm not seeing where to hook the BDI object from > > > so that it all makes sense, a few of the things are currently not exposed in > > > sysfs at all, like the NFS and FUSE things. > > > > What happended to the idea to create a "bdi" class, and have the > > existing devices as parents, and for stuff that is not (not now, or > > never) in sysfs, no parent is set. > > Must have forgotten about that, mainly because I'm not sure I fully > understand it. > > So we create a class, Yes. > create these objects, Yes, "struct device" objects, assigned to the "bdi" class. (Don't use class_device, that will be removed soon.) > which are all called bdi Probably not. You can name it how you want, you can inherit the name of the parent, or prefix it with whatever fits, they just need to be unique. Things like the "fuse-%llu" name would work just fine. I guess you already solved that problem in the debugfs directory. > and have children with these attributes in it. The attributes would just be files in the device object. > Now, I supposed there is a directory that lists all unparented thingies, > how do I locate the one that matches my nfs mount? You look for the name (prefix), try: "ls /sys/class/sound/", it's the same model all over the place. Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 15:33 ` Kay Sievers @ 2007-10-26 15:33 ` Peter Zijlstra 2007-10-26 15:55 ` Kay Sievers 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-26 15:33 UTC (permalink / raw) To: Kay Sievers Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi On Fri, 2007-10-26 at 17:33 +0200, Kay Sievers wrote: > On Fri, 2007-10-26 at 17:22 +0200, Peter Zijlstra wrote: > > On Fri, 2007-10-26 at 17:10 +0200, Kay Sievers wrote: > > > On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: > > > > > > > I appreciate the sysfs people their opinion that /sys/bdi/ might not be the > > > > best from their POV, however I'm not seeing where to hook the BDI object from > > > > so that it all makes sense, a few of the things are currently not exposed in > > > > sysfs at all, like the NFS and FUSE things. > > > > > > What happended to the idea to create a "bdi" class, and have the > > > existing devices as parents, and for stuff that is not (not now, or > > > never) in sysfs, no parent is set. > > > > Must have forgotten about that, mainly because I'm not sure I fully > > understand it. > > > > So we create a class, > > Yes. > > > create these objects, > > Yes, "struct device" objects, assigned to the "bdi" class. (Don't use > class_device, that will be removed soon.) > > > which are all called bdi > > Probably not. You can name it how you want, you can inherit the name of > the parent, or prefix it with whatever fits, they just need to be > unique. Things like the "fuse-%llu" name would work just fine. I guess > you already solved that problem in the debugfs directory. > > > and have children with these attributes in it. > > The attributes would just be files in the device object. > > > Now, I supposed there is a directory that lists all unparented thingies, > > how do I locate the one that matches my nfs mount? > > You look for the name (prefix), try: "ls /sys/class/sound/", it's the > same model all over the place. Ok, will try that. Is there a 'simple uncluttered' example I could look at to copy from? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 15:33 ` Peter Zijlstra @ 2007-10-26 15:55 ` Kay Sievers 2007-10-26 20:04 ` Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-10-26 15:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi On Fri, 2007-10-26 at 17:33 +0200, Peter Zijlstra wrote: > On Fri, 2007-10-26 at 17:33 +0200, Kay Sievers wrote: > > On Fri, 2007-10-26 at 17:22 +0200, Peter Zijlstra wrote: > > > On Fri, 2007-10-26 at 17:10 +0200, Kay Sievers wrote: > > > > On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: > > > > > > > > > I appreciate the sysfs people their opinion that /sys/bdi/ might not be the > > > > > best from their POV, however I'm not seeing where to hook the BDI object from > > > > > so that it all makes sense, a few of the things are currently not exposed in > > > > > sysfs at all, like the NFS and FUSE things. > > > > > > > > What happended to the idea to create a "bdi" class, and have the > > > > existing devices as parents, and for stuff that is not (not now, or > > > > never) in sysfs, no parent is set. > > > > > > Must have forgotten about that, mainly because I'm not sure I fully > > > understand it. > > > > > > So we create a class, > > > > Yes. > > > > > create these objects, > > > > Yes, "struct device" objects, assigned to the "bdi" class. (Don't use > > class_device, that will be removed soon.) > > > > > which are all called bdi > > > > Probably not. You can name it how you want, you can inherit the name of > > the parent, or prefix it with whatever fits, they just need to be > > unique. Things like the "fuse-%llu" name would work just fine. I guess > > you already solved that problem in the debugfs directory. > > > > > and have children with these attributes in it. > > > > The attributes would just be files in the device object. > > > > > Now, I supposed there is a directory that lists all unparented thingies, > > > how do I locate the one that matches my nfs mount? > > > > You look for the name (prefix), try: "ls /sys/class/sound/", it's the > > same model all over the place. > > Ok, will try that. Is there a 'simple uncluttered' example I could look > at to copy from? drivers/firmware/dmi-id.c It has only a single device created in the init routine, but it shows what to do with the class. Until the block subsystem is converted from using raw kobjects to devices, you need to set the parent kobject of the bdi device to the blockdev: bdidev->dev.kobj.parent = &disk->kobj Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 15:55 ` Kay Sievers @ 2007-10-26 20:04 ` Peter Zijlstra 2007-10-27 1:18 ` Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-26 20:04 UTC (permalink / raw) To: Kay Sievers Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi This crashes and burns on bootup, but I'm too tired to figure out what I did wrong... will give it another try tomorrow.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- block/genhd.c | 2 fs/fuse/inode.c | 2 fs/nfs/client.c | 2 include/linux/backing-dev.h | 33 ++++++++++++ include/linux/writeback.h | 3 + mm/backing-dev.c | 121 ++++++++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 2 7 files changed, 162 insertions(+), 3 deletions(-) Index: linux-2.6-2/fs/fuse/inode.c =================================================================== --- linux-2.6-2.orig/fs/fuse/inode.c +++ linux-2.6-2/fs/fuse/inode.c @@ -467,7 +467,7 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; - err = bdi_init(&fc->bdi); + err = bdi_init_fmt(&fc->bdi, "fuse-%llu", (unsigned long long)fc->id); if (err) { kfree(fc); fc = NULL; Index: linux-2.6-2/fs/nfs/client.c =================================================================== --- linux-2.6-2.orig/fs/nfs/client.c +++ linux-2.6-2/fs/nfs/client.c @@ -678,7 +678,7 @@ static int nfs_probe_fsinfo(struct nfs_s goto out_error; nfs_server_set_fsinfo(server, &fsinfo); - error = bdi_init(&server->backing_dev_info); + error = bdi_init_fmt(&server->backing_dev_info, "nfs-%s-%p", clp->cl_hostname, server); if (error) goto out_error; Index: linux-2.6-2/include/linux/backing-dev.h =================================================================== --- linux-2.6-2.orig/include/linux/backing-dev.h +++ linux-2.6-2/include/linux/backing-dev.h @@ -11,6 +11,8 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> +#include <linux/device.h> #include <asm/atomic.h> struct page; @@ -48,11 +50,42 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + +#ifdef CONFIG_SYSFS + struct device kdev; +#endif }; int bdi_init(struct backing_dev_info *bdi); void bdi_destroy(struct backing_dev_info *bdi); +int __bdi_register(struct backing_dev_info *bdi); +void bdi_unregister(struct backing_dev_info *bdi); + +#ifdef CONFIG_SYSFS +#define bdi_init_fmt(bdi, fmt...) \ + ({ \ + int ret; \ + kobject_set_name(&(bdi)->kdev.kobj, ##fmt); \ + ret = bdi_init(bdi); \ + if (!ret) { \ + ret = __bdi_register(bdi); \ + if (ret) \ + bdi_destroy(bdi); \ + } \ + ret; \ + }) + +#define bdi_register(bdi, fmt...) \ + ({ \ + kobject_set_name(&(bdi)->kdev.kobj, ##fmt); \ + __bdi_register(bdi); \ + }) +#else +#define bdi_init_fmt(bdi, fmt...) bdi_init(bdi) +#define bdi_register(bdi, fmt...) __bdi_register(bdi) +#endif + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { Index: linux-2.6-2/include/linux/writeback.h =================================================================== --- linux-2.6-2.orig/include/linux/writeback.h +++ linux-2.6-2/include/linux/writeback.h @@ -113,6 +113,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); Index: linux-2.6-2/mm/backing-dev.c =================================================================== --- linux-2.6-2.orig/mm/backing-dev.c +++ linux-2.6-2/mm/backing-dev.c @@ -4,12 +4,130 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/writeback.h> +#include <linux/device.h> + +#ifdef CONFIG_SYSFS + +static void bdi_release(struct device *dev) +{ +} + +static int bdi_uevent(struct device *dev, struct kobj_uevent_env *env) +{ + return 0; +} + +static struct class bdi_class = { + .name = "bdi", + .dev_release = bdi_release, + .dev_uevent = bdi_uevent, +}; + +static ssize_t readahead_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = + container_of(dev, struct backing_dev_info, kdev); + char *end; + + bdi->ra_pages = simple_strtoul(buf, &end, 10); + + return end - buf; +} + +#define BDI_SHOW(name, expr) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, char *page) \ +{ \ + struct backing_dev_info *bdi = \ + container_of(dev, struct backing_dev_info, kdev); \ + \ + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ +} + +BDI_SHOW(readahead, bdi->ra_pages) + +BDI_SHOW(reclaimable, bdi_stat(bdi, BDI_RECLAIMABLE)) +BDI_SHOW(writeback, bdi_stat(bdi, BDI_WRITEBACK)) + +static inline unsigned long get_dirty(struct backing_dev_info *bdi, int i) +{ + unsigned long thresh[3]; + + get_dirty_limits(&thresh[0], &thresh[1], &thresh[2], bdi); + + return thresh[i]; +} + +BDI_SHOW(dirty, get_dirty(bdi, 1)) +BDI_SHOW(bdi_dirty, get_dirty(bdi, 2)) + +static struct device_attribute bdi_dev_attrs[] = { + __ATTR(readahead, 0644, readahead_show, readahead_store), + __ATTR_RO(reclaimable), + __ATTR_RO(writeback), + __ATTR_RO(dirty), + __ATTR_RO(bdi_dirty), + __ATTR_NULL, +}; + +static struct attribute *bdi_attributes[ARRAY_SIZE(bdi_dev_attrs) + 1]; + +static struct attribute_group bdi_attribute_group = { + .attrs = bdi_attributes, +}; + +static struct attribute_group *bdi_attribute_groups[] = { + &bdi_attribute_group, + NULL, +}; + +static __init int bdi_class_init(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) + bdi_attributes[i] = &bdi_dev_attrs[i].attr; + + return class_register(&bdi_class); +} + +__initcall(bdi_class_init); + +int __bdi_register(struct backing_dev_info *bdi) +{ + bdi->kdev.class = &bdi_class; + bdi->kdev.groups = bdi_attribute_groups; + strcpy(bdi->kdev.bus_id, "bdi"); + return device_register(&bdi->kdev); +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ + device_unregister(&bdi->kdev); +} +#else +int __bdi_register(struct backing_dev_info *bdi) +{ + return 0; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ +} +#endif + +EXPORT_SYMBOL(__bdi_register); int bdi_init(struct backing_dev_info *bdi) { int i, j; int err; + memset(bdi, 0, sizeof(*bdi)); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +151,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); @@ -90,3 +210,4 @@ long congestion_wait(int rw, long timeou } EXPORT_SYMBOL(congestion_wait); + Index: linux-2.6-2/mm/page-writeback.c =================================================================== --- linux-2.6-2.orig/mm/page-writeback.c +++ linux-2.6-2/mm/page-writeback.c @@ -291,7 +291,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { Index: linux-2.6-2/block/genhd.c =================================================================== --- linux-2.6-2.orig/block/genhd.c +++ linux-2.6-2/block/genhd.c @@ -182,6 +182,7 @@ void add_disk(struct gendisk *disk) disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + bdi_register(&disk->queue->backing_dev_info, "%s", disk->disk_name); } EXPORT_SYMBOL(add_disk); @@ -190,6 +191,7 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { blk_unregister_queue(disk); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); } ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 20:04 ` Peter Zijlstra @ 2007-10-27 1:18 ` Peter Zijlstra 2007-10-27 2:40 ` Greg KH 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-27 1:18 UTC (permalink / raw) To: Kay Sievers Cc: Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Trond Myklebust, Miklos Szeredi On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: > This crashes and burns on bootup, but I'm too tired to figure out what I > did wrong... will give it another try tomorrow.. Ok, can't sleep.. took a look. I have several problems here. The thing that makes it go *boom* is the __ATTR_NULL. Removing that makes it boot. Albeit it then warns me of multiple duplicate sysfs objects, all named "bdi". For some obscure reason this device interface insists on using the bus_id as name (?!), and further reduces usability by limiting that to 20 odd characters. This makes it quite useless. I tried fudging around that limit by using device_rename and kobject_rename, but to no avail. Really, it should not be this hard to use, trying to expose a handfull of simple integers to userspace should not take 8h+ and still not work. Peter, who thinks sysfs is contorted mess beyond his skill. I'll stick to VM and scheduler code, that actually makes sense. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 1:18 ` Peter Zijlstra @ 2007-10-27 2:40 ` Greg KH 2007-10-27 8:39 ` Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Greg KH @ 2007-10-27 2:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Kay Sievers, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Sat, Oct 27, 2007 at 03:18:08AM +0200, Peter Zijlstra wrote: > > On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: > > This crashes and burns on bootup, but I'm too tired to figure out what I > > did wrong... will give it another try tomorrow.. > > Ok, can't sleep.. took a look. I have several problems here. > > The thing that makes it go *boom* is the __ATTR_NULL. Removing that > makes it boot. Albeit it then warns me of multiple duplicate sysfs > objects, all named "bdi". > > For some obscure reason this device interface insists on using the > bus_id as name (?!), and further reduces usability by limiting that to > 20 odd characters. > > This makes it quite useless. I tried fudging around that limit by using > device_rename and kobject_rename, but to no avail. > > Really, it should not be this hard to use, trying to expose a handfull > of simple integers to userspace should not take 8h+ and still not work. > > Peter, who thinks sysfs is contorted mess beyond his skill. I'll stick > to VM and scheduler code, that actually makes sense. Heh, that's funny :) I'll look at this and see what I can come up with. Would you just like a whole new patch, or one against this one? thanks, greg k-h ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 2:40 ` Greg KH @ 2007-10-27 8:39 ` Peter Zijlstra 2007-10-27 16:02 ` Greg KH 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-27 8:39 UTC (permalink / raw) To: Greg KH Cc: Kay Sievers, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Fri, 2007-10-26 at 19:40 -0700, Greg KH wrote: > On Sat, Oct 27, 2007 at 03:18:08AM +0200, Peter Zijlstra wrote: > > > > On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: > > > This crashes and burns on bootup, but I'm too tired to figure out what I > > > did wrong... will give it another try tomorrow.. > > > > Ok, can't sleep.. took a look. I have several problems here. > > > > The thing that makes it go *boom* is the __ATTR_NULL. Removing that > > makes it boot. Albeit it then warns me of multiple duplicate sysfs > > objects, all named "bdi". > > > > For some obscure reason this device interface insists on using the > > bus_id as name (?!), and further reduces usability by limiting that to > > 20 odd characters. > > > > This makes it quite useless. I tried fudging around that limit by using > > device_rename and kobject_rename, but to no avail. > > > > Really, it should not be this hard to use, trying to expose a handfull > > of simple integers to userspace should not take 8h+ and still not work. > > > > Peter, who thinks sysfs is contorted mess beyond his skill. I'll stick > > to VM and scheduler code, that actually makes sense. > > Heh, that's funny :) > > I'll look at this and see what I can come up with. Would you just like > a whole new patch, or one against this one? Sorry for the grumpy note, I get that way at 3.30 am. Maybe I ought not have mailed :-/ This is the code I had at that time. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- block/genhd.c | 2 fs/fuse/inode.c | 2 fs/nfs/client.c | 2 include/linux/backing-dev.h | 21 ++++++ include/linux/string.h | 4 + include/linux/writeback.h | 3 mm/backing-dev.c | 144 ++++++++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 2 mm/util.c | 42 ++++++++++++ 9 files changed, 219 insertions(+), 3 deletions(-) Index: linux-2.6-2/fs/fuse/inode.c =================================================================== --- linux-2.6-2.orig/fs/fuse/inode.c +++ linux-2.6-2/fs/fuse/inode.c @@ -467,7 +467,7 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; - err = bdi_init(&fc->bdi); + err = bdi_init_fmt(&fc->bdi, "bdi-fuse-%llu", (unsigned long long)fc->id); if (err) { kfree(fc); fc = NULL; Index: linux-2.6-2/fs/nfs/client.c =================================================================== --- linux-2.6-2.orig/fs/nfs/client.c +++ linux-2.6-2/fs/nfs/client.c @@ -678,7 +678,7 @@ static int nfs_probe_fsinfo(struct nfs_s goto out_error; nfs_server_set_fsinfo(server, &fsinfo); - error = bdi_init(&server->backing_dev_info); + error = bdi_init_fmt(&server->backing_dev_info, "bdi-nfs-%s-%p", clp->cl_hostname, server); if (error) goto out_error; Index: linux-2.6-2/include/linux/backing-dev.h =================================================================== --- linux-2.6-2.orig/include/linux/backing-dev.h +++ linux-2.6-2/include/linux/backing-dev.h @@ -11,6 +11,8 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> +#include <linux/device.h> #include <asm/atomic.h> struct page; @@ -48,11 +50,30 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + +#ifdef CONFIG_SYSFS + struct device kdev; +#endif }; int bdi_init(struct backing_dev_info *bdi); void bdi_destroy(struct backing_dev_info *bdi); +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...); +void bdi_unregister(struct backing_dev_info *bdi); + +#define bdi_init_fmt(bdi, fmt...) \ + ({ \ + int ret; \ + ret = bdi_init(bdi); \ + if (!ret) { \ + ret = bdi_register(bdi, ##fmt); \ + if (ret) \ + bdi_destroy(bdi); \ + } \ + ret; \ + }) + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { Index: linux-2.6-2/include/linux/writeback.h =================================================================== --- linux-2.6-2.orig/include/linux/writeback.h +++ linux-2.6-2/include/linux/writeback.h @@ -113,6 +113,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); Index: linux-2.6-2/mm/backing-dev.c =================================================================== --- linux-2.6-2.orig/mm/backing-dev.c +++ linux-2.6-2/mm/backing-dev.c @@ -4,12 +4,153 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/writeback.h> +#include <linux/device.h> + +#ifdef CONFIG_SYSFS + +static void bdi_release(struct device *dev) +{ +} + +static int bdi_uevent(struct device *dev, struct kobj_uevent_env *env) +{ + return 0; +} + +static struct class bdi_class = { + .name = "bdi", + .dev_release = bdi_release, + .dev_uevent = bdi_uevent, +}; + +static ssize_t readahead_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = + container_of(dev, struct backing_dev_info, kdev); + char *end; + + bdi->ra_pages = simple_strtoul(buf, &end, 10); + + return end - buf; +} + +#define BDI_SHOW(name, expr) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, char *page) \ +{ \ + struct backing_dev_info *bdi = \ + container_of(dev, struct backing_dev_info, kdev); \ + \ + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ +} + +BDI_SHOW(readahead, bdi->ra_pages) + +BDI_SHOW(reclaimable, bdi_stat(bdi, BDI_RECLAIMABLE)) +BDI_SHOW(writeback, bdi_stat(bdi, BDI_WRITEBACK)) + +static inline unsigned long get_dirty(struct backing_dev_info *bdi, int i) +{ + unsigned long thresh[3]; + + get_dirty_limits(&thresh[0], &thresh[1], &thresh[2], bdi); + + return thresh[i]; +} + +BDI_SHOW(dirty, get_dirty(bdi, 1)) +BDI_SHOW(bdi_dirty, get_dirty(bdi, 2)) + +static struct device_attribute bdi_dev_attrs[] = { + __ATTR(readahead, 0644, readahead_show, readahead_store), + __ATTR_RO(reclaimable), + __ATTR_RO(writeback), + __ATTR_RO(dirty), + __ATTR_RO(bdi_dirty), +}; + +static struct attribute *bdi_attributes[ARRAY_SIZE(bdi_dev_attrs) + 1]; + +static struct attribute_group bdi_attribute_group = { + .attrs = bdi_attributes, +}; + +static struct attribute_group *bdi_attribute_groups[] = { + &bdi_attribute_group, + NULL, +}; + +static __init int bdi_class_init(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) + bdi_attributes[i] = &bdi_dev_attrs[i].attr; + + return class_register(&bdi_class); +} + +__initcall(bdi_class_init); + +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...) +{ + int ret; + + bdi->kdev.class = &bdi_class; + bdi->kdev.groups = bdi_attribute_groups; + strcpy(bdi->kdev.bus_id, "bdi"); + ret = device_register(&bdi->kdev); + if (!ret) { + char *name; + va_list args; + + va_start(args, fmt); + name = kvprintf(fmt, args); + va_end(args); + + if (!name) { + device_unregister(&bdi->kdev); + return -ENOMEM; + } + + ret = device_rename(&bdi->kdev, name); + kfree(name); + + if (!ret) + device_unregister(&bdi->kdev); + } + + return ret; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ + device_unregister(&bdi->kdev); +} +#else +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...) +{ + return 0; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ +} +#endif + +EXPORT_SYMBOL(bdi_register); +EXPORT_SYMBOL(bdi_unregister); int bdi_init(struct backing_dev_info *bdi) { int i, j; int err; + memset(bdi, 0, sizeof(*bdi)); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +174,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); @@ -90,3 +233,4 @@ long congestion_wait(int rw, long timeou } EXPORT_SYMBOL(congestion_wait); + Index: linux-2.6-2/mm/page-writeback.c =================================================================== --- linux-2.6-2.orig/mm/page-writeback.c +++ linux-2.6-2/mm/page-writeback.c @@ -291,7 +291,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { Index: linux-2.6-2/block/genhd.c =================================================================== --- linux-2.6-2.orig/block/genhd.c +++ linux-2.6-2/block/genhd.c @@ -182,6 +182,7 @@ void add_disk(struct gendisk *disk) disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + bdi_register(&disk->queue->backing_dev_info, "bdi-%s", disk->disk_name); } EXPORT_SYMBOL(add_disk); @@ -190,6 +191,7 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { blk_unregister_queue(disk); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); } Index: linux-2.6-2/include/linux/string.h =================================================================== --- linux-2.6-2.orig/include/linux/string.h +++ linux-2.6-2/include/linux/string.h @@ -8,6 +8,7 @@ #include <linux/compiler.h> /* for inline */ #include <linux/types.h> /* for size_t */ #include <linux/stddef.h> /* for NULL */ +#include <stdarg.h> #ifdef __cplusplus extern "C" { @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si extern char **argv_split(gfp_t gfp, const char *str, int *argcp); extern void argv_free(char **argv); +char *kvprintf(const char *fmt, va_list args); +char *kprintf(const char *fmt, ...); + #ifdef __cplusplus } #endif Index: linux-2.6-2/mm/util.c =================================================================== --- linux-2.6-2.orig/mm/util.c +++ linux-2.6-2/mm/util.c @@ -136,3 +136,45 @@ char *strndup_user(const char __user *s, return p; } EXPORT_SYMBOL(strndup_user); + +char *kvprintf(const char *fmt, va_list args) +{ + char c; + char *buf; + int need; + int limit; + va_list args1; + + va_copy(args1, args); + need = vsnprintf(&c, 1, fmt, args1); + va_end(args1); + + /* Allocate the new space and copy the string in */ + limit = need + 1; + buf = kmalloc(limit, GFP_KERNEL); + if (!buf) + return NULL; + need = vsnprintf(buf, limit, fmt, args); + + /* something wrong with the string we copied? */ + if (need >= limit) { + kfree(buf); + return NULL; + } + + return buf; +} +EXPORT_SYMBOL(kvprintf); + +char *kprintf(const char *fmt, ...) +{ + char *buf; + va_list args; + + va_start(args, fmt); + buf = kvprintf(fmt, args); + va_end(args); + + return buf; +} +EXPORT_SYMBOL(kprintf); ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 8:39 ` Peter Zijlstra @ 2007-10-27 16:02 ` Greg KH 2007-10-27 16:07 ` Peter Zijlstra 2007-10-27 21:08 ` Kay Sievers 0 siblings, 2 replies; 112+ messages in thread From: Greg KH @ 2007-10-27 16:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Kay Sievers, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Sat, Oct 27, 2007 at 10:39:59AM +0200, Peter Zijlstra wrote: > On Fri, 2007-10-26 at 19:40 -0700, Greg KH wrote: > > On Sat, Oct 27, 2007 at 03:18:08AM +0200, Peter Zijlstra wrote: > > > > > > On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: > > > > This crashes and burns on bootup, but I'm too tired to figure out what I > > > > did wrong... will give it another try tomorrow.. > > > > > > Ok, can't sleep.. took a look. I have several problems here. > > > > > > The thing that makes it go *boom* is the __ATTR_NULL. Removing that > > > makes it boot. Albeit it then warns me of multiple duplicate sysfs > > > objects, all named "bdi". > > > > > > For some obscure reason this device interface insists on using the > > > bus_id as name (?!), and further reduces usability by limiting that to > > > 20 odd characters. > > > > > > This makes it quite useless. I tried fudging around that limit by using > > > device_rename and kobject_rename, but to no avail. > > > > > > Really, it should not be this hard to use, trying to expose a handfull > > > of simple integers to userspace should not take 8h+ and still not work. > > > > > > Peter, who thinks sysfs is contorted mess beyond his skill. I'll stick > > > to VM and scheduler code, that actually makes sense. > > > > Heh, that's funny :) > > > > I'll look at this and see what I can come up with. Would you just like > > a whole new patch, or one against this one? > > Sorry for the grumpy note, I get that way at 3.30 am. Maybe I ought not > have mailed :-/ > > This is the code I had at that time. Ah, I see a few problems. Here, try this version instead. It's compile-tested only, and should be a lot simpler. Note, we still are not setting the parent to the new bdi structure properly, so the devices will show up in /sys/devices/virtual/ instead of in their proper location. To do this, we need the parent of the device, which I'm not so sure what it should be (block device? block device controller?) Let me know if this works better, I'm off to a kids birthday party for the day, but will be around this evening... thanks, greg k-h --- block/genhd.c | 2 fs/fuse/inode.c | 2 fs/nfs/client.c | 2 include/linux/backing-dev.h | 19 +++++++ include/linux/string.h | 4 + include/linux/writeback.h | 3 + mm/backing-dev.c | 110 ++++++++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 2 mm/util.c | 42 ++++++++++++++++ 9 files changed, 183 insertions(+), 3 deletions(-) --- a/block/genhd.c +++ b/block/genhd.c @@ -182,6 +182,7 @@ void add_disk(struct gendisk *disk) disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + bdi_register(&disk->queue->backing_dev_info, "bdi-%s", disk->disk_name); } EXPORT_SYMBOL(add_disk); @@ -190,6 +191,7 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { blk_unregister_queue(disk); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); } --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -467,7 +467,7 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; - err = bdi_init(&fc->bdi); + err = bdi_init_fmt(&fc->bdi, "bdi-fuse-%llu", (unsigned long long)fc->id); if (err) { kfree(fc); fc = NULL; --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -678,7 +678,7 @@ static int nfs_probe_fsinfo(struct nfs_s goto out_error; nfs_server_set_fsinfo(server, &fsinfo); - error = bdi_init(&server->backing_dev_info); + error = bdi_init_fmt(&server->backing_dev_info, "bdi-nfs-%s-%p", clp->cl_hostname, server); if (error) goto out_error; --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -11,6 +11,8 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> +#include <linux/device.h> #include <asm/atomic.h> struct page; @@ -48,11 +50,28 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + + struct device *dev; }; int bdi_init(struct backing_dev_info *bdi); void bdi_destroy(struct backing_dev_info *bdi); +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...); +void bdi_unregister(struct backing_dev_info *bdi); + +#define bdi_init_fmt(bdi, fmt...) \ + ({ \ + int ret; \ + ret = bdi_init(bdi); \ + if (!ret) { \ + ret = bdi_register(bdi, ##fmt); \ + if (ret) \ + bdi_destroy(bdi); \ + } \ + ret; \ + }) + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { --- a/include/linux/string.h +++ b/include/linux/string.h @@ -8,6 +8,7 @@ #include <linux/compiler.h> /* for inline */ #include <linux/types.h> /* for size_t */ #include <linux/stddef.h> /* for NULL */ +#include <stdarg.h> #ifdef __cplusplus extern "C" { @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si extern char **argv_split(gfp_t gfp, const char *str, int *argcp); extern void argv_free(char **argv); +char *kvprintf(const char *fmt, va_list args); +char *kprintf(const char *fmt, ...); + #ifdef __cplusplus } #endif --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -113,6 +113,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -4,12 +4,119 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/writeback.h> +#include <linux/device.h> + + +static struct class *bdi_class; + +static ssize_t readahead_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + char *end; + + bdi->ra_pages = simple_strtoul(buf, &end, 10); + + return end - buf; +} + +#define BDI_SHOW(name, expr) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, char *page) \ +{ \ + struct backing_dev_info *bdi = dev_get_drvdata(dev); \ + \ + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ +} + +BDI_SHOW(readahead, bdi->ra_pages) + +BDI_SHOW(reclaimable, bdi_stat(bdi, BDI_RECLAIMABLE)) +BDI_SHOW(writeback, bdi_stat(bdi, BDI_WRITEBACK)) + +static inline unsigned long get_dirty(struct backing_dev_info *bdi, int i) +{ + unsigned long thresh[3]; + + get_dirty_limits(&thresh[0], &thresh[1], &thresh[2], bdi); + + return thresh[i]; +} + +BDI_SHOW(dirty, get_dirty(bdi, 1)) +BDI_SHOW(bdi_dirty, get_dirty(bdi, 2)) + +static struct device_attribute bdi_dev_attrs[] = { + __ATTR(readahead, 0644, readahead_show, readahead_store), + __ATTR_RO(reclaimable), + __ATTR_RO(writeback), + __ATTR_RO(dirty), + __ATTR_RO(bdi_dirty), +}; + +static __init int bdi_class_init(void) +{ + bdi_class = class_create(THIS_MODULE, "bdi"); + return 0; +} + +__initcall(bdi_class_init); + +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...) +{ + char *name; + va_list args; + int ret = -ENOMEM; + int i; + + va_start(args, fmt); + name = kvprintf(fmt, args); + va_end(args); + + if (!name) + return -ENOMEM; + + bdi->dev = device_create(bdi_class, NULL, MKDEV(0,0), name); + if (IS_ERR(bdi->dev)) + goto exit; + + dev_set_drvdata(bdi->dev, bdi); + + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) { + ret = device_create_file(bdi->dev, &bdi_dev_attrs[i]); + if (ret) + break; + } + if (ret) { + while (--i >= 0) + device_remove_file(bdi->dev, &bdi_dev_attrs[i]); + device_unregister(bdi->dev); + bdi->dev = NULL; + } + +exit: + kfree(name); + + return ret; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ + device_unregister(bdi->dev); +} + +EXPORT_SYMBOL(bdi_register); +EXPORT_SYMBOL(bdi_unregister); int bdi_init(struct backing_dev_info *bdi) { int i, j; int err; + memset(bdi, 0, sizeof(*bdi)); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +140,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); @@ -90,3 +199,4 @@ long congestion_wait(int rw, long timeou } EXPORT_SYMBOL(congestion_wait); + --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -291,7 +291,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { --- a/mm/util.c +++ b/mm/util.c @@ -136,3 +136,45 @@ char *strndup_user(const char __user *s, return p; } EXPORT_SYMBOL(strndup_user); + +char *kvprintf(const char *fmt, va_list args) +{ + char c; + char *buf; + int need; + int limit; + va_list args1; + + va_copy(args1, args); + need = vsnprintf(&c, 1, fmt, args1); + va_end(args1); + + /* Allocate the new space and copy the string in */ + limit = need + 1; + buf = kmalloc(limit, GFP_KERNEL); + if (!buf) + return NULL; + need = vsnprintf(buf, limit, fmt, args); + + /* something wrong with the string we copied? */ + if (need >= limit) { + kfree(buf); + return NULL; + } + + return buf; +} +EXPORT_SYMBOL(kvprintf); + +char *kprintf(const char *fmt, ...) +{ + char *buf; + va_list args; + + va_start(args, fmt); + buf = kvprintf(fmt, args); + va_end(args); + + return buf; +} +EXPORT_SYMBOL(kprintf); ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 16:02 ` Greg KH @ 2007-10-27 16:07 ` Peter Zijlstra 2007-10-27 21:08 ` Kay Sievers 1 sibling, 0 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-10-27 16:07 UTC (permalink / raw) To: Greg KH Cc: Kay Sievers, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote: > Ah, I see a few problems. Here, try this version instead. It's > compile-tested only, and should be a lot simpler. > > Note, we still are not setting the parent to the new bdi structure > properly, so the devices will show up in /sys/devices/virtual/ instead > of in their proper location. To do this, we need the parent of the > device, which I'm not so sure what it should be (block device? block > device controller?) The problem is that not every bdi has a sysfs represented parent, hence the class suggestion. For block devices it is indeed the block device itself, but for example the NFS client's server descriptor does not have a sysfs representation. > Let me know if this works better, I'm off to a kids birthday party for > the day, but will be around this evening... Hehe, do enjoy! Thanks. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 16:02 ` Greg KH 2007-10-27 16:07 ` Peter Zijlstra @ 2007-10-27 21:08 ` Kay Sievers 2007-10-27 21:35 ` Peter Zijlstra 2007-11-02 13:15 ` Peter Zijlstra 1 sibling, 2 replies; 112+ messages in thread From: Kay Sievers @ 2007-10-27 21:08 UTC (permalink / raw) To: Greg KH Cc: Peter Zijlstra, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote: > On Sat, Oct 27, 2007 at 10:39:59AM +0200, Peter Zijlstra wrote: > > On Fri, 2007-10-26 at 19:40 -0700, Greg KH wrote: > > > On Sat, Oct 27, 2007 at 03:18:08AM +0200, Peter Zijlstra wrote: > > > > On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: > > > > > This crashes and burns on bootup, but I'm too tired to figure out what I > > > > > did wrong... will give it another try tomorrow.. > > > > > > > > Ok, can't sleep.. took a look. I have several problems here. > > > > > > > > The thing that makes it go *boom* is the __ATTR_NULL. Removing that > > > > makes it boot. Albeit it then warns me of multiple duplicate sysfs > > > > objects, all named "bdi". > > > I'll look at this and see what I can come up with. Would you just like > > > a whole new patch, or one against this one? > > > > Sorry for the grumpy note, I get that way at 3.30 am. Maybe I ought not > > have mailed :-/ > > > > This is the code I had at that time. > > Ah, I see a few problems. Here, try this version instead. It's > compile-tested only, and should be a lot simpler. > > Note, we still are not setting the parent to the new bdi structure > properly, so the devices will show up in /sys/devices/virtual/ instead > of in their proper location. To do this, we need the parent of the > device, which I'm not so sure what it should be (block device? block > device controller?) Assigning a parent device will only work with the upcoming conversion of the raw kobjects in the block subsystem to "struct device". A few comments to the patch: > --- a/include/linux/string.h > +++ b/include/linux/string.h > @@ -8,6 +8,7 @@ > #include <linux/compiler.h> /* for inline */ > #include <linux/types.h> /* for size_t */ > #include <linux/stddef.h> /* for NULL */ > +#include <stdarg.h> > > #ifdef __cplusplus > extern "C" { > @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si > extern char **argv_split(gfp_t gfp, const char *str, int *argcp); > extern void argv_free(char **argv); > > +char *kvprintf(const char *fmt, va_list args); > +char *kprintf(const char *fmt, ...); Why is that here? I don't think we need this when we use the existing: kvasprintf(GFP_KERNEL, fmt, args) > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > + > +static struct device_attribute bdi_dev_attrs[] = { > + __ATTR(readahead, 0644, readahead_show, readahead_store), > + __ATTR_RO(reclaimable), > + __ATTR_RO(writeback), > + __ATTR_RO(dirty), > + __ATTR_RO(bdi_dirty), > +}; Default attributes will need the NULL termination back (see below). > +static __init int bdi_class_init(void) > +{ > + bdi_class = class_create(THIS_MODULE, "bdi"); > + return 0; > +} > + > +__initcall(bdi_class_init); > + > +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...) This function should accept a: "struct device *parent" and all callers just pass NULL until the block layer conversion gets merged. > +{ > + char *name; > + va_list args; > + int ret = -ENOMEM; > + int i; > + > + va_start(args, fmt); > + name = kvprintf(fmt, args); kvasprintf(GFP_KERNEL, fmt, args); > + va_end(args); > + > + if (!name) > + return -ENOMEM; > + > + bdi->dev = device_create(bdi_class, NULL, MKDEV(0,0), name); The parent should be passed here. > + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) { > + ret = device_create_file(bdi->dev, &bdi_dev_attrs[i]); > + if (ret) > + break; > + } > + if (ret) { > + while (--i >= 0) > + device_remove_file(bdi->dev, &bdi_dev_attrs[i]); > + device_unregister(bdi->dev); > + bdi->dev = NULL; > + } All this open-coded attribute stuff should go away and be replaced by: bdi_class->dev_attrs = bdi_dev_attrs; Otherwise at event time the attributes are not created and stuff hooking into the events will not be able to set values. Also, the core will do proper add/remove and error handling then. Thanks, Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 21:08 ` Kay Sievers @ 2007-10-27 21:35 ` Peter Zijlstra 2007-10-28 7:10 ` Greg KH 2007-11-02 13:15 ` Peter Zijlstra 1 sibling, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-10-27 21:35 UTC (permalink / raw) To: Kay Sievers Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Sat, 2007-10-27 at 23:08 +0200, Kay Sievers wrote: > On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote: > > Ah, I see a few problems. Here, try this version instead. It's > > compile-tested only, and should be a lot simpler. > > > > Note, we still are not setting the parent to the new bdi structure > > properly, so the devices will show up in /sys/devices/virtual/ instead > > of in their proper location. To do this, we need the parent of the > > device, which I'm not so sure what it should be (block device? block > > device controller?) > > Assigning a parent device will only work with the upcoming conversion of > the raw kobjects in the block subsystem to "struct device". > > A few comments to the patch: > > > --- a/include/linux/string.h > > +++ b/include/linux/string.h > > @@ -8,6 +8,7 @@ > > #include <linux/compiler.h> /* for inline */ > > #include <linux/types.h> /* for size_t */ > > #include <linux/stddef.h> /* for NULL */ > > +#include <stdarg.h> > > > > #ifdef __cplusplus > > extern "C" { > > @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si > > extern char **argv_split(gfp_t gfp, const char *str, int *argcp); > > extern void argv_free(char **argv); > > > > +char *kvprintf(const char *fmt, va_list args); > > +char *kprintf(const char *fmt, ...); > > Why is that here? I don't think we need this when we use the existing: > kvasprintf(GFP_KERNEL, fmt, args) Ignorance of the existance of said function. Thanks for pointing it out. (kobject_set_name ought to use it too I guess) > > --- a/mm/backing-dev.c > > +++ b/mm/backing-dev.c > > > + > > +static struct device_attribute bdi_dev_attrs[] = { > > + __ATTR(readahead, 0644, readahead_show, readahead_store), > > + __ATTR_RO(reclaimable), > > + __ATTR_RO(writeback), > > + __ATTR_RO(dirty), > > + __ATTR_RO(bdi_dirty), > > +}; > > Default attributes will need the NULL termination back (see below). > > > +static __init int bdi_class_init(void) > > +{ > > + bdi_class = class_create(THIS_MODULE, "bdi"); > > + return 0; > > +} > > + > > +__initcall(bdi_class_init); > > + > > +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...) > > This function should accept a: "struct device *parent" and all callers > just pass NULL until the block layer conversion gets merged. Yeah, you're right, but I wanted to just get something working before bothering with the parent thing. > > +{ > > + char *name; > > + va_list args; > > + int ret = -ENOMEM; > > + int i; > > + > > + va_start(args, fmt); > > + name = kvprintf(fmt, args); > > kvasprintf(GFP_KERNEL, fmt, args); > > > + va_end(args); > > + > > + if (!name) > > + return -ENOMEM; > > + > > + bdi->dev = device_create(bdi_class, NULL, MKDEV(0,0), name); > > The parent should be passed here. > > > + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) { > > + ret = device_create_file(bdi->dev, &bdi_dev_attrs[i]); > > + if (ret) > > + break; > > + } > > + if (ret) { > > + while (--i >= 0) > > + device_remove_file(bdi->dev, &bdi_dev_attrs[i]); > > + device_unregister(bdi->dev); > > + bdi->dev = NULL; > > + } > > All this open-coded attribute stuff should go away and be replaced by: > bdi_class->dev_attrs = bdi_dev_attrs; > Otherwise at event time the attributes are not created and stuff hooking > into the events will not be able to set values. Also, the core will do > proper add/remove and error handling then. ok, that's good to know. someone ought to write a book on how to use all this... really, even the functions are bare of documentation or comments. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 21:35 ` Peter Zijlstra @ 2007-10-28 7:10 ` Greg KH 0 siblings, 0 replies; 112+ messages in thread From: Greg KH @ 2007-10-28 7:10 UTC (permalink / raw) To: Peter Zijlstra Cc: Kay Sievers, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Sat, Oct 27, 2007 at 11:35:45PM +0200, Peter Zijlstra wrote: > On Sat, 2007-10-27 at 23:08 +0200, Kay Sievers wrote: > > All this open-coded attribute stuff should go away and be replaced by: > > bdi_class->dev_attrs = bdi_dev_attrs; > > Otherwise at event time the attributes are not created and stuff hooking > > into the events will not be able to set values. Also, the core will do > > proper add/remove and error handling then. > > ok, that's good to know. someone ought to write a book on how to use all > this... really, even the functions are bare of documentation or > comments. Yes, I know, sorry :( I'm working on cleaning up the apis a lot right now, so hopefully, in a few months things will have settled down. I'm trying to document things as I go, but right now I'm stuck at a much lower level (ksets and friends...) thanks, greg k-h ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-27 21:08 ` Kay Sievers 2007-10-27 21:35 ` Peter Zijlstra @ 2007-11-02 13:15 ` Peter Zijlstra 2007-11-02 13:50 ` Kay Sievers 1 sibling, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-11-02 13:15 UTC (permalink / raw) To: Kay Sievers Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi Hi, Thanks for the help so far, however we're still not quite there. The below patch still has the funny 20 character name limit. Is there a good reason its a char array like this, and not just a char * to a kstr? The code does kstrdup all over the place, I can't imagine why suddenly limiting it to 20 chars seems like a good idea. Mounting NFS filesystems: mount: 192.168.0.1:/mnt/ying failed, reason given by server: Permission denied mount: 192.168.0.1:/mnt/yang failed, reason given by server: Permission denied [ 52.705052] sysfs: duplicate filename 'bdi-nfs-192.168.0.1' can not be created [ 52.712517] WARNING: at /mnt/md0/src/linux-2.6-2/fs/sysfs/dir.c:424 sysfs_add_one() [ 52.720243] [ 52.720244] Call Trace: [ 52.724311] [<ffffffff80304f7c>] sysfs_add_one+0xac/0xe0 [ 52.729708] [<ffffffff80305613>] create_dir+0x63/0xc0 [ 52.734832] [<ffffffff803056a4>] sysfs_create_dir+0x34/0x50 [ 52.740489] [<ffffffff804d5710>] _spin_unlock+0x30/0x60 [ 52.745792] [<ffffffff8036958e>] kobject_add+0xbe/0x200 [ 52.751094] [<ffffffff803dfc40>] device_add+0xc0/0x680 [ 52.756305] [<ffffffff803e0219>] device_register+0x19/0x20 [ 52.761877] [<ffffffff803e0697>] device_create+0xe7/0x120 [ 52.767360] [<ffffffff8036e0bc>] vsnprintf+0x2bc/0x690 [ 52.772585] [<ffffffff80370380>] kvasprintf+0x70/0x90 [ 52.777724] [<ffffffff80295bbb>] bdi_register+0x9b/0xe0 [ 52.783037] [<ffffffff8037e039>] percpu_counter_init_irq+0x39/0x50 [ 52.789299] [<ffffffff8036abcc>] prop_local_init_percpu+0x3c/0x50 [ 52.795462] [<ffffffff80295a41>] bdi_init+0x61/0xb0 [ 52.800411] [<ffffffff80308069>] nfs_probe_fsinfo+0x4d9/0x640 [ 52.806226] [<ffffffff80308f9a>] nfs_create_server+0x1ea/0x560 [ 52.812123] [<ffffffff80263526>] lock_release_holdtime+0x66/0x80 [ 52.818217] [<ffffffff802ca936>] __d_lookup+0x106/0x1d0 [ 52.823529] [<ffffffff802b0fd5>] __kmalloc_track_caller+0xa5/0x100 [ 52.829789] [<ffffffff80266185>] trace_hardirqs_on+0xd5/0x170 [ 52.835618] [<ffffffff80312412>] nfs_get_sb+0x2b2/0x740 [ 52.840931] [<ffffffff80312459>] nfs_get_sb+0x2f9/0x740 [ 52.846245] [<ffffffff802d0000>] __put_mnt_ns+0x90/0xa0 [ 52.851556] [<ffffffff802b89db>] vfs_kern_mount+0xbb/0x150 [ 52.857127] [<ffffffff802b8ade>] do_kern_mount+0x4e/0x100 [ 52.862614] [<ffffffff802d122c>] do_mount+0x4dc/0x7a0 [ 52.867751] [<ffffffff80266185>] trace_hardirqs_on+0xd5/0x170 [ 52.873580] [<ffffffff804d5842>] _spin_unlock_irqrestore+0x42/0x80 [ 52.879841] [<ffffffff8036c505>] __up_read+0x45/0xb0 [ 52.884894] [<ffffffff80255aa5>] search_exception_tables+0x25/0x40 [ 52.891155] [<ffffffff8028d1d5>] get_page_from_freelist+0x4a5/0x760 [ 52.897487] [<ffffffff8028a58f>] bad_range+0x1f/0x80 [ 52.902540] [<ffffffff8028d0b7>] get_page_from_freelist+0x387/0x760 [ 52.908888] [<ffffffff8028d54e>] __alloc_pages+0x6e/0x3b0 [ 52.914374] [<ffffffff802a80ea>] alloc_pages_current+0x5a/0x90 [ 52.920290] [<ffffffff8028c97b>] __get_free_pages+0x1b/0x40 [ 52.925929] [<ffffffff802cf892>] copy_mount_options+0x52/0x170 [ 52.931827] [<ffffffff802d1584>] sys_mount+0x94/0xe0 [ 52.936881] [<ffffffff804d49cd>] trace_hardirqs_on_thunk+0x35/0x3a [ 52.943141] [<ffffffff8020c43e>] system_call+0x7e/0x83 [ 52.948357] [ 52.949850] kobject_add failed for bdi-nfs-192.168.0.1 with -EEXIST, don't try to register things with the same name in the same directory. [ 52.962332] (just in case it wasn't obvious, the -%p part that was supposed to make it unique got truncated) --- block/genhd.c | 3 + fs/fuse/inode.c | 3 - fs/nfs/client.c | 3 - include/linux/backing-dev.h | 19 +++++++ include/linux/writeback.h | 3 + lib/percpu_counter.c | 1 mm/backing-dev.c | 107 ++++++++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 2 8 files changed, 138 insertions(+), 3 deletions(-) Index: linux-2.6-2/block/genhd.c =================================================================== --- linux-2.6-2.orig/block/genhd.c +++ linux-2.6-2/block/genhd.c @@ -182,6 +182,8 @@ void add_disk(struct gendisk *disk) disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + bdi_register(&disk->queue->backing_dev_info, NULL, + "bdi-%s", disk->disk_name); } EXPORT_SYMBOL(add_disk); @@ -190,6 +192,7 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { blk_unregister_queue(disk); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); } Index: linux-2.6-2/fs/fuse/inode.c =================================================================== --- linux-2.6-2.orig/fs/fuse/inode.c +++ linux-2.6-2/fs/fuse/inode.c @@ -467,7 +467,8 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; - err = bdi_init(&fc->bdi); + err = bdi_init_fmt(&fc->bdi, NULL, + "bdi-fuse-%llu", (unsigned long long)fc->id); if (err) { kfree(fc); fc = NULL; Index: linux-2.6-2/fs/nfs/client.c =================================================================== --- linux-2.6-2.orig/fs/nfs/client.c +++ linux-2.6-2/fs/nfs/client.c @@ -678,7 +678,8 @@ static int nfs_probe_fsinfo(struct nfs_s goto out_error; nfs_server_set_fsinfo(server, &fsinfo); - error = bdi_init(&server->backing_dev_info); + error = bdi_init_fmt(&server->backing_dev_info, NULL, + "bdi-nfs-%s-%p", clp->cl_hostname, server); if (error) goto out_error; Index: linux-2.6-2/include/linux/backing-dev.h =================================================================== --- linux-2.6-2.orig/include/linux/backing-dev.h +++ linux-2.6-2/include/linux/backing-dev.h @@ -11,6 +11,8 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> +#include <linux/device.h> #include <asm/atomic.h> struct page; @@ -48,11 +50,28 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + + struct device *dev; }; int bdi_init(struct backing_dev_info *bdi); void bdi_destroy(struct backing_dev_info *bdi); +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...); +void bdi_unregister(struct backing_dev_info *bdi); + +#define bdi_init_fmt(bdi, parent, fmt...) \ + ({ \ + int ret = bdi_init(bdi); \ + if (!ret) { \ + ret = bdi_register(bdi, parent, ##fmt); \ + if (ret) \ + bdi_destroy(bdi); \ + } \ + ret; \ + }) + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { Index: linux-2.6-2/include/linux/writeback.h =================================================================== --- linux-2.6-2.orig/include/linux/writeback.h +++ linux-2.6-2/include/linux/writeback.h @@ -113,6 +113,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); Index: linux-2.6-2/mm/backing-dev.c =================================================================== --- linux-2.6-2.orig/mm/backing-dev.c +++ linux-2.6-2/mm/backing-dev.c @@ -4,12 +4,115 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/writeback.h> +#include <linux/device.h> + + +static struct class *bdi_class; + +static ssize_t readahead_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + char *end; + + bdi->ra_pages = simple_strtoul(buf, &end, 10); + + return end - buf; +} + +#define BDI_SHOW(name, expr) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, char *page) \ +{ \ + struct backing_dev_info *bdi = dev_get_drvdata(dev); \ + \ + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ +} + +BDI_SHOW(readahead, bdi->ra_pages) + +BDI_SHOW(reclaimable, bdi_stat(bdi, BDI_RECLAIMABLE)) +BDI_SHOW(writeback, bdi_stat(bdi, BDI_WRITEBACK)) + +static inline unsigned long get_dirty(struct backing_dev_info *bdi, int i) +{ + unsigned long thresh[3]; + + get_dirty_limits(&thresh[0], &thresh[1], &thresh[2], bdi); + + return thresh[i]; +} + +BDI_SHOW(dirty, get_dirty(bdi, 1)) +BDI_SHOW(bdi_dirty, get_dirty(bdi, 2)) + +static struct device_attribute bdi_dev_attrs[] = { + __ATTR(readahead, 0644, readahead_show, readahead_store), + __ATTR_RO(reclaimable), + __ATTR_RO(writeback), + __ATTR_RO(dirty), + __ATTR_RO(bdi_dirty), + __ATTR_NULL, +}; + +static __init int bdi_class_init(void) +{ + bdi_class = class_create(THIS_MODULE, "bdi"); + bdi_class->dev_attrs = bdi_dev_attrs; + return 0; +} + +__initcall(bdi_class_init); + +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...) +{ + char *name; + va_list args; + int ret = 0; + struct device *dev; + + va_start(args, fmt); + name = kvasprintf(GFP_KERNEL, fmt, args); + va_end(args); + + if (!name) + return -ENOMEM; + + dev = device_create(bdi_class, parent, MKDEV(0,0), name); + if (IS_ERR(dev)) { + ret = PTR_ERR(dev); + goto exit; + } + + bdi->dev = dev; + dev_set_drvdata(bdi->dev, bdi); + +exit: + kfree(name); + return ret; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ + if (bdi->dev) { + device_unregister(bdi->dev); + bdi->dev = NULL; + } +} + +EXPORT_SYMBOL(bdi_register); +EXPORT_SYMBOL(bdi_unregister); int bdi_init(struct backing_dev_info *bdi) { int i, j; int err; + bdi->dev = NULL; + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +136,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); Index: linux-2.6-2/mm/page-writeback.c =================================================================== --- linux-2.6-2.orig/mm/page-writeback.c +++ linux-2.6-2/mm/page-writeback.c @@ -291,7 +291,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { Index: linux-2.6-2/lib/percpu_counter.c =================================================================== --- linux-2.6-2.orig/lib/percpu_counter.c +++ linux-2.6-2/lib/percpu_counter.c @@ -102,6 +102,7 @@ void percpu_counter_destroy(struct percp return; free_percpu(fbc->counters); + fbc->counters = NULL; #ifdef CONFIG_HOTPLUG_CPU mutex_lock(&percpu_counters_lock); list_del(&fbc->list); ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-11-02 13:15 ` Peter Zijlstra @ 2007-11-02 13:50 ` Kay Sievers 2007-11-02 13:54 ` Peter Zijlstra 2007-11-02 14:17 ` Peter Zijlstra 0 siblings, 2 replies; 112+ messages in thread From: Kay Sievers @ 2007-11-02 13:50 UTC (permalink / raw) To: Peter Zijlstra Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Nov 2, 2007 2:15 PM, Peter Zijlstra <peterz@infradead.org> wrote: > Thanks for the help so far, however we're still not quite there. > > The below patch still has the funny 20 character name limit. Is there a > good reason its a char array like this, and not just a char * to a kstr? > The code does kstrdup all over the place, I can't imagine why suddenly > limiting it to 20 chars seems like a good idea. You are absolutely right, it doesn't make any sense. The 20 char limit is bad, but really, having the name duplicated in the device structure, while the name is already in the embedded kobject, is really bad. Greg recently got rid of the 20 chars in the kobject, now we need to fix the devices to completely get rid of the static bus_id string array, and just set the kobject name directly. It's all long overdue to fix things like this in the driver core - it's such a mess. After the kset cleanup Greg and I are doing currently, we will remove that silly limit. Hmm, regardless of the limit, isn't there a better device name than a memory address of a kernel structure. :) If there is no better data, shouldn't we get something like an atomic counter somewhere in the nfs code, which just increases with every instance, and we could use that number as a connection id? Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-11-02 13:50 ` Kay Sievers @ 2007-11-02 13:54 ` Peter Zijlstra 2007-11-02 14:17 ` Peter Zijlstra 1 sibling, 0 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-11-02 13:54 UTC (permalink / raw) To: Kay Sievers Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Fri, 2007-11-02 at 14:50 +0100, Kay Sievers wrote: > On Nov 2, 2007 2:15 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > Thanks for the help so far, however we're still not quite there. > > > > The below patch still has the funny 20 character name limit. Is there a > > good reason its a char array like this, and not just a char * to a kstr? > > The code does kstrdup all over the place, I can't imagine why suddenly > > limiting it to 20 chars seems like a good idea. > > You are absolutely right, it doesn't make any sense. The 20 char limit > is bad, but really, > having the name duplicated in the device structure, while the name is > already in the > embedded kobject, is really bad. > > Greg recently got rid of the 20 chars in the kobject, now we need to fix the > devices to completely get rid of the static bus_id string array, and just set > the kobject name directly. > It's all long overdue to fix things like this in the driver core - > it's such a mess. After the > kset cleanup Greg and I are doing currently, we will remove that silly limit. Ok, great! Could I ask you to nudge me awake once those patches hit a git tree somewhere? > Hmm, regardless of the limit, isn't there a better device name than a memory > address of a kernel structure. :) Yes there is, Trond already suggested a proper replacement, however so far I've been just trying to get it to work before trying to make it pretty. Will implement Trond's suggestion while you and Greg eradicate this 20 byte thing :-) ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-11-02 13:50 ` Kay Sievers 2007-11-02 13:54 ` Peter Zijlstra @ 2007-11-02 14:17 ` Peter Zijlstra 2007-11-02 14:32 ` Kay Sievers 1 sibling, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-11-02 14:17 UTC (permalink / raw) To: Kay Sievers Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi One more question, I currently prefix the names with "bdi-", is that needed? That is, if I give the bdi object a parent, how will it look? Would a bdi device with name "sda" with a block device called "sda" as parent look like: /sys/block/sda/sda? Or would if be called /sys/block/sda/bdi:sda or just /sys/block/sda/bdi? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-11-02 14:17 ` Peter Zijlstra @ 2007-11-02 14:32 ` Kay Sievers 2007-11-02 14:59 ` [PATCH] mm: sysfs: expose the BDI object in sysfs Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Kay Sievers @ 2007-11-02 14:32 UTC (permalink / raw) To: Peter Zijlstra Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Fri, 2007-11-02 at 15:17 +0100, Peter Zijlstra wrote: > One more question, > > I currently prefix the names with "bdi-", is that needed? Not really. > That is, if I give the bdi object a parent, how will it look? > Would a bdi device with name "sda" with a block device called "sda" as > parent look like: /sys/block/sda/sda? Or would if be > called /sys/block/sda/bdi:sda or just /sys/block/sda/bdi? The class devices as childs get their own subdirectory at the parent. So it would be /sys/block/sda/bdi/sda It's currently only implemented for class devices which get a bus device as a parent, but that will be for all parents, so that we prevent clashing names if devices from multiple subsystems get the same parent. See here for netdevs there is a "net" "glue directory": $ ls -l /sys/class/net/eth0 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:1c.0/0000:02:00.0/net/eth0 We needed this, because people complained that they can't name their netif "irq", because there is already an attribute at the parent with that name. :) When we get the "block as devices" patch merged, we can do the proper parent logic for bdi, and I'll add the same logic as we have for bus device parents today. Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH] mm: sysfs: expose the BDI object in sysfs 2007-11-02 14:32 ` Kay Sievers @ 2007-11-02 14:59 ` Peter Zijlstra 2007-11-02 15:13 ` Kay Sievers 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-11-02 14:59 UTC (permalink / raw) To: Kay Sievers Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Fri, 2007-11-02 at 15:32 +0100, Kay Sievers wrote: > On Fri, 2007-11-02 at 15:17 +0100, Peter Zijlstra wrote: > > One more question, > > > > I currently prefix the names with "bdi-", is that needed? > > Not really. Thanks. Here is the 'pretty' patch :-) Since it relies on the removal of the device name length limit, this should not yet be applied. --- Subject: mm: sysfs: expose the BDI object in sysfs Provide a place in sysfs for the backing_dev_info object. This allows us to see and set the various BDI specific variables. In particular this properly exposes the read-ahead window for all relevant users and /sys/block/<block>/queue/read_ahead_kb should be deprecated. With patient help from Kay Sievers and Greg KH Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- block/genhd.c | 3 + fs/fuse/inode.c | 3 - fs/nfs/client.c | 24 +++++---- fs/nfs/internal.h | 10 ++-- fs/nfs/super.c | 10 ++-- include/linux/backing-dev.h | 19 +++++++ include/linux/writeback.h | 3 + lib/percpu_counter.c | 1 mm/backing-dev.c | 109 ++++++++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 2 10 files changed, 163 insertions(+), 21 deletions(-) Index: linux-2.6-2/block/genhd.c =================================================================== --- linux-2.6-2.orig/block/genhd.c +++ linux-2.6-2/block/genhd.c @@ -182,6 +182,8 @@ void add_disk(struct gendisk *disk) disk->minors, NULL, exact_match, exact_lock, disk); register_disk(disk); blk_register_queue(disk); + bdi_register(&disk->queue->backing_dev_info, NULL, + "%s", disk->disk_name); } EXPORT_SYMBOL(add_disk); @@ -190,6 +192,7 @@ EXPORT_SYMBOL(del_gendisk); /* in partit void unlink_gendisk(struct gendisk *disk) { blk_unregister_queue(disk); + bdi_unregister(&disk->queue->backing_dev_info); blk_unregister_region(MKDEV(disk->major, disk->first_minor), disk->minors); } Index: linux-2.6-2/fs/fuse/inode.c =================================================================== --- linux-2.6-2.orig/fs/fuse/inode.c +++ linux-2.6-2/fs/fuse/inode.c @@ -467,7 +467,8 @@ static struct fuse_conn *new_conn(void) atomic_set(&fc->num_waiting, 0); fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; fc->bdi.unplug_io_fn = default_unplug_io_fn; - err = bdi_init(&fc->bdi); + err = bdi_init_fmt(&fc->bdi, NULL, + "fuse-%llu", (unsigned long long)fc->id); if (err) { kfree(fc); fc = NULL; Index: linux-2.6-2/fs/nfs/client.c =================================================================== --- linux-2.6-2.orig/fs/nfs/client.c +++ linux-2.6-2/fs/nfs/client.c @@ -657,7 +657,8 @@ static void nfs_server_set_fsinfo(struct /* * Probe filesystem information, including the FSID on v2/v3 */ -static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, struct nfs_fattr *fattr) +static int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, + struct nfs_fattr *fattr, const char *dev_name) { struct nfs_fsinfo fsinfo; struct nfs_client *clp = server->nfs_client; @@ -678,7 +679,8 @@ static int nfs_probe_fsinfo(struct nfs_s goto out_error; nfs_server_set_fsinfo(server, &fsinfo); - error = bdi_init(&server->backing_dev_info); + error = bdi_init_fmt(&server->backing_dev_info, NULL, + "nfs-%s", dev_name); if (error) goto out_error; @@ -772,7 +774,7 @@ void nfs_free_server(struct nfs_server * * - keyed on server and FSID */ struct nfs_server *nfs_create_server(const struct nfs_parsed_mount_data *data, - struct nfs_fh *mntfh) + struct nfs_fh *mntfh, const char *dev_name) { struct nfs_server *server; struct nfs_fattr fattr; @@ -792,7 +794,7 @@ struct nfs_server *nfs_create_server(con BUG_ON(!server->nfs_client->rpc_ops->file_inode_ops); /* Probe the root fh to retrieve its FSID */ - error = nfs_probe_fsinfo(server, mntfh, &fattr); + error = nfs_probe_fsinfo(server, mntfh, &fattr, dev_name); if (error < 0) goto error; if (server->nfs_client->rpc_ops->version == 3) { @@ -949,7 +951,7 @@ static int nfs4_init_server(struct nfs_s * - keyed on server and FSID */ struct nfs_server *nfs4_create_server(const struct nfs_parsed_mount_data *data, - struct nfs_fh *mntfh) + struct nfs_fh *mntfh, const char *dev_name) { struct nfs_fattr fattr; struct nfs_server *server; @@ -991,7 +993,7 @@ struct nfs_server *nfs4_create_server(co (unsigned long long) server->fsid.minor); dprintk("Mount FH: %d\n", mntfh->size); - error = nfs_probe_fsinfo(server, mntfh, &fattr); + error = nfs_probe_fsinfo(server, mntfh, &fattr, dev_name); if (error < 0) goto error; @@ -1021,7 +1023,8 @@ error: * Create an NFS4 referral server record */ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data, - struct nfs_fh *mntfh) + struct nfs_fh *mntfh, + const char *dev_name) { struct nfs_client *parent_client; struct nfs_server *server, *parent_server; @@ -1066,7 +1069,7 @@ struct nfs_server *nfs4_create_referral_ goto error; /* probe the filesystem info for this server filesystem */ - error = nfs_probe_fsinfo(server, mntfh, &fattr); + error = nfs_probe_fsinfo(server, mntfh, &fattr, dev_name); if (error < 0) goto error; @@ -1100,7 +1103,8 @@ error: */ struct nfs_server *nfs_clone_server(struct nfs_server *source, struct nfs_fh *fh, - struct nfs_fattr *fattr) + struct nfs_fattr *fattr, + const char *dev_name) { struct nfs_server *server; struct nfs_fattr fattr_fsinfo; @@ -1128,7 +1132,7 @@ struct nfs_server *nfs_clone_server(stru nfs_init_server_aclclient(server); /* probe the filesystem info for this server filesystem */ - error = nfs_probe_fsinfo(server, fh, &fattr_fsinfo); + error = nfs_probe_fsinfo(server, fh, &fattr_fsinfo, dev_name); if (error < 0) goto out_free_server; Index: linux-2.6-2/include/linux/backing-dev.h =================================================================== --- linux-2.6-2.orig/include/linux/backing-dev.h +++ linux-2.6-2/include/linux/backing-dev.h @@ -11,6 +11,8 @@ #include <linux/percpu_counter.h> #include <linux/log2.h> #include <linux/proportions.h> +#include <linux/kernel.h> +#include <linux/device.h> #include <asm/atomic.h> struct page; @@ -48,11 +50,28 @@ struct backing_dev_info { struct prop_local_percpu completions; int dirty_exceeded; + + struct device *dev; }; int bdi_init(struct backing_dev_info *bdi); void bdi_destroy(struct backing_dev_info *bdi); +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...); +void bdi_unregister(struct backing_dev_info *bdi); + +#define bdi_init_fmt(bdi, parent, fmt...) \ + ({ \ + int ret = bdi_init(bdi); \ + if (!ret) { \ + ret = bdi_register(bdi, parent, ##fmt); \ + if (ret) \ + bdi_destroy(bdi); \ + } \ + ret; \ + }) + static inline void __add_bdi_stat(struct backing_dev_info *bdi, enum bdi_stat_item item, s64 amount) { Index: linux-2.6-2/include/linux/writeback.h =================================================================== --- linux-2.6-2.orig/include/linux/writeback.h +++ linux-2.6-2/include/linux/writeback.h @@ -113,6 +113,9 @@ struct file; int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *); +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, + struct backing_dev_info *bdi); + void page_writeback_init(void); void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, unsigned long nr_pages_dirtied); Index: linux-2.6-2/mm/backing-dev.c =================================================================== --- linux-2.6-2.orig/mm/backing-dev.c +++ linux-2.6-2/mm/backing-dev.c @@ -4,12 +4,119 @@ #include <linux/fs.h> #include <linux/sched.h> #include <linux/module.h> +#include <linux/writeback.h> +#include <linux/device.h> + + +static struct class *bdi_class; + +static ssize_t read_ahead_kb_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + char *end; + + bdi->ra_pages = simple_strtoul(buf, &end, 10) >> (PAGE_SHIFT - 10); + + return end - buf; +} + +#define K(pages) ((pages) << (PAGE_SHIFT - 10)) + +#define BDI_SHOW(name, expr) \ +static ssize_t name##_show(struct device *dev, \ + struct device_attribute *attr, char *page) \ +{ \ + struct backing_dev_info *bdi = dev_get_drvdata(dev); \ + \ + return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr); \ +} + +BDI_SHOW(read_ahead_kb, K(bdi->ra_pages)) + +BDI_SHOW(reclaimable_kb, K(bdi_stat(bdi, BDI_RECLAIMABLE))) +BDI_SHOW(writeback_kb, K(bdi_stat(bdi, BDI_WRITEBACK))) + +static inline unsigned long get_dirty(struct backing_dev_info *bdi, int i) +{ + unsigned long thresh[3]; + + get_dirty_limits(&thresh[0], &thresh[1], &thresh[2], bdi); + + return thresh[i]; +} + +BDI_SHOW(dirty_kb, K(get_dirty(bdi, 1))) +BDI_SHOW(bdi_dirty_kb, K(get_dirty(bdi, 2))) + +#define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store) + +static struct device_attribute bdi_dev_attrs[] = { + __ATTR_RW(read_ahead_kb), + __ATTR_RO(reclaimable_kb), + __ATTR_RO(writeback_kb), + __ATTR_RO(dirty_kb), + __ATTR_RO(bdi_dirty_kb), + __ATTR_NULL, +}; + +static __init int bdi_class_init(void) +{ + bdi_class = class_create(THIS_MODULE, "bdi"); + bdi_class->dev_attrs = bdi_dev_attrs; + return 0; +} + +__initcall(bdi_class_init); + +int bdi_register(struct backing_dev_info *bdi, struct device *parent, + const char *fmt, ...) +{ + char *name; + va_list args; + int ret = 0; + struct device *dev; + + va_start(args, fmt); + name = kvasprintf(GFP_KERNEL, fmt, args); + va_end(args); + + if (!name) + return -ENOMEM; + + dev = device_create(bdi_class, parent, MKDEV(0,0), name); + if (IS_ERR(dev)) { + ret = PTR_ERR(dev); + goto exit; + } + + bdi->dev = dev; + dev_set_drvdata(bdi->dev, bdi); + +exit: + kfree(name); + return ret; +} + +void bdi_unregister(struct backing_dev_info *bdi) +{ + if (bdi->dev) { + device_unregister(bdi->dev); + bdi->dev = NULL; + } +} + +EXPORT_SYMBOL(bdi_register); +EXPORT_SYMBOL(bdi_unregister); int bdi_init(struct backing_dev_info *bdi) { int i, j; int err; + bdi->dev = NULL; + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) { err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0); if (err) @@ -33,6 +140,8 @@ void bdi_destroy(struct backing_dev_info { int i; + bdi_unregister(bdi); + for (i = 0; i < NR_BDI_STAT_ITEMS; i++) percpu_counter_destroy(&bdi->bdi_stat[i]); Index: linux-2.6-2/mm/page-writeback.c =================================================================== --- linux-2.6-2.orig/mm/page-writeback.c +++ linux-2.6-2/mm/page-writeback.c @@ -291,7 +291,7 @@ static unsigned long determine_dirtyable return x + 1; /* Ensure that we never return 0 */ } -static void +void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty, struct backing_dev_info *bdi) { Index: linux-2.6-2/lib/percpu_counter.c =================================================================== --- linux-2.6-2.orig/lib/percpu_counter.c +++ linux-2.6-2/lib/percpu_counter.c @@ -102,6 +102,7 @@ void percpu_counter_destroy(struct percp return; free_percpu(fbc->counters); + fbc->counters = NULL; #ifdef CONFIG_HOTPLUG_CPU mutex_lock(&percpu_counters_lock); list_del(&fbc->list); Index: linux-2.6-2/fs/nfs/internal.h =================================================================== --- linux-2.6-2.orig/fs/nfs/internal.h +++ linux-2.6-2/fs/nfs/internal.h @@ -65,16 +65,18 @@ extern void nfs_put_client(struct nfs_cl extern struct nfs_client *nfs_find_client(const struct sockaddr_in *, int); extern struct nfs_server *nfs_create_server( const struct nfs_parsed_mount_data *, - struct nfs_fh *); + struct nfs_fh *, const char *); extern struct nfs_server *nfs4_create_server( const struct nfs_parsed_mount_data *, - struct nfs_fh *); + struct nfs_fh *, const char *); extern struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *, - struct nfs_fh *); + struct nfs_fh *, + const char *); extern void nfs_free_server(struct nfs_server *server); extern struct nfs_server *nfs_clone_server(struct nfs_server *, struct nfs_fh *, - struct nfs_fattr *); + struct nfs_fattr *, + const char *); #ifdef CONFIG_PROC_FS extern int __init nfs_fs_proc_init(void); extern void nfs_fs_proc_exit(void); Index: linux-2.6-2/fs/nfs/super.c =================================================================== --- linux-2.6-2.orig/fs/nfs/super.c +++ linux-2.6-2/fs/nfs/super.c @@ -1359,7 +1359,7 @@ static int nfs_get_sb(struct file_system goto out; /* Get a volume representation */ - server = nfs_create_server(&data, &mntfh); + server = nfs_create_server(&data, &mntfh, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out; @@ -1442,7 +1442,7 @@ static int nfs_xdev_get_sb(struct file_s dprintk("--> nfs_xdev_get_sb()\n"); /* create a new volume representation */ - server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr); + server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out_err_noserver; @@ -1702,7 +1702,7 @@ static int nfs4_get_sb(struct file_syste goto out; /* Get a volume representation */ - server = nfs4_create_server(&data, &mntfh); + server = nfs4_create_server(&data, &mntfh, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out; @@ -1787,7 +1787,7 @@ static int nfs4_xdev_get_sb(struct file_ dprintk("--> nfs4_xdev_get_sb()\n"); /* create a new volume representation */ - server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr); + server = nfs_clone_server(NFS_SB(data->sb), data->fh, data->fattr, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out_err_noserver; @@ -1861,7 +1861,7 @@ static int nfs4_referral_get_sb(struct f dprintk("--> nfs4_referral_get_sb()\n"); /* create a new volume representation */ - server = nfs4_create_referral_server(data, &mntfh); + server = nfs4_create_referral_server(data, &mntfh, dev_name); if (IS_ERR(server)) { error = PTR_ERR(server); goto out_err_noserver; ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH] mm: sysfs: expose the BDI object in sysfs 2007-11-02 14:59 ` [PATCH] mm: sysfs: expose the BDI object in sysfs Peter Zijlstra @ 2007-11-02 15:13 ` Kay Sievers 0 siblings, 0 replies; 112+ messages in thread From: Kay Sievers @ 2007-11-02 15:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Greg KH, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, Trond Myklebust, Miklos Szeredi On Fri, 2007-11-02 at 15:59 +0100, Peter Zijlstra wrote: > On Fri, 2007-11-02 at 15:32 +0100, Kay Sievers wrote: > > On Fri, 2007-11-02 at 15:17 +0100, Peter Zijlstra wrote: > Here is the 'pretty' patch :-) > > Since it relies on the removal of the device name length limit, this > should not yet be applied. Ok, I'll take a look now, and see what's needed here. Would be good to get that into the next kernel, now that you are waiting for the crap to get fixed. :) Thanks, Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 14:48 ` Peter Zijlstra 2007-10-26 15:06 ` Miklos Szeredi 2007-10-26 15:10 ` Kay Sievers @ 2007-10-26 16:37 ` Trond Myklebust 2007-12-14 14:50 ` Peter Zijlstra 2 siblings, 1 reply; 112+ messages in thread From: Trond Myklebust @ 2007-10-26 16:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Kay Sievers, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Miklos Szeredi On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: > Miklos, Trond: could you suggest a better fmt for the bdi_init_fmt() for your > respective filesystems? <snip> > Index: linux-2.6-2/fs/nfs/client.c > =================================================================== > --- linux-2.6-2.orig/fs/nfs/client.c > +++ linux-2.6-2/fs/nfs/client.c > @@ -678,7 +678,7 @@ static int nfs_probe_fsinfo(struct nfs_s > goto out_error; > > nfs_server_set_fsinfo(server, &fsinfo); > - error = bdi_init(&server->backing_dev_info); > + error = bdi_init_fmt(&server->backing_dev_info, "nfs-%s-%p", clp->cl_hostname, server); In our debugging printks, we usually use the superblock->s_id, but that only gets initialised later. I'd suggest passing the 'dev_name' from *_get_sb() into *_create_server(). Cheers Trond ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-26 16:37 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Trond Myklebust @ 2007-12-14 14:50 ` Peter Zijlstra 2007-12-14 15:14 ` Miklos Szeredi 0 siblings, 1 reply; 112+ messages in thread From: Peter Zijlstra @ 2007-12-14 14:50 UTC (permalink / raw) To: Trond Myklebust Cc: Kay Sievers, Nick Piggin, Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg, Miklos Szeredi, Neil Brown On Fri, 2007-10-26 at 12:37 -0400, Trond Myklebust wrote: > On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: > > > Miklos, Trond: could you suggest a better fmt for the bdi_init_fmt() for your > > respective filesystems? > <snip> > > Index: linux-2.6-2/fs/nfs/client.c > > =================================================================== > > --- linux-2.6-2.orig/fs/nfs/client.c > > +++ linux-2.6-2/fs/nfs/client.c > > @@ -678,7 +678,7 @@ static int nfs_probe_fsinfo(struct nfs_s > > goto out_error; > > > > nfs_server_set_fsinfo(server, &fsinfo); > > - error = bdi_init(&server->backing_dev_info); > > + error = bdi_init_fmt(&server->backing_dev_info, "nfs-%s-%p", clp->cl_hostname, server); > > In our debugging printks, we usually use the superblock->s_id, but that > only gets initialised later. > > I'd suggest passing the 'dev_name' from *_get_sb() into > *_create_server(). I just realised that such a name would contain '/' and I'm quite sure that makes filesystems unhappy. Neil suggested using device numbers which would work, however I think those might not be human friendly. While its easy to find the device number of a given path (eg.: stat -c %d /), its rather hard to find the path belonging to a given device number. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-12-14 14:50 ` Peter Zijlstra @ 2007-12-14 15:14 ` Miklos Szeredi 2007-12-14 15:54 ` Peter Zijlstra 0 siblings, 1 reply; 112+ messages in thread From: Miklos Szeredi @ 2007-12-14 15:14 UTC (permalink / raw) To: peterz Cc: trond.myklebust, kay.sievers, nickpiggin, akpm, linux-kernel, jens.axboe, fengguang.wu, greg, miklos, neilb, linuxram > Neil suggested using device numbers which would work, however I think > those might not be human friendly. While its easy to find the device > number of a given path (eg.: stat -c %d /), its rather hard to find the > path belonging to a given device number. Ram Pai had a patch which added the device number (among other things) to /proc/mounts: Subject: [RFC2 PATCH 1/1] VFS: Augment /proc/mount with subroot and shared-subtree From: Ram Pai <linuxram@us.ibm.com> To: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@ftp.linux.org.uk>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, linux-fsdevel@vger.kernel.org, util-linux-ng@vger.kernel.org In-Reply-To: <20070625214640.GC29058@ram.us.ibm.com> Content-Type: text/plain Date: Mon, 16 Jul 2007 11:46:48 -0700 Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org X-Mailing-List: linux-fsdevel@vger.kernel.org /proc/mounts in its current state fail to disambiguate bind mounts, especially when the bind mount is subrooted. Also it does not capture propagation state of the mounts(shared-subtree). The following patch addresses the problem. The following additional fields to /proc/mounts are added. propagation-type in the form of <propagation_flag>[:<mntid>][,...] note: 'shared' flag is followed by the mntid of its peer mount 'slave' flag is followed by the mntid of its master mount 'private' flag stands by itself 'unbindable' flag stands by itself mntid -- is a unique identifier of the mount major:minor -- is the major minor number of the device hosting the filesystem dir -- the subdir in the filesystem which forms the root of this mount parent -- the id of the parent mount Here is a sample cat /proc/mounts after execution the following commands: mount --bind /mnt /mnt mount --make-shared /mnt mount --bind /mnt/1 /var mount --make-slave /var mount --make-shared /var mount --bind /var/abc /tmp mount --make-unbindable /proc rootfs / rootfs rw 0 0 private 2 0:1 / 2 /dev/root / ext2 rw 0 0 private 16 98:0 / 2 /proc /proc proc rw 0 0 unbindable 17 0:3 / 16 devpts /dev/pts devpts rw 0 0 private 18 0:10 / 16 /dev/root /mnt ext2 rw 0 0 shared:19 19 98:0 /mnt 16 /dev/root /var ext2 rw 0 0 shared:21,slave:19 20 98:0 /mnt/1 16 /dev/root /tmp ext2 rw 0 0 shared:20,slave:19 21 98:0 /mnt/1/abc 16 For example, the last line indicates that : 1) The mount is a shared mount. 2) Its peer mount of mount with id 20 3) It is also a slave mount of the master-mount with the id 19 4) The filesystem on device with major/minor number 98:0 and subdirectory mnt/1/abc makes the root directory of this mount. 5) And finally the mount with id 16 is its parent. Testing: symlinked /etc/mtab to /proc/mounts and did some mount and df commands. They worked normally. Signed-off-by: Ram Pai <linuxram@us.ibm.com> --- fs/dcache.c | 53 +++++++++++++++++++++++++++++++ fs/namespace.c | 35 +++++++++++++++++++- fs/pnode.c | 22 +++++++++++++ fs/pnode.h | 2 + fs/seq_file.c | 79 ++++++++++++++++++++++++++++++++++------------- include/linux/dcache.h | 2 + include/linux/mount.h | 1 include/linux/seq_file.h | 1 8 files changed, 172 insertions(+), 23 deletions(-) Index: linux-2.6.21.5/fs/dcache.c =================================================================== --- linux-2.6.21.5.orig/fs/dcache.c +++ linux-2.6.21.5/fs/dcache.c @@ -1835,6 +1835,59 @@ char * d_path(struct dentry *dentry, str return res; } +static inline int prepend(char **buffer, int *buflen, const char *str, + int namelen) +{ + if ((*buflen -= namelen) < 0) + return 1; + *buffer -= namelen; + memcpy(*buffer, str, namelen); + return 0; +} + +/* + * write full pathname into buffer and return start of pathname. + * If @vfsmnt is not specified return the path relative to the + * its filesystem's root. + */ +char * dentry_path(struct dentry *dentry, char *buf, int buflen) +{ + char * end = buf+buflen; + char * retval; + + spin_lock(&dcache_lock); + prepend(&end, &buflen, "\0", 1); + if (!IS_ROOT(dentry) && d_unhashed(dentry)) { + if (prepend(&end, &buflen, "//deleted", 10)) + goto Elong; + } + /* Get '/' right */ + retval = end-1; + *retval = '/'; + + for (;;) { + struct dentry * parent; + if (IS_ROOT(dentry)) + break; + + parent = dentry->d_parent; + prefetch(parent); + + if (prepend(&end, &buflen, dentry->d_name.name, + dentry->d_name.len) || + prepend(&end, &buflen, "/", 1)) + goto Elong; + + retval = end; + dentry = parent; + } + spin_unlock(&dcache_lock); + return retval; +Elong: + spin_unlock(&dcache_lock); + return ERR_PTR(-ENAMETOOLONG); +} + /* * NOTE! The user-level library version returns a * character pointer. The kernel system call just Index: linux-2.6.21.5/fs/namespace.c =================================================================== --- linux-2.6.21.5.orig/fs/namespace.c +++ linux-2.6.21.5/fs/namespace.c @@ -33,6 +33,8 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfsmount_lock); static int event; +static atomic_t mnt_counter; + static struct list_head *mount_hashtable __read_mostly; static int hash_mask __read_mostly, hash_bits __read_mostly; @@ -51,6 +53,7 @@ static inline unsigned long hash(struct return tmp & hash_mask; } + struct vfsmount *alloc_vfsmnt(const char *name) { struct vfsmount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL); @@ -64,6 +67,7 @@ struct vfsmount *alloc_vfsmnt(const char INIT_LIST_HEAD(&mnt->mnt_share); INIT_LIST_HEAD(&mnt->mnt_slave_list); INIT_LIST_HEAD(&mnt->mnt_slave); + mnt->mnt_id = atomic_inc_return(&mnt_counter); if (name) { int size = strlen(name) + 1; char *newname = kmalloc(size, GFP_KERNEL); @@ -386,9 +390,35 @@ static int show_vfsmnt(struct seq_file * if (mnt->mnt_flags & fs_infop->flag) seq_puts(m, fs_infop->str); } - if (mnt->mnt_sb->s_op->show_options) + seq_putc(m, ' '); + if (mnt->mnt_sb->s_op->show_options) { err = mnt->mnt_sb->s_op->show_options(m, mnt); - seq_puts(m, " 0 0\n"); + seq_putc(m, ' '); + } + seq_puts(m, "0 0 "); + if (IS_MNT_SHARED(mnt)) { + seq_printf(m, "shared:%lu", + get_peer_same_ns(mnt)->mnt_id); + if (IS_MNT_SLAVE(mnt)) { + seq_printf(m, ",slave:%lu ", + get_master_same_ns(mnt)->mnt_id); + } else { + seq_putc(m, ' '); + } + } else if (IS_MNT_SLAVE(mnt)) { + seq_printf(m, "slave:%lu ", + get_master_same_ns(mnt)->mnt_id); + } else if (IS_MNT_UNBINDABLE(mnt)) { + seq_printf(m, "unbindable "); + } else { + seq_printf(m, "private "); + } + seq_printf(m, "%lu %u:%u ", mnt->mnt_id, + MAJOR(mnt->mnt_sb->s_dev), + MINOR(mnt->mnt_sb->s_dev)); + seq_dentry(m, mnt->mnt_root, " \t\n\\"); + seq_putc(m, ' '); + seq_printf(m, "%lu \n", mnt->mnt_parent->mnt_id); return err; } @@ -1822,6 +1852,7 @@ void __init mnt_init(unsigned long mempa if (!mount_hashtable) panic("Failed to allocate mount hash table\n"); + atomic_set(&mnt_counter, 0); /* * Find the power-of-two list-heads that can fit into the allocation.. * We don't guarantee that "sizeof(struct list_head)" is necessarily Index: linux-2.6.21.5/fs/seq_file.c =================================================================== --- linux-2.6.21.5.orig/fs/seq_file.c +++ linux-2.6.21.5/fs/seq_file.c @@ -338,38 +338,75 @@ int seq_printf(struct seq_file *m, const } EXPORT_SYMBOL(seq_printf); -int seq_path(struct seq_file *m, - struct vfsmount *mnt, struct dentry *dentry, - char *esc) +static inline char *mangle_path(char *s, char *p, char *esc) { + while (s <= p) { + char c = *p++; + if (!c) { + return s; + } else if (!strchr(esc, c)) { + *s++ = c; + } else if (s + 4 > p) { + break; + } else { + *s++ = '\\'; + *s++ = '0' + ((c & 0300) >> 6); + *s++ = '0' + ((c & 070) >> 3); + *s++ = '0' + (c & 07); + } + } + return NULL; +} + +/* + * return the absolute path of 'dentry' residing in mount 'mnt'. + */ +int seq_path(struct seq_file *m, struct vfsmount *mnt, struct dentry *dentry, + char *esc) +{ + char *p = NULL; if (m->count < m->size) { char *s = m->buf + m->count; - char *p = d_path(dentry, mnt, s, m->size - m->count); + p = d_path(dentry, mnt, s, m->size - m->count); if (!IS_ERR(p)) { - while (s <= p) { - char c = *p++; - if (!c) { - p = m->buf + m->count; - m->count = s - m->buf; - return s - p; - } else if (!strchr(esc, c)) { - *s++ = c; - } else if (s + 4 > p) { - break; - } else { - *s++ = '\\'; - *s++ = '0' + ((c & 0300) >> 6); - *s++ = '0' + ((c & 070) >> 3); - *s++ = '0' + (c & 07); - } + s = mangle_path(s, p, esc); + if (s) { + p = m->buf + m->count; + m->count = s - m->buf; + return s - p; } } } m->count = m->size; - return -1; + return p == ERR_PTR(-ENAMETOOLONG) ? 0 : -1; } + EXPORT_SYMBOL(seq_path); +/* + * returns the path of the 'dentry' from the root of its filesystem. + */ +int seq_dentry(struct seq_file *m, struct dentry *dentry, char *esc) +{ + char *p = NULL; + if (m->count < m->size) { + char *s = m->buf + m->count; + p = dentry_path(dentry, s, m->size - m->count); + if (!IS_ERR(p)) { + s = mangle_path(s, p, esc); + if (s) { + p = m->buf + m->count; + m->count = s - m->buf; + return s - p; + } + } + } + m->count = m->size; + return p == ERR_PTR(-ENAMETOOLONG) ? 0 : -1; +} + +EXPORT_SYMBOL(seq_dentry); + static void *single_start(struct seq_file *p, loff_t *pos) { return NULL + (*pos == 0); Index: linux-2.6.21.5/include/linux/dcache.h =================================================================== --- linux-2.6.21.5.orig/include/linux/dcache.h +++ linux-2.6.21.5/include/linux/dcache.h @@ -294,6 +294,8 @@ extern struct dentry * d_hash_and_lookup extern int d_validate(struct dentry *, struct dentry *); extern char * d_path(struct dentry *, struct vfsmount *, char *, int); +extern char * dentry_path(struct dentry *, char *, int); + /* Allocation counts.. */ Index: linux-2.6.21.5/include/linux/seq_file.h =================================================================== --- linux-2.6.21.5.orig/include/linux/seq_file.h +++ linux-2.6.21.5/include/linux/seq_file.h @@ -43,6 +43,7 @@ int seq_printf(struct seq_file *, const __attribute__ ((format (printf,2,3))); int seq_path(struct seq_file *, struct vfsmount *, struct dentry *, char *); +int seq_dentry(struct seq_file *, struct dentry *, char *); int single_open(struct file *, int (*)(struct seq_file *, void *), void *); int single_release(struct inode *, struct file *); Index: linux-2.6.21.5/fs/pnode.c =================================================================== --- linux-2.6.21.5.orig/fs/pnode.c +++ linux-2.6.21.5/fs/pnode.c @@ -27,6 +27,28 @@ static inline struct vfsmount *next_slav return list_entry(p->mnt_slave.next, struct vfsmount, mnt_slave); } +/* return a peer in the same namespace */ +struct vfsmount *get_peer_same_ns(struct vfsmount *mnt) +{ + struct vfsmount *m = mnt; + do { + m = next_peer(m); + } while (mnt->mnt_ns != m->mnt_ns); + return m; +} + +/* return a peer in the same namespace */ +struct vfsmount *get_master_same_ns(struct vfsmount *mnt) +{ + struct vfsmount *m = mnt->mnt_master; + struct vfsmount *tmp = m; + if (!m) return m; + do { + m = next_peer(m); + } while (tmp != m && mnt->mnt_ns != m->mnt_ns); + return m; +} + static int do_make_slave(struct vfsmount *mnt) { struct vfsmount *peer_mnt = mnt, *master = mnt->mnt_master; Index: linux-2.6.21.5/fs/pnode.h =================================================================== --- linux-2.6.21.5.orig/fs/pnode.h +++ linux-2.6.21.5/fs/pnode.h @@ -34,4 +34,6 @@ int propagate_mnt(struct vfsmount *, str struct list_head *); int propagate_umount(struct list_head *); int propagate_mount_busy(struct vfsmount *, int); +struct vfsmount *get_master_same_ns(struct vfsmount *); +struct vfsmount *get_peer_same_ns(struct vfsmount *); #endif /* _LINUX_PNODE_H */ Index: linux-2.6.21.5/include/linux/mount.h =================================================================== --- linux-2.6.21.5.orig/include/linux/mount.h +++ linux-2.6.21.5/include/linux/mount.h @@ -53,6 +53,7 @@ struct vfsmount { struct list_head mnt_slave; /* slave list entry */ struct vfsmount *mnt_master; /* slave is on master->mnt_slave_list */ struct mnt_namespace *mnt_ns; /* containing namespace */ + unsigned long mnt_id; /* mount identifier */ /* * We put mnt_count & mnt_expiry_mark at the end of struct vfsmount * to let these frequently modified fields in a separate cache line ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-12-14 15:14 ` Miklos Szeredi @ 2007-12-14 15:54 ` Peter Zijlstra 0 siblings, 0 replies; 112+ messages in thread From: Peter Zijlstra @ 2007-12-14 15:54 UTC (permalink / raw) To: Miklos Szeredi Cc: trond.myklebust, kay.sievers, nickpiggin, akpm, linux-kernel, jens.axboe, fengguang.wu, greg, neilb, linuxram On Fri, 2007-12-14 at 16:14 +0100, Miklos Szeredi wrote: > > Neil suggested using device numbers which would work, however I think > > those might not be human friendly. While its easy to find the device > > number of a given path (eg.: stat -c %d /), its rather hard to find the > > path belonging to a given device number. > > Ram Pai had a patch which added the device number (among other things) > to /proc/mounts: > > > > Subject: [RFC2 PATCH 1/1] VFS: Augment /proc/mount with subroot and > shared-subtree > From: Ram Pai <linuxram@us.ibm.com> > To: "H. Peter Anvin" <hpa@zytor.com> > Cc: Al Viro <viro@ftp.linux.org.uk>, > Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, > linux-fsdevel@vger.kernel.org, util-linux-ng@vger.kernel.org > In-Reply-To: <20070625214640.GC29058@ram.us.ibm.com> > Content-Type: text/plain > Date: Mon, 16 Jul 2007 11:46:48 -0700 > Content-Transfer-Encoding: 7bit > Sender: linux-fsdevel-owner@vger.kernel.org > X-Mailing-List: linux-fsdevel@vger.kernel.org > > /proc/mounts in its current state fail to disambiguate bind mounts, especially > when the bind mount is subrooted. Also it does not capture propagation state of > the mounts(shared-subtree). The following patch addresses the problem. > > The following additional fields to /proc/mounts are added. > > propagation-type in the form of <propagation_flag>[:<mntid>][,...] > note: 'shared' flag is followed by the mntid of its peer mount > 'slave' flag is followed by the mntid of its master mount > 'private' flag stands by itself > 'unbindable' flag stands by itself > > mntid -- is a unique identifier of the mount > major:minor -- is the major minor number of the device hosting the filesystem > dir -- the subdir in the filesystem which forms the root of this mount > parent -- the id of the parent mount > > > Here is a sample cat /proc/mounts after execution the following commands: > > mount --bind /mnt /mnt > mount --make-shared /mnt > mount --bind /mnt/1 /var > mount --make-slave /var > mount --make-shared /var > mount --bind /var/abc /tmp > mount --make-unbindable /proc > > rootfs / rootfs rw 0 0 private 2 0:1 / 2 > /dev/root / ext2 rw 0 0 private 16 98:0 / 2 > /proc /proc proc rw 0 0 unbindable 17 0:3 / 16 > devpts /dev/pts devpts rw 0 0 private 18 0:10 / 16 > /dev/root /mnt ext2 rw 0 0 shared:19 19 98:0 /mnt 16 > /dev/root /var ext2 rw 0 0 shared:21,slave:19 20 98:0 /mnt/1 16 > /dev/root /tmp ext2 rw 0 0 shared:20,slave:19 21 98:0 /mnt/1/abc 16 > > For example, the last line indicates that : > > 1) The mount is a shared mount. > 2) Its peer mount of mount with id 20 > 3) It is also a slave mount of the master-mount with the id 19 > 4) The filesystem on device with major/minor number 98:0 and subdirectory > mnt/1/abc makes the root directory of this mount. > 5) And finally the mount with id 16 is its parent. > > > Testing: symlinked /etc/mtab to /proc/mounts and did some mount and df commands. They worked normally. > > > > Signed-off-by: Ram Pai <linuxram@us.ibm.com> OK, I guess that would work. Ram Pai, what is the current status of that work? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 11:40 ` Peter Zijlstra 2007-10-02 12:05 ` Nick Piggin @ 2007-10-02 14:38 ` Kay Sievers 1 sibling, 0 replies; 112+ messages in thread From: Kay Sievers @ 2007-10-02 14:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-kernel, Jens Axboe, Fengguang Wu, greg On 10/2/07, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, 2007-10-02 at 13:21 +0200, Kay Sievers wrote: > > On 10/2/07, Peter Zijlstra <peterz@infradead.org> wrote: > > > On Tue, 2007-10-02 at 12:31 +0200, Kay Sievers wrote: > > > > > > > What would be the point in another top-level tree for device > > > > information? All devices you are exporting information for, are > > > > already in the sysfs tree, right? > > > > > > Never did find NFS mounts/servers/superblocks or whatever constitutes a > > > BDI for NFS in there. Same goes for all other networked filesystems for > > > that matter. > > > > How about adding this information to the tree then, instead of > > creating a new top-level hack, just because something that you think > > you need doesn't exist. > > So you suggest adding all the various network filesystems in there > (where?), and adding the concept of a BDI, and ensuring all are properly > linked together - somehow. Feel free to do so. No, I propose to add only sane new userspace interfaces. That you miss infrastructure to use, should never be the reason to add conceptually broken new interfaces. A new device related top-level directory in /sys is not going to happen. Better don't add new stuff, if you can't do it right. > > You called sysfs a mess, seems you work on that topic too. :) > > I called the in-kernel API to create sysfs files a mess. Not that I have > another opinion on the content of /sys though, always takes to damn long > to find anything in there. Sure, it's a mess, we are trying to clean that up, it's hard, and therefore we can not accept stuff like you propose, that makes it even harder. Use debugfs, if you can't add a sane interface. There are no rules, you can do whatever you want there. If over time the stuff you need gets added, you can always switch over. Or add an attribute group to the existing devices, and leave the ones out, which are not in sysfs right now, until they are added. Or use a device class and use the existing devices as parents. If you don't have a parent, they will show up in "virtual" until someone adds the right parent devices. (If you are going to do this, make sure nothing depends on a device being "virtual", it must be allowed to re-parent these device with any future kernel release.) Thanks, Kay ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24) 2007-10-02 8:17 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Peter Zijlstra [not found] ` <20071002082831.GA19954@mail.ustc.edu.cn> 2007-10-02 8:31 ` Andrew Morton @ 2007-10-03 11:00 ` Martin Knoblauch 2 siblings, 0 replies; 112+ messages in thread From: Martin Knoblauch @ 2007-10-03 11:00 UTC (permalink / raw) To: Peter Zijlstra, Andrew Morton; +Cc: linux-kernel, Jens Axboe, Fengguang Wu --- Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2007-10-01 at 14:22 -0700, Andrew Morton wrote: > > > nfs-remove-congestion_end.patch > > lib-percpu_counter_add.patch > > lib-percpu_counter_sub.patch > > lib-percpu_counter-variable-batch.patch > > lib-make-percpu_counter_add-take-s64.patch > > lib-percpu_counter_set.patch > > lib-percpu_counter_sum_positive.patch > > lib-percpu_count_sum.patch > > lib-percpu_counter_init-error-handling.patch > > lib-percpu_counter_init_irq.patch > > mm-bdi-init-hooks.patch > > mm-scalable-bdi-statistics-counters.patch > > mm-count-reclaimable-pages-per-bdi.patch > > mm-count-writeback-pages-per-bdi.patch > > This one: > > mm-expose-bdi-statistics-in-sysfs.patch > > > lib-floating-proportions.patch > > mm-per-device-dirty-threshold.patch > > mm-per-device-dirty-threshold-warning-fix.patch > > mm-per-device-dirty-threshold-fix.patch > > mm-dirty-balancing-for-tasks.patch > > mm-dirty-balancing-for-tasks-warning-fix.patch > > And, this one: > > debug-sysfs-files-for-the-current-ratio-size-total.patch > > > I'm not sure polluting /sys/block/<foo>/queue/ like that is The Right > Thing. These patches sure were handy when debugging this, but not > sure > they want to move to maineline. > > Maybe we want /sys/bdi/<foo>/ or maybe /debug/bdi/<foo>/ > > Opinions? > Hi Peter, my only opinion is that it is great to see that stuff moving into mainline. If it really goes in, there will be one more very interested rc-tester :-) Cheers Martin ------------------------------------------------------ Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de ^ permalink raw reply [flat|nested] 112+ messages in thread
[parent not found: <20071002083922.GA28892@mail.ustc.edu.cn>]
* writeback fixes [not found] ` <20071002083922.GA28892@mail.ustc.edu.cn> @ 2007-10-02 8:39 ` Fengguang Wu 0 siblings, 0 replies; 112+ messages in thread From: Fengguang Wu @ 2007-10-02 8:39 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Mon, Oct 01, 2007 at 02:22:22PM -0700, Andrew Morton wrote: [...] > writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch > writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch > writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch > writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch > writeback-fix-comment-use-helper-function.patch > writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch > writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch > writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch > writeback-fix-periodic-superblock-dirty-inode-flushing.patch I have 4 more patches on writeback, 3 of them fix *new* problems that was introduced by the above patch. I'd recommend to merge them as a whole - either for 2.6.24 or for 2.6.25. I'll post them right now. Fengguang > introduce-i_sync.patch > introduce-i_sync-fix.patch > writeback-remove-unnecessary-wait-in-throttle_vm_writeout.patch > > Merge ^ permalink raw reply [flat|nested] 112+ messages in thread
* kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (6 preceding siblings ...) [not found] ` <20071002083922.GA28892@mail.ustc.edu.cn> @ 2007-10-02 16:06 ` Hugh Dickins 2007-10-02 9:10 ` Nick Piggin 2007-10-02 18:38 ` Mel Gorman 2007-10-02 16:12 ` -mm merge plans for 2.6.24 Pekka Enberg ` (5 subsequent siblings) 13 siblings, 2 replies; 112+ messages in thread From: Hugh Dickins @ 2007-10-02 16:06 UTC (permalink / raw) To: Andrew Morton; +Cc: Chritoph Lameter, Mel Gorman, linux-kernel, linux-mm On Mon, 1 Oct 2007, Andrew Morton wrote: > # > # slub && antifrag > # > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch > slub-exploit-page-mobility-to-increase-allocation-order.patch > slub-reduce-antifrag-max-order.patch > > I think this stuff is in the "mm stuff we don't want to merge" category. > If so, I really should have dropped it ages ago. I agree. I spent a while last week bisecting down to see why my heavily swapping loads take 30%-60% longer with -mm than mainline, and it was here that they went bad. Trying to keep higher orders free is costly. On the other hand, hasn't SLUB efficiency been built on the expectation that higher orders can be used? And it would be a twisted shame for high performance to be held back by some idiot's swapping load. Hugh ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins @ 2007-10-02 9:10 ` Nick Piggin 2007-10-02 18:38 ` Mel Gorman 1 sibling, 0 replies; 112+ messages in thread From: Nick Piggin @ 2007-10-02 9:10 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Chritoph Lameter, Mel Gorman, linux-kernel, linux-mm On Wednesday 03 October 2007 02:06, Hugh Dickins wrote: > On Mon, 1 Oct 2007, Andrew Morton wrote: > > # > > # slub && antifrag > > # > > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch > > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocation > >s.patch slub-exploit-page-mobility-to-increase-allocation-order.patch > > slub-reduce-antifrag-max-order.patch > > > > I think this stuff is in the "mm stuff we don't want to merge" > > category. If so, I really should have dropped it ages ago. > > I agree. I spent a while last week bisecting down to see why my heavily > swapping loads take 30%-60% longer with -mm than mainline, and it was > here that they went bad. Trying to keep higher orders free is costly. Yeah, no there's no way we'd merge that. > On the other hand, hasn't SLUB efficiency been built on the expectation > that higher orders can be used? And it would be a twisted shame for > high performance to be held back by some idiot's swapping load. IMO it's a bad idea to create all these dependencies like this. If SLUB can get _more_ performance out of using higher order allocations, then fine. If it is starting off at a disadvantage at the same order, then it that should be fixed first, right? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins 2007-10-02 9:10 ` Nick Piggin @ 2007-10-02 18:38 ` Mel Gorman 2007-10-02 18:28 ` Christoph Lameter 1 sibling, 1 reply; 112+ messages in thread From: Mel Gorman @ 2007-10-02 18:38 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Chritoph Lameter, linux-kernel, linux-mm On Tue, 2007-10-02 at 17:06 +0100, Hugh Dickins wrote: > On Mon, 1 Oct 2007, Andrew Morton wrote: > > # > > # slub && antifrag > > # > > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch > > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch > > slub-exploit-page-mobility-to-increase-allocation-order.patch > > slub-reduce-antifrag-max-order.patch > > > > I think this stuff is in the "mm stuff we don't want to merge" category. > > If so, I really should have dropped it ages ago. > > I agree. I spent a while last week bisecting down to see why my heavily > swapping loads take 30%-60% longer with -mm than mainline, and it was > here that they went bad. Trying to keep higher orders free is costly. > Very interesting. I had agreed with these patches being pulled but it was simply on the grounds that there was no agreement it was the right thing to do. It was best to have mainline and -mm behave the same from a fragmentation perspective and revisit this idea from scratch. That it affects swapping loads is news so thanks for that. > On the other hand, hasn't SLUB efficiency been built on the expectation > that higher orders can be used? And it would be a twisted shame for > high performance to be held back by some idiot's swapping load. > My belief is that SLUB can still use the higher orders if configured to do so at boot-time. The loss of these patches means it won't try and do it automatically. Christoph will chime in I'm sure. -- Mel Gorman ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 18:38 ` Mel Gorman @ 2007-10-02 18:28 ` Christoph Lameter 2007-10-03 0:37 ` Christoph Lameter 0 siblings, 1 reply; 112+ messages in thread From: Christoph Lameter @ 2007-10-02 18:28 UTC (permalink / raw) To: Mel Gorman; +Cc: Hugh Dickins, Andrew Morton, linux-kernel, linux-mm On Tue, 2 Oct 2007, Mel Gorman wrote: > > I agree. I spent a while last week bisecting down to see why my heavily > > swapping loads take 30%-60% longer with -mm than mainline, and it was > > here that they went bad. Trying to keep higher orders free is costly. The larger order allocations may cause excessive reclaim under certain circumstances. Reclaim will continue to evict pages until a larger order page can be coalesced. And it seems that this eviction is not that well targeted at this point. So lots of pages may be needlessly evicted. > > On the other hand, hasn't SLUB efficiency been built on the expectation > > that higher orders can be used? And it would be a twisted shame for > > high performance to be held back by some idiot's swapping load. > > > > My belief is that SLUB can still use the higher orders if configured to > do so at boot-time. The loss of these patches means it won't try and do > it automatically. Christoph will chime in I'm sure. You can still manually configure those at boot time via slub_max_order etc. I think Mel and I have to rethink how to do these efficiently. Mel has some ideas and there is some talk about using the vmalloc fallback to insure that things always work. Probably we may have to tune things so that fallback is chosen if reclaim cannot get us the larger order page with reasonable effort. The maximum order of allocation used by SLUB may have to depend on the number of page structs in the system since small systems (128M was the case that Peter found) can easier get into trouble. SLAB has similar measures to avoid order 1 allocations for small systems below 32M. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 18:28 ` Christoph Lameter @ 2007-10-03 0:37 ` Christoph Lameter 0 siblings, 0 replies; 112+ messages in thread From: Christoph Lameter @ 2007-10-03 0:37 UTC (permalink / raw) To: Mel Gorman; +Cc: Hugh Dickins, Andrew Morton, linux-kernel, linux-mm On Tue, 2 Oct 2007, Christoph Lameter wrote: > The maximum order of allocation used by SLUB may have to depend on the > number of page structs in the system since small systems (128M was the > case that Peter found) can easier get into trouble. SLAB has similar > measures to avoid order 1 allocations for small systems below 32M. A patch like this? This is based on the number of page structs on the system. Maybe it needs to be based on the number of MAX_ORDER blocks for antifrag? SLUB: Determine slub_max_order depending on the number of pages available Determine the maximum order to be used for slabs and the mininum desired number of objects in a slab from the amount of pages that a system has available (like SLAB does for the order 1/0 distinction). For systems with less than 128M only use order 0 allocations (SLAB does that for <32M only). The order 0 config is useful for small systems to minimize the memory used. Memory easily fragments since we have less than 32k pages to play with. Order 0 insures that higher order allocations are minimized (Larger orders must still be used for objects that do not fit into order 0 pages). Then step up to order 1 for systems < 256000 pages (1G) Order 2 limit to systems < 1000000 page structs (4G) Order 3 for systems larger than that. Signed-off-by: Christoph Lameter <clameter@sgi.com> --- mm/slub.c | 49 +++++++++++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 24 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2007-10-02 09:26:16.000000000 -0700 +++ linux-2.6/mm/slub.c 2007-10-02 16:40:22.000000000 -0700 @@ -153,25 +153,6 @@ static inline void ClearSlabDebug(struct /* Enable to test recovery from slab corruption on boot */ #undef SLUB_RESILIENCY_TEST -#if PAGE_SHIFT <= 12 - -/* - * Small page size. Make sure that we do not fragment memory - */ -#define DEFAULT_MAX_ORDER 1 -#define DEFAULT_MIN_OBJECTS 4 - -#else - -/* - * Large page machines are customarily able to handle larger - * page orders. - */ -#define DEFAULT_MAX_ORDER 2 -#define DEFAULT_MIN_OBJECTS 8 - -#endif - /* * Mininum number of partial slabs. These will be left on the partial * lists even if they are empty. kmem_cache_shrink may reclaim them. @@ -1718,8 +1699,9 @@ static struct page *get_object_page(cons * take the list_lock. */ static int slub_min_order; -static int slub_max_order = DEFAULT_MAX_ORDER; -static int slub_min_objects = DEFAULT_MIN_OBJECTS; +static int slub_max_order; +static int slub_min_objects = 4; +static int manual; /* * Merge control. If this is set then no merging of slab caches will occur. @@ -2237,7 +2219,7 @@ static struct kmem_cache *kmalloc_caches static int __init setup_slub_min_order(char *str) { get_option (&str, &slub_min_order); - + manual = 1; return 1; } @@ -2246,7 +2228,7 @@ __setup("slub_min_order=", setup_slub_mi static int __init setup_slub_max_order(char *str) { get_option (&str, &slub_max_order); - + manual = 1; return 1; } @@ -2255,7 +2237,7 @@ __setup("slub_max_order=", setup_slub_ma static int __init setup_slub_min_objects(char *str) { get_option (&str, &slub_min_objects); - + manual = 1; return 1; } @@ -2566,6 +2548,16 @@ int kmem_cache_shrink(struct kmem_cache } EXPORT_SYMBOL(kmem_cache_shrink); +/* + * Table to autotune the maximum slab order based on the number of pages + * that the system has available. + */ +static unsigned long __initdata phys_pages_for_order[PAGE_ALLOC_COSTLY_ORDER] = { + 32768, /* >128M if using 4K pages, >512M (16k), >2G (64k) */ + 256000, /* >1G if using 4k pages, >4G (16k), >16G (64k) */ + 1000000 /* >4G if using 4k pages, >16G (16k), >64G (64k) */ +}; + /******************************************************************** * Basic setup of slabs *******************************************************************/ @@ -2575,6 +2567,15 @@ void __init kmem_cache_init(void) int i; int caches = 0; + if (!manual) { + /* No manual parameters. Autotune for system */ + for (i = 0; i < PAGE_ALLOC_COSTLY_ORDER; i++) + if (num_physpages > phys_pages_for_order[i]) { + slub_max_order++; + slub_min_objects <<= 1; + } + } + #ifdef CONFIG_NUMA /* * Must first have the slab cache available for the allocations of the ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: -mm merge plans for 2.6.24 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (7 preceding siblings ...) 2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins @ 2007-10-02 16:12 ` Pekka Enberg 2007-10-02 16:21 ` new aops merge [was Re: -mm merge plans for 2.6.24] Hugh Dickins ` (4 subsequent siblings) 13 siblings, 0 replies; 112+ messages in thread From: Pekka Enberg @ 2007-10-02 16:12 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On 10/2/07, Andrew Morton <akpm@linux-foundation.org> wrote: > revoke-special-mmap-handling.patch > revoke-special-mmap-handling-vs-fault-vs-invalidate.patch > revoke-core-code.patch > slab-api-remove-useless-ctor-parameter-and-reorder-parameters-vs-revoke.patch > revoke-support-for-ext2-and-ext3.patch > revoke-add-documentation.patch > revoke-wire-up-i386-system-calls.patch > fs-introduce-write_begin-write_end-and-perform_write-aops-revoke.patch > fs-introduce-write_begin-write_end-and-perform_write-aops-revoke-fix.patch > revoke-vs-git-block.patch > > Not sure - opinions sought. Needs more work to fix problems raised by viro. I am cooking up a patch but it won't be ready for 2.6.24. ^ permalink raw reply [flat|nested] 112+ messages in thread
* new aops merge [was Re: -mm merge plans for 2.6.24] 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (8 preceding siblings ...) 2007-10-02 16:12 ` -mm merge plans for 2.6.24 Pekka Enberg @ 2007-10-02 16:21 ` Hugh Dickins 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin ` (3 subsequent siblings) 13 siblings, 0 replies; 112+ messages in thread From: Hugh Dickins @ 2007-10-02 16:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, linux-kernel, linux-mm On Mon, 1 Oct 2007, Andrew Morton wrote: > fs-introduce-write_begin-write_end-and-perform_write-aops.patch > introduce-write_begin-write_end-aops-important-fix.patch > introduce-write_begin-write_end-aops-fix2.patch > deny-partial-write-for-loop-dev-fd.patch > mm-restore-kernel_ds-optimisations.patch > implement-simple-fs-aops.patch > implement-simple-fs-aops-fix.patch > ... > fs-remove-some-aop_truncated_page.patch > > Merge Good, fine by me; but forces me to confess, with abject shame, that I still haven't sent you some shmem/tmpfs fixes/cleanups (currently intermingled with some other stuff in my tree, I'm still disentangling). Nothing so bad as to mess up a bisection, but my loop-over-tmpfs tests hang without passing gfp_mask down and down to add_to_swap_cache; and a few other bits. I'll get back on to it. Hugh ^ permalink raw reply [flat|nested] 112+ messages in thread
* remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (9 preceding siblings ...) 2007-10-02 16:21 ` new aops merge [was Re: -mm merge plans for 2.6.24] Hugh Dickins @ 2007-10-02 17:45 ` Nick Piggin 2007-10-03 10:58 ` Andrew Morton 2007-10-03 15:21 ` Linus Torvalds 2007-10-03 19:50 ` A kernel Tracing interface " David Wilder ` (2 subsequent siblings) 13 siblings, 2 replies; 112+ messages in thread From: Nick Piggin @ 2007-10-02 17:45 UTC (permalink / raw) To: Andrew Morton, Torvalds, Linus, linux-mm; +Cc: linux-kernel On Tuesday 02 October 2007 07:22, Andrew Morton wrote: > remove-zero_page.patch > > Linus dislikes it. Probably drop it. I don't know if Linus actually disliked the patch itself, or disliked my (maybe confusingly worded) rationale? To clarify: it is not zero_page that fundamentally causes a problem, but it is a problem that was exposed when I rationalised the page refcounting in the kernel (and mapcounting in the mm). I see about 4 things we can do: 1. Nothing 2. Remove zero_page 3. Reintroduce some refcount special-casing for the zero page 4. zero_page per-node or per-cpu or whatever 1 and 2 kind of imply that nothing much sane should use the zero_page much (the former also implies that we don't care much about those who do, but in that case, why not go for code removal?). 3 and 4 are if we think there are valid heavy users of zero page, or we are worried about hurting badly written apps by removing it. If the former, I'd love to hear about them; if the latter, then it definitely is a valid concern and I have a patch to avoid refcounting (but if this is the case then I do hope that one day we can eventually remove it). > mm-use-pagevec-to-rotate-reclaimable-page.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix-2.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix-function-declaration.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix-bug-at-include-linux-mmh220.p >atch > mm-use-pagevec-to-rotate-reclaimable-page-kill-redundancy-in-rotate_reclaim >able_page.patch > mm-use-pagevec-to-rotate-reclaimable-page-move_tail_pages-into-lru_add_drai >n.patch > > I guess I'll merge this. Would be nice to have wider perfromance testing > but I guess it'll be easy enough to undo. Care to give it one more round through -mm? Is it easy enough to keep? I haven't had a chance to review it, which I'd like to do at some point (and I don't think it would hurt to have a bit more testing). ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin @ 2007-10-03 10:58 ` Andrew Morton 2007-10-03 15:21 ` Linus Torvalds 1 sibling, 0 replies; 112+ messages in thread From: Andrew Morton @ 2007-10-03 10:58 UTC (permalink / raw) To: Nick Piggin; +Cc: Torvalds, Linus, linux-mm, linux-kernel On Wed, 3 Oct 2007 03:45:09 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > mm-use-pagevec-to-rotate-reclaimable-page.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix-2.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix-function-declaration.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix-bug-at-include-linux-mmh220.p > >atch > > mm-use-pagevec-to-rotate-reclaimable-page-kill-redundancy-in-rotate_reclaim > >able_page.patch > > mm-use-pagevec-to-rotate-reclaimable-page-move_tail_pages-into-lru_add_drai > >n.patch > > > > I guess I'll merge this. Would be nice to have wider perfromance testing > > but I guess it'll be easy enough to undo. > > Care to give it one more round through -mm? Is it easy enough to > keep? Yup. Nobody has done much with that code in ages. > I haven't had a chance to review it, which I'd like to do at some > point (and I don't think it would hurt to have a bit more testing). Sure. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin 2007-10-03 10:58 ` Andrew Morton @ 2007-10-03 15:21 ` Linus Torvalds 2007-10-08 15:17 ` Nick Piggin 1 sibling, 1 reply; 112+ messages in thread From: Linus Torvalds @ 2007-10-03 15:21 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-mm, linux-kernel On Wed, 3 Oct 2007, Nick Piggin wrote: > > I don't know if Linus actually disliked the patch itself, or disliked > my (maybe confusingly worded) rationale? Yes. I'd happily accept the patch, but I'd want it clarified and made obvious what the problem was - and it wasn't the zero page itself, it was a regression in the VM that made it less palatable. I also thought that there were potentially better solutions, namely to simply avoid the VM regression, but I also acknowledge that they may not be worth it - I just want them to be on the table. In short: the real cost of the zero page was the reference counting on the page that we do these days. For example, I really do believe that the problem could fairly easily be fixed by simply not considering zero_page to be a "vm_normal_page()". We already *do* have support for pages not getting ref-counted (since we need it for other things), and I think that zero_page very naturally falls into exactly that situation. So in many ways, I would think that turning zero-page into a nonrefcounted page (the same way we really do have to do for other things anyway) would be the much more *direct* solution, and in many ways the obvious one. HOWEVER - if people think that it's easier to remove zero_page, and want to do it for other reasons, *AND* can hopefully even back up the claim that it never matters with numbers (ie that the extra pagefaults just make the whole zero-page thing pointless), then I'd certainly accept the patch. I'd just want the patch *description* to then also be correct, and blame the right situation, instead of blaming zero-page itself. Linus ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-03 15:21 ` Linus Torvalds @ 2007-10-08 15:17 ` Nick Piggin 2007-10-09 13:00 ` Hugh Dickins 2007-10-09 14:52 ` Linus Torvalds 0 siblings, 2 replies; 112+ messages in thread From: Nick Piggin @ 2007-10-08 15:17 UTC (permalink / raw) To: Linus Torvalds, Hugh Dickins; +Cc: Andrew Morton, linux-mm, linux-kernel On Thursday 04 October 2007 01:21, Linus Torvalds wrote: > On Wed, 3 Oct 2007, Nick Piggin wrote: > > I don't know if Linus actually disliked the patch itself, or disliked > > my (maybe confusingly worded) rationale? > > Yes. I'd happily accept the patch, but I'd want it clarified and made > obvious what the problem was - and it wasn't the zero page itself, it was > a regression in the VM that made it less palatable. OK, revised changelog at the end of this mail... > I also thought that there were potentially better solutions, namely to > simply avoid the VM regression, but I also acknowledge that they may not > be worth it - I just want them to be on the table. > > In short: the real cost of the zero page was the reference counting on the > page that we do these days. For example, I really do believe that the > problem could fairly easily be fixed by simply not considering zero_page > to be a "vm_normal_page()". We already *do* have support for pages not > getting ref-counted (since we need it for other things), and I think that > zero_page very naturally falls into exactly that situation. > > So in many ways, I would think that turning zero-page into a nonrefcounted > page (the same way we really do have to do for other things anyway) would > be the much more *direct* solution, and in many ways the obvious one. That was my first approach. It isn't completely trivial, but vm_normal_page() does play a part (but we end up needing a vm_normal_page() variant -- IIRC vm_normal_or_zero_page()). But taken as a whole, non-refcounted zero_page is obviously a lot more work than no zero page at all :) > HOWEVER - if people think that it's easier to remove zero_page, and want > to do it for other reasons, *AND* can hopefully even back up the claim > that it never matters with numbers (ie that the extra pagefaults just make > the whole zero-page thing pointless), then I'd certainly accept the patch. I have done some tests which indicate a couple of very basic common tools don't do much zero-page activity (ie. kbuild). And also combined with some logical arguments to say that a "sane" app wouldn't be using zero_page much. (basically -- if the app cares about memory or cache footprint and is using many pages of zeroes, then it should have a more compressed representation of zeroes anyway). However there is a window for some "insane" code to regress without the zero_page. I'm not arguing that we don't care about those, however I have no way to guarantee they don't exist. I hope we wouldn't get a potentially useless complexity like this stuck in the VM forever just because we don't _know_ whether it's useful to anybody... How about something like this? --- From: Nick Piggin <npiggin@suse.de> The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!). There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see. Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used). As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it. When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second, and 2,000 bounces of the ZERO_PAGE struct page cacheline per second when running kbuild, while saving less than 1 page clearing operation per second (even 1 page clear is far cheaper than a thousand cacheline bounces between CPUs). Of course, neither the logical argument nor these checks give anything like a guarantee of no regressions. However, I think this is a reasonable opportunity to remove the ZERO_PAGE from the pagefault path. The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except complexity and useless benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and ripping out ZERO_PAGE completely if/when this patch gets in. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-08 15:17 ` Nick Piggin @ 2007-10-09 13:00 ` Hugh Dickins 2007-10-09 14:52 ` Linus Torvalds 1 sibling, 0 replies; 112+ messages in thread From: Hugh Dickins @ 2007-10-09 13:00 UTC (permalink / raw) To: Nick Piggin; +Cc: Linus Torvalds, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note > > A last caveat: the ZERO_PAGE is now refcounted and managed with rmap > (and thus mapcounted and count towards shared rss). These writes to > the struct page could cause excessive cacheline bouncing on big > systems. There are a number of ways this could be addressed if it is > an issue. > > And indeed this cacheline bouncing has shown up on large SGI systems. > There was a situation where an Altix system was essentially livelocked > tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. > This situation can be avoided in userspace, but it does highlight the > potential scalability problem with refcounting ZERO_PAGE, and corner > cases where it can really hurt (we don't want the system to livelock!). > > There are several broad ways to fix this problem: > 1. add back some special casing to avoid refcounting ZERO_PAGE > 2. per-node or per-cpu ZERO_PAGES > 3. remove the ZERO_PAGE completely > > I will argue for 3. The others should also fix the problem, but they > result in more complex code than does 3, with little or no real benefit > that I can see. Why? Sorry, I've no useful arguments to add (and my testing was too much like yours to add any value), but I do want to go on record as still a strong supporter of approach 3 and your patch. Hugh ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-08 15:17 ` Nick Piggin 2007-10-09 13:00 ` Hugh Dickins @ 2007-10-09 14:52 ` Linus Torvalds 2007-10-09 9:31 ` Nick Piggin 1 sibling, 1 reply; 112+ messages in thread From: Linus Torvalds @ 2007-10-09 14:52 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > I have done some tests which indicate a couple of very basic common tools > don't do much zero-page activity (ie. kbuild). And also combined with some > logical arguments to say that a "sane" app wouldn't be using zero_page much. > (basically -- if the app cares about memory or cache footprint and is using > many pages of zeroes, then it should have a more compressed representation > of zeroes anyway). One of the things that zero-page has been used for is absolutely *huge* (but sparse) arrays in Fortan programs. At least in traditional fortran, it was very hard to do dynamic allocations, so people would allocate the *maximum* array statically, and then not necessarily use everything. I don't know if the pages ever even got paged in, but this is the kind of usage which is *not* insane. Linus ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 14:52 ` Linus Torvalds @ 2007-10-09 9:31 ` Nick Piggin 2007-10-10 2:22 ` Linus Torvalds 0 siblings, 1 reply; 112+ messages in thread From: Nick Piggin @ 2007-10-09 9:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wednesday 10 October 2007 00:52, Linus Torvalds wrote: > On Tue, 9 Oct 2007, Nick Piggin wrote: > > I have done some tests which indicate a couple of very basic common tools > > don't do much zero-page activity (ie. kbuild). And also combined with > > some logical arguments to say that a "sane" app wouldn't be using > > zero_page much. (basically -- if the app cares about memory or cache > > footprint and is using many pages of zeroes, then it should have a more > > compressed representation of zeroes anyway). > > One of the things that zero-page has been used for is absolutely *huge* > (but sparse) arrays in Fortan programs. > > At least in traditional fortran, it was very hard to do dynamic > allocations, so people would allocate the *maximum* array statically, and > then not necessarily use everything. I don't know if the pages ever even > got paged in, In which case, they would not be using the ZERO_PAGE? If they were paging in (ie. reading) huge reams of zeroes, then maybe their algorithms aren't so good anyway? (I don't know). > but this is the kind of usage which is *not* insane. Yeah, that's why I use the double quotes... I wonder how to find out, though. I guess I could ask SGI if they could ask around -- but that still comes back to the problem of not being able to ever conclusively show that there are no real users of the ZERO_PAGE. Where do you suggest I go from here? Is there any way I can convince you to try it? Make it a config option? (just kidding) ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 9:31 ` Nick Piggin @ 2007-10-10 2:22 ` Linus Torvalds 2007-10-09 10:15 ` Nick Piggin 0 siblings, 1 reply; 112+ messages in thread From: Linus Torvalds @ 2007-10-10 2:22 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > Where do you suggest I go from here? Is there any way I can > convince you to try it? Make it a config option? (just kidding) No, I'll take the damn patch, but quite frankly, I think your arguments suck. I've told you so before, and asked for numbers, and all you do is handwave. And this is like the *third*time*, and you don't even seem to admit that you're handwaving. So let's do it, but dammit: - make sure there aren't any invalid statements like this in the final commit message. - if somebody shows that you were wrong, and points to a real load, please never *ever* make excuses for this again, ok? Is that a deal? I hope we'll never need to hear about this again, but I really object to the way you've tried to "sell" this thing, by basically starting out dishonest about what the problem was, and even now I've yet to see a *single* performance number even though I've asked for them (except for the problem case, which was introduced by *you*) Linus ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-10 2:22 ` Linus Torvalds @ 2007-10-09 10:15 ` Nick Piggin 2007-10-10 3:06 ` Linus Torvalds 2007-10-10 4:06 ` Hugh Dickins 0 siblings, 2 replies; 112+ messages in thread From: Nick Piggin @ 2007-10-09 10:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wednesday 10 October 2007 12:22, Linus Torvalds wrote: > On Tue, 9 Oct 2007, Nick Piggin wrote: > > Where do you suggest I go from here? Is there any way I can > > convince you to try it? Make it a config option? (just kidding) > > No, I'll take the damn patch, but quite frankly, I think your arguments > suck. > > I've told you so before, and asked for numbers, and all you do is I gave 2 other numbers. After that, it really doesn't matter if I give you 2 numbers or 200, because it wouldn't change the fact that there are 3 programs using the ZERO_PAGE that we'll never know about. > handwave. And this is like the *third*time*, and you don't even seem to > admit that you're handwaving. I think I've always admitted I'm handwaving in my assertion that programs would not be using the zero page. My handwaving is an attempt to show that I have some vaguely reasonable reasons to think it will be OK to remove it. That's all. > So let's do it, but dammit: > - make sure there aren't any invalid statements like this in the final > commit message. Was the last one OK? > - if somebody shows that you were wrong, and points to a real load, > please never *ever* make excuses for this again, ok? > > Is that a deal? I hope we'll never need to hear about this again, but I > really object to the way you've tried to "sell" this thing, by basically > starting out dishonest about what the problem was, The dishonesty in the changelog is more of an oversight than an attempt to get it merged. It never even crossed my mind that you would be fooled by it ;) To prove my point: the *first* approach I posted to fix this problem was exactly a patch to special-case the zero_page refcounting which was removed with my PageReserved patch. Neither Hugh nor yourself liked it one bit! So I have no particular bias against the zero page or problem admitting I introduced the issue. I do just think this could be a nice opportunity to try getting rid of the zero page and simplifiy things. > and even now I've yet > to see a *single* performance number even though I've asked for them > (except for the problem case, which was introduced by *you*) Basically: I don't know what else to show you! I expect it would be relatively difficult to find a measurable difference between no zero-page and zero-page with no refcounting problem. Precisely because I can't find anything that really makes use of it. Again: what numbers can I get for you that would make you feel happier about it? Anyway, before you change your mind: it's a deal! If somebody screams then I'll have a patch for you to reintroduce the zero page minus refcounting the next day. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 10:15 ` Nick Piggin @ 2007-10-10 3:06 ` Linus Torvalds 2007-10-10 4:06 ` Hugh Dickins 1 sibling, 0 replies; 112+ messages in thread From: Linus Torvalds @ 2007-10-10 3:06 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > I gave 2 other numbers. After that, it really doesn't matter if I give > you 2 numbers or 200, because it wouldn't change the fact that there > are 3 programs using the ZERO_PAGE that we'll never know about. You gave me no timings what-so-ever. Yes, you said "1000 page faults", but no, I have yet to see a *single* actual performance number. Maybe I missed it? Or maybe you just never did them. Was it really so non-obvious that I actually wanted *performance* numbers, not just some random numbers about how many page faults you have? Or did you post them somewhere else? I don't have any memory of having seen any performance numbers what-so-ever, but I admittedly get too much email. Here's three numbers of my own: 8, 17 and 975. So I gave you "numbers", but what do they _mean_? So let me try one more time: - I don't want any excuses about how bad PAGE_ZERO is. You made it bad, it wasn't bad before. - I want numbers. I want the commit message to tell us *why* this is done. The numbers I want is performance numbers, not handwave numbers. Both for the bad case that it's supposed to fix, *and* for "normal load". - I want you to just say that if it turns out that there are people who use ZERO_PAGE, you stop calling them crazy, and promise to look at the alternatives. How much clearer can I be? I have said several times that I think this patch is kind of sad, and the reason I think it's sad is that you (and Hugh) convinced me to take the patch that made it sad in the first place. It didn't *use* to be bad. And I've use ZERO_PAGE myself for timing, I've had nice test-programs that knew that it could ignore cache effects and get pure TLB effects when it just allocated memory and didn't write to it. That's why I don't like the lack of numbers. That's why I didn't like the original commit message that tried to blame the wrong part. That's why I didn't like this patch to begin with. But I'm perfectly ready to take it, and see if anybody complains. Hopefully nobody ever will. But by now I absolutely *detest* this patch because of its history, and how I *told* you guys what the reserved bit did, and how you totally ignored it, and then tried to blame ZERO_PAGE for that. So yes, I want the patch to be accompanied by an explanation, which includes the performance side of why it is wanted/needed in the first place. If this patch didn't have that kind of history, I wouldn't give a flying f about it. As it is, this whole thing has a background. Linus ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 10:15 ` Nick Piggin 2007-10-10 3:06 ` Linus Torvalds @ 2007-10-10 4:06 ` Hugh Dickins 2007-10-10 5:20 ` Linus Torvalds 1 sibling, 1 reply; 112+ messages in thread From: Hugh Dickins @ 2007-10-10 4:06 UTC (permalink / raw) To: Nick Piggin; +Cc: Linus Torvalds, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > by it ;) To prove my point: the *first* approach I posted to fix this > problem was exactly a patch to special-case the zero_page refcounting > which was removed with my PageReserved patch. Neither Hugh nor yourself > liked it one bit! True (speaking for me; I forget whether Linus ever got to see it). I apologize to you, Nick, for getting you into this position of fighting for something which wasn't your choice in the first place. If I thought we'd have a better kernel by dropping this patch and going back to one that just avoids the refcounting, I'd say do it. No, I still think it's worth trying this one first. But best have your avoid-the-refcounting patch ready and reviewed for emergency use if regression does show up somewhere. Thanks, Hugh [My mails out are at present getting randomly delayed by six hours or so, which makes it extra hard for me to engage usefully in any thread.] ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-10 4:06 ` Hugh Dickins @ 2007-10-10 5:20 ` Linus Torvalds 2007-10-09 14:30 ` Nick Piggin 0 siblings, 1 reply; 112+ messages in thread From: Linus Torvalds @ 2007-10-10 5:20 UTC (permalink / raw) To: Hugh Dickins; +Cc: Nick Piggin, Andrew Morton, linux-mm, linux-kernel On Wed, 10 Oct 2007, Hugh Dickins wrote: > On Tue, 9 Oct 2007, Nick Piggin wrote: > > by it ;) To prove my point: the *first* approach I posted to fix this > > problem was exactly a patch to special-case the zero_page refcounting > > which was removed with my PageReserved patch. Neither Hugh nor yourself > > liked it one bit! > > True (speaking for me; I forget whether Linus ever got to see it). The problem is, those first "remove ref-counting" patches were ugly *regardless* of ZERO_PAGE. We (yes, largely I) fixed up the mess since. The whole vm_normal_page() and the magic PFN_REMAP thing got rid of a lot of the problems. And I bet that we could do something very similar wrt the zero page too. Basically, the ZERO page could act pretty much exactly like a PFN_REMAP page: the VM would not touch it. No rmap, no page refcounting, no nothing. This following patch is not meant to be even half-way correct (it's not even _remotely_ tested), but is just meant to be a rough "grep for ZERO_PAGE in the VM, and see what happens if you don't ref-count it". Would something like the below work? I dunno. But I suspect it would. I doubt anybody has the energy to actually try to actually follow through on it, which is why I'm not pushing on it any more, and why I'll accept Nick's patch to just remove ZERO_PAGE, but I really *am* very unhappy about this. The "page refcounting cleanups" in the VM back when were really painful. And dammit, I felt like I was the one who had to clean them up after you guys. Which makes me really testy on this subject. And yes, I also admit that the vm_normal_page() and the PFN_REMAP thing ended up really improving the VM, and we're pretty much certainly better off now than we were before - but I also think that ZERO_PAGE etc could easily be handled with the same model. After all, if we can make "mmap(/dev/mem)" work with COW and everything, I'd argue that ZERO_PAGE really is just a very very small special case of that! Totally half-assed untested patch to follow, not meant for anything but a "I think this kind of approach should have worked too" comment. So I'm not pushing the patch below, I'm just fighting for people realizing that - the kernel has *always* (since pretty much day 1) done that ZERO_PAGE thing. This means that I would not be at all surprised if some application basically depends on it. I've written test-programs that depends on it - maybe people have written other code that basically has been written for and tested with a kernel that has basically always made read-only zero pages extra cheap. So while it may be true that removing ZERO_PAGE won't affect anybody, I don't think it's a given, and I also don't think it's sane calling people "crazy" for depending on something that has always been true under Linux for the last 15+ years. There are few behaviors that have been around for that long. - make sure the commit message is accurate as to need for this (ie not claim that the ZERO_PAGE itself was the problem, and give some actual performance numbers on what is going on) that's all. Linus --- mm/memory.c | 17 ++++++++--------- mm/migrate.c | 2 +- 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index f82b359..0a8cc88 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -386,6 +386,7 @@ static inline int is_cow_mapping(unsigned int flags) struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { unsigned long pfn = pte_pfn(pte); + struct page *page; if (unlikely(vma->vm_flags & VM_PFNMAP)) { unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT; @@ -413,7 +414,11 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_ * The PAGE_ZERO() pages and various VDSO mappings can * cause them to exist. */ - return pfn_to_page(pfn); + page = pfn_to_page(pfn); + if (PageReserved(page)) + page = NULL; + + return page; } /* @@ -968,7 +973,7 @@ no_page_table: if (flags & FOLL_ANON) { page = ZERO_PAGE(address); if (flags & FOLL_GET) - get_page(page); + page = alloc_page(GFP_KERNEL | GFP_ZERO); BUG_ON(flags & FOLL_WRITE); } return page; @@ -1131,9 +1136,6 @@ static int zeromap_pte_range(struct mm_struct *mm, pmd_t *pmd, pte++; break; } - page_cache_get(page); - page_add_file_rmap(page); - inc_mm_counter(mm, file_rss); set_pte_at(mm, addr, pte, zero_pte); } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); @@ -1717,7 +1719,7 @@ gotten: if (unlikely(anon_vma_prepare(vma))) goto oom; - if (old_page == ZERO_PAGE(address)) { + if (!old_page) { new_page = alloc_zeroed_user_highpage_movable(vma, address); if (!new_page) goto oom; @@ -2274,15 +2276,12 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, } else { /* Map the ZERO_PAGE - vm_page_prot is readonly */ page = ZERO_PAGE(address); - page_cache_get(page); entry = mk_pte(page, vma->vm_page_prot); ptl = pte_lockptr(mm, pmd); spin_lock(ptl); if (!pte_none(*page_table)) goto release; - inc_mm_counter(mm, file_rss); - page_add_file_rmap(page); } set_pte_at(mm, address, page_table, entry); diff --git a/mm/migrate.c b/mm/migrate.c index e2fdbce..8d2e110 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -827,7 +827,7 @@ static int do_move_pages(struct mm_struct *mm, struct page_to_node *pm, goto set_status; if (PageReserved(page)) /* Check for zero page */ - goto put_and_set; + goto set_status; pp->page = page; err = page_to_nid(page); ^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-10 5:20 ` Linus Torvalds @ 2007-10-09 14:30 ` Nick Piggin 2007-10-10 15:04 ` Linus Torvalds 0 siblings, 1 reply; 112+ messages in thread From: Nick Piggin @ 2007-10-09 14:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wednesday 10 October 2007 15:20, Linus Torvalds wrote: > On Wed, 10 Oct 2007, Hugh Dickins wrote: > > On Tue, 9 Oct 2007, Nick Piggin wrote: > > > by it ;) To prove my point: the *first* approach I posted to fix this > > > problem was exactly a patch to special-case the zero_page refcounting > > > which was removed with my PageReserved patch. Neither Hugh nor yourself > > > liked it one bit! > > > > True (speaking for me; I forget whether Linus ever got to see it). > > The problem is, those first "remove ref-counting" patches were ugly > *regardless* of ZERO_PAGE. > > We (yes, largely I) fixed up the mess since. The whole vm_normal_page() > and the magic PFN_REMAP thing got rid of a lot of the problems. > > And I bet that we could do something very similar wrt the zero page too. > > Basically, the ZERO page could act pretty much exactly like a PFN_REMAP > page: the VM would not touch it. No rmap, no page refcounting, no nothing. > > This following patch is not meant to be even half-way correct (it's not > even _remotely_ tested), but is just meant to be a rough "grep for > ZERO_PAGE in the VM, and see what happens if you don't ref-count it". > > Would something like the below work? I dunno. But I suspect it would. I Sure it will work. It's not completely trivial like your patch, though. The VM has to know about ZERO_PAGE if you also want it to do the "optimised" wp (what you have won't work because it will break all other "not normal" pages which are non-zero I think). And your follow_page_page path is not going to do the right thing for ZERO_PAGE either I think. > doubt anybody has the energy to actually try to actually follow through on > it, which is why I'm not pushing on it any more, and why I'll accept Sure they have. http://marc.info/?l=linux-mm&m=117515508009729&w=2 OK, this patch was open coding the tests rather than putting them in vm_normal_page, but vm_normal_page doesn't magically make it a whole lot cleaner (a _little_ bit cleaner, I agree, but in my current patch I still need a vm_normal_or_zero_page() function). > Nick's patch to just remove ZERO_PAGE, but I really *am* very unhappy > about this. Well that's not very good... > The "page refcounting cleanups" in the VM back when were really painful. > And dammit, I felt like I was the one who had to clean them up after you > guys. Which makes me really testy on this subject. OK, but in this case we'll not have a big hard-to-revert set of changes that fundamentally alter assumptions throughout the vm. It will be more a case of "if somebody screams, put the zero page back", won't it? > Totally half-assed untested patch to follow, not meant for anything but a > "I think this kind of approach should have worked too" comment. > > So I'm not pushing the patch below, I'm just fighting for people realizing > that > > - the kernel has *always* (since pretty much day 1) done that ZERO_PAGE > thing. This means that I would not be at all surprised if some > application basically depends on it. I've written test-programs that > depends on it - maybe people have written other code that basically has > been written for and tested with a kernel that has basically always > made read-only zero pages extra cheap. > > So while it may be true that removing ZERO_PAGE won't affect anybody, I > don't think it's a given, and I also don't think it's sane calling > people "crazy" for depending on something that has always been true > under Linux for the last 15+ years. There are few behaviors that have > been around for that long. That's the main question. Maybe my wording was a little strong, but I simply personally couldn't think of sane uses of zero page. I'm not prepared to argue that none could possibly exist. It just seems like now might be a good time to just _try_ removing the zero page, because of this peripheral problem caused by my refcounting patch. If it doesn't work out, then at least we'll be wiser for it, we can document why the zero page is needed, and add it back with the refcounting exceptions. > - make sure the commit message is accurate as to need for this (ie not > claim that the ZERO_PAGE itself was the problem, and give some actual > performance numbers on what is going on) OK, maybe this is where we are not on the same page. There are 2 issues really. Firstly, performance problem of refcounting the zero-page -- we've established that it causes this livelock and that we should stop refcounting it, right? Second issue is the performance difference between removing the zero page completely, and de-refcounting it (it's obviously incorrect to argue for zero page removal for performance reasons if the performance improvement is simply coming from avoiding the refcounting). The problem with that is I simply don't know any tests that use the ZERO_PAGE significantly enough to measure a difference. The 1000 COW faults vs < 1 unmap per second thing was simply to show that, on the micro level, performance won't have regressed by removing the zero page. So I'm not arguing to remove the zero page because performance is so much better than having a de-refcounted zero page! I'm saying that we should remove the refcounting one way or the other. If you accept that, then I argue that we should try removing zero page completely rather than just de-refcounting it, because that allows nice simplifications and hopefully nobody will miss the zero page. Does that make sense? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 14:30 ` Nick Piggin @ 2007-10-10 15:04 ` Linus Torvalds 0 siblings, 0 replies; 112+ messages in thread From: Linus Torvalds @ 2007-10-10 15:04 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wed, 10 Oct 2007, Nick Piggin wrote: > > It just seems like now might be a good time to just _try_ removing > the zero page Yes. Let's do your patch immediately after the x86 merge, and just see if anybody screams. It might take a while, because I certainly agree that whoever would be affected by it is likely to be unusual. > OK, maybe this is where we are not on the same page. > There are 2 issues really. Firstly, performance problem of > refcounting the zero-page -- we've established that it causes > this livelock and that we should stop refcounting it, right? Yes, I do agree that refcounting is problematic. > Second issue is the performance difference between removing the > zero page completely, and de-refcounting it (it's obviously > incorrect to argue for zero page removal for performance reasons > if the performance improvement is simply coming from avoiding > the refcounting). Well, even if it's a "when you don't get into the bad behaviour, performance difference is not measurable", and give a before-and-after number for some random but interesting load. Even if it's just a kernel compile.. Linus ^ permalink raw reply [flat|nested] 112+ messages in thread
* A kernel Tracing interface (was Re: -mm merge plans for 2.6.24) 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (10 preceding siblings ...) 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin @ 2007-10-03 19:50 ` David Wilder 2007-10-09 9:19 ` r/o bind mounts, was Re: -mm merge plans for 2.6.24 Christoph Hellwig 2007-10-13 8:44 ` Borislav Petkov 13 siblings, 0 replies; 112+ messages in thread From: David Wilder @ 2007-10-03 19:50 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, SystemTAP Andrew- Could you please add the trace patches to the merge list? These patches have been very well reviewed on lkml. I believe they are ready to be merged. The patches can be found here: http://lkml.org/lkml/2007/10/2/236 http://lkml.org/lkml/2007/10/2/237 http://lkml.org/lkml/2007/10/2/238 http://lkml.org/lkml/2007/10/2/239 Dave Wilder ^ permalink raw reply [flat|nested] 112+ messages in thread
* r/o bind mounts, was Re: -mm merge plans for 2.6.24 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (11 preceding siblings ...) 2007-10-03 19:50 ` A kernel Tracing interface " David Wilder @ 2007-10-09 9:19 ` Christoph Hellwig 2007-10-13 8:44 ` Borislav Petkov 13 siblings, 0 replies; 112+ messages in thread From: Christoph Hellwig @ 2007-10-09 9:19 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Dave Hansen > r-o-bind-mounts-filesystem-helpers-for-custom-struct-files.patch > r-o-bind-mounts-rearrange-may_open-to-be-r-o-friendly.patch > r-o-bind-mounts-give-permission-a-local-mnt-variable.patch > r-o-bind-mounts-create-cleanup-helper-svc_msnfs.patch <...> > Doesn't seem ready yet I guess we need some more time for this patchset to mature, yes. But the four patches I've quoted about are just small preparator cleanups that should go into 2.6.24. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: -mm merge plans for 2.6.24 2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton ` (12 preceding siblings ...) 2007-10-09 9:19 ` r/o bind mounts, was Re: -mm merge plans for 2.6.24 Christoph Hellwig @ 2007-10-13 8:44 ` Borislav Petkov 2007-10-13 8:52 ` Andrew Morton 13 siblings, 1 reply; 112+ messages in thread From: Borislav Petkov @ 2007-10-13 8:44 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Eugene Teo On Mon, Oct 01, 2007 at 02:22:22PM -0700, Andrew Morton wrote: <snip> > #increase-at_vector_size-to-terminate-saved_auxv-properly.patch: Tony wanted enhancements > increase-at_vector_size-to-terminate-saved_auxv-properly.patch > change-inotifyfs-magic-as-the-same-magic-is-used-for-futexfs-v2.patch > delay-creation-of-khcvd-thread.patch > hvc-console-is-also-used-by-iseries-so-add-that-to-hvc_driver-help.patch > lockdep-give-each-filesystem-its-own-inode-lock-class.patch > menuconfig-transform-nls-and-dlm-menus.patch > menuconfig-transform-network-filesystems-menu.patch > fs-udf-ballocc-mark-a-variable-as-uninitialized_var.patch > jbd-config_jbd_debug-cannot-create-proc-entry.patch > jbd-config_jbd_debug-cannot-create-proc-entry-fix.patch > jbd-fix-commit-code-to-properly-abort-journal.patch > jbd-fix-jbd-warnings-when-compiling-with-config_jbd_debug.patch > dont-truncate-proc-pid-environ-at-4096-characters.patch > fix-wrong-filename-reference-in-drivers-testingtxt.patch > anon-inodes-use-open-coded-atomic_inc-for-the-shared-inode.patch > ncr53c8xx-remove-deprecated-irq-flags-sa_.patch > completely-remove-deprecated-irq-flags-sa_.patch > compile-handle_percpu_irq-even-for-uniprocessor-kernels.patch > fs-correct-sus-compliance-for-open-of-large-file-without.patch > ext3-remove-ifdef-config_ext3_index.patch > rename-signalfd_siginfo-fields.patch > break-elf_platform-and-stack-pointer-randomization-dependency.patch > spin_lock_unlocked-cleanups.patch > binfmt_flat-minimum-support-for-the-blackfin-relocations.patch > binfmt_flat-minimum-support-for-the-blackfin-relocations-checkpatch-fixes.patch > > The infamous misc. Will re-review and will merge basically all of them. can you please add http://lkml.org/lkml/2007/7/30/98 also to the misc-queue for the warning still persists and the patch is good to go as is (against current git v2.6.23-2840-g752097c, for example) albeit with a little fuzziness. -- Regards/Gruß, Boris. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: -mm merge plans for 2.6.24 2007-10-13 8:44 ` Borislav Petkov @ 2007-10-13 8:52 ` Andrew Morton 2007-10-13 11:45 ` Borislav Petkov 0 siblings, 1 reply; 112+ messages in thread From: Andrew Morton @ 2007-10-13 8:52 UTC (permalink / raw) To: bbpetkov; +Cc: linux-kernel, Eugene Teo On Sat, 13 Oct 2007 10:44:55 +0200 Borislav Petkov <bbpetkov@yahoo.de> wrote: > can you please add http://lkml.org/lkml/2007/7/30/98 also to the misc-queue for > the warning still persists and the patch is good to go as is (against current git > v2.6.23-2840-g752097c, for example) albeit with a little fuzziness. I got completely fed up with maintaining that patch against ongoing churn frenzy in Greg's trees so I dropped it. If/when things settle down in that area someone will need to redo the patch. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: -mm merge plans for 2.6.24 2007-10-13 8:52 ` Andrew Morton @ 2007-10-13 11:45 ` Borislav Petkov 0 siblings, 0 replies; 112+ messages in thread From: Borislav Petkov @ 2007-10-13 11:45 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Eugene Teo On Sat, Oct 13, 2007 at 01:52:52AM -0700, Andrew Morton wrote: > On Sat, 13 Oct 2007 10:44:55 +0200 Borislav Petkov <bbpetkov@yahoo.de> wrote: > > > can you please add http://lkml.org/lkml/2007/7/30/98 also to the misc-queue for > > the warning still persists and the patch is good to go as is (against current git > > v2.6.23-2840-g752097c, for example) albeit with a little fuzziness. > > I got completely fed up with maintaining that patch against ongoing churn > frenzy in Greg's trees so I dropped it. > > If/when things settle down in that area someone will need to redo the > patch. /me wondering: what if you pass it on upstream to Linus, since: 1. it applies cleanly now 2. is pretty trivial and forget it about it forever :). It seems the place this patch touches is the only place where kobject_* and sysfs_create_* etc. error codes are not being handled in contrast to all those functions which have been declared __must_check? -- Regards/Gruß, Boris. ^ permalink raw reply [flat|nested] 112+ messages in thread
end of thread, other threads:[~2007-12-14 15:54 UTC | newest]
Thread overview: 112+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-01 21:22 -mm merge plans for 2.6.24 Andrew Morton
2007-10-01 21:34 ` wibbling over the cpuset shed domain connnection Paul Jackson
2007-10-02 12:36 ` Nick Piggin
2007-10-03 5:21 ` Paul Jackson
2007-10-02 13:12 ` Nick Piggin
2007-10-03 7:00 ` Paul Jackson
2007-10-03 10:57 ` Andrew Morton
2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh
2007-10-02 15:46 ` Hugh Dickins
2007-10-03 8:13 ` Balbir Singh
2007-10-03 18:47 ` Hugh Dickins
2007-10-04 4:16 ` Balbir Singh
2007-10-04 13:16 ` Hugh Dickins
2007-10-05 3:07 ` Balbir Singh
2007-10-07 17:41 ` Hugh Dickins
2007-10-08 2:54 ` Balbir Singh
2007-10-04 16:10 ` Paul Menage
2007-10-10 21:07 ` Rik van Riel
2007-10-11 6:33 ` Balbir Singh
2007-10-02 6:18 ` x86 patches was Re: -mm merge plans for 2.6.24 Andi Kleen
2007-10-02 6:32 ` Andrew Morton
2007-10-02 7:01 ` Andi Kleen
2007-10-02 7:18 ` Andrew Morton
2007-10-02 7:36 ` KAMEZAWA Hiroyuki
2007-10-02 7:43 ` Andrew Morton
2007-10-02 8:16 ` KAMEZAWA Hiroyuki
2007-10-02 10:48 ` Yasunori Goto
2007-10-02 18:18 ` Christoph Lameter
2007-10-02 17:25 ` Lee Schermerhorn
2007-10-02 16:40 ` Nish Aravamudan
2007-10-02 17:17 ` Lee Schermerhorn
2007-10-02 18:16 ` Christoph Lameter
2007-10-02 7:55 ` Matt Mackall
2007-10-02 7:59 ` Andi Kleen
2007-10-02 9:26 ` Andy Whitcroft
2007-10-02 7:37 ` Ingo Molnar
2007-10-02 7:46 ` Andi Kleen
2007-10-02 7:58 ` Thomas Gleixner
2007-10-02 7:59 ` v4l-stk11xx* [Was: -mm merge plans for 2.6.24] Jiri Slaby
[not found] ` <4701FC79.3060608@gmail.com>
2007-10-02 8:10 ` Wireless damage " Jiri Slaby
2007-10-02 8:17 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Peter Zijlstra
[not found] ` <20071002082831.GA19954@mail.ustc.edu.cn>
2007-10-02 8:28 ` Fengguang Wu
2007-10-02 8:31 ` Andrew Morton
2007-10-02 8:48 ` Peter Zijlstra
2007-10-02 10:31 ` Kay Sievers
2007-10-02 10:44 ` Peter Zijlstra
[not found] ` <20071002104734.GA9410@mail.ustc.edu.cn>
2007-10-02 10:47 ` Fengguang Wu
2007-10-02 11:22 ` Kay Sievers
[not found] ` <20071002112802.GA12607@mail.ustc.edu.cn>
2007-10-02 11:28 ` Fengguang Wu
2007-10-02 11:21 ` Kay Sievers
2007-10-02 11:40 ` Peter Zijlstra
2007-10-02 12:05 ` Nick Piggin
2007-10-03 10:15 ` Kay Sievers
2007-10-03 10:37 ` Peter Zijlstra
2007-10-03 13:35 ` Kay Sievers
2007-10-03 13:58 ` Peter Zijlstra
2007-10-26 14:48 ` Peter Zijlstra
2007-10-26 15:06 ` Miklos Szeredi
2007-10-26 15:10 ` Kay Sievers
2007-10-26 15:22 ` Peter Zijlstra
2007-10-26 15:33 ` Kay Sievers
2007-10-26 15:33 ` Peter Zijlstra
2007-10-26 15:55 ` Kay Sievers
2007-10-26 20:04 ` Peter Zijlstra
2007-10-27 1:18 ` Peter Zijlstra
2007-10-27 2:40 ` Greg KH
2007-10-27 8:39 ` Peter Zijlstra
2007-10-27 16:02 ` Greg KH
2007-10-27 16:07 ` Peter Zijlstra
2007-10-27 21:08 ` Kay Sievers
2007-10-27 21:35 ` Peter Zijlstra
2007-10-28 7:10 ` Greg KH
2007-11-02 13:15 ` Peter Zijlstra
2007-11-02 13:50 ` Kay Sievers
2007-11-02 13:54 ` Peter Zijlstra
2007-11-02 14:17 ` Peter Zijlstra
2007-11-02 14:32 ` Kay Sievers
2007-11-02 14:59 ` [PATCH] mm: sysfs: expose the BDI object in sysfs Peter Zijlstra
2007-11-02 15:13 ` Kay Sievers
2007-10-26 16:37 ` per BDI dirty limit (was Re: -mm merge plans for 2.6.24) Trond Myklebust
2007-12-14 14:50 ` Peter Zijlstra
2007-12-14 15:14 ` Miklos Szeredi
2007-12-14 15:54 ` Peter Zijlstra
2007-10-02 14:38 ` Kay Sievers
2007-10-03 11:00 ` Martin Knoblauch
[not found] ` <20071002083922.GA28892@mail.ustc.edu.cn>
2007-10-02 8:39 ` writeback fixes Fengguang Wu
2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins
2007-10-02 9:10 ` Nick Piggin
2007-10-02 18:38 ` Mel Gorman
2007-10-02 18:28 ` Christoph Lameter
2007-10-03 0:37 ` Christoph Lameter
2007-10-02 16:12 ` -mm merge plans for 2.6.24 Pekka Enberg
2007-10-02 16:21 ` new aops merge [was Re: -mm merge plans for 2.6.24] Hugh Dickins
2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin
2007-10-03 10:58 ` Andrew Morton
2007-10-03 15:21 ` Linus Torvalds
2007-10-08 15:17 ` Nick Piggin
2007-10-09 13:00 ` Hugh Dickins
2007-10-09 14:52 ` Linus Torvalds
2007-10-09 9:31 ` Nick Piggin
2007-10-10 2:22 ` Linus Torvalds
2007-10-09 10:15 ` Nick Piggin
2007-10-10 3:06 ` Linus Torvalds
2007-10-10 4:06 ` Hugh Dickins
2007-10-10 5:20 ` Linus Torvalds
2007-10-09 14:30 ` Nick Piggin
2007-10-10 15:04 ` Linus Torvalds
2007-10-03 19:50 ` A kernel Tracing interface " David Wilder
2007-10-09 9:19 ` r/o bind mounts, was Re: -mm merge plans for 2.6.24 Christoph Hellwig
2007-10-13 8:44 ` Borislav Petkov
2007-10-13 8:52 ` Andrew Morton
2007-10-13 11:45 ` Borislav Petkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox