* 2.6.18 -mm merge plans
@ 2006-06-04 20:50 Andrew Morton
2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer
` (20 more replies)
0 siblings, 21 replies; 166+ messages in thread
From: Andrew Morton @ 2006-06-04 20:50 UTC (permalink / raw)
To: linux-kernel
It's time to take a look at the -mm queue for 2.6.18.
There is an unusually large amount of difficult material here. If you were
bcc'ed, please take the time to think about what we should do.
I have an Asia trip June 10-17 which will probably be during the 2.6.18
merge window. It'll take some time to get all this material sorted out in
a decent fashion so I might end up having to ask Linus to delay -rc1 by a
week or so. We'll see.
When replying to this email pleeeeeeze rewrite the Subject: to something
appropriate so we do not all go mad. Thanks.
The list:
git-hdrcleanup.patch
git-hdrinstall.patch
This is Dave Woodhouse's work cleaning up the kernel headers and adding a
`make headerinstall' target which automates the exporting of kernel
headers as a userspace-usable package.
All I can say about this is that it doesn't appear to break anything and
is ready to merge from that point of view. It's not an area in which I
have much interest or knowledge.
That being said, it's relatively costly to carry such extensive patches
in -mm for long periods, so I'd ask Linus and the distro people to work
out what we want to do here promptly, please.
git-klibc.patch
Similar. This all appears to work sufficiently well for a 2.6.18 merge.
But it's been so long since klibc was a hot topic that I've forgotten who
wanted it, and what for.
Can whoever has an interest in this work please pipe up and let's get our
direction sorted out quickly.
fix-hpet-operation-on-32-bit-nvidia-platforms.patch
fix-hpet-operation-on-32-bit-nvidia-platforms-build-fix.patch
fix-hpet-operation-on-64-bit-nvidia-platforms.patch
These are bugfixes and are a marginal call for 2.6.17. But they're
playing in fragile areas, they're quite new and I fixed a bug in here just
a couple of hours ago. So I'll hold these off until 2.6.18-rc1 and will
tag them for a 2.6.17.x backport.
acpi-update-asus_acpi-driver-registration-fix.patch
acpi-memory-hotplug-cannot-manage-_crs-with-plural-resoureces.patch
catch-notification-of-memory-add-event-of-acpi-via-container-driver-register-start-func-for-memory-device.patch
catch-notification-of-memory-add-event-of-acpi-via-container-driveravoid-redundant-call-add_memory.patch
kevent-add-new-uevent.patch
acpi-dock-driver.patch
acpiphp-use-new-dock-driver.patch
acpiphp-prevent-duplicate-slot-numbers-when-no-_sun.patch
asus_acpi-w3000-support.patch
acpi-atlas-acpi-driver.patch
acpi-atlas-acpi-driver-fix.patch
remove-acpi_os_create_lock-acpi_os_delete_lock.patch
asus_acpi-invert-read-of-wled-proc-file-to-show-correct.patch
2.6-sony_acpi4.patch
acpi-remove-__init-__exit-from-sony-add-remove-methods.patch
sony_apci-resume.patch
git-agpgart.patch
uninorth-agp-warning-fixes.patch
alpha-agp-warning-fix.patch
git-alsa.patch
fix-drivers-mfd-ucb1x00-corec-irq-probing-bug.patch
kauditd_thread-warning-fix.patch
blk_start_queue-must-be-called-with-irq-disabled-add-warning.patch
blktrace_apih-endian-annotations.patch
powernow-k8-crash-workaround.patch
dprintk-adjustments-to-cpufreq-nforce2.patch
dprintk-adjustments-to-cpufreq-speedstep-centrino.patch
cpufreq-dprintk-adjustments.patch
create-sys-hypervisor-when-needed.patch
trivial-videodev2h-patch.patch
scx200_acb-use-pci-i-o-resource-when-appropriate.patch
i2c-pca954x-i2c-mux-driver.patch
i2c-mpc-fix-up-error-handling.patch
opencores-i2c-bus-driver.patch
i2c-pca954x-fix-initial-access-to-first-mux-switch-port.patch
ieee1394-video1394-be-quiet.patch
ieee1394-ohci1394c-function-calls-without.patch
ieee1394-sbp2-make-tsb42aa9-workaround-specific.patch
ieee1394-semaphore-to-mutex-conversion.patch
ieee1394-raw1394-fix-whitespace-after-x86_64.patch
ieee1394-ieee1394-ohci1394-cycletoolong.patch
ieee1394-ieee1394-support-for-slow-links-or-slow.patch
ieee1394-ieee1394-save-ram-by-using-a-single.patch
ieee1394-sbp2-remove-manipulation-of-inquiry.patch
ieee1394-sbp2-log-number-of-supported-concurrent.patch
ieee1394-ieee1394-extend-lowlevel-api-for.patch
ieee1394-ohci1394-set-address-range-properties.patch
ieee1394-ohci1394-make-phys_dma-parameter.patch
ieee1394-sbp2-sbp2-remove-ohci1394-specific.patch
ieee1394-sbp2-fix-s800-transfers-if-phys_dma-is.patch
ieee1394-update-feature-removal-of-obsolete.patch
ieee1394-sbp2-provide-helptext-for.patch
ieee1394-sbp2-kconfig-fix.patch
ieee1394-sbp2-use-__attribute__packed-for.patch
ieee1394-speed-up-of-dma_region_sync_for_cpu.patch
ieee1394-sbp2-fix-deregistration-of-status-fifo-address-space.patch
ieee1394-add-preprocessor-constant-for-invalid-csr.patch
fix-broken-suspend-resume-in-ohci1394-was-acpi-suspend.patch
ieee1394_core-switch-to-kthread-api.patch
eth1394-endian-fixes.patch
input-keyboard_tasklet-dont-touch-leds-of-already-grabed-device.patch
remove-silly-messages-from-input-layer.patch
via-pmu-add-input-device.patch
input-powermac-cleanup-of-mac_hid-and-support-for-ctrlclick-and-commandclick.patch
mm-constify-drivers-char-keyboardc.patch
input-move-fixp-arithh-to-drivers-input.patch
input-fix-accuracy-of-fixp-arithh.patch
input-new-force-feedback-interface.patch
input-adapt-hid-force-feedback-drivers-for-the-new-interface.patch
input-adapt-uinput-for-the-new-force-feedback-interface.patch
input-adapt-iforce-driver-for-the-new-force-feedback-interface.patch
input-force-feedback-driver-for-pid-devices.patch
input-force-feedback-driver-for-zeroplus-devices.patch
input-update-documentation-of-force-feedback.patch
input-drop-the-remains-of-the-old-ff-interface.patch
input-drop-the-old-pid-driver.patch
input-use-enospc-instead-of-enomem-in-iforce-when-device-full.patch
add-dependency-on-kernelrelease-to-the-package-targets.patch
kconfig-improve-config-load-save-output.patch
kconfig-fix-config-dependencies.patch
kconfig-remove-symbol_yesmodno.patch
kconfig-allow-multiple-default-values-per-symbol.patch
kconfig-allow-loading-multiple-configurations.patch
kconfig-integrate-split-config-into-silentoldconfig.patch
kconfig-integrate-split-config-into-silentoldconfig-fix.patch
kconfig-move-kernelrelease.patch
kconfig-add-symbol-option-config-syntax.patch
kconfig-add-defconfig_list-module-option.patch
kconfig-add-search-option-for-xconfig.patch
kconfig-finer-customization-via-popup-menus.patch
kconfig-create-links-in-info-window.patch
kconfig-jump-to-linked-menu-prompt.patch
kconfig-warn-about-leading-whitespace-for-menu-prompts.patch
kconfig-remove-leading-whitespace-in-menu-prompts.patch
config-exit-if-no-beginning-filename.patch
make-kernelrelease-speedup.patch
kconfig-kconfig_overwriteconfig.patch
sane-menuconfig-colours.patch
kbuild-export-type-enhancement-to-modpostc.patch
kbuild-export-type-enhancement-to-modpostc-fix.patch
kbuild-prevent-building-modules-that-wont-load.patch
kbuild-export-symbol-usage-report-generator.patch
kbuild-obj-dirs-is-calculated-incorrectly-if-hostprogs-y-is-defined.patch
fix-make-rpm-for-powerpc.patch
revert-sata_sil24-sii3124-sata-driver-endian-problem.patch
libata-add-missing-data_xfer-for-pata_pdc2027x-and-pdc_adma.patch
libata-add-missing-data_xfer-for-pata_pdc2027x-and-pdc_adma-fix.patch
libata-reduce-timeouts.patch
libata-debug.patch
2.6.17-rc4-mm1-ich8-fix.patch
for_each_possible_cpu-mips.patch
sdhci-truncated-pointer-fix.patch
prevent-au1xmmcc-breakage-on-non-au1200-alchemy.patch
myri10ge-alpha-build-fix.patch
smc911x-Kconfig-fix.patch
tulip-natsemi-dp83840a-phy-fix.patch
natsemi-add-support-for-using-mii-port-with-no-phy.patch
pci-error-recovery-e1000-network-device-driver.patch
pci-error-recovery-e100-network-device-driver.patch
e1000-prevent-statistics-from-getting-garbled-during-reset.patch
e100-disable-interrupts-at-boot.patch
drivers-char-hw_randomc-remove-asserts.patch
forcedeth-config-ring-sizes.patch
forcedeth-config-flow-control.patch
forcedeth-config-phy.patch
forcedeth-config-wol.patch
forcedeth-config-csum.patch
forcedeth-config-statistics.patch
forcedeth-config-diagnostics.patch
forcedeth-config-module-parameters.patch
forcedeth-config-version.patch
forcedeth-new-device-ids.patch
forcedeth-typecast-cleanup.patch
add-a-pci-vendor-id-definition-for-aculab.patch
natsemi-add-quirks-for-aculab-e1-t1-pmxc-cpci-carrier-cards.patch
tulip-fix-for-64-bit-mips.patch
drivers-net-ns83820c-add-paramter-to-disable-auto.patch
fix-phy-id-for-lxt971a-lxt972a.patch
clean-up-initcall-warning-for-netconsole.patch
remove-dead-entry-in-net-wan-kconfig.patch
eliminate-unused-proc-sys-net-ethernet.patch
ppp_async-hang-fix.patch
selinux-add-security-class-for-appletalk-sockets.patch
neighbourc-pneigh_get_next-skips-published-entry.patch
secmark-add-new-flask-definitions-to-selinux.patch
secmark-add-selinux-exports.patch
secmark-add-secmark-support-to-core-networking.patch
secmark-add-xtables-secmark-target.patch
secmark-add-secmark-support-to-conntrack.patch
secmark-add-connsecmark-xtables-target.patch
secmark-add-new-packet-controls-to-selinux.patch
irda-missing-allocation-result-check-in-irlap_change_speed.patch
pppoe-missing-result-check-in-__pppoe_xmit.patch
lock-validator-netlinkc-netlink_table_grab-fix.patch
recent-match-fix-sleeping-function-called-from-invalid-context.patch
recent-match-missing-refcnt-initialization.patch
client-side-nfsacl-caching-fix.patch
nfs-really-return-status-from-decode_recall_args.patch
powerpc-kbuild-warning-fix.patch
serial-fix-uart_bug_txen-test.patch
revert-gregkh-pci-pci-test-that-drivers-properly-call-pci_set_master.patch
gregkh-pci-kconfigurable-resources-arch-dependent-changes-arm-fix.patch
gregkh-pci-pci-64-bit-resources-core-changes-mips-fix.patch
fix-pciehp-driver-on-non-acpi-systems.patch
gregkh-pci-acpiphp-configure-_prt-v3-cleanup.patch
kconfigurable-resources-mtd-fixes.patch
drivers-scsi-fix-proc_scsi_write-to-return-length-on.patch
drivers-scsi-sdc-fix-uninitialized-variable-in-handling-medium-errors.patch
drivers-scsi-aic7xxx-possible-cleanups.patch
drivers-scsi-small-cleanups.patch
drivers-scsi-megaraidc-add-a-dummy-mega_create_proc_entry-for-proc_fs=y.patch
drivers-scsi-qla2xxx-make-some-functions-static.patch
drivers-scsi-aic7xxx-aic79xx_corec-make-ahd_done_with_status-static.patch
small-whitespace-cleanup-for-qlogic-driver.patch
remove-drivers-scsi-constantscscsi_print_req_sense.patch
drivers-scsi-aic7xxx-aic79xx_corec-make-ahd_match_scb-static.patch
aic7xxx-deinline-large-functions-save-80k-of-text.patch
aic7xxx-s-__inline-inline.patch
drivers-scsi-aic7xxx-possible-cleanups-2.patch
scsi-remove-documentation-scsi-cpqfctxt.patch
mpt-fusion-driver-initialization-failure-fix.patch
drivers-scsi-use-array_size-macro.patch
lpfc-sparse-null-warnings.patch
mpt_interrupt-should-return-irq_none-when.patch
aic7-cleanup-module_parm_desc-strings.patch
random-remove-redundant-sa_sample_random-from-ninjascsi.patch
megaraid-gcc-41-warning-fix.patch
buslogic-gcc-41-warning-fixes.patch
add-scsi_add_host-failure-handling-for-nsp32.patch
qla1280-fix-section-mismatch-warnings.patch
bogus-disk-geometry-on-large-disks.patch
megaraid_sas-switch-fw_outstanding-to-an-atomic_t.patch
megaraid_sas-add-support-for-zcr-controller.patch
megaraid_sas-add-support-for-zcr-controller-fix.patch
gdth-add-execute-firmware-command-abstraction.patch
drivers-scsi-gdthc-make-__gdth_execute-static.patch
areca-raid-linux-scsi-driver.patch
scsi-clean-up-warnings-in-advansys-driver.patch
git-scsi-target-warning-fix.patch
touchkit-ps-2-touchscreen-driver.patch
fix-sco-on-some-bluetooth-adapters-2.patch
fall-back-to-old-style-call-trace-if-no-unwinding.patch
allow-unwinder-to-build-without-module-support.patch
x86_64-mm-moving-phys_proc_id-and-cpu_core_id-to-cpuinfo_x86-warning-fix.patch
add-abilty-to-enable-disable-nmi-watchdog-from-procfs.patch
x86_64-unexport-ia32_sys_call_table.patch
x86_64-msi-apic-build-fix.patch
x86_64-dont-warn-for-overflow-in-nommu-case-when-dma_mask-is-32bit-fix.patch
lock-validator-lockdep-small-xfs-init_rwsem-cleanup.patch
That's over 200 patches which need to be handled by subsystem
maintainers. I continue to have some difficulty getting this material
processed.
I'll try to make Thursdays be my unload-stuff-on-maintainers day.
Hopefully the boredom of seeing the same patches over and over will
motivate some merging, nacking and fixing.
I'm going to start sending the Areca driver to James, too. The vendor
has worked hard and the hardware is becoming more important - let's help
them get it in.
I'll henceforth include the highpoint rocketraid controller driver
(hptiop-highpoint-rocketraid-3xxx-controller-driver.patch) as well.
s390_hypfs-filesystem.patch
Will merge
mm-vm_bug_on.patch
mm-thrash-detect-process-thrashing-against-itself.patch
zone-init-check-and-report-unaligned-zone-boundaries.patch
x86-align-highmem-zone-boundaries-with-numa.patch
zone-allow-unaligned-zone-boundaries.patch
zone-allow-unaligned-zone-boundaries-x86-add-zone-alignment-qualifier.patch
page-migration-make-do_swap_page-redo-the-fault.patch
slab-extract-cache_free_alien-from-__cache_free.patch
pg_uncached-is-ia64-only.patch
slab-page-mapping-cleanup.patch
migration-remove-unnecessary-pageswapcache-checks.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-change-name-of-wait_table_size.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-change-to-meminit-for-build_zonelist.patch
wait_table-and-zonelist-initializing-for-memory-hotaddadd-return-code-for-init_current_empty_zone.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-wait_table-initialization.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-update-zonelists.patch
squash-duplicate-page_to_pfn-and-pfn_to_page.patch
support-for-panic-at-oom.patch
mm-fix-typos-in-comments-in-mm-oom_killc.patch
reserve-space-for-swap-label.patch
tightening-hugetlb-strict-accounting.patch
slab-cleanup-kmem_getpages.patch
slab-stop-using-list_for_each.patch
swsusp-rework-memory-shrinker-rev-2.patch
unify-pxm_to_node-and-node_to_pxm.patch
pgdat-allocation-for-new-node-add-specify-node-id.patch
pgdat-allocation-for-new-node-add-get-node-id-by-acpi.patch
pgdat-allocation-for-new-node-add-generic-alloc-node_data.patch
pgdat-allocation-for-new-node-add-refresh-node_data.patch
pgdat-allocation-for-new-node-add-export-kswapd-start-func.patch
pgdat-allocation-for-new-node-add-call-pgdat-allocation.patch
register-hot-added-memory-to-iomem-resource.patch
catch-valid-mem-range-at-onlining-memory.patch
fix-compile-error-undefined-reference-for-sparc64.patch
register-sysfs-file-for-hotpluged-new-node.patch
pgdat-allocation-and-update-for-ia64-of-memory-hotplughold-pgdat-address-at-system-running.patch
pgdat-allocation-and-update-for-ia64-of-memory-hotplug-update-pgdat-address-array.patch
pgdat-allocation-and-update-for-ia64-of-memory-hotplugallocate-pgdat-and-per-node-data.patch
mm-introduce-remap_vmalloc_range.patch
change-gen_pool-allocator-to-not-touch-managed-memory.patch
radix-tree-direct-data.patch
radix-tree-small.patch
likely-cleanup-remove-unlikely-in-sys_mprotect.patch
slab-redzone-double-free-detection.patch
buglet-in-radix_tree_tag_set.patch
writeback-fix-range-handling.patch
page-migration-cleanup-rename-ignrefs-to-migration.patch
page-migration-cleanup-group-functions.patch
page-migration-cleanup-remove-useless-definitions.patch
page-migration-cleanup-drop-nr_refs-in-remove_references.patch
page-migration-cleanup-extract-try_to_unmap-from-migration-functions.patch
page-migration-cleanup-pass-mapping-to-migration-functions.patch
page-migration-cleanup-move-fallback-handling-into-special-function.patch
swapless-pm-add-r-w-migration-entries.patch
swapless-pm-add-r-w-migration-entries-fix-2.patch
swapless-page-migration-rip-out-swap-based-logic.patch
swapless-page-migration-modify-core-logic.patch
more-page-migration-do-not-inc-dec-rss-counters.patch
more-page-migration-use-migration-entries-for-file-pages.patch
page-migration-update-documentation.patch
aop_truncated_page-victims-in-read_pages-belong-in-the-lru.patch
flatmem-relax-requirement-for-memory-to-start-at-pfn-0.patch
slab-verify-pointers-before-free.patch
sparsemem-record-nid-during-memory-present.patch
mm-cleanup-swap-unused-warning.patch
node-hotplug-register-cpu-remove-node-struct.patch
node-hotplug-register-cpu-remove-node-struct-alpha-fix.patch
add-page_mkwrite-vm_operations-method.patch
mm-remove-vm_locked-before-remap_pfn_range-and-drop-vm_shm.patch
swapoff-atomic_inc_not_zero-on-mm_users.patch
remove-unused-o_flags-from-do_shmat.patch
fix-update_mmu_cache-in-fremapc.patch
fix-update_mmu_cache-in-fremapc-fix.patch
mm-slabc-fix-early-init-assumption.patch
Memory management. Will merge.
page-migration-simplify-migrate_pages.patch
page-migration-simplify-migrate_pages-tweaks.patch
page-migration-handle-freeing-of-pages-in-migrate_pages.patch
page-migration-use-allocator-function-for-migrate_pages.patch
page-migration-support-moving-of-individual-pages.patch
page-migration-detailed-status-for-moving-of-individual-pages.patch
page-migration-support-moving-of-individual-pages-fixes.patch
page-migration-support-moving-of-individual-pages-x86_64-support.patch
page-migration-support-moving-of-individual-pages-x86-support.patch
page-migration-support-moving-of-individual-pages-x86-support-fix.patch
page-migration-support-a-vma-migration-function.patch
allow-migration-of-mlocked-pages.patch
Post-2.6.18.
acx1xx-wireless-driver.patch
fix-tiacx-on-alpha.patch
tiacx-fix-attribute-packed-warnings.patch
tiacx-pci-build-fix.patch
tiacx-ia64-fix.patch
It is about time we did something with this large and presumably useful
wireless driver.
lsm-add-task_setioprio-hook.patch
selinux-add-hooks-for-key-subsystem.patch
au1550-1200-add-missing-psc-defines-make-oss-driver-use.patch
Will merge.
x86-cache-pollution-aware-__copy_from_user_ll.patch
x86-cpu_init-avoid-gfp_kernel-allocation-while-atomic.patch
arch-i386-kernel-apicc-make-modern_apic-static.patch
i386-apmc-optimization.patch
x86-dont-trigger-full-rebuild-via-config_mtrr.patch
fix-x86-microcode-driver-handling-of-multiple-matching.patch
i386-break-out-of-recursion-in-stackframe-walk.patch
dont-trigger-full-rebuild-via-config_x86_mce.patch
x86-increase-interrupt-vector-range.patch
x86-call-eisa_set_level_irq-in-pcibios_lookup_irq.patch
x86-kernel-irq-balancer-fix.patch
x86-kernel-irq-balancer-fix-tidy.patch
i386-let-usermode-execute-the-enter.patch
fix-broken-vm86-interrupt-signal-handling.patch
x86-re-enable-generic-numa.patch
x86-make-using_apic_timer-__read_mostly.patch
x86-cyrix-code-config_pci-fix--add-__initdata.patch
x86-constify-some-parts-of-arch-i386-kernel-cpu.patch
x86-make-i387-mxcsr_feature_mask-__read_mostly.patch
x86-make-acpi-errata-__read_mostly.patch
x86-constify-arch-i386-pci-irqc.patch
x86-use-proper-defines-for-i8259a-i-o.patch
i386-moving-phys_proc_id-and-cpu_core_id-to-cpuinfo_x86.patch
i386-moving-phys_proc_id-and-cpu_core_id-to-cpuinfo_x86-warning-fix.patch
i386-fix-get_segment_eip-with-vm86.patch
i386-dont-try-kprobes-for-v8086-mode.patch
x86 queue. Will mostly merge. I have a note here that Zach Amsden had
issues with x86-cpu_init-avoid-gfp_kernel-allocation-while-atomic.patch?
x86-cache-pollution-aware-__copy_from_user_ll.patch has been in -mm for a
very long time - it's never been clear that it's a net gain. Will
merge-and-see-what-happens I guess.
support-physical-cpu-hotplug-for-x86_64.patch
I think this got nacked. Will resend, see what happens.
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-tidy.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-arch_vma_name-fix.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-vs-x86_64-mm-reliable-stack-trace-support-i386.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-vs-x86_64-mm-reliable-stack-trace-support-i386-2.patch
Will merge.
powerpc-vdso-updates.patch
Will send to Paul.
remove-duplicate-symbol-exports-on-alpha.patch
alpha-generic-hweight-build-fix.patch
Will merge.
remove-empty-node-at-boot-time.patch
Will send to Tony when the prerequisites are merged.
swsusp-add-architecture-special-saveable-pages-support.patch
swsusp-i386-mark-special-saveable-unsaveable-pages.patch
swsusp-x86_64-mark-special-saveable-unsaveable-pages.patch
swsusp-take-lowmem-reserves-into-account.patch
kernel-power-snapshotc-cleanups.patch
swsusp-use-less-memory-during-resume.patch
dont-use-flush_tlb_all-in-suspend-time.patch
swsusp-documentation-updates.patch
Will merge.
m68k-completely-initialize-hw_regs_t-in-ide_setup_ports.patch
m68k-atyfb_base-compile-fix-for-config_pci=n.patch
m68k-cleanup-unistdh.patch
m68k-remove-some-unused-definitions-in-zorroh.patch
m68k-use-c99-initializer.patch
m68k-print-correct-stack-trace.patch
m68k-restore-amikbd-compatibility-with-24.patch
m68k-extra-delay.patch
m68k-use-proper-defines-for-zone-initialization.patch
m68k-adjust-to-changed-hardirq_mask.patch
m68k-m68k-mac-via2-fixes-and-cleanups.patch
Will merge.
uml-make-copy__user-atomic.patch
uml-fix-not_dead_yet-when-directory-is-in-bad-state.patch
uml-rename-and-improve-actually_do_remove.patch
These are marked "mm only". I'm not sure if that's permanent?
xtensa-remove-verify_area-macros.patch
xtensa-remove-verify_area-macros-fix.patch
Will merge.
remove-fs-jffs2-ioctlc.patch
Will re-re-re-spam maintainer.
work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
kernel-kernel-cpuc-to-mutexes.patch
ug. We cannot convert the cpu.c semaphore into a mutex until we work out
why power4 goes titsup if you enable local interrupts during boot.
fix-a-race-condition-between-i_mapping-and-iput.patch
insert-identical-resources-above-existing-resources.patch
make-sure-nobodys-leaking-resources.patch
remove-steal_locks.patch
avoid-tasklist_lock-at-getrusage-for-multithreaded-case-too.patch
add-prctl-to-change-endian-of-a-task.patch
#writeback-fix-range-handling.patch
fix-dcache-race-during-umount.patch
prune_one_dentry-tweaks.patch
vgacon-make-vga_map_mem-take-size-remove-extra-use.patch
zlib_inflate-upgrade-library-code-to-a-recent-version.patch
zlib_inflate-upgrade-library-code-to-a-recent-version-fix.patch
initramfs-cpio-unpacking-fix.patch
fix-cdrom-being-confused-on-using-kdump.patch
read_mapping_page-for-address-space.patch
locks-dont-unnecessarily-fail-posix-lock-operations.patch
locks-dont-do-unnecessary-allocations.patch
locks-clean-up-locks_remove_posix.patch
vfs-add-lock-owner-argument-to-flush-operation.patch
fs-locksc-make-posix_locks_deadlock-static.patch
moduleh-updated-comments-with-a-new.patch
remove-config_parport_arc-drivers-parport-parport_arcc.patch
add-poisonh-and-patch-primary-users.patch
update-2-drivers-for-poisonh.patch
mmput-might-sleep.patch
fs-fat-miscc-unexport-fat_sync_bhs.patch
poll-cleanups-microoptimizations.patch
ptrace-document-the-locking-rules.patch
cleanup-default-value-of-sched_smt.patch
cleanup-default-value-of-syscall_debug.patch
cleanup-default-value-of-usb_isp116x_hcd-usb_sl811_hcd-and-usb_sl811_cs.patch
cleanup-default-value-of-ip_dccp_ackvec.patch
cleanup-default-value-of-dvb_cinergyt2_enable_rc_input_device.patch
dup-fd-error.patch
rtc-framework-driver-for-ds1307-and-similar-rtc-chips.patch
cond-resched-might-sleep-fix.patch
enhancing-accessibility-of-lxdialog.patch
the-scheduled-unexport-of-insert_resource.patch
jbd-fix-bug-in-journal_commit_transaction.patch
jbd-fix-bug-in-journal_commit_transaction-fix.patch
rename-swapper-to-idle.patch
oss-cs46xx-cleanup-and-tiny-bugfix.patch
i4l-memory-leak-fix-for-sc_ioctl.patch
isdn-unsafe-interaction-between-isdn_write-and-isdn_writebuf_stub.patch
isdn-unsafe-interaction-between-isdn_write-and-isdn_writebuf_stub-fix.patch
invert-irq-migrationc-brach-prediction.patch
x86-powerpc-make-hardirq_ctx-and-softirq_ctx-__read_mostly.patch
jbd-avoid-kfree-null.patch
ext3_clear_inode-avoid-kfree-null.patch
make-noirqdebug-irqfixup-__read_mostly-add-unlikely.patch
leds-amstrad-delta-led-support.patch
leds-amstrad-delta-led-support-tidy.patch
update-devicestxt.patch
binfmt_elf-codingstyle-cleanup-and-remove-some-pointless-casts.patch
binfnt_elf-remove-more-casts.patch
fix-incorrect-sa_onstack-behaviour-for-64-bit-processes.patch
percpu-counters-add-percpu_counter_exceeds.patch
percpu-counter-data-type-changes-to-suppport.patch
remove-unlikely-in-might_sleep_if.patch
process-events-header-cleanup.patch
process-events-license-change.patch
strstrip-api.patch
ipmi-strstrip-conversion.patch
connector-exports.patch
config_net=n-build-fix.patch
remove-softlockup-from-invalidate_mapping_pages.patch
add-doc-submitchecklist.patch
kernel-sysc-doesnt-need-inith.patch
make-rcu-api-inaccessible-to-non-gpl-linux-kernel-modules.patch
doc-add-audit-acct-to-docbook.patch
ip2-fix-sections.patch
sgi-ioc4-detect-io-card-variant.patch
two-additions-to-linux-documentation-ioctl-numbertxt.patch
list-introduce-list_replace-helper.patch
list-use-list_replace_init-instead-of-list_splice_init.patch
when-config_base_samll=1-the-kernel-261611-cascade-in-kernel-timerc-may-enter-the-infinite-loop.patch
when-config_base_samll=1-the-kernel-261611-cascade-in-kernel-timerc-may-enter-the-infinite-loop-use-list_replace_init.patch
codingstyle-add-typedefs-chapter.patch
fs-bufferc-possible-cleanups.patch
rtc-rtc-dev-uie-emulation.patch
drivers-md-raid6algosc-fix-a-null-dereference.patch
adjust-handle_irr_event-return-type.patch
sparse-fixes-for-synclink_cs.patch
jbd-split-checkpoint-lists.patch
add-__iowrite64_copy.patch
mark-address_space_operations-const.patch
more-bug_on-conversion.patch
make-kernel-ignore-bogus-partitions.patch
drivers-block-loopc-dont-return-garbage-if-loop_set_status-not-called.patch
docs-update-sparsetxt-with-check_endian.patch
drivers-acorn-char-pcf8583-vs-rtc-subsystem.patch
rewritten-backlight-infrastructure-for-portable-apple-computers.patch
rewritten-backlight-infrastructure-for-portable-apple-computers-fix.patch
ensure-null-deref-cant-possibly-happen-in-is_exported.patch
bluetooth-fix-potential-null-ptr-deref-in-dtl1_cscdtl1_hci_send_frame.patch
bloat-o-meter-gcc-4-fix.patch
random-remove-sa_sample_random-from-floppy-driver.patch
random-make-cciss-use-add_disk_randomness.patch
random-change-cpqarray-to-use-add_disk_randomness.patch
random-remove-bogus-sa_sample_random-from-at91-compact-flash-driver.patch
random-remove-redundant-sa_sample_random-from-touchscreen-drivers.patch
define-__raw_get_cpu_var-and-use-it.patch
allow-for-per-cpu-data-being-in-tdata-and-tbss-sections.patch
allow-for-per-cpu-data-being-in-tdata-and-tbss-sections-fix.patch
allow-for-per-cpu-data-being-in-tdata-and-tbss-sections-tidy.patch
deprecate-smbfs-in-favour-of-cifs.patch
allow-raw_notifier-callouts-to-unregister-themselves.patch
hptiop-highpoint-rocketraid-3xxx-controller-driver.patch
fix-kbuild-dependencies-for-synclink-drivers.patch
fs-freevxfs-cleanup-of-spelling-errors.patch
pnp-card_probe-fix-memory-leak.patch
ufs-ufs_trunc_indirect-infinite-cycle.patch
ufs-right-block-allocation.patch
ufs-change-block-number-on-the-fly.patch
ufs-directory-and-page-cache-install-aops.patch
ufs-directory-and-page-cache-from-blocks-to-pages.patch
ufs-wrong-type-cast.patch
ufs-not-usual-amounts-of-fragments-per-block.patch
ufs-unmark-config_ufs_fs_write-as-broken-mm-tree.patch
ufs-easy-debug.patch
ufs-little-directory-lookup-optimization.patch
ufs-i_blocks-wrong-count.patch
ufs-unlock_super-without-lock.patch
ufs-zero-metadata.patch
ufs-printk-warning-fixes.patch
oprofile-fix-unnecessary-cleverness.patch
msnd-section-fix.patch
oprofile-convert-from-semaphores-to-mutexes.patch
drivers-char-applicomc-proper-module_initexit.patch
remove-dead-entry-in-net-wan-makefile.patch
openpromfs-fix-missing-nul.patch
openpromfs-remove-unnecessary-casts.patch
openpromfs-factorize-out.patch
openpromfs-factorize-out-tidy.patch
idetape-gcc-41-warning-fix.patch
add-driver-for-arm-amba-pl031-rtc.patch
rtc-subsystem-fix-capability-checks-in-kernel-interface.patch
rtc-subsystem-add-capability-checks.patch
add-export_unused_symbol-and-export_unused_symbol_gpl.patch
add-export_unused_symbol-and-export_unused_symbol_gpl-default.patch
make-printk-work-for-really-early-debugging.patch
kernel-sysc-cleanups.patch
kernel-sysc-cleanups-fix.patch
nbd-kill-obsolete-changelog-add-gpl.patch
fix-listh-kernel-doc.patch
listh-doc-change-counter-to-control.patch
fix-magic-sysrq-on-strange-keyboards.patch
ide-cd-end-of-media-error-fix.patch
add-a-sysfs-file-to-determine-if-a-kexec-kernel-is-loaded.patch
cpqarray-section-fix.patch
pdflush-handle-resume-wakeups.patch
edd-isnt-experimental-anymore.patch
kernel-doc-drop-leading-space-in-sections.patch
kernel-doc-script-cleanups.patch
schedule_on_each_cpu-reduce-kmalloc-size.patch
avoid-disk-sector_t-overflow-for-2tb-ext3-filesystem.patch
cleanup-dead-code-from-ext2-mount-code.patch
fix-memory-leak-when-the-ext3s-journal-file-is-corrupted.patch
remove-inconsistent-space-before-exclamation-point-in-ext3s-mount-code.patch
moxa-remove-pointless-casts.patch
moxa-remove-pointless-check-of-tty-argument-vs-null.patch
moxa-partial-codingstyle-cleanup-spelling-fixes.patch
updated-kdump-documentation.patch
cpuset-remove-extra-cpuset_zone_allowed-check-in-__alloc_pages.patch
spin-rwlock-init-cleanups.patch
make-debug_mutex_on-__read_mostly.patch
constify-parts-of-kernel-power.patch
constify-libcrc32c-table.patch
apple-motion-sensor-driver.patch
prepare-for-__copy_from_user_inatomic-to-not-zero-missed-bytes.patch
make-copy_from_user_inatomic-not-zero-the-tail-on-i386.patch
remove-unecessary-null-check-in-kernel-acctc.patch
ax88796-parallel-port-driver.patch
ax88796-parallel-port-driver-build-fix.patch
wd7000-fix-section-mismatch-warnings.patch
megaraid_mbox-fix-section-mismatch-warnings.patch
keys-fix-race-between-two-instantiators-of-a-key.patch
keys-fix-race-between-two-instantiators-of-a-key-tidy.patch
ext3_fsblk_t-filesystem-group-blocks-and-bug-fixes.patch
ext3_fsblk_t-the-rest-of-in-kernel-filesystem-blocks.patch
list_del-debug.patch
inotify-split-kernel-api-from-userspace-support.patch
inotify-add-names-inode-to-event-handler.patch
inotify-add-interfaces-to-kernel-api.patch
inotify-allow-watch-removal-from-event-handler.patch
inotify-update-kernel-documentation.patch
kernel-doc-mm-readhead-fixup.patch
make-procfs-obligatory-except-under-config_embedded.patch
lock-validator-introduce-warn_on_oncecond.patch
lock-validator-introduce-warn_on_oncecond-speedup.patch
make-sysctl-obligatory-except-under-config_embedded.patch
for_each_cpu_mask-warning-fix.patch
emu10k1-mark-midi_spinlock-as-used.patch
add-max6902-rtc-support.patch
add-max6902-rtc-support-update.patch
add-max6902-rtc-support-tidy.patch
rtc-small-documentation-update.patch
#big-kernel-lock-contention-in-do_open-and-blkdev_put.patch
make-ext2_debug-work-again.patch
nbd-endian-annotations.patch
epoll-use-unlocked-wqueue-operations.patch
This is the misc-random-stuff-which-doesnt-have-a-subsystem-tree queue.
Will mostly merge, based upon re-review.
use-list_add_tail-instead-of-list_add.patch
arch-use-list_move.patch
core-use-list_move.patch
net-rxrpc-use-list_move.patch
drivers-use-list_move.patch
fs-use-list_move.patch
Will merge.
per-task-delay-accounting-setup.patch
per-task-delay-accounting-setup-fix-1.patch
per-task-delay-accounting-setup-fix-2.patch
per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch
per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch
per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch
per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch
per-task-delay-accounting-utilities-for-genetlink-usage.patch
per-task-delay-accounting-taskstats-interface.patch
per-task-delay-accounting-taskstats-interface-fix-1.patch
per-task-delay-accounting-taskstats-interface-fix-2.patch
per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch
per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch
per-task-delay-accounting-documentation.patch
per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch
per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch
I just don't know. There are a number of groups who pop up with various
enhanced accounting requirements and patches (all quite different) but I
haven't heard a lot of enthusiasm from any of them over this work, which
attempts to provide an extensible framework for accumulation and querying
of per-task metrics.
But then again, we cannot just sit there and wait for everyone to be 100%
happy. So I'm 51% inclined to push this along.
Anyone else who has an interest in this sort of thing needs to be aware
that there will be an expectation that any future statistics submissions
should use these interfaces. So the time to pay attention is right now.
time-clocksource-infrastructure.patch
time-clocksource-infrastructure-dont-enable-irq-too-early.patch
time-use-clocksource-infrastructure-for-update_wall_time.patch
time-use-clocksource-infrastructure-for-update_wall_time-mark-few-functions-as-__init.patch
time-let-user-request-precision-from-current_tick_length.patch
time-use-clocksource-abstraction-for-ntp-adjustments.patch
time-use-clocksource-abstraction-for-ntp-adjustments-optimize-out-some-mults-since-gcc-cant-avoid-them.patch
time-introduce-arch-generic-time-accessors.patch
hangcheck-remove-monotomic_clock-on-x86.patch
time-i386-conversion-part-1-move-timer_pitc-to-i8253c.patch
time-i386-conversion-part-2-rework-tsc-support.patch
time-i386-conversion-part-3-enable-generic-timekeeping.patch
time-i386-conversion-part-4-remove-old-timer_opts-code.patch
time-i386-clocksource-drivers.patch
time-i386-clocksource-drivers-pm-timer-doesnt-use-workaround-if-chipset-is-not-buggy.patch
time-i386-clocksource-drivers-pm-timer-doesnt-use-workaround-if-chipset-is-not-buggy-acpi_pm-cleanup.patch
time-i386-clocksource-drivers-pm-timer-doesnt-use-workaround-if-chipset-is-not-buggy-acpi_pm-cleanup-fix-missing-to-rename-pmtmr_good-to-acpi_pm_good.patch
time-i386-clocksource-drivers-fix-spelling-typos.patch
time-rename-clocksource-functions.patch
make-pmtmr_ioport-__read_mostly.patch
generic-time-add-macro-to-simplify-hide-mask.patch
time-fix-time-going-backward-w-clock=pit.patch
John's x86 time clocksource patches. Will merge. At last.
kprobe-boost-2byte-opcodes-on-i386.patch
kprobemulti-kprobe-posthandler-for-booster.patch
kprobemulti-kprobe-posthandler-for-booster-kprobes-bugfix-of-kprobe-booster-reenable-kprobe-booster.patch
notify-page-fault-call-chain-for-x86_64.patch
notify-page-fault-call-chain-for-i386.patch
notify-page-fault-call-chain-for-ia64.patch
notify-page-fault-call-chain-for-powerpc.patch
notify-page-fault-call-chain-for-sparc64.patch
kprobes-registers-for-notify-page-fault.patch
notify-page-fault-call-chain.patch
Will merge.
kconfig-select-things-at-the-closest-tristate-instead-of-bool.patch
<wonders what this is>
sched-fix-smt-nice-lock-contention-and-optimization.patch
sched-fix-smt-nice-lock-contention-and-optimization-tidy.patch
Will merge.
sched-comment-bitmap-size-accounting.patch
sched-fix-interactive-ceiling-code.patch
unnecessary-long-index-i-in-sched.patch
sched-implement-smpnice.patch
sched-protect-calculation-of-max_pull-from-integer-wrap.patch
sched-store-weighted-load-on-up.patch
sched-add-discrete-weighted-cpu-load-function.patch
sched-prevent-high-load-weight-tasks-suppressing-balancing.patch
sched-improve-stability-of-smpnice-load-balancing.patch
sched-improve-smpnice-load-balancing-when-load-per-task.patch
smpnice-dont-consider-sched-groups-which-are-lightly-loaded-for-balancing.patch
smpnice-dont-consider-sched-groups-which-are-lightly-loaded-for-balancing-fix.patch
smpnice-dont-consider-sched-groups-which-are-lightly-loaded-for-balancing-fix-2patch.patch
sched-modify-move_tasks-to-improve-load-balancing-outcomes.patch
sched-avoid-unnecessarily-moving-highest-priority-task-move_tasks.patch
sched-avoid-unnecessarily-moving-highest-priority-task-move_tasks-fix-2.patch
sched_domain-handle-kmalloc-failure.patch
sched_domain-handle-kmalloc-failure-fix.patch
sched_domain-dont-use-gfp_atomic.patch
sched_domain-use-kmalloc_node.patch
sched_domain-allocate-sched_group-structures-dynamically.patch
sched2-sched-domain-sysctl.patch
It's all been quiet on the sched performance regressions front lately.
I'll ping the usual suspects and see if we can get smpnice merged this
time.
sched-add-above-background-load-function.patch
mm-implement-swap-prefetching.patch
mm-implement-swap-prefetching-fix.patch
mm-implement-swap-prefetching-sched-batch.patch
swap-prefetch-fix-lru_cache_add_tail.patch
swap-prefetch-fix-lru_cache_add_tail-tidy.patch
mm-swap-prefetch-fix-lowmem-reserve-calc.patch
Swap prefetch. I remain skeptical, but I have a lot of RAM. Multiple
people have sung its praises. I guess I'll re-review and tentatively plan
on sending them along or 2.6.18. Opinions are sought.
pi-futex-futex-code-cleanups.patch
pi-futex-robust-futex-docs-fix.patch
pi-futex-introduce-debug_check_no_locks_freed.patch
pi-futex-introduce-warn_on_smp.patch
pi-futex-add-plist-implementation.patch
pi-futex-scheduler-support-for-pi.patch
pi-futex-rt-mutex-core.patch
pi-futex-rt-mutex-docs.patch
pi-futex-rt-mutex-docs-update.patch
pi-futex-rt-mutex-debug.patch
pi-futex-rt-mutex-tester.patch
pi-futex-rt-mutex-futex-api.patch
pi-futex-futex_lock_pi-futex_unlock_pi-support.patch
#
futex_requeue-optimization.patch
Priority-inheriting futexes. I don't have a clue how this code works,
but it sure has a lot of trylocks for something which allegedly works.
Will merge.
proc-fix-the-inode-number-on-proc-pid-fd.patch
proc-remove-useless-bkl-in-proc_pid_readlink.patch
proc-remove-unnecessary-and-misleading-assignments.patch
proc-simplify-the-ownership-rules-for-proc.patch
proc-replace-proc_inodetype-with-proc_inodefd.patch
proc-remove-bogus-proc_task_permission.patch
proc-kill-proc_mem_inode_operations.patch
proc-properly-filter-out-files-that-are-not-visible.patch
proc-fix-the-link-count-for-proc-pid-task.patch
proc-move-proc_maps_operations-into-task_mmuc.patch
proc-rewrite-the-proc-dentry-flush-on-exit.patch
proc-close-the-race-of-a-process-dying-durning.patch
proc-refactor-reading-directories-of-tasks.patch
proc-remove-tasklist_lock-from-proc_pid_readdir.patch
proc-remove-tasklist_lock-from-proc_pid_lookup-and.patch
proc-remove-tasklist_lock-from-proc_pid_readdir-simply-fix-first_tgid.patch
proc-make-proc_numbuf-the-buffer-size-for-holding-a.patch
proc-dont-lock-task_structs-indefinitely.patch
proc-dont-lock-task_structs-indefinitely-task_mmu-small-fixes.patch
proc-use-struct-pid-not-struct-task_ref.patch
proc-optimize-proc_check_dentry_visible.patch
proc-use-sane-permission-checks-on-the-proc-pid-fd.patch
proc-cleanup-proc_fd_access_allowed.patch
proc-remove-tasklist_lock-from-proc_task_readdir.patch
simplify-fix-first_tid.patch
cleanup-next_tid.patch
/proc/pid revamp. Will merge.
de_thread-fix-lockless-do_each_thread.patch
coredump-optimize-mm-users-traversal.patch
coredump-speedup-sigkill-sending.patch
coredump-kill-ptrace-related-stuff.patch
coredump-kill-ptrace-related-stuff-fix.patch
coredump-dont-take-tasklist_lock.patch
coredump-some-code-relocations.patch
coredump-shutdown-current-process-first.patch
coredump-copy_process-dont-check-signal_group_exit.patch
Will merge. I have a note here that Roland had issues with
coredump-kill-ptrace-related-stuff.patch?
ecryptfs-fs-makefile-and-fs-kconfig.patch
ecryptfs-fs-makefile-and-fs-kconfig-remove-ecrypt_debug-from-fs-kconfig.patch
ecryptfs-documentation.patch
ecryptfs-makefile.patch
ecryptfs-main-module-functions.patch
ecryptfs-main-module-functions-uint16_t-u16.patch
ecryptfs-header-declarations.patch
ecryptfs-header-declarations-update.patch
ecryptfs-header-declarations-update-convert-signed-data-types-to-unsigned-data-types.patch
ecryptfs-header-declarations-remove-unnecessary-ifndefs.patch
ecryptfs-superblock-operations.patch
ecryptfs-dentry-operations.patch
ecryptfs-file-operations.patch
ecryptfs-file-operations-remove-null-==-syntax.patch
ecryptfs-file-operations-remove-extraneous-read-of-inode-size-from-header.patch
#ecryptfs-vs-streamline-generic_file_-interfaces-and-filemap.patch
#ecryptfs-vs-streamline-generic_file_-interfaces-and-filemap-fix.patch
ecryptfs-file-operations-fix.patch
ecryptfs-file-operations-fix-premature-release-of-file_info-memory.patch
ecryptfs-inode-operations.patch
ecryptfs-mmap-operations.patch
mark-address_space_operations-const-vs-ecryptfs-mmap-operations.patch
ecryptfs-keystore.patch
ecryptfs-crypto-functions.patch
ecryptfs-debug-functions.patch
ecryptfs-alpha-build-fix.patch
ecryptfs-convert-assert-to-bug_on.patch
ecryptfs-remove-unnecessary-null-checks.patch
ecryptfs-rewrite-ecryptfs_fsync.patch
ecryptfs-overhaul-file-locking.patch
Christoph has half-reviewed this and all the issues arising from that
have, I believe, been addressed. With the exception of the "we should
have a generic stacking layer" issue. Which is true. Michael's take is
"yes, but that's not my job". Which also is true.
Don't know.
proc-sysctl-add-_proc_do_string-helper.patch
namespaces-add-nsproxy.patch
namespaces-add-nsproxy-dont-include-compileh.patch
namespaces-incorporate-fs-namespace-into-nsproxy.patch
namespaces-utsname-introduce-temporary-helpers.patch
namespaces-utsname-switch-to-using-uts-namespaces.patch
namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch
namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch
namespaces-utsname-use-init_utsname-when-appropriate.patch
namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch
namespaces-utsname-implement-utsname-namespaces.patch
namespaces-utsname-implement-utsname-namespaces-export.patch
namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch
namespaces-utsname-sysctl-hack.patch
namespaces-utsname-sysctl-hack-cleanup.patch
namespaces-utsname-sysctl-hack-cleanup-2.patch
namespaces-utsname-sysctl-hack-cleanup-2-fix.patch
namespaces-utsname-remove-system_utsname.patch
namespaces-utsname-implement-clone_newuts-flag.patch
uts-copy-nsproxy-only-when-needed.patch
# needed if git-klibc isn't there:
#namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch
#namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch
#namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch
utsname virtualisation. This doesn't seem very pointful as a standalone
thing. That's a general problem with infrastructural work for a very
large new feature.
So probably I'll continue to babysit these patches, unless someone can
identify a decent reason why mainline needs this work.
I don't want to carry an ever-growing stream of OS-virtualisation
groundwork patches for ever and ever so if we're going to do this thing...
faster, please.
readahead-kconfig-options.patch
radixtree-introduce-radix_tree_scan_hole.patch
mm-introduce-probe_page.patch
mm-introduce-pg_readahead.patch
readahead-add-look-ahead-support-to-__do_page_cache_readahead.patch
readahead-delay-page-release-in-do_generic_mapping_read.patch
readahead-insert-cond_resched-calls.patch
readahead-minmax_ra_pages.patch
readahead-events-accounting.patch
readahead-rescue_pages.patch
readahead-sysctl-parameters.patch
readahead-sysctl-parameters-fix.patch
readahead-min-max-sizes.patch
readahead-state-based-method-aging-accounting.patch
readahead-state-based-method-routines.patch
readahead-state-based-method.patch
readahead-state-based-method-readahead-state-based-method-stand-alone-size-limit-code.patch
readahead-context-based-method.patch
readahead-context-based-method-apply-stream_shift-size-limits-to-contexta-method.patch
readahead-context-based-method-fix-remain-counting.patch
readahead-initial-method-guiding-sizes.patch
readahead-initial-method-thrashing-guard-size.patch
readahead-initial-method-expected-read-size.patch
readahead-initial-method-user-recommended-size.patch
readahead-initial-method.patch
readahead-backward-prefetching-method.patch
readahead-backward-prefetching-method-add-use-case-comment.patch
readahead-seeking-reads-method.patch
readahead-thrashing-recovery-method.patch
readahead-call-scheme.patch
readahead-laptop-mode.patch
readahead-loop-case.patch
readahead-nfsd-case.patch
readahead-turn-on-by-default.patch
readahead-debug-radix-tree-new-functions.patch
readahead-debug-traces-showing-accessed-file-names.patch
readahead-debug-traces-showing-read-patterns.patch
It's early days yet - needs heaps more performance testing. The results
from "Linux Portal" <linportal@gmail.com> were discouraging.
reiser4-export-handle_ra_miss.patch
reiser4-sb_sync_inodes.patch
reiser4-export-remove_from_page_cache.patch
reiser4-export-radix_tree_preload.patch
reiser4-export-find_get_pages.patch
make-copy_from_user_inatomic-not-zero-the-tail-on-i386-vs-reiser4.patch
reiser4.patch
reiser4-hardirq-include-fix.patch
reiser4-fix-trivial-tyops-which-were-hard-to-hit.patch
reiser4-run-truncate_inode_pages-in-reiser4_delete_inode.patch
We need to do something about this. It does need an intensive review and
there aren't many people who have the experience to do that right, and
there are fewer who have the time. Uptake by a vendor or two would be
good.
ide-pdc202xx_oldc-remove-unneeded-tuneproc-call.patch
ide-claim-extra-dma-ports-regardless-of-channel.patch
ide-remove-dma_base2-field-form-ide_hwif_t.patch
ide-always-release-dma-engine.patch
fix-ide-locking-error.patch
ide-error-handling-fixes.patch
ide-hpt3xxn-clocking-fixes.patch
ide-io-increase-timeout-value-to-allow-for-slave-wakeup.patch
ide-actually-honor-drives-minimum-pio-dma-cycle-times.patch
ide-fix-hpt37x-timing-tables.patch
ide-optimize-hpt37x-timing-tables.patch
ide-fix-hpt3xx-hotswap-support.patch
ide-fix-the-case-of-multiple-hpt3xx-chips-present.patch
ide-hpt3xx-fix-pci-clock-detection.patch
ide-hpt3xx-fix-pci-clock-detection-fix-2.patch
ide-pdc202xx_old-remove-the-obsolete-busproc.patch
piix-fix-82371mx-enablebits.patch
piix-remove-check-for-broken-mw-dma-mode-0.patch
piix-slc90e66-pio-mode-fallback-fix.patch
make-number-of-ide-interfaces-configurable.patch
ide_dma_speed-fixes.patch
ide_dma_speed-fixes-warning-fix.patch
ide_dma_speed-fixes-tidy.patch
hpt3xx-rework-rate-filtering.patch
hpt3xx-rework-rate-filtering-tidy.patch
hpt3xx-print-the-real-chip-name-at-startup.patch
hpt3xx-switch-to-using-pci_get_slot.patch
hpt3xx-cache-channels-mcr-address.patch
Will merge, subject to maintainer-poking.
radeonfb-powerdrain-issue-on-ibm-thinkpads-and-suspend-to-d2.patch
savagefb-allocate-space-for-current-and-saved-register.patch
savagefb-add-state-save-and_restore-hooks.patch
savagefb-add-state-save-and_restore-hooks-tidy.patch
savagefb-add-state-save-and_restore-hooks-fix.patch
backlight-locomo-backlight-driver-updates.patch
fbdev-cleanup-the-config_video_select-mess.patch
fbdev-remove-duplicate-includes.patch
fbdev-more-accurate-sync-range-extrapolation.patch
nvidiafb-revise-pci_device_id-table.patch
atyfb-fix-hardware-cursor-handling.patch
atyfb-remove-unneeded-calls-to-wait_for_idle.patch
atyfb-set-correct-acceleration-flags.patch
epson1355fb-update-platform-code.patch
vesafb-update-platform-code.patch
vfb-update-platform-code.patch
vga16fb-update-platform-code.patch
fbdev-static-pseudocolor-with-depth-less-than-4-does.patch
savagefb-whitespace-cleanup.patch
fbdev-firmware-edid-fixes.patch
fbdev-firmware-edid-fixes-fix.patch
nvidiafb-add-support-for-geforce-6100-and-related-chipsets.patch
fbdev-add-1366x768-wxga-mode-to-mode-database.patch
vesafb-fix-return-code-of-vesafb_setcolreg.patch
vesafb-prefer-vga-registers-over-pmi.patch
vt-delay-the-update-of-the-visible-console.patch
atyfb-fix-dead-code.patch
fbdev-coverity-bug-85.patch
fbdev-coverity-bug-90.patch
fbdev-remove-unused-exports.patch
s3c2410fb-fix-resume.patch
backlight-fix-kconfig-dependency.patch
au1100fb-add-power-management-support.patch
au1100fb-add-power-management-support-tidy.patch
skeletonfb-remove-duplicate-module-init-exit-license-lines.patch
neofb-fix-unblank-logic-interfering-with-lid-toggled-backlight.patch
Will merge.
dm-snapshot-unify-chunk_size.patch
lib-add-idr_replace.patch
lib-add-idr_replace-tidy.patch
dm-fix-idr-minor-allocation.patch
dm-move-idr_pre_get.patch
dm-change-minor_lock-to-spinlock.patch
dm-add-dmf_freeing.patch
dm-fix-mapped-device-ref-counting.patch
dm-add-module-ref-counting.patch
dm-fix-block-device-initialisation.patch
dm-mirror-sector-offset-fix.patch
dm-table-get_target-fix-last-index.patch
Will merge.
md-reformat-code-in-raid1_end_write_request-to-avoid-goto.patch
md-remove-arbitrary-limit-on-chunk-size.patch
md-remove-useless-ioctl-warning.patch
md-increase-the-delay-before-marking-metadata-clean-and-make-it-configurable.patch
md-merge-raid5-and-raid6-code.patch
md-remove-nuisance-message-at-shutdown.patch
md-allow-checkpoint-of-recovery-with-version-1-superblock.patch
md-allow-checkpoint-of-recovery-with-version-1-superblock-fix.patch
md-allow-a-linear-array-to-have-drives-added-while-active.patch
md-support-stripe-offset-mode-in-raid10.patch
md-make-md_print_devices-static.patch
md-split-reshape-portion-of-raid5-sync_request-into-a-separate-function.patch
#
md-bitmap-fix-online-removal-of-file-backed-bitmaps.patch
md-bitmap-remove-bitmap-writeback-daemon.patch
md-bitmap-cleaner-separation-of-page-attribute-handlers-in-md-bitmap.patch
md-bitmap-use-set_bit-etc-for-bitmap-page-attributes.patch
md-bitmap-remove-unnecessary-page-reference-manipulations-from-md-bitmap-code.patch
md-bitmap-remove-dead-code-from-md-bitmap.patch
md-bitmap-tidy-up-i_writecount-handling-in-md-bitmap.patch
md-bitmap-change-md-bitmap-file-handling-to-use-bmap-to-file-blocks.patch
md-change-md-bitmap-file-handling-to-use-bmap-to-file-blocks-fix.patch
md-calculate-correct-array-size-for-raid10-in-new-offset-mode.patch
#
md-md-kconfig-speeling-feex.patch
md-fix-kconfig-error.patch
md-fix-bug-that-stops-raid5-resync-from-happening.patch
md-allow-re-add-to-work-on-array-without-bitmaps.patch
md-dont-write-dirty-clean-update-to-spares-leave-them-alone.patch
md-set-get-state-of-array-via-sysfs.patch
md-allow-rdev-state-to-be-set-via-sysfs.patch
md-allow-raid-layout-to-be-read-and-set-via-sysfs.patch
md-allow-resync_start-to-be-set-and-queried-via-sysfs.patch
md-allow-the-write_mostly-flag-to-be-set-via-sysfs.patch
Will merge.
statistics-infrastructure-prerequisite-list.patch
statistics-infrastructure-prerequisite-parser.patch
statistics-infrastructure-prerequisite-timestamp.patch
statistics-infrastructure-prerequisite-timestamp-fix.patch
statistics-infrastructure-make-printk_clock-a-generic-kernel-wide-nsec-resolution.patch
statistics-infrastructure-documentation.patch
statistics-infrastructure.patch
statistics-infrastructure-update-1.patch
statistics-infrastructure-update-2.patch
statistics-infrastructure-update-3.patch
statistics-infrastructure-exploitation-zfcp.patch
Another tough one. It offers generic intrastructure for non-task-related
instrumentation and it would really be good if someone who has an interest
in this for something other than the zfcp driver could stand up and say
"this works for us".
genirq-rename-desc-handler-to-desc-chip.patch
genirq-rename-desc-handler-to-desc-chip-power-fix.patch
genirq-rename-desc-handler-to-desc-chip-ia64-fix.patch
genirq-rename-desc-handler-to-desc-chip-ia64-fix-2.patch
genirq-sem2mutex-probe_sem-probing_active.patch
genirq-cleanup-merge-irq_affinity-into-irq_desc.patch
genirq-cleanup-remove-irq_descp.patch
genirq-cleanup-remove-irq_descp-fix.patch
genirq-cleanup-remove-fastcall.patch
genirq-cleanup-misc-code-cleanups.patch
genirq-cleanup-reduce-irq_desc_t-use-mark-it-obsolete.patch
genirq-cleanup-include-linux-irqh.patch
genirq-cleanup-merge-irq_dir-smp_affinity_entry-into-irq_desc.patch
genirq-cleanup-merge-pending_irq_cpumask-into-irq_desc.patch
genirq-cleanup-turn-arch_has_irq_per_cpu-into-config_irq_per_cpu.patch
genirq-debug-better-debug-printout-in-enable_irq.patch
genirq-add-retrigger-irq-op-to-consolidate-hw_irq_resend.patch
genirq-doc-comment-include-linux-irqh-structures.patch
genirq-doc-handle_irq_event-and-__do_irq-comments.patch
genirq-cleanup-no_irq_type-cleanups.patch
genirq-doc-add-design-documentation.patch
genirq-add-genirq-sw-irq-retrigger.patch
genirq-add-irq_noprobe-support.patch
genirq-add-irq_norequest-support.patch
genirq-add-irq_noautoen-support.patch
genirq-update-copyrights.patch
genirq-core.patch
genirq-msi-fixes-2.patch
genirq-add-irq-chip-support.patch
genirq-add-irq-chip-support-fix.patch
genirq-add-handle_bad_irq.patch
genirq-add-irq-wake-power-management-support.patch
genirq-add-sa_trigger-support.patch
genirq-cleanup-no_irq_type-no_irq_chip-rename.patch
genirq-convert-the-x86_64-architecture-to-irq-chips.patch
genirq-convert-the-i386-architecture-to-irq-chips.patch
genirq-convert-the-i386-architecture-to-irq-chips-fix-2.patch
genirq-more-verbose-debugging-on-unexpected-irq-vectors.patch
genirq-add-chip-eoi-fastack-fasteoi.patch
genirq-add-chip-eoi-fastack-fasteoi-fix.patch
Still stabilising. It's looking more like 2.6.19 material. Needs more
review from arch maintainers too.
lock-validator-floppyc-irq-release-fix.patch
lock-validator-floppyc-irq-release-fix-fix.patch
lock-validator-forcedethc-fix.patch
lock-validator-mutex-section-binutils-workaround.patch
lock-validator-add-__module_address-method.patch
lock-validator-better-lock-debugging.patch
lock-validator-locking-api-self-tests.patch
lock-validator-locking-api-self-tests-self-test-fix.patch
lock-validator-locking-init-debugging-improvement.patch
lock-validator-beautify-x86_64-stacktraces.patch
lock-validator-beautify-x86_64-stacktraces-fix.patch
lock-validator-beautify-x86_64-stacktraces-fix-2.patch
lock-validator-beautify-x86_64-stacktraces-fix-3.patch
lock-validator-beautify-x86_64-stacktraces-fix-4.patch
lock-validator-x86_64-document-stack-frame-internals.patch
lock-validator-stacktrace.patch
lock-validator-stacktrace-build-fix.patch
lock-validator-stacktrace-warning-fix.patch
lock-validator-stacktrace-fix-on-x86_64.patch
lock-validator-fown-locking-workaround.patch
lock-validator-sk_callback_lock-workaround.patch
lock-validator-irqtrace-core.patch
lock-validator-irqtrace-core-powerpc-fix-1.patch
lock-validator-irqtrace-core-non-x86-fix.patch
lock-validator-irqtrace-core-non-x86-fix-2.patch
lock-validator-irqtrace-core-non-x86-fix-3.patch
lock-validator-irqtrace-entrys-fix.patch
lock-validator-irqtrace-core-remove-softirqc-warn_on.patch
lock-validator-irqtrace-cleanup-include-asm-i386-irqflagsh.patch
lock-validator-irqtrace-cleanup-include-asm-x86_64-irqflagsh.patch
lock-validator-x86_64-irqflags-trace-entrys-fix.patch
lock-validator-lockdep-add-local_irq_enable_in_hardirq-api.patch
lock-validator-add-per_cpu_offset.patch
lock-validator-add-per_cpu_offset-fix.patch
lock-validator-core.patch
lock-validator-core-early_boot_irqs_-build-fix.patch
lock-validator-core-fix-compiler-warning.patch
lock-validator-procfs.patch
lock-validator-core-multichar-fix.patch
lock-validator-core-count_matching_names-fix.patch
lock-validator-design-docs.patch
lock-validator-prove-rwsem-locking-correctness.patch
lock-validator-prove-rwsem-locking-correctness-fix.patch
lock-validator-prove-rwsem-locking-correctness-powerpc-fix.patch
lock-validator-prove-spinlock-rwlock-locking-correctness.patch
lock-validator-prove-mutex-locking-correctness.patch
lock-validator-prove-mutex-locking-correctness-fix-null-type-name-bug.patch
lock-validator-print-all-lock-types-on-sysrq-d.patch
lock-validator-x86_64-early-init.patch
lock-validator-smp-alternatives-workaround.patch
lock-validator-do-not-recurse-in-printk.patch
lock-validator-disable-nmi-watchdog-if-config_lockdep.patch
lock-validator-disable-nmi-watchdog-if-config_lockdep-i386.patch
lock-validator-disable-nmi-watchdog-if-config_lockdep-x86_64.patch
lock-validator-special-locking-bdev.patch
lock-validator-special-locking-direct-io.patch
lock-validator-special-locking-serial.patch
lock-validator-special-locking-serial-fix.patch
lock-validator-special-locking-dcache.patch
lock-validator-special-locking-i_mutex.patch
lock-validator-special-locking-s_lock.patch
lock-validator-special-locking-futex.patch
lock-validator-special-locking-genirq.patch
lock-validator-special-locking-completions.patch
lock-validator-special-locking-waitqueues.patch
lock-validator-special-locking-mm.patch
lock-validator-special-locking-serio.patch
lock-validator-special-locking-slab.patch
lock-validator-special-locking-skb_queue_head_init.patch
lock-validator-special-locking-net-ipv4-igmpcpatch.patch
lock-validator-special-locking-net-ipv4-igmpc-2.patch
lock-validator-special-locking-timerc.patch
lock-validator-special-locking-schedc.patch
lock-validator-special-locking-hrtimerc.patch
lock-validator-special-locking-sock_lock_init.patch
lock-validator-special-locking-af_unix.patch
lock-validator-special-locking-bh_lock_sock.patch
lock-validator-special-locking-mmap_sem.patch
lock-validator-special-locking-sb-s_umount.patch
lock-validator-special-locking-sb-s_umount-fix.patch
lock-validator-special-locking-sb-s_umount-2.patch
lock-validator-special-locking-sb-s_umount-2-fix.patch
lockdep-annotate-rpc_populate-for.patch
lock-validator-special-locking-jbd.patch
lock-validator-special-locking-posix-timers.patch
lock-validator-special-locking-sch_genericc.patch
lock-validator-special-locking-xfrm.patch
lockdep-add-i_mutex-ordering-annotations-to-the-sunrpc.patch
lockdep-add-parent-child-annotations-to-usbfs.patch
lock-validator-special-locking-sound-core-seq-seq_portsc.patch
lock-validator-special-locking-sound-core-seq-seq_devicec.patch
lock-validator-special-locking-sound-core-seq-seq_devicec-fix.patch
lock-validator-fix-rt_hash_lock_sz.patch
lock-validator-introduce-irq__lockdep.patch
locking-validator-special-rule-8390c-disable_irq.patch
locking-validator-special-rule-3c59xc-disable_irq.patch
lock-validator-enable-lock-validator-in-kconfig.patch
lock-validator-enable-lock-validator-in-kconfig-require-trace_irqflags_support.patch
lock-validator-enable-lock-validator-in-kconfig-not-yet.patch
lockdep-one-stacktrace-column-if-config_lockdep=y.patch
i386-remove-multi-entry-backtraces.patch
lockdep-further-improve-stacktrace-output.patch
lock-validator-irqtrace-support-non-x86-architectures.patch
lock-validator-disable-oprofile-if-lockdep=y.patch
lock-validator-select-kallsyms_all.patch
I'm not really sure that this has as good a bugfixes/effort ratio as would,
say, working on our ever-growing bugzilla list.
But given that it exists, and that it'll fix (or rather prevent) future
bugs at a constant-but-low rate for a long time, I guess it's something we
want.
I think it's more like 2.6.19 material. The number of
teach-lockdep-about-this-unusual-but-correct-locking-code patches
continues to grow and I don't think we fully have a handle on how it'll
all end up looking.
^ permalink raw reply [flat|nested] 166+ messages in thread* 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton @ 2006-06-04 21:20 ` Bernhard Rosenkraenzer 2006-06-04 21:33 ` header cleanup and install David Woodhouse ` (19 subsequent siblings) 20 siblings, 0 replies; 166+ messages in thread From: Bernhard Rosenkraenzer @ 2006-06-04 21:20 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sunday, 4. June 2006 22:50, Andrew Morton wrote: > git-hdrcleanup.patch > git-hdrinstall.patch > > This is Dave Woodhouse's work cleaning up the kernel headers and adding a > `make headerinstall' target which automates the exporting of kernel > headers as a userspace-usable package. > > All I can say about this is that it doesn't appear to break anything and > is ready to merge from that point of view. It's not an area in which I > have much interest or knowledge. I've played with it and rebuilt all of Ark Linux (around 5000 packages) with glibc-kernheaders replaced with make headerinstall-ed headers, no problems at all (except some stupid apps thinking BITS_PER_LONG is supposed to be defined, but they were probably broken with the last couple of glibc-kernheaders releases as well). So from a user's perspective, it's ready. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton 2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer @ 2006-06-04 21:33 ` David Woodhouse 2006-06-04 21:43 ` Andrew Morton 2006-06-05 10:52 ` Jens Axboe 2006-06-04 21:36 ` 2.6.18 -mm merge plans Alan Cox ` (18 subsequent siblings) 20 siblings, 2 replies; 166+ messages in thread From: David Woodhouse @ 2006-06-04 21:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote: > git-hdrcleanup.patch > git-hdrinstall.patch > > This is Dave Woodhouse's work cleaning up the kernel headers and adding a > `make headerinstall' target which automates the exporting of kernel > headers as a userspace-usable package. More specifically: git-hdrcleanup is simple and boring janitorial stuff in headers -- nothing particularly new and exciting. Mostly it's just moving stuff that shouldn't be user-visible inside existing instances of #ifdef __KERNEL__ -- it doesn't even add many new ifdefs. A large chunk of it is just removing the superfluous #include <linux/config.h> from every file. The only bit that's even vaguely interesting, if you're _desperate_ to find something exciting in it, is the fact that I hid the broken _syscallX macros from asm-*/unistd.h inside #ifdef __KERNEL__. They're broken for 64-bit syscall arguments on architectures like MIPS, they were even broken for PIC code on i386. Not only were they broken, but also the kernel headers are _not_ a library of random crap for userspace to use. Glibc doesn't use them, klibc doesn't use them, and dietlibc folks were working on not using them last time I checked. git-hdrinstall is just the 'make headers_install' thing, based on an original implementation by Arnd Bergmann. It takes the set of headers which are at all suitable for userspace and exports them with unifdef. The idea is that distributions can have a _consistent_ set of headers to build stuff like glibc and system tools against, rather than the horrid mess we have now. Those files can also be diffed from one release to the next, and we have a decent chance of actually _seeing_ what changed, without all the noise. Having done that diff on my last few updates, it does actually seem to work like that in practice. > That being said, it's relatively costly to carry such extensive patches > in -mm for long periods, so I'd ask Linus and the distro people to work > out what we want to do here promptly, please. The result of this is already shipping in Fedora rawhide, and it's a godsend. I haven't heard much from the relevant package maintainers in other distros recently, but they were generally in agreement last time I heard. There's not a lot of 'working out' to be done -- we just need Linus to take it. Btw, no mention of the rbtree shrinkage. I plan to send that Linuswards as soon as 2.6.17 is out too, OK? And the mtd tree too but that's just a normal maintainer tree so I _expected_ you to omit that. -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-04 21:33 ` header cleanup and install David Woodhouse @ 2006-06-04 21:43 ` Andrew Morton 2006-06-05 10:52 ` Jens Axboe 1 sibling, 0 replies; 166+ messages in thread From: Andrew Morton @ 2006-06-04 21:43 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-kernel On Sun, 04 Jun 2006 22:33:13 +0100 David Woodhouse <dwmw2@infradead.org> wrote: > Btw, no mention of the rbtree shrinkage. I plan to send that Linuswards > as soon as 2.6.17 is out too, OK? yup. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-04 21:33 ` header cleanup and install David Woodhouse 2006-06-04 21:43 ` Andrew Morton @ 2006-06-05 10:52 ` Jens Axboe 2006-06-05 10:54 ` David Woodhouse 1 sibling, 1 reply; 166+ messages in thread From: Jens Axboe @ 2006-06-05 10:52 UTC (permalink / raw) To: David Woodhouse; +Cc: Andrew Morton, linux-kernel On Sun, Jun 04 2006, David Woodhouse wrote: > Btw, no mention of the rbtree shrinkage. I plan to send that Linuswards > as soon as 2.6.17 is out too, OK? And the mtd tree too but that's just a > normal maintainer tree so I _expected_ you to omit that. I guess the color -> colour transformation is clouding the inclusion :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-05 10:52 ` Jens Axboe @ 2006-06-05 10:54 ` David Woodhouse 2006-06-05 10:59 ` Jens Axboe 0 siblings, 1 reply; 166+ messages in thread From: David Woodhouse @ 2006-06-05 10:54 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, linux-kernel On Mon, 2006-06-05 at 12:52 +0200, Jens Axboe wrote: > I guess the color -> colour transformation is clouding the > inclusion :-) Heh. Well, mostly I've just _removed_ the references colo{u,}r, since callers shouldn't be poking at it anyway in general. We have rb_set_black() and rb_set_red() now, but even those are only really for the rbtree code itself to be using. -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-05 10:54 ` David Woodhouse @ 2006-06-05 10:59 ` Jens Axboe 2006-06-05 10:57 ` David Woodhouse 0 siblings, 1 reply; 166+ messages in thread From: Jens Axboe @ 2006-06-05 10:59 UTC (permalink / raw) To: David Woodhouse; +Cc: Andrew Morton, linux-kernel On Mon, Jun 05 2006, David Woodhouse wrote: > On Mon, 2006-06-05 at 12:52 +0200, Jens Axboe wrote: > > I guess the color -> colour transformation is clouding the > > inclusion :-) > > Heh. Well, mostly I've just _removed_ the references colo{u,}r, since > callers shouldn't be poking at it anyway in general. We have > rb_set_black() and rb_set_red() now, but even those are only really for > the rbtree code itself to be using. Yeah I'm just kidding, I just noticed that your British fingers could not leave the "color" alone! The patches are fine with me, rb usage is quite wide spread and shrinking the nodes is definitely a good thing. -- Jens Axboe ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-05 10:59 ` Jens Axboe @ 2006-06-05 10:57 ` David Woodhouse 2006-06-05 11:03 ` Jens Axboe 0 siblings, 1 reply; 166+ messages in thread From: David Woodhouse @ 2006-06-05 10:57 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrew Morton, linux-kernel On Mon, 2006-06-05 at 12:59 +0200, Jens Axboe wrote: > Yeah I'm just kidding, I just noticed that your British fingers could > not leave the "color" alone! The patches are fine with me, rb usage is > quite wide spread and shrinking the nodes is definitely a good thing. Hey... I left rb_insert_color() as it was, didn't I? :) -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-05 10:57 ` David Woodhouse @ 2006-06-05 11:03 ` Jens Axboe 2006-06-05 18:09 ` Andrew Morton 0 siblings, 1 reply; 166+ messages in thread From: Jens Axboe @ 2006-06-05 11:03 UTC (permalink / raw) To: David Woodhouse; +Cc: Andrew Morton, linux-kernel On Mon, Jun 05 2006, David Woodhouse wrote: > On Mon, 2006-06-05 at 12:59 +0200, Jens Axboe wrote: > > Yeah I'm just kidding, I just noticed that your British fingers could > > not leave the "color" alone! The patches are fine with me, rb usage is > > quite wide spread and shrinking the nodes is definitely a good thing. > > Hey... I left rb_insert_color() as it was, didn't I? :) You did - and snuck in the renaming in the headers :-) But don't label me as a colour racist. -- Jens Axboe ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-05 11:03 ` Jens Axboe @ 2006-06-05 18:09 ` Andrew Morton 2006-06-05 19:19 ` David Woodhouse 0 siblings, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-05 18:09 UTC (permalink / raw) To: Jens Axboe; +Cc: dwmw2, linux-kernel On Mon, 5 Jun 2006 13:03:31 +0200 Jens Axboe <axboe@suse.de> wrote: > On Mon, Jun 05 2006, David Woodhouse wrote: > > On Mon, 2006-06-05 at 12:59 +0200, Jens Axboe wrote: > > > Yeah I'm just kidding, I just noticed that your British fingers could > > > not leave the "color" alone! The patches are fine with me, rb usage is > > > quite wide spread and shrinking the nodes is definitely a good thing. > > > > Hey... I left rb_insert_color() as it was, didn't I? :) > > You did - and snuck in the renaming in the headers :-) > But don't label me as a colour racist. > I'm not as shy. David, we now have a mixture of "color" and "colour" in the same piece of code. That's just dumb. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-05 18:09 ` Andrew Morton @ 2006-06-05 19:19 ` David Woodhouse 2006-06-17 20:35 ` Alistair John Strachan 0 siblings, 1 reply; 166+ messages in thread From: David Woodhouse @ 2006-06-05 19:19 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, linux-kernel On Mon, 2006-06-05 at 11:09 -0700, Andrew Morton wrote: > I'm not as shy. > > David, we now have a mixture of "color" and "colour" in the same piece of > code. That's just dumb. I blame them damn Frenchies. Fixed in the git tree. -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-05 19:19 ` David Woodhouse @ 2006-06-17 20:35 ` Alistair John Strachan 2006-06-17 21:20 ` David Woodhouse 0 siblings, 1 reply; 166+ messages in thread From: Alistair John Strachan @ 2006-06-17 20:35 UTC (permalink / raw) To: David Woodhouse; +Cc: Andrew Morton, Jens Axboe, linux-kernel On Monday 05 June 2006 20:19, David Woodhouse wrote: > On Mon, 2006-06-05 at 11:09 -0700, Andrew Morton wrote: > > I'm not as shy. > > > > David, we now have a mixture of "color" and "colour" in the same piece of > > code. That's just dumb. > > I blame them damn Frenchies. Fixed in the git tree. To colour, I assume. ;-) -- Cheers, Alistair. Third year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: header cleanup and install 2006-06-17 20:35 ` Alistair John Strachan @ 2006-06-17 21:20 ` David Woodhouse 0 siblings, 0 replies; 166+ messages in thread From: David Woodhouse @ 2006-06-17 21:20 UTC (permalink / raw) To: Alistair John Strachan; +Cc: Andrew Morton, Jens Axboe, linux-kernel On Sat, 2006-06-17 at 21:35 +0100, Alistair John Strachan wrote: > > > David, we now have a mixture of "color" and "colour" in the same piece of > > > code. That's just dumb. > > > > I blame them damn Frenchies. Fixed in the git tree. > > To colour, I assume. ;-) No, to 'color' since rb_insert_color() was the public API and hasn't changed, while 'rb_parent_colour' is a new, internal field (now renamed to 'rb_parent_color'). -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton 2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer 2006-06-04 21:33 ` header cleanup and install David Woodhouse @ 2006-06-04 21:36 ` Alan Cox 2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg ` (17 subsequent siblings) 20 siblings, 0 replies; 166+ messages in thread From: Alan Cox @ 2006-06-04 21:36 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Ar Sul, 2006-06-04 am 13:50 -0700, ysgrifennodd Andrew Morton: > All I can say about this is that it doesn't appear to break anything and > is ready to merge from that point of view. It's not an area in which I > have much interest or knowledge. With my distro hat on I'd say its essential work and it either needs doing now or resolving at the kernel summit. Alan ^ permalink raw reply [flat|nested] 166+ messages in thread
* kbuild, kconfig and hrdinstall stuff 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (2 preceding siblings ...) 2006-06-04 21:36 ` 2.6.18 -mm merge plans Alan Cox @ 2006-06-04 21:41 ` Sam Ravnborg 2006-06-04 21:54 ` David Woodhouse 2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin ` (16 subsequent siblings) 20 siblings, 1 reply; 166+ messages in thread From: Sam Ravnborg @ 2006-06-04 21:41 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel > git-hdrcleanup.patch > git-hdrinstall.patch > > This is Dave Woodhouse's work cleaning up the kernel headers and adding a > `make headerinstall' target which automates the exporting of kernel > headers as a userspace-usable package. > > All I can say about this is that it doesn't appear to break anything and > is ready to merge from that point of view. It's not an area in which I > have much interest or knowledge. Dave Woodhouse asked me to review the hdrinstall part and I will do so. At first glance only a fiw tid-bits needs fixing and then I like to include unifdef in the kernel. It is rather unusual to have installed (gentoo at least does not have it in Portage). I just lacks a bit of time. Work and my newcomer (2 months old now) takes a bit of time at the moment. I you do not beat me hdrinstall will be part of kbuild-tree soon, whereas the hrdcleanup part will not. Following will be in kbuild-tree soon too. > add-dependency-on-kernelrelease-to-the-package-targets.patch > kconfig-improve-config-load-save-output.patch > kconfig-fix-config-dependencies.patch > kconfig-remove-symbol_yesmodno.patch > kconfig-allow-multiple-default-values-per-symbol.patch > kconfig-allow-loading-multiple-configurations.patch > kconfig-integrate-split-config-into-silentoldconfig.patch > kconfig-integrate-split-config-into-silentoldconfig-fix.patch > kconfig-move-kernelrelease.patch > kconfig-add-symbol-option-config-syntax.patch > kconfig-add-defconfig_list-module-option.patch > kconfig-add-search-option-for-xconfig.patch > kconfig-finer-customization-via-popup-menus.patch > kconfig-create-links-in-info-window.patch > kconfig-jump-to-linked-menu-prompt.patch > kconfig-warn-about-leading-whitespace-for-menu-prompts.patch > kconfig-remove-leading-whitespace-in-menu-prompts.patch > config-exit-if-no-beginning-filename.patch > make-kernelrelease-speedup.patch > kconfig-kconfig_overwriteconfig.patch Not this one >>> sane-menuconfig-colours.patch Randy Dunlap has a patch so it is configurable - but I like it selectable in menuconfig - something I have not yet done. > kbuild-export-type-enhancement-to-modpostc.patch > kbuild-export-type-enhancement-to-modpostc-fix.patch > kbuild-prevent-building-modules-that-wont-load.patch > kbuild-export-symbol-usage-report-generator.patch > kbuild-obj-dirs-is-calculated-incorrectly-if-hostprogs-y-is-defined.patch > fix-make-rpm-for-powerpc.patch If review is good => kbuild-tree. > powerpc-kbuild-warning-fix.patch I need to check up on this. > kernel-doc-drop-leading-space-in-sections.patch > kernel-doc-script-cleanups.patch I thought we had a kernel-doc maintainer these days? Sam ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: kbuild, kconfig and hrdinstall stuff 2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg @ 2006-06-04 21:54 ` David Woodhouse 0 siblings, 0 replies; 166+ messages in thread From: David Woodhouse @ 2006-06-04 21:54 UTC (permalink / raw) To: Sam Ravnborg; +Cc: Andrew Morton, linux-kernel On Sun, 2006-06-04 at 23:41 +0200, Sam Ravnborg wrote: > Dave Woodhouse asked me to review the hdrinstall part and I will do > so. > At first glance only a fiw tid-bits needs fixing and then I like to > include unifdef in the kernel. It is rather unusual to have installed > (gentoo at least does not have it in Portage). I'm happy enough to include unifdef -- I'll do it right now if there's consensus. I just didn't want to do it and then have its inclusion be a _problem_ -- I wanted the tree to be as simple as possible. -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (3 preceding siblings ...) 2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg @ 2006-06-04 23:04 ` H. Peter Anvin 2006-06-05 18:09 ` Roman Zippel 2006-06-06 15:20 ` Pavel Machek 2006-06-04 23:50 ` clocksource Roman Zippel ` (15 subsequent siblings) 20 siblings, 2 replies; 166+ messages in thread From: H. Peter Anvin @ 2006-06-04 23:04 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Andrew Morton wrote: > > git-klibc.patch > > Similar. This all appears to work sufficiently well for a 2.6.18 merge. > But it's been so long since klibc was a hot topic that I've forgotten who > wanted it, and what for. > > Can whoever has an interest in this work please pipe up and let's get our > direction sorted out quickly. > klibc (early userspace) in its current form is fundamentally a cleanup. What it does is unload code from the kernel which has no fundamental reason to be kernel code (written during kernel rules, with all the problems it entails.) The initial code to have removed is the root-mounting code, with all the various ugly mutations of that (ramdisk loading, NFS root, initrd...) The original idea was due Al Viro; obviously, the implementation is mostly mine. It is of course my hope that this will be used for more than just plain initialization code, but that in itself is a significant step, and one has to start somewhere. Part of the reason it has taken as long is it has is just to try to make it as drop-in as at all possible. -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin @ 2006-06-05 18:09 ` Roman Zippel 2006-06-06 15:20 ` Pavel Machek 1 sibling, 0 replies; 166+ messages in thread From: Roman Zippel @ 2006-06-05 18:09 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Morton, linux-kernel Hi, On Sun, 4 Jun 2006, H. Peter Anvin wrote: > Andrew Morton wrote: > > > > git-klibc.patch > > > > Similar. This all appears to work sufficiently well for a 2.6.18 merge. > > But it's been so long since klibc was a hot topic that I've forgotten who > > wanted it, and what for. > > > > Can whoever has an interest in this work please pipe up and let's get our > > direction sorted out quickly. > > > > klibc (early userspace) in its current form is fundamentally a cleanup. What > it does is unload code from the kernel which has no fundamental reason to be > kernel code (written during kernel rules, with all the problems it entails.) > The initial code to have removed is the root-mounting code, with all the > various ugly mutations of that (ramdisk loading, NFS root, initrd...) For a cleanup it adds quite a lot of code, where I'm not really sure it should all be distributed with the kernel. I'm really surprised there hasn't been any larger discussion about or maybe I missed something? It adds various utitilies (dash, gzip, ...) to the kernel source, which are not kernel specific at all. Why do we need this duplication? IMO code duplication like this is not a desirable thing, as it increases the maintenance overhead. Sometimes this is necessary, but IMO it should have a good reason and should be temporary. Where does this duplication end (e.g. udev, module tools), are we going to pull everything into the kernel source, which might be needed for an initramfs? I think the most questionable duplication is the libgcc copy. Why do you even provide your own copy of this? This is a private library of the compiler and the last thing I would duplicate. How is this going to integrate with the rest of the system, especially on the distribution side. They have their own mechanisms to produce an initrd. What happens if NFS root requires a network module, which requires a firmware file? How is this going to work? How can e.g. embedded users control what's going into the initramfs, they certainly don't want any duplication here (e.g. initrd on top of initramfs?) I'm not really comfortable with merging the whole thing at once, it's a huge patch (or thousands of little commits) with no real documentation, which makes it very hard to review. IMO it would be preferable to distribute the non-kernel specific parts separatly and make this more modular, so that the user can exchange any part he like. The patch also already removes a lot of the old setup stuff, which makes it an all-or-nothing approach. It doesn't leave us with the possibility to make the new setup optional and gradually convert the various boot initialization to it and experiment a little with it at first (not just kernel developer but also everyone else). Considering the impact all these changes will have, I would be really interested in more opinions on this or just does everyone hope it will somehow work out by itself? bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin 2006-06-05 18:09 ` Roman Zippel @ 2006-06-06 15:20 ` Pavel Machek 2006-06-06 20:56 ` Rafael J. Wysocki 1 sibling, 1 reply; 166+ messages in thread From: Pavel Machek @ 2006-06-06 15:20 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Morton, linux-kernel Hi! > >git-klibc.patch > > > > Similar. This all appears to work sufficiently well > > for a 2.6.18 merge. But it's been so long since klibc > > was a hot topic that I've forgotten who > > wanted it, and what for. > > > > Can whoever has an interest in this work please pipe > > up and let's get our > > direction sorted out quickly. > > > > klibc (early userspace) in its current form is > fundamentally a cleanup. What it does is unload code > from the kernel which has no fundamental reason to be > kernel code (written during kernel rules, with all the > problems it entails.) The initial code to have removed > is the root-mounting code, with all the various ugly > mutations of that (ramdisk loading, NFS root, initrd...) > > The original idea was due Al Viro; obviously, the > implementation is mostly mine. > > It is of course my hope that this will be used for more > than just plain initialization code, but that in itself > is a significant step, and one has to start somewhere. It allows me to unify swsusp & uswsusp into one in future, for example, reducing code duplication. klibc looks like good thing. -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-06 15:20 ` Pavel Machek @ 2006-06-06 20:56 ` Rafael J. Wysocki 2006-06-07 3:37 ` H. Peter Anvin ` (2 more replies) 0 siblings, 3 replies; 166+ messages in thread From: Rafael J. Wysocki @ 2006-06-06 20:56 UTC (permalink / raw) To: Pavel Machek; +Cc: H. Peter Anvin, Andrew Morton, linux-kernel Hi, On Tuesday 06 June 2006 17:20, Pavel Machek wrote: > > >git-klibc.patch > > > > > > Similar. This all appears to work sufficiently well > > > for a 2.6.18 merge. But it's been so long since klibc > > > was a hot topic that I've forgotten who > > > wanted it, and what for. > > > > > > Can whoever has an interest in this work please pipe > > > up and let's get our > > > direction sorted out quickly. > > > > > > > klibc (early userspace) in its current form is > > fundamentally a cleanup. What it does is unload code > > from the kernel which has no fundamental reason to be > > kernel code (written during kernel rules, with all the > > problems it entails.) The initial code to have removed > > is the root-mounting code, with all the various ugly > > mutations of that (ramdisk loading, NFS root, initrd...) > > > > The original idea was due Al Viro; obviously, the > > implementation is mostly mine. > > > > It is of course my hope that this will be used for more > > than just plain initialization code, but that in itself > > is a significant step, and one has to start somewhere. > > It allows me to unify swsusp & uswsusp into one in future, for > example, reducing code duplication. [cough] How distant is the future you're referring to? Rafael ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-06 20:56 ` Rafael J. Wysocki @ 2006-06-07 3:37 ` H. Peter Anvin 2006-06-07 4:00 ` Nigel Cunningham 2006-06-07 8:44 ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek 2 siblings, 0 replies; 166+ messages in thread From: H. Peter Anvin @ 2006-06-07 3:37 UTC (permalink / raw) To: linux-kernel Followup to: <200606062256.55472.rjw@sisk.pl> By author: "Rafael J. Wysocki" <rjw@sisk.pl> In newsgroup: linux.dev.kernel > > > > It allows me to unify swsusp & uswsusp into one in future, for > > example, reducing code duplication. > > [cough] How distant is the future you're referring to? > Shouldn't be far, since most of the code is already written. One major advantage of klibc is that it allows most of the initialization code to both be re-used as standalone programs as well as be tested in normal userspace. The former lets distributions stitch it together any way they want, and the latter should reduce bugs (especially since it's combined with what is a decent-sized subset of the POSIX programming model, as opposed to the much more difficult kernel programming model.) -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-06 20:56 ` Rafael J. Wysocki 2006-06-07 3:37 ` H. Peter Anvin @ 2006-06-07 4:00 ` Nigel Cunningham 2006-06-07 4:10 ` H. Peter Anvin 2006-06-07 8:44 ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek 2 siblings, 1 reply; 166+ messages in thread From: Nigel Cunningham @ 2006-06-07 4:00 UTC (permalink / raw) To: linux-kernel Cc: Rafael J. Wysocki, Pavel Machek, H. Peter Anvin, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 823 bytes --] Hi. Sorry for coming in late. I've only just resubscribed after my move. Not sure who originally said this... > > > problems it entails.) The initial code to have removed > > > is the root-mounting code, with all the various ugly > > > mutations of that (ramdisk loading, NFS root, initrd...) Could I get more explanation of what this means and its implications? I'm thinking in particular about the implications for suspending to disk. Will it imply that everyone will _have_ to have an initramfs with some userspace program that sets up device nodes and so on, even if at the moment all you have is root=/dev/hda1 resume2=swap:/dev/hda2? Along similar lines, I had been considering eventually including support for putting an image in place of the initrd (for embedded). Regards, Nigel [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-07 4:00 ` Nigel Cunningham @ 2006-06-07 4:10 ` H. Peter Anvin 2006-06-07 4:25 ` Nigel Cunningham 0 siblings, 1 reply; 166+ messages in thread From: H. Peter Anvin @ 2006-06-07 4:10 UTC (permalink / raw) To: linux-kernel Followup to: <200606071400.49980.ncunningham@linuxmail.org> By author: Nigel Cunningham <ncunningham@linuxmail.org> In newsgroup: linux.dev.kernel > > Hi. > > Sorry for coming in late. I've only just resubscribed after my move. > > Not sure who originally said this... > > > > > problems it entails.) The initial code to have removed > > > > is the root-mounting code, with all the various ugly > > > > mutations of that (ramdisk loading, NFS root, initrd...) > > Could I get more explanation of what this means and its implications? I'm > thinking in particular about the implications for suspending to disk. Will it > imply that everyone will _have_ to have an initramfs with some userspace > program that sets up device nodes and so on, even if at the moment all you > have is root=/dev/hda1 resume2=swap:/dev/hda2? > Yes. That initramfs is embedded in the kernel image. > Along similar lines, I had been considering eventually including support for > putting an image in place of the initrd (for embedded). You can still override the default buildin initramfs. Then you get the benefit of not carrying a bunch of code with you that can never be executed. -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-07 4:10 ` H. Peter Anvin @ 2006-06-07 4:25 ` Nigel Cunningham 2006-06-07 4:26 ` klibc H. Peter Anvin 2006-06-07 6:51 ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson 0 siblings, 2 replies; 166+ messages in thread From: Nigel Cunningham @ 2006-06-07 4:25 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1559 bytes --] Hi. On Wednesday 07 June 2006 14:10, H. Peter Anvin wrote: > Followup to: <200606071400.49980.ncunningham@linuxmail.org> > By author: Nigel Cunningham <ncunningham@linuxmail.org> > In newsgroup: linux.dev.kernel > > > Hi. > > > > Sorry for coming in late. I've only just resubscribed after my move. > > > > Not sure who originally said this... > > > > > > > problems it entails.) The initial code to have removed > > > > > is the root-mounting code, with all the various ugly > > > > > mutations of that (ramdisk loading, NFS root, initrd...) > > > > Could I get more explanation of what this means and its implications? I'm > > thinking in particular about the implications for suspending to disk. > > Will it imply that everyone will _have_ to have an initramfs with some > > userspace program that sets up device nodes and so on, even if at the > > moment all you have is root=/dev/hda1 resume2=swap:/dev/hda2? > > Yes. That initramfs is embedded in the kernel image. > > > Along similar lines, I had been considering eventually including support > > for putting an image in place of the initrd (for embedded). > > You can still override the default buildin initramfs. Then you get > the benefit of not carrying a bunch of code with you that can never be > executed. Ok. Ta. I guess I should put some time into learning this prior to 2.6.18 then, so I can help others through the transition. Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-07 4:25 ` Nigel Cunningham @ 2006-06-07 4:26 ` H. Peter Anvin 2006-06-07 6:22 ` klibc Nigel Cunningham 2006-06-07 6:51 ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson 1 sibling, 1 reply; 166+ messages in thread From: H. Peter Anvin @ 2006-06-07 4:26 UTC (permalink / raw) To: Nigel Cunningham; +Cc: linux-kernel Nigel Cunningham wrote: > > Ok. Ta. I guess I should put some time into learning this prior to 2.6.18 > then, so I can help others through the transition. > I've been meaning to write up some proper documentation, but obviously my #1 priority has been to fix problems as they crop up. If you would be willing to help in that area, I'd be more than willing to spend some time giving you a mind dump. -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-07 4:26 ` klibc H. Peter Anvin @ 2006-06-07 6:22 ` Nigel Cunningham 2006-06-07 6:38 ` klibc H. Peter Anvin 0 siblings, 1 reply; 166+ messages in thread From: Nigel Cunningham @ 2006-06-07 6:22 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 718 bytes --] Hi. On Wednesday 07 June 2006 14:26, H. Peter Anvin wrote: > Nigel Cunningham wrote: > > Ok. Ta. I guess I should put some time into learning this prior to 2.6.18 > > then, so I can help others through the transition. > > I've been meaning to write up some proper documentation, but obviously > my #1 priority has been to fix problems as they crop up. If you would > be willing to help in that area, I'd be more than willing to spend some > time giving you a mind dump. I'm sorry, but I probably wouldn't have time to help. I'm not even making progress on Suspend2 at the moment. Regards, Nigel -- Nigel, Michelle and Alisdair Cunningham 5 Mitchell Street Cobden 3266 Victoria, Australia [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-07 6:22 ` klibc Nigel Cunningham @ 2006-06-07 6:38 ` H. Peter Anvin 0 siblings, 0 replies; 166+ messages in thread From: H. Peter Anvin @ 2006-06-07 6:38 UTC (permalink / raw) To: Nigel Cunningham; +Cc: linux-kernel Nigel Cunningham wrote: > Hi. > > On Wednesday 07 June 2006 14:26, H. Peter Anvin wrote: >> Nigel Cunningham wrote: >>> Ok. Ta. I guess I should put some time into learning this prior to 2.6.18 >>> then, so I can help others through the transition. >> I've been meaning to write up some proper documentation, but obviously >> my #1 priority has been to fix problems as they crop up. If you would >> be willing to help in that area, I'd be more than willing to spend some >> time giving you a mind dump. > > I'm sorry, but I probably wouldn't have time to help. I'm not even making > progress on Suspend2 at the moment. > Anyway, swsusp code is already in klibc. -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-07 4:25 ` Nigel Cunningham 2006-06-07 4:26 ` klibc H. Peter Anvin @ 2006-06-07 6:51 ` Joshua Hudson 2006-06-07 21:12 ` H. Peter Anvin 1 sibling, 1 reply; 166+ messages in thread From: Joshua Hudson @ 2006-06-07 6:51 UTC (permalink / raw) To: linux-kernel Did anybody ever fix the can't pivot_root() the rootfs filesystem; hense can't use on a loopback system backed by NTFS? ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-07 6:51 ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson @ 2006-06-07 21:12 ` H. Peter Anvin 2006-06-09 8:03 ` klibc Nix 0 siblings, 1 reply; 166+ messages in thread From: H. Peter Anvin @ 2006-06-07 21:12 UTC (permalink / raw) To: linux-kernel Followup to: <bda6d13a0606062351i5c94414fpa03ee2ce3dd180ae@mail.gmail.com> By author: "Joshua Hudson" <joshudson@gmail.com> In newsgroup: linux.dev.kernel > > Did anybody ever fix the can't pivot_root() the rootfs filesystem; > hense can't use on a loopback system backed by NTFS? > You shouldn't pivot_root the rootfs filesystem. Use the run-init utility or something similar instead (which does a mount with MS_MOVE.) -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-07 21:12 ` H. Peter Anvin @ 2006-06-09 8:03 ` Nix 2006-06-09 18:45 ` klibc H. Peter Anvin [not found] ` <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com> 0 siblings, 2 replies; 166+ messages in thread From: Nix @ 2006-06-09 8:03 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-kernel On 7 Jun 2006, H. Peter Anvin noted: > Followup to: <bda6d13a0606062351i5c94414fpa03ee2ce3dd180ae@mail.gmail.com> > By author: "Joshua Hudson" <joshudson@gmail.com> > In newsgroup: linux.dev.kernel >> >> Did anybody ever fix the can't pivot_root() the rootfs filesystem; >> hense can't use on a loopback system backed by NTFS? > > You shouldn't pivot_root the rootfs filesystem. What happens if you do? I mean, it doesn't make even conceptual sense, really. The rootfs is always there: that's its entire purpose. > Use the run-init > utility or something similar instead (which does a mount with > MS_MOVE.) busybox has a switch_root tool which (conceptually) rm -rf's everything on the root filesystem and then does such a mount. (After all whatever is on that filesystem is inaccessible after the overmount, so keeping it around is just a waste of memory.) -- `Voting for any American political party is fundamentally incomprehensible.' --- Vadik ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-09 8:03 ` klibc Nix @ 2006-06-09 18:45 ` H. Peter Anvin [not found] ` <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com> 1 sibling, 0 replies; 166+ messages in thread From: H. Peter Anvin @ 2006-06-09 18:45 UTC (permalink / raw) To: linux-kernel Followup to: <8764ja7o2d.fsf@hades.wkstn.nix> By author: Nix <nix@esperi.org.uk> In newsgroup: linux.dev.kernel > > > > You shouldn't pivot_root the rootfs filesystem. > > What happens if you do? I mean, it doesn't make even conceptual sense, > really. The rootfs is always there: that's its entire purpose. > "What happens if you do"... well, it may work, it might not, it may break some functionality for you or break in a future kernel version. It's undefined behaviour. > > Use the run-init > > utility or something similar instead (which does a mount with > > MS_MOVE.) > > busybox has a switch_root tool which (conceptually) rm -rf's everything > on the root filesystem and then does such a mount. (After all whatever > is on that filesystem is inaccessible after the overmount, so keeping > it around is just a waste of memory.) What busybox calls switch_root is the same as the run-init tool from the klibc distribution. -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
[parent not found: <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com>]
[parent not found: <871wty6rl9.fsf@hades.wkstn.nix>]
* Re: klibc [not found] ` <871wty6rl9.fsf@hades.wkstn.nix> @ 2006-06-09 22:28 ` Joshua Hudson 2006-06-09 22:48 ` klibc H. Peter Anvin 0 siblings, 1 reply; 166+ messages in thread From: Joshua Hudson @ 2006-06-09 22:28 UTC (permalink / raw) To: linux-kernel On 6/9/06, Nix <nix@esperi.org.uk> wrote: > On Fri, 9 Jun 2006, Joshua Hudson whispered secretively: > > On 6/9/06, Nix <nix@esperi.org.uk> wrote: > >> What happens if you do? I mean, it doesn't make even conceptual sense, > >> really. The rootfs is always there: that's its entire purpose. > > > > I just need it accessable somewhere else on the tree so that the system > > init runs from that rather than the root filesystem, and so can unmount > > root filesystem. Obvously, after a mount /, it is not. > > You cannot unmount rootfs: it's the first filesystem mounted, the > ultimate parent of all attached mounts, the fallback used if you umount > everything else, and is explicitly checked for at mount and pivot_root > time. > > You also don't often want to leave anything in it after you've booted: > unlike tmpfs, it's not swap-backed, so stuff in there stays in > nonswappable memory, pinned in the page cache. This is generally > undesirable. Yes, it stays around empty: but if you boot without an > initramfs, it stays around empty *in any case*: the kernel builds an > empty one and uses it automatically, then falls back to code which > mounts a root filesystem for you (code which HPA's klibc patch removes > in favour of doing everything it did from an initramfs). > > > The end of my initramfs script (busybox / uclibc-based) reads > > # Unmount everything and switch root filesystems for good: > # exec the real init and begin the real boot process. > /bin/umount -l /proc > /bin/umount -l /sys > /bin/umount -l /dev > > exec switch_root /new-root $init $INIT_ARGS > > where switch_root is the aforementioned busybox `rm -rf everything on > this filesystem and mount --move us into the new root'. (At the time > it runs, it's PID 1 and there are no other non-kernel threads running: > it execs init.) > > > What are you trying to accomplish? Once again. Loopback mount requires a clean unmount of root and host filesystem. After remounting root read-only, host is still read-write and cannot be remounted read-only. It is necessary to provide access to the rootfs tree somewhere else or use pivot_root, like the initrd solution below: initrd: /linuxrc #!/bin/sh mount /dev/hda1 -o rw -t ntfs /host mount /host/linux/root.img -o loop,ro -t ext3 /root pivot_root /root /root/initrd exec /initrd/bin/init root:/etc/rc.d/rc.halt: #!/bin/sh pivot_root /initrd /initrd/root cd / exec /stop $RUNLEVEL initrd:/stop #!/bin/sh kill -SIGUSR1 1 umount /root umount /host case $1 in 0) poweroff -f ;; *) reboot -f ;; esac This requires static binaries of init, sh, mount, umount, an extant /etc, and a few nodes in /dev. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-09 22:28 ` klibc Joshua Hudson @ 2006-06-09 22:48 ` H. Peter Anvin 2006-06-09 23:13 ` klibc Joshua Hudson 0 siblings, 1 reply; 166+ messages in thread From: H. Peter Anvin @ 2006-06-09 22:48 UTC (permalink / raw) To: linux-kernel Followup to: <bda6d13a0606091528h4e85265du8651818c73827b7d@mail.gmail.com> By author: "Joshua Hudson" <joshudson@gmail.com> In newsgroup: linux.dev.kernel > > Once again. Loopback mount requires a clean unmount of root and > host filesystem. After remounting root read-only, host is still read-write > and cannot be remounted read-only. > > It is necessary to provide access to the rootfs tree somewhere else > or use pivot_root, like the initrd solution below: > > initrd: /linuxrc > #!/bin/sh > mount /dev/hda1 -o rw -t ntfs /host > mount /host/linux/root.img -o loop,ro -t ext3 /root > pivot_root /root /root/initrd > exec /initrd/bin/init > > root:/etc/rc.d/rc.halt: > #!/bin/sh > pivot_root /initrd /initrd/root > cd / > exec /stop $RUNLEVEL > > initrd:/stop > #!/bin/sh > kill -SIGUSR1 1 > umount /root > umount /host > case $1 in > 0) poweroff -f ;; > *) reboot -f ;; > esac > > This requires static binaries of init, sh, mount, umount, an extant /etc, and a > few nodes in /dev. Another solution is to leave a process with its cwd parked in the rootfs. Look at run_linuxrc() in usr/kinit/initrd.c of any klibc tree to see how this can be used. (That is there to support old-style /linuxrc, but should be applicable here, too.) -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-09 22:48 ` klibc H. Peter Anvin @ 2006-06-09 23:13 ` Joshua Hudson 2006-06-09 23:44 ` klibc H. Peter Anvin 0 siblings, 1 reply; 166+ messages in thread From: Joshua Hudson @ 2006-06-09 23:13 UTC (permalink / raw) To: linux-kernel On 6/9/06, H. Peter Anvin <hpa@zytor.com> wrote: > Followup to: <bda6d13a0606091528h4e85265du8651818c73827b7d@mail.gmail.com> > By author: "Joshua Hudson" <joshudson@gmail.com> > In newsgroup: linux.dev.kernel > > > > Once again. Loopback mount requires a clean unmount of root and > > host filesystem. After remounting root read-only, host is still read-write > > and cannot be remounted read-only. > > > > It is necessary to provide access to the rootfs tree somewhere else > > or use pivot_root, like the initrd solution below: > > > > initrd: /linuxrc > > #!/bin/sh > > mount /dev/hda1 -o rw -t ntfs /host > > mount /host/linux/root.img -o loop,ro -t ext3 /root > > pivot_root /root /root/initrd > > exec /initrd/bin/init > > > > root:/etc/rc.d/rc.halt: > > #!/bin/sh > > pivot_root /initrd /initrd/root > > cd / > > exec /stop $RUNLEVEL > > > > initrd:/stop > > #!/bin/sh > > kill -SIGUSR1 1 > > umount /root > > umount /host > > case $1 in > > 0) poweroff -f ;; > > *) reboot -f ;; > > esac > > > > This requires static binaries of init, sh, mount, umount, an extant /etc, and a > > few nodes in /dev. > > Another solution is to leave a process with its cwd parked in the > rootfs. Look at run_linuxrc() in usr/kinit/initrd.c of any klibc tree > to see how this can be used. (That is there to support old-style > /linuxrc, but should be applicable here, too.) > > -hpa Should work if the following is true: if pwd is /, mount / followed by ls . retunrs the contents of initramfs. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-09 23:13 ` klibc Joshua Hudson @ 2006-06-09 23:44 ` H. Peter Anvin 2006-06-16 6:02 ` klibc Joshua Hudson 0 siblings, 1 reply; 166+ messages in thread From: H. Peter Anvin @ 2006-06-09 23:44 UTC (permalink / raw) To: linux-kernel Followup to: <bda6d13a0606091613h3334facbrcb86dbb2de01b412@mail.gmail.com> By author: "Joshua Hudson" <joshudson@gmail.com> In newsgroup: linux.dev.kernel > Should work if the following is true: > if pwd is /, mount / followed by ls . retunrs the contents of initramfs. It does, and it does work as described. Again, see the referenced code. You can also fchdir() to the rootfs if you have a file descriptor to any directory therein. -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-09 23:44 ` klibc H. Peter Anvin @ 2006-06-16 6:02 ` Joshua Hudson 2006-06-16 19:19 ` klibc H. Peter Anvin 0 siblings, 1 reply; 166+ messages in thread From: Joshua Hudson @ 2006-06-16 6:02 UTC (permalink / raw) To: linux-kernel I've come to the conclusion that there is no good way to return to the initramfs at all after init moves to the real root device. What I have found is that the only way is for another process to keep a cwd or open file handle on the initramfs which plays very badly with killall. Anybody got a way to make a user process other than init involunerable to kill -9? <g> It would be dirt-simple if I could mount --rbind / /root/initrd where / is the initramfs and /root is a mounted filesystem, but that creates cycles and so breaks other things. Oh, and mount / followed by ls / returns the contents of the initramfs. Weird. umount -l / has the exteremely bizarre effect of leaving the process stranded in / unless it currently has pwd or open directory handle elsewhere. Anybody want a patch that dumps the executor of umount / in the initramfs, then does a lazy unmount? That, however, carries risks of its own so might not be the best move. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc 2006-06-16 6:02 ` klibc Joshua Hudson @ 2006-06-16 19:19 ` H. Peter Anvin 0 siblings, 0 replies; 166+ messages in thread From: H. Peter Anvin @ 2006-06-16 19:19 UTC (permalink / raw) To: linux-kernel Followup to: <bda6d13a0606152302v6598ce84sf4c7066705c3284f@mail.gmail.com> By author: "Joshua Hudson" <joshudson@gmail.com> In newsgroup: linux.dev.kernel > > I've come to the conclusion that there is no good way to return to the > initramfs at all > after init moves to the real root device. What I have found is that the only way > is for another process to keep a cwd or open file handle on the initramfs which > plays very badly with killall. > > Anybody got a way to make a user process other than init involunerable > to kill -9? <g> > Actually, does init close all its file descriptors? Otherwise you could simply pass a file descriptor to init when init is executed. If init explicitly closes file descriptors then that's not possible, but perhaps that could be fixed in init. On the other hand, if you killall -9 arbitrary processes as root, then perhaps getting a dirty filesystem on reboot is ugly but not catastrophic. -hpa ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-06 20:56 ` Rafael J. Wysocki 2006-06-07 3:37 ` H. Peter Anvin 2006-06-07 4:00 ` Nigel Cunningham @ 2006-06-07 8:44 ` Pavel Machek 2006-06-07 9:44 ` Rafael J. Wysocki 2 siblings, 1 reply; 166+ messages in thread From: Pavel Machek @ 2006-06-07 8:44 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: H. Peter Anvin, Andrew Morton, linux-kernel Hi! > > > The original idea was due Al Viro; obviously, the > > > implementation is mostly mine. > > > > > > It is of course my hope that this will be used for more > > > than just plain initialization code, but that in itself > > > is a significant step, and one has to start somewhere. > > > > It allows me to unify swsusp & uswsusp into one in future, for > > example, reducing code duplication. > > [cough] How distant is the future you're referring to? Year or two, I believe. Actually it is not as much as "unify swsusp & uswsusp" as "drop kernel/power/swap.c" and possibly put parts of uswsusp into initial userland so that user do not notice it was dropped from kernel. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: klibc (was: 2.6.18 -mm merge plans) 2006-06-07 8:44 ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek @ 2006-06-07 9:44 ` Rafael J. Wysocki 0 siblings, 0 replies; 166+ messages in thread From: Rafael J. Wysocki @ 2006-06-07 9:44 UTC (permalink / raw) To: Pavel Machek; +Cc: H. Peter Anvin, Andrew Morton, linux-kernel On Wednesday 07 June 2006 10:44, Pavel Machek wrote: > > > > The original idea was due Al Viro; obviously, the > > > > implementation is mostly mine. > > > > > > > > It is of course my hope that this will be used for more > > > > than just plain initialization code, but that in itself > > > > is a significant step, and one has to start somewhere. > > > > > > It allows me to unify swsusp & uswsusp into one in future, for > > > example, reducing code duplication. > > > > [cough] How distant is the future you're referring to? > > Year or two, I believe. Actually it is not as much as "unify swsusp & > uswsusp" as "drop kernel/power/swap.c" and possibly put parts of > uswsusp into initial userland so that user do not notice it was > dropped from kernel. OK Rafael ^ permalink raw reply [flat|nested] 166+ messages in thread
* clocksource 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (4 preceding siblings ...) 2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin @ 2006-06-04 23:50 ` Roman Zippel 2006-06-05 20:20 ` clocksource john stultz 2006-06-05 0:02 ` utsname/hostname Randy.Dunlap ` (14 subsequent siblings) 20 siblings, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-04 23:50 UTC (permalink / raw) To: Andrew Morton, johnstul; +Cc: linux-kernel Hi, On Sun, 4 Jun 2006, Andrew Morton wrote: > time-use-clocksource-infrastructure-for-update_wall_time.patch I still disagree with the update_wall_time() changes, they should be kept the new separate from this. The error algorithm is a somewhat old version and can cause oscillation and thus a confused clock. > time-let-user-request-precision-from-current_tick_length.patch This is broken, as it simply throws away resolution depending on the clock. These are two key problems, the rest can be fixed incrementally. bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-04 23:50 ` clocksource Roman Zippel @ 2006-06-05 20:20 ` john stultz 2006-06-05 20:53 ` clocksource john stultz 2006-06-05 21:07 ` clocksource Roman Zippel 0 siblings, 2 replies; 166+ messages in thread From: john stultz @ 2006-06-05 20:20 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote: > Hi, > > On Sun, 4 Jun 2006, Andrew Morton wrote: > > > time-use-clocksource-infrastructure-for-update_wall_time.patch > > I still disagree with the update_wall_time() changes, they should be kept > the new separate from this. Is this directly related to the next item (if so, how?), or just preference? I'd really like to avoid having multiple code paths for the timekeeping core, so I'd like to see this unified. I'm willing to optimize out bits w/ constants and whatnot, but I worry it will be a nightmare to maintain if we have multiple generic update_wall_time implementations. > The error algorithm is a somewhat old version > and can cause oscillation and thus a confused clock. Would you mind elaborating on this? Which aspect of the error algorithm is off? How does the clock become confused? Could you point to the line numbers, etc? I assume your last patchset contains the current version? > > time-let-user-request-precision-from-current_tick_length.patch > > This is broken, as it simply throws away resolution depending on the > clock. So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is true, and currently that's only the jiffies case. Just to be clear, are you then suggesting that the accumulation in update_wall_time should be done in a fixed shifted nanosecond unit regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in your mind for this? That seems not too difficult to do, and can be done w/ an incremental patch. I'll try to crank that out today. > These are two key problems, the rest can be fixed incrementally. If these are really blockers, I want to get them resolved in the next day or so (I'd really like to avoid having Andrew carry them for yet another cycle). So I'd appreciate your help in correcting these issues. thanks -john ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-05 20:20 ` clocksource john stultz @ 2006-06-05 20:53 ` john stultz 2006-06-05 21:07 ` clocksource Roman Zippel 1 sibling, 0 replies; 166+ messages in thread From: john stultz @ 2006-06-05 20:53 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel On Mon, 2006-06-05 at 13:20 -0700, john stultz wrote: > On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote: > > > time-let-user-request-precision-from-current_tick_length.patch > > > > This is broken, as it simply throws away resolution depending on the > > clock. > > So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is > true, and currently that's only the jiffies case. > > Just to be clear, are you then suggesting that the accumulation in > update_wall_time should be done in a fixed shifted nanosecond unit > regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in > your mind for this? > > That seems not too difficult to do, and can be done w/ an incremental > patch. I'll try to crank that out today. Just to quickly get some feedback on this. Currently untested (I'm working on that part now - Andrew, I'll send it to once it clears), but it builds. Roman: Your thoughts? Does it cover your concern? thanks -john diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 4bc9428..884980a 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -146,14 +146,17 @@ static inline s64 cyc2ns(struct clocksou return ret; } + +#define CLOCKSOURCE_INTERVAL_SHIFT (SHIFT_SCALE - 10) + /** * clocksource_calculate_interval - Calculates a clocksource interval struct * * @c: Pointer to clocksource. * @length_nsec: Desired interval length in nanoseconds. * - * Calculates a fixed cycle/nsec interval for a given clocksource/adjustment - * pair and interval request. + * Calculates a fixed cycle/nsec interval (in CLOCKSOURCE_INTERVAL_SHIFT units) + * for a given clocksource/adjustment pair and interval request. * * Unless you're the timekeeping code, you should not be using this! */ @@ -164,7 +167,7 @@ static inline void clocksource_calculate /* XXX - All of this could use a whole lot of optimization */ tmp = length_nsec; - tmp <<= c->shift; + tmp <<= CLOCKSOURCE_INTERVAL_SHIFT; tmp += c->mult/2; do_div(tmp, c->mult); @@ -215,8 +218,8 @@ static inline int error_aproximation(u64 * @cycles_delta: Current unacounted cycle delta * @error: Pointer to current error value * - * Returns clock shifted nanosecond adjustment to be applied against - * the accumulated time value (ie: xtime). + * Returns CLOCKSOURCE_INTERVAL_SHIFT shifted nanosecond adjustment to be + * applied against the accumulated time value (ie: xtime). * * If the error value is large enough, this function calulates the * (power of two) adjustment value, and adjusts the clock's mult and diff --git a/kernel/timer.c b/kernel/timer.c index 0569d40..588bfcd 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1029,8 +1029,8 @@ static void update_wall_time(void) s64 snsecs_per_sec; cycle_t now, offset; - snsecs_per_sec = (s64)NSEC_PER_SEC << clock->shift; - remainder_snsecs += (s64)xtime.tv_nsec << clock->shift; + snsecs_per_sec = (s64)NSEC_PER_SEC << CLOCKSOURCE_INTERVAL_SHIFT; + remainder_snsecs += (s64)xtime.tv_nsec << CLOCKSOURCE_INTERVAL_SHIFT; now = clocksource_read(clock); offset = (now - last_clock_cycle)&clock->mask; @@ -1039,8 +1039,11 @@ static void update_wall_time(void) * case of lost or late ticks, it will accumulate correctly. */ while (offset > clock->interval_cycles) { - /* get the ntp interval in clock shifted nanoseconds */ - s64 ntp_snsecs = current_tick_length(clock->shift); + /* get the ntp interval in CLOCKSOURCE_INTERVAL_SHIFT + * shifted nanoseconds: + */ + s64 ntp_snsecs = + current_tick_length(CLOCKSOURCE_INTERVAL_SHIFT); /* accumulate one interval */ remainder_snsecs += clock->interval_snsecs; @@ -1049,7 +1052,7 @@ static void update_wall_time(void) /* interpolator bits */ time_interpolator_update(clock->interval_snsecs - >> clock->shift); + >> CLOCKSOURCE_INTERVAL_SHIFT); /* increment the NTP state machine */ update_ntp_one_tick(); @@ -1066,8 +1069,8 @@ static void update_wall_time(void) } } /* store full nanoseconds into xtime */ - xtime.tv_nsec = remainder_snsecs >> clock->shift; - remainder_snsecs -= (s64)xtime.tv_nsec << clock->shift; + xtime.tv_nsec = remainder_snsecs >> CLOCKSOURCE_INTERVAL_SHIFT; + remainder_snsecs -= (s64)xtime.tv_nsec << CLOCKSOURCE_INTERVAL_SHIFT; /* check to see if there is a new clocksource to use */ if (change_clocksource()) { ^ permalink raw reply related [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-05 20:20 ` clocksource john stultz 2006-06-05 20:53 ` clocksource john stultz @ 2006-06-05 21:07 ` Roman Zippel 2006-06-06 19:42 ` clocksource john stultz 1 sibling, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-05 21:07 UTC (permalink / raw) To: john stultz; +Cc: Andrew Morton, linux-kernel Hi, On Mon, 5 Jun 2006, john stultz wrote: > On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote: > > Hi, > > > > On Sun, 4 Jun 2006, Andrew Morton wrote: > > > > > time-use-clocksource-infrastructure-for-update_wall_time.patch > > > > I still disagree with the update_wall_time() changes, they should be kept > > the new separate from this. > > Is this directly related to the next item (if so, how?), or just > preference? I'd really like to avoid having multiple code paths for the > timekeeping core, so I'd like to see this unified. I'm willing to > optimize out bits w/ constants and whatnot, but I worry it will be a > nightmare to maintain if we have multiple generic update_wall_time > implementations. One "unified" version will only be worse. Keeping the new path separate from the old path will only make things clearer and more flexible. Right now you have a mixture of old code, interpolator code and new timekeeping code, which makes it a big mess. _Please_ don't do this, it makes your code very hard to read, personally I cannot guarantee that this thing does the right thing, with the separate function we at least have a backup plan. John, this is very sensitive code, I beg you not to fuck around with it. :-( > > The error algorithm is a somewhat old version > > and can cause oscillation and thus a confused clock. > > Would you mind elaborating on this? Which aspect of the error algorithm > is off? How does the clock become confused? Could you point to the line > numbers, etc? I assume your last patchset contains the current version? With large clock offsets the lookahead doesn't work correctly, basically because it's already to late and it can cause overadjustment. Because of this I do an extra lookahead in clocksource_bigadjust(). > > > time-let-user-request-precision-from-current_tick_length.patch > > > > This is broken, as it simply throws away resolution depending on the > > clock. > > So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is > true, and currently that's only the jiffies case. > > Just to be clear, are you then suggesting that the accumulation in > update_wall_time should be done in a fixed shifted nanosecond unit > regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in > your mind for this? > > That seems not too difficult to do, and can be done w/ an incremental > patch. I'll try to crank that out today. I'd prefer you'd just take the update function from my patch, it's nicely optimized and I'll try to address any concern you have about it. For this I also I posted a userspace test program, so that I know how it behaves, do you have something similiar for yours? bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-05 21:07 ` clocksource Roman Zippel @ 2006-06-06 19:42 ` john stultz 2006-06-07 0:41 ` clocksource Roman Zippel 0 siblings, 1 reply; 166+ messages in thread From: john stultz @ 2006-06-06 19:42 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel On Mon, 2006-06-05 at 23:07 +0200, Roman Zippel wrote: > Hi, > > On Mon, 5 Jun 2006, john stultz wrote: > > > On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote: > > > Hi, > > > > > > On Sun, 4 Jun 2006, Andrew Morton wrote: > > > > > > > time-use-clocksource-infrastructure-for-update_wall_time.patch > > > > > > I still disagree with the update_wall_time() changes, they should be kept > > > the new separate from this. > > > > Is this directly related to the next item (if so, how?), or just > > preference? I'd really like to avoid having multiple code paths for the > > timekeeping core, so I'd like to see this unified. I'm willing to > > optimize out bits w/ constants and whatnot, but I worry it will be a > > nightmare to maintain if we have multiple generic update_wall_time > > implementations. > > One "unified" version will only be worse. Keeping the new path separate > from the old path will only make things clearer and more flexible. > Right now you have a mixture of old code, interpolator code and new > timekeeping code, which makes it a big mess. Eh? Are we looking at the same code? Right now there is *no* mixture of old code. The update_wall_time() function is the same for *all* arches, and jiffies is the common clocksource. This allows us to use alternate clocksources to move to continuous timekeeping, while allowing the accumulation function to stay the same. I don't see what part of this you consider messy. Its true the interpolator code is still there, but not for long, as they will be easy to convert since the clocksource structure is very similar. And that's just *one* line of code in the function in question. > > > The error algorithm is a somewhat old version > > > and can cause oscillation and thus a confused clock. > > > > Would you mind elaborating on this? Which aspect of the error algorithm > > is off? How does the clock become confused? Could you point to the line > > numbers, etc? I assume your last patchset contains the current version? > > With large clock offsets the lookahead doesn't work correctly, basically > because it's already to late and it can cause overadjustment. Because of > this I do an extra lookahead in clocksource_bigadjust(). Do you have a hard example for this with numbers? I don't mean to be a pain, but I don't see this right off. With the current code in -mm I can run a test app that disables interrupts for 2 seconds at a time over and over and I'm still keeping synched w/ an NTP server within 30 microseconds. > > > > time-let-user-request-precision-from-current_tick_length.patch > > > > > > This is broken, as it simply throws away resolution depending on the > > > clock. > > > > So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is > > true, and currently that's only the jiffies case. > > > > Just to be clear, are you then suggesting that the accumulation in > > update_wall_time should be done in a fixed shifted nanosecond unit > > regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in > > your mind for this? > > > > That seems not too difficult to do, and can be done w/ an incremental > > patch. I'll try to crank that out today. > > I'd prefer you'd just take the update function from my patch, it's nicely > optimized and I'll try to address any concern you have about it. Even though as mentioned above I'm still in the dark on the need for it, I spent a few hours last trying to convert the make_ntp_adj() in -mm to use your bigadjust implementation. Unfortunately my attempts refuse to boot. I'm not sure if this is the same problem that was keeping your patchset from booting as well, or possibly just an implementation error on my part. I'll send a patch for your review shortly and maybe you can catch the issue? > For this I also I posted a userspace test program, so that I know how it > behaves, do you have something similiar for yours? At your prodding awhile back I wrote a userspace simulator, but you never commented on it. Ok, so back to the critical issues: 1) You don't like the unified update_wall_time 2) Issue w/ the current make_ntp_adj and lost ticks. 3) NTP error may be limited to clock->shift resolution w/ the jiffies clocksource (other clocksources have finer resolution then NTP's). #1 I disagree with. I really think the unified approach is the way to go, but I do understand the need to be conservative. This has gotten a fair amount of testing in -mm with no issues reported. But its not too hard to re-add the old update_wall_time(), so I could be convinced to push the unification off w/ a second opinion. Again, this would be via an incremental patch. So for #2 I'm working on trying to get your implementation functioning, but I still don't quite understand the reason. I think this can be worked out w/ an incremental patch. I'll send you some separate mail shortly on this. I posted an incremental patch for #3 yesterday, but it needs more work, and changes in #2 could affect it, so I'm focusing on #2 at them moment. Even so, NTP's resolution is 0.2 picoseconds, and the jiffies clocksource keeps an error resolution of 4 picoseconds, so I'm not sure if that this is really a blocker. Further if we do re-add the old update_wall_time for issue #1, this would keep the jiffies clocksource from being used commonly, so this wouldn't be an issue. Your thoughts? -john ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-06 19:42 ` clocksource john stultz @ 2006-06-07 0:41 ` Roman Zippel 2006-06-08 8:05 ` clocksource john stultz 0 siblings, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-07 0:41 UTC (permalink / raw) To: john stultz; +Cc: Andrew Morton, linux-kernel Hi, On Tue, 6 Jun 2006, john stultz wrote: > > One "unified" version will only be worse. Keeping the new path separate > > from the old path will only make things clearer and more flexible. > > Right now you have a mixture of old code, interpolator code and new > > timekeeping code, which makes it a big mess. > > Eh? Are we looking at the same code? Right now there is *no* mixture of > old code. The update_wall_time() function is the same for *all* arches, > and jiffies is the common clocksource. Why is this so important??? Please, John, you driving me crazy with this. You can start with the jiffies clocksource everywhere else, why here? > This allows us to use alternate > clocksources to move to continuous timekeeping, while allowing the > accumulation function to stay the same. I don't see what part of this > you consider messy. _Please_ look at the update functions in my patch and try to understand its design, read the design document. The core part is: while (cycle_offset >= cycle_update) { cycle_offset -= cycle_update; clocksource_update_tick(); } clocksource_adjust(cycle_offset); For tick based design I can easily change this to: clocksource_update_tick(); clocksource_adjust(0); I can also rather easily exchange parts of it with 32 bit based calculations (without losing resolution). I know how to optimize this, but with your version I can't do this. John, I don't need a unified design, I want a _flexible_ design and your unified mess doesn't give me this. You're taking away any control over this, which would make my life easier from an arch perspective. As archs convert to the new timekeeping code they can easily switch to the new generic function or they can rather easily adapt it to their needs, you make this impossible. Later we can still unify things, but without having the majority of archs converted, without having really the complete picture, we need foremost flexibility. Doing a preemptive unification only because we can is the worst thing we can do at this time. > > With large clock offsets the lookahead doesn't work correctly, basically > > because it's already to late and it can cause overadjustment. Because of > > this I do an extra lookahead in clocksource_bigadjust(). > > Do you have a hard example for this with numbers? I don't mean to be a > pain, but I don't see this right off. > > With the current code in -mm I can run a test app that disables > interrupts for 2 seconds at a time over and over and I'm still keeping > synched w/ an NTP server within 30 microseconds. You need a clock source which doesn't generate it's own interrupts, so interrupts and clock updates can run asynchron. The key part above is "large clock offsets". In my test program disable the extra lookahead and run it with large offsets. This code gets only limited testing in -mm, it needs to run for weeks or months, which I don't expect from the average -mm kernel. This makes userspace simulations so damn important and if you don't do this, you're playing a very risky game with a kernel which is supposed to be stable. > > I'd prefer you'd just take the update function from my patch, it's nicely > > optimized and I'll try to address any concern you have about it. > > Even though as mentioned above I'm still in the dark on the need for it, > I spent a few hours last trying to convert the make_ntp_adj() in -mm to > use your bigadjust implementation. Unfortunately my attempts refuse to > boot. I'm not sure if this is the same problem that was keeping your > patchset from booting as well, or possibly just an implementation error > on my part. John, why don't you just take my function with as little modifications as possible? Please don't take it as offense, but as long as you only take small pieces of my code without understanding how it fits into the big picture, I'm not surprised about such problems and this will take another year before we get to something usable. In the meantime Andrew is threatening to merge this anyway, which I'm definitively wouldn't trust it with anything which needs a stable time source. Please try to understand me, usually I can be quite patient and I know I have a hard time to make myself understandable at times, but I simply can't do this under such pressure... > > For this I also I posted a userspace test program, so that I know how it > > behaves, do you have something similiar for yours? > > At your prodding awhile back I wrote a userspace simulator, but you > never commented on it. It was very hard to get running at all and in the meantime you had updated patches and the whole thing didn't work anymore. Sorry, that I didn't comment on it more. Note that I have different test programs to test various aspects - the NTP part and the clock part. At that time you did still poke very deeply into the NTP guts, where now the clock parts are more important. Anyway, I did all this testing already, why are you simply throwing this away? > 1) You don't like the unified update_wall_time > 2) Issue w/ the current make_ntp_adj and lost ticks. > 3) NTP error may be limited to clock->shift resolution w/ the jiffies > clocksource (other clocksources have finer resolution then NTP's). > > > #1 I disagree with. I really think the unified approach is the way to > go, but I do understand the need to be conservative. This has gotten a > fair amount of testing in -mm with no issues reported. But its not too > hard to re-add the old update_wall_time(), so I could be convinced to > push the unification off w/ a second opinion. Again, this would be via > an incremental patch. It wouldn't be an incremental patch, if this gets merged as is, creating something actually flexible would come close to another rewrite with the update function as the key part. As we discussed previously there other parts I don't agree with, but these can be done in more incremental steps, but not what you've done with update_wall_time(). > So for #2 I'm working on trying to get your implementation functioning, > but I still don't quite understand the reason. I think this can be > worked out w/ an incremental patch. Because it's not about lost ticks. > I posted an incremental patch for #3 yesterday, but it needs more work, > and changes in #2 could affect it, so I'm focusing on #2 at them moment. > Even so, NTP's resolution is 0.2 picoseconds, and the jiffies > clocksource keeps an error resolution of 4 picoseconds, so I'm not sure > if that this is really a blocker. Further if we do re-add the old > update_wall_time for issue #1, this would keep the jiffies clocksource > from being used commonly, so this wouldn't be an issue. You can use the jiffies clocksource everywhere else, why do you have to start in the interrupt function? In general management code this is fine, but I don't see a compelling reason to start in the most sensitive part. bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-07 0:41 ` clocksource Roman Zippel @ 2006-06-08 8:05 ` john stultz 2006-06-15 11:40 ` clocksource Roman Zippel 0 siblings, 1 reply; 166+ messages in thread From: john stultz @ 2006-06-08 8:05 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel On Wed, 2006-06-07 at 02:41 +0200, Roman Zippel wrote: > On Tue, 6 Jun 2006, john stultz wrote: > > > With large clock offsets the lookahead doesn't work correctly, basically > > > because it's already to late and it can cause overadjustment. Because of > > > this I do an extra lookahead in clocksource_bigadjust(). > > > > Do you have a hard example for this with numbers? I don't mean to be a > > pain, but I don't see this right off. > > > > With the current code in -mm I can run a test app that disables > > interrupts for 2 seconds at a time over and over and I'm still keeping > > synched w/ an NTP server within 30 microseconds. > > You need a clock source which doesn't generate it's own interrupts, so > interrupts and clock updates can run asynchron. The key part above is > "large clock offsets". In my test program disable the extra lookahead and > run it with large offsets. I'm not sure I'm following you here. Almost all clocksources on i386 (specifically, in the case above, I was using the apci_pm) don't generate interrupts and run asynchronous from the timer interrupt source. I did re-review your documentation, and while it does go over the mult adjustment code in nice understandable terms, the "why" of this additional look-ahead isn't quite obvious. > This code gets only limited testing in -mm, it needs to run for weeks > or months, which I don't expect from the average -mm kernel. This makes > userspace simulations so damn important and if you don't do this, you're > playing a very risky game with a kernel which is supposed to be stable. Agreed, simulation is nice. Thus, I've revived the old simulator which builds using the existing code in -mm. Its a bit fast/dirty and isn't exactly like your sim, but maybe you can take a look at it and send patches to improve it? You can find it at: http://sr71.net/~jstultz/tod/simulator_C2.tar.bz2 I'm currently using it in testing my attempts to get your bigadjust code working, so hopefully it will help there. > > > For this I also I posted a userspace test program, so that I know how it > > > behaves, do you have something similiar for yours? > > > > At your prodding awhile back I wrote a userspace simulator, but you > > never commented on it. > > It was very hard to get running at all and in the meantime you had updated > patches and the whole thing didn't work anymore. Sorry, that I didn't > comment on it more. Please let me know if you still have difficulty getting this new one running. thanks -john ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-08 8:05 ` clocksource john stultz @ 2006-06-15 11:40 ` Roman Zippel 2006-06-16 3:21 ` clocksource john stultz 0 siblings, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-15 11:40 UTC (permalink / raw) To: john stultz; +Cc: Andrew Morton, linux-kernel Hi, On Thu, 8 Jun 2006, john stultz wrote: > > This code gets only limited testing in -mm, it needs to run for weeks > > or months, which I don't expect from the average -mm kernel. This makes > > userspace simulations so damn important and if you don't do this, you're > > playing a very risky game with a kernel which is supposed to be stable. > > Agreed, simulation is nice. Thus, I've revived the old simulator which > builds using the existing code in -mm. Its a bit fast/dirty and isn't > exactly like your sim, but maybe you can take a look at it and send > patches to improve it? > > You can find it at: > http://sr71.net/~jstultz/tod/simulator_C2.tar.bz2 At http://www.xs4all.nl/~zippel/ntp/simulator_C2+patches.tar.bz2 is my version where I added a number of patches (all p? patches) to get it into an acceptable state. You have a number of bugs which actually didn't let the clock oscillate that much but instead added random jitter (and in some cases a lot of it). I disabled the lost interrupt simulation, so the effect of adjustments are better visible, the error should return to near zero after it. Look for the "ppm" prints and watch the time difference. In the series file you can enable some debug patches (d?) to add extra prints or simulate large update delays to see the effect on the error difference. bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-15 11:40 ` clocksource Roman Zippel @ 2006-06-16 3:21 ` john stultz 2006-06-16 3:35 ` clocksource john stultz ` (2 more replies) 0 siblings, 3 replies; 166+ messages in thread From: john stultz @ 2006-06-16 3:21 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel On Thu, 2006-06-15 at 13:40 +0200, Roman Zippel wrote: > On Thu, 8 Jun 2006, john stultz wrote: > > > > This code gets only limited testing in -mm, it needs to run for weeks > > > or months, which I don't expect from the average -mm kernel. This makes > > > userspace simulations so damn important and if you don't do this, you're > > > playing a very risky game with a kernel which is supposed to be stable. > > > > Agreed, simulation is nice. Thus, I've revived the old simulator which > > builds using the existing code in -mm. Its a bit fast/dirty and isn't > > exactly like your sim, but maybe you can take a look at it and send > > patches to improve it? > > > > You can find it at: > > http://sr71.net/~jstultz/tod/simulator_C2.tar.bz2 > > At http://www.xs4all.nl/~zippel/ntp/simulator_C2+patches.tar.bz2 is my > version where I added a number of patches (all p? patches) to get it into > an acceptable state. You have a number of bugs which actually didn't let > the clock oscillate that much but instead added random jitter (and in some > cases a lot of it). I've been working on the simulator as well, and you're right, it caught a few problems. I appreciate your prodding me to get it running again. My current version is here: http://sr71.net/~jstultz/tod/gtod-sim_C2.1.tar.bz2 Some of the improvements : o I've added random offsets so increment_simulator_time doesn't always increment to a INTERVAL boundary. o Improved the random "tick dropping" (its a bad name, but it changes the frequency at which update_wall_time is called) so you can specify the frequency. o Added a seed argument so the random results can be reproduced. o Added the PPM randomization as suggested in your patch (I had to implement it differently as it collided w/ the random offset code). Just quickly so you don't have to read the README: ./todsim <drift> <seed> <droptick> Where: o drift is the ppm drift. if not specified or zero, it will be randomly changed as the test runs. o seed is seeds the random function. If not specified or zero time() will be used. o droptick is the frequency that ticks are taken. update_wall_time will be called randomly w/ 1/<droptick> frequency. Thus if droptick is 1000, we will on average only call update_wall_time once per 1000 ticks. If not specified, it will be set to one. > I disabled the lost interrupt simulation, so the effect of adjustments are > better visible, the error should return to near zero after it. Look for > the "ppm" prints and watch the time difference. > In the series file you can enable some debug patches (d?) to add extra > prints or simulate large update delays to see the effect on the error > difference. Very cool. I appreciate the small incremental patches. I've looked over them and am trying to see which ones make sense in light of the following info. I've also been working on improving the adjustment algorithm. Paul Mckenney enlightened me to the established concepts in control theory, I started reading up on PID control (see: http://en.wikipedia.org/wiki/PID_controller ). While I have understood the basic concept, it was useful to read up on it. I've tried to rework the adjustment code accordingly. The method I came up with is really just P-D (proportional-derivative) control, but that should be ok since the adjustments are all linear so I don't think the integral control is necessary (control theorists can pipe in here). The basic algorithm is as follows: update_error =0; interval_cycs = 0; while (offset >= clock->interval_cycles) { /* accumulate one interval */ remainder_snsecs += clock->interval_snsecs; ... /* accumulate error between NTP and clock interval */ update_error += current_tick_length(clock->shift); update_error -= clock->interval_snsecs; interval_cycs += clock->interval_cycles; ... } /* add error accumulated since last interrupt */ total_error += update_error; if (total_error > (s64)clock->interval_cycles || total_error < -((s64)clock->interval_cycles)) { /* derivative control: fix the slope */ freqadj = update_error/((s64)interval_cycs); /* proportional control: converge to zero */ offadj = total_error/(s64)interval_cycs; /* limiter to avoid oscillation */ if (offadj > MAXOFFADJ) offadj = MAXOFFADJ; else if(offadj < -MAXOFFADJ) offadj = -MAXOFFADJ; /* make the adjustment */ multadj = freqadj + offadj; clock->mult += multadj; clock->interval_snsecs = clock->mult * clock->interval_cycles; remainder_snsecs -= multadj * (s64)offset; total_error += multadj * offset; } Then using the same method in your bigadjust function, I can approximate the divides and the rest is very similar to your suggestions. The full patch for -mm is attached below (Andrew, please don't take this just yet, I'm doing more testing and I'd like Roman's feedback first). After applying this to -mm and generating the simulator from the result, I've found this to be *very* robust (it keeps proportionally close with the frequency of lost ticks, and doesn't have issue w/ ppm changes). Please take a look at it and let me know what you think. thanks -john diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 4bc9428..2993521 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -181,46 +181,57 @@ static inline void clocksource_calculate * * @error: Error value (unsigned) * @unit: Adjustment unit + * @max: Limit on adjustment unit * * For a given error value, this function takes the adjustment unit * and uses binary approximation to return a power of two adjustment value. * - * This function is only for use by the the make_ntp_adj() function - * and you must hold a write on the xtime_lock when calling. */ -static inline int error_aproximation(u64 error, u64 unit) +static int error_aproximation(u64 error, u64 unit, int max) { - static int saved_adj = 0; - u64 adjusted_unit = unit << saved_adj; - - if (error > (adjusted_unit * 2)) { - /* large error, so increment the adjustment factor */ - saved_adj++; - } else if (error > adjusted_unit) { - /* just right, don't touch it */ - } else if (saved_adj) { - /* small error, so drop the adjustment factor */ - saved_adj--; - return 0; + int adj = 0; + while (1) { + error >>= 1; + if (error <= unit) + return adj; + if (!max || adj < max) + adj++; } - - return saved_adj; } +#define MAXOFFADJ 4 /* vary max oscillation vs convergance speed */ + /** - * make_ntp_adj - Adjusts the specified clocksource for a given error + * clocksource_adj - Adjusts the specified clocksource for a given error * * @clock: Pointer to clock to be adjusted - * @cycles_delta: Current unacounted cycle delta - * @error: Pointer to current error value + * @cycles_delta: Current unaccumulated cycle delta + * @total_error: Pointer to current total error value + * @interval_error: Error accumulated since the last sample + * @interval_cycs: Accumulated cycles since the last sample * * Returns clock shifted nanosecond adjustment to be applied against * the accumulated time value (ie: xtime). * - * If the error value is large enough, this function calulates the - * (power of two) adjustment value, and adjusts the clock's mult and - * interval_snsecs values accordingly. + * If the error value is large enough, this function aproximates + * the frequency and offset adjustment, and applies it to the + * clock's mult and interval_snsecs values accordingly. + * + * This method of adjustment is similar to PID control. + * See http://en.wikipedia.org/wiki/PID_controller for more info. + * However we are really just doing P-D control, as since are adjustments + * are liniar, there is no need for the integral component of PID. + * The P-D control is done in two steps: + * 1) Proportonal control: (offset adjustment) + * This makes adjustment based on the current error from NTP. + * This adjustment is limited to avoid oscillation from missed + * ticks. + * 2) Derivative control: + * This makes adjustments based on the error accumulated in + * the last period (in otherwords, the different in error from + * the last period). This provides a frequency correction so no + * additional error should be accumulated in the next period. * * However, since there may be some unaccumulated cycles, to avoid * time inconsistencies we must adjust the accumulation value @@ -238,35 +249,89 @@ static inline int error_aproximation(u64 * * Where mult_delta is the adjustment value made to mult * + * An aditional complication: Since we are adjusting the base value, + * we must also adjust the total_error value, as it is the distance + * of the base time from the NTP time. Thus we adjust the total_error + * by the negative amount we adjusted the base. */ -static inline s64 make_ntp_adj(struct clocksource *clock, - cycles_t cycles_delta, s64* error) +static inline s64 clocksource_adj(struct clocksource *clock, + cycle_t cycles_delta, s64* total_error, + s64 interval_error, s64 interval_cycs) { s64 ret = 0; - if (*error > ((s64)clock->interval_cycles+1)/2) { - /* calculate adjustment value */ - int adjustment = error_aproximation(*error, - clock->interval_cycles); - /* adjust clock */ - clock->mult += 1 << adjustment; - clock->interval_snsecs += clock->interval_cycles << adjustment; - - /* adjust the base and error for the adjustment */ - ret = -(cycles_delta << adjustment); - *error -= clock->interval_cycles << adjustment; - /* XXX adj error for cycle_delta offset? */ - } else if ((-(*error)) > ((s64)clock->interval_cycles+1)/2) { - /* calculate adjustment value */ - int adjustment = error_aproximation(-(*error), - clock->interval_cycles); - /* adjust clock */ - clock->mult -= 1 << adjustment; - clock->interval_snsecs -= clock->interval_cycles << adjustment; - - /* adjust the base and error for the adjustment */ - ret = cycles_delta << adjustment; - *error += clock->interval_cycles << adjustment; - /* XXX adj error for cycle_delta offset? */ + s64 error = *total_error; + + if ((error > (s64)clock->interval_cycles) + ||(error < -((s64)clock->interval_cycles)) ) { + + int adj, multadj = 0; + s64 offset_update = 0, snsec_update = 0; + + /* First do the frequency adjustment: + * The idea here is to look at the error + * accumulated since the last call to + * update_wall_time to determine the + * frequency adjustment needed so no new + * error will be incurred in the next + * interval. + * + * This is basically derivative control + * using the PID terminology (we're calculating + * the derivative of the slope and correcting it). + * + * The math is basically: + * multadj = interval_error/interval_cycles + * Which we fudge using binary approximation. + */ + if(interval_error >= 0) { + adj = error_aproximation(interval_error, + interval_cycs, 0); + multadj += 1 << adj; + snsec_update += clock->interval_cycles << adj; + offset_update += cycles_delta << adj; + } else { + adj = error_aproximation(-interval_error, + interval_cycs, 0); + multadj -= 1 << adj; + snsec_update -= clock->interval_cycles << adj; + offset_update -= cycles_delta << adj; + } + /* Now do the offset adjustment: + * Now that the frequncy is fixed, we + * want to look at the total error accumulated + * to move us back in sync using the same method. + * However, we must be careful as if we make too + * sudden an adjustment we might overshoot. So we + * limit the amount of change to spread the + * adjustment (using MAXOFFADJ) over a longer + * period of time. + * + * This is basically proportional control + * using the PID terminology. + * + * We use interval_cycs here as the divisor, which + * hopes that the next sample will be similar in + * distance from the last. + */ + if(error >= 0) { + adj = error_aproximation(error, + interval_cycs, MAXOFFADJ); + multadj += 1<<adj; + snsec_update += clock->interval_cycles <<adj; + offset_update += cycles_delta << adj; + } else { + adj = error_aproximation(-error, + interval_cycs, MAXOFFADJ); + multadj -= 1<<adj; + snsec_update -= clock->interval_cycles <<adj; + offset_update -= cycles_delta << adj; + } + + clock->mult += multadj; + clock->interval_snsecs += snsec_update;; + ret -= offset_update; + *total_error += offset_update; + } return ret; } diff --git a/kernel/timer.c b/kernel/timer.c index 0569d40..1345759 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1025,9 +1025,9 @@ device_initcall(timekeeping_init_device) */ static void update_wall_time(void) { - static s64 remainder_snsecs, error; - s64 snsecs_per_sec; - cycle_t now, offset; + static s64 remainder_snsecs, total_error; + s64 snsecs_per_sec, interval_error = 0; + cycle_t now, offset, interval_cycs = 0; snsecs_per_sec = (s64)NSEC_PER_SEC << clock->shift; remainder_snsecs += (s64)xtime.tv_nsec << clock->shift; @@ -1038,7 +1038,7 @@ static void update_wall_time(void) /* normally this loop will run just once, however in the * case of lost or late ticks, it will accumulate correctly. */ - while (offset > clock->interval_cycles) { + while (offset >= clock->interval_cycles) { /* get the ntp interval in clock shifted nanoseconds */ s64 ntp_snsecs = current_tick_length(clock->shift); @@ -1054,10 +1054,8 @@ static void update_wall_time(void) update_ntp_one_tick(); /* accumulate error between NTP and clock interval */ - error += (ntp_snsecs - (s64)clock->interval_snsecs); - - /* correct the clock when NTP error is too big */ - remainder_snsecs += make_ntp_adj(clock, offset, &error); + interval_error += (ntp_snsecs - (s64)clock->interval_snsecs); + interval_cycs += clock->interval_cycles; if (remainder_snsecs >= snsecs_per_sec) { remainder_snsecs -= snsecs_per_sec; @@ -1065,13 +1063,20 @@ static void update_wall_time(void) second_overflow(); } } + + total_error += interval_error; + + /* correct the clock when NTP error is too big */ + remainder_snsecs += clocksource_adj(clock, offset, &total_error, + interval_error, interval_cycs); + /* store full nanoseconds into xtime */ xtime.tv_nsec = remainder_snsecs >> clock->shift; remainder_snsecs -= (s64)xtime.tv_nsec << clock->shift; /* check to see if there is a new clocksource to use */ if (change_clocksource()) { - error = 0; + total_error = 0; remainder_snsecs = 0; clocksource_calculate_interval(clock, tick_nsec); } ^ permalink raw reply related [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-16 3:21 ` clocksource john stultz @ 2006-06-16 3:35 ` john stultz 2006-06-16 15:33 ` clocksource Roman Zippel 2006-06-17 17:04 ` clocksource Andrew Morton 2 siblings, 0 replies; 166+ messages in thread From: john stultz @ 2006-06-16 3:35 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel On Thu, 2006-06-15 at 20:21 -0700, john stultz wrote: > Very cool. I appreciate the small incremental patches. I've looked over > them and am trying to see which ones make sense in light of the > following info. Ah, forgot to mention that the patch included the changes from your p1 fix (some of which I had already made, but items like the clocksource_adj() name change is a clear improvement). And just to be sure you know I'm not brushing your patches off: I'm looking at patches p2, p4, p5 and p7 as the next steps, and after I get some feedback on the patch I just sent we can discuss p3,p6, and p8. Sound good? thanks -john ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-16 3:21 ` clocksource john stultz 2006-06-16 3:35 ` clocksource john stultz @ 2006-06-16 15:33 ` Roman Zippel 2006-06-16 18:48 ` clocksource john stultz 2006-06-17 17:04 ` clocksource Andrew Morton 2 siblings, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-16 15:33 UTC (permalink / raw) To: john stultz; +Cc: Andrew Morton, linux-kernel Hi, On Thu, 15 Jun 2006, john stultz wrote: > I've also been working on improving the adjustment algorithm. Paul > Mckenney enlightened me to the established concepts in control theory, I > started reading up on PID control (see: > http://en.wikipedia.org/wiki/PID_controller ). While I have understood > the basic concept, it was useful to read up on it. I've tried to rework > the adjustment code accordingly. > > The method I came up with is really just P-D (proportional-derivative) > control, but that should be ok since the adjustments are all linear so I > don't think the integral control is necessary (control theorists can > pipe in here). This makes it more complex than necessary. AFAICT this controller calculates the adjustment solely based on the current error, but we have more information than this, which make the current error rather uninteresting. We know the clock frequency and the NTP frequency so we can easily precalculate, how the error will look like at the next few ticks. Based on this we can calculate how we have to adjust the clock frequency to reduce the error. Overshooting is also not a real problem as long as the absolute error gets smaller. An important point about the last patch is not just robustness but also speed, it tries to keep the fast path small, which is basically: interval = clock->cycle_interval; if (error > interval / 2) { adj = 1; if (unlikely(error > interval * 2)) { ... } } else if (error < -interval / 2) { adj = -1 interval = -interval; offset = -offset; if (unlikely(error < interval * 2)) { ... } } else return; clock->mult += adj; clock->xtime_interval += interval; clock->xtime_nsec -= offset; clock->error -= interval - offset; You'll need a very good reason to do anything more than this for small errors and I would suggest you start from something like this, as this is the very core of the error adjustment. bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-16 15:33 ` clocksource Roman Zippel @ 2006-06-16 18:48 ` john stultz 2006-06-17 19:45 ` clocksource Roman Zippel 0 siblings, 1 reply; 166+ messages in thread From: john stultz @ 2006-06-16 18:48 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel On Fri, 2006-06-16 at 17:33 +0200, Roman Zippel wrote: > On Thu, 15 Jun 2006, john stultz wrote: > > I've also been working on improving the adjustment algorithm. Paul > > Mckenney enlightened me to the established concepts in control theory, I > > started reading up on PID control (see: > > http://en.wikipedia.org/wiki/PID_controller ). While I have understood > > the basic concept, it was useful to read up on it. I've tried to rework > > the adjustment code accordingly. > > > > The method I came up with is really just P-D (proportional-derivative) > > control, but that should be ok since the adjustments are all linear so I > > don't think the integral control is necessary (control theorists can > > pipe in here). > > This makes it more complex than necessary. AFAICT this controller > calculates the adjustment solely based on the current error, but we have > more information than this, which make the current error rather > uninteresting. Indeed it is the current error, but its also taking the change in error into account as well. > We know the clock frequency and the NTP frequency so we can easily > precalculate, how the error will look like at the next few ticks. Based on > this we can calculate how we have to adjust the clock frequency to reduce > the error. Overshooting is also not a real problem as long as the absolute > error gets smaller. I'm not sure I agree here. Using your patch series, if I re-enable the code that drops calls to update_wall_time (simulating lost ticks) the clock does not appear very stable. Robustness and features like dynamic/no_idle_hz are going to require that we can handle taking something close to only one tick per second, so overshoot is a big concern in my mind. Maybe I'm misunderstanding you? However, I need to forward port your patchset to the new simulator to really do a fair comparison as I know there were some issues w/ the simulator that I addressed in order to get the new features working. If you have already done this, let me know. > An important point about the last patch is not just robustness but also > speed, it tries to keep the fast path small, which is basically: > > interval = clock->cycle_interval; > if (error > interval / 2) { > adj = 1; > if (unlikely(error > interval * 2)) { > ... > } > } else if (error < -interval / 2) { > adj = -1 > interval = -interval; > offset = -offset; > if (unlikely(error < interval * 2)) { > ... > } > } else > return; > > clock->mult += adj; > clock->xtime_interval += interval; > clock->xtime_nsec -= offset; > clock->error -= interval - offset; > > You'll need a very good reason to do anything more than this for small > errors and I would suggest you start from something like this, as this is > the very core of the error adjustment. I agree that the patch I sent could use some optimizations, and likely even some tweaking (supposedly I can get rid of the proportional adjustment limiter by using a gain value, but I need to test this a bit) to improve it further. Now trying to compare it to your code: Looking at your description of the code above from your documentation email: 1) mult_adj = error / cycle_update; 2) mult += mult_adj; 3) xtime -= cycle_offset * mult_adj; 4) error -= (cycle_update - cycle_offset) * mult_adj; Lines 1 & 2 calculates the proportional error adjustment for the error at the next interval. Line 3 is also well understood, as it corrects the base for the new adjustment value if there is an offset value. So I see the proportional adjustment, but I don't see how the derivative is included. I suspect the density of the error adjustment bit is what makes this so opaque to me. Breaking line 4 apart for a moment: 4a) error += cycle_offset * multadj; 4b) error -= cycle_update * muladj Line 4a is also clear, since if the base had been changed in line 3, the error between the base and ntp has changed as well, so it must be changed by the negative amount the base was changed to stay in sync. Line 4b is a bit foggy. Just assuming cycle_offset is zero, we can ignore line 3 and 4a. So we're reducing the error by the change in length of the next interval. I see how this would in effect dampen the next adjustment, but I'm not sure how that then maps the error value to the actual distance from ntp_time. Abstractly I understand how looking at the next tick is good for when the NTP adjustment value changes, but I'm not sure I see how looking ahead makes the clock more stable when the NTP adjustment isn't changing. Is there a way you can map the math above to the terms of PID control (or maybe some other established concept that I can dig deeper on?). thanks -john ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-16 18:48 ` clocksource john stultz @ 2006-06-17 19:45 ` Roman Zippel 0 siblings, 0 replies; 166+ messages in thread From: Roman Zippel @ 2006-06-17 19:45 UTC (permalink / raw) To: john stultz; +Cc: Andrew Morton, linux-kernel Hi, On Fri, 16 Jun 2006, john stultz wrote: > > This makes it more complex than necessary. AFAICT this controller > > calculates the adjustment solely based on the current error, but we have > > more information than this, which make the current error rather > > uninteresting. > > Indeed it is the current error, but its also taking the change in error > into account as well. Which is not really important anymore, as soon as you already _know_ the future error. > > We know the clock frequency and the NTP frequency so we can easily > > precalculate, how the error will look like at the next few ticks. Based on > > this we can calculate how we have to adjust the clock frequency to reduce > > the error. Overshooting is also not a real problem as long as the absolute > > error gets smaller. > > I'm not sure I agree here. Using your patch series, if I re-enable the > code that drops calls to update_wall_time (simulating lost ticks) the > clock does not appear very stable. That's because the tick length is not applied in the same way in sim-main.c and time.c, if the drop happens around a call to second_overflow(), one gets the old value and the other gets the new value. If you prevent the drop around a ppm change, it will work just fine, it's a bug in the simulator. > Robustness and features like > dynamic/no_idle_hz are going to require that we can handle taking > something close to only one tick per second, so overshoot is a big > concern in my mind. Maybe I'm misunderstanding you? Overshooting is not your real problem in this case, as large offsets and lost interrupts are less a problem. You're making it way too complex... > However, I need to forward port your patchset to the new simulator to > really do a fair comparison as I know there were some issues w/ the > simulator that I addressed in order to get the new features working. If > you have already done this, let me know. The new simulator is far too complex and I don't want to spend the time verifying it does a really correct simulation, I don't think it's really a good idea to make the simulator more complex than the subject, otherwise you need a simulator to test the simulator... You are also using rather smallish adjustments, changes of upto 1ms/s are more realistic (NTP allows a MAXPHASE offset which is spread over a period of time). Your simulator is also fixed to 1GHz and 1000Hz, which are rather convenient values. > Line 4b is a bit foggy. Just assuming cycle_offset is zero, we can > ignore line 3 and 4a. So we're reducing the error by the change in > length of the next interval. I see how this would in effect dampen the > next adjustment, but I'm not sure how that then maps the error value to > the actual distance from ntp_time. As I said it looks ahead to the next error, if we adjust the multiplier at the current tick, it also changes the error at the next tick. > Abstractly I understand how looking at the next tick is good for when > the NTP adjustment value changes, but I'm not sure I see how looking > ahead makes the clock more stable when the NTP adjustment isn't > changing. Huh? If the NTP adjustment isn't changing, clock and NTP are hopefully in sync and there is no (significant) error to adjust. > Is there a way you can map the math above to the terms of PID control Please forget this one, it doesn't really apply here. We know more about the model, which allows for simpler calculations. You have to throw away information to fit it into the PID model. > (or maybe some other established concept that I can dig deeper on?). I guess it's basically a FLL. Below is another patch, which better takes the update delays into account. bye, Roman take the offset better into account by applying the current adjustment to the error and anticipate a possible large adjustment at the next update due to a large offset. --- time.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) Index: simulator_C2/time.c =================================================================== --- simulator_C2.orig/time.c +++ simulator_C2/time.c @@ -227,7 +227,7 @@ device_initcall(timekeeping_init_device) * If the error is already larger, we look ahead another tick, * to compensate for late or lost adjustments. */ -static __always_inline int clocksource_bigadjust(int sign, s64 error, s64 interval) +static __always_inline int clocksource_bigadjust(int sign, s64 error, s64 interval, s64 offset) { int adj = 0; @@ -236,8 +236,12 @@ static __always_inline int clocksource_b while (1) { error >>= 1; - if (sign > 0 ? error <= interval : error >= interval) + if (sign > 0 ? error <= interval : error >= interval) { + error = (error << 1) - interval + offset; + if (sign > 0 ? error > interval : error < interval) + adj++; return adj; + } adj++; } } @@ -246,7 +250,8 @@ static __always_inline int clocksource_b int adj = sign; \ error >>= 2; \ if (unlikely(sign > 0 ? error > interval : error < interval)) { \ - adj = clocksource_bigadjust(sign, error, interval); \ + adj = clocksource_bigadjust(sign, error, \ + interval, offset); \ interval <<= adj; \ offset <<= adj; \ adj = sign << adj; \ ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: clocksource 2006-06-16 3:21 ` clocksource john stultz 2006-06-16 3:35 ` clocksource john stultz 2006-06-16 15:33 ` clocksource Roman Zippel @ 2006-06-17 17:04 ` Andrew Morton 2 siblings, 0 replies; 166+ messages in thread From: Andrew Morton @ 2006-06-17 17:04 UTC (permalink / raw) To: john stultz; +Cc: zippel, linux-kernel On Thu, 15 Jun 2006 20:21:24 -0700 john stultz <johnstul@us.ibm.com> wrote: > The method I came up with is really just P-D (proportional-derivative) > control, but that should be ok since the adjustments are all linear so I > don't think the integral control is necessary (control theorists can > pipe in here). Boy, that takes me back. If you don't feed back the integral you'll end up with an output which has a steady-state offset error against the control point (unless the forward gain is infinite, and it never is). I don't know if that matters here, but it cannot be good. If you feed back the integral of the error then it introduces the possibility of instability. Probably in this application you can just overdamp the thing to avoid that. It'll make it slower to respond to changes in the setpoint. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: utsname/hostname 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (5 preceding siblings ...) 2006-06-04 23:50 ` clocksource Roman Zippel @ 2006-06-05 0:02 ` Randy.Dunlap 2006-06-05 1:06 ` utsname/hostname Andrew Morton [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn> ` (13 subsequent siblings) 20 siblings, 1 reply; 166+ messages in thread From: Randy.Dunlap @ 2006-06-05 0:02 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sun, 4 Jun 2006 13:50:11 -0700 Andrew Morton wrote: > > It's time to take a look at the -mm queue for 2.6.18. > > > When replying to this email pleeeeeeze rewrite the Subject: to something > appropriate so we do not all go mad. Thanks. > > > proc-sysctl-add-_proc_do_string-helper.patch > namespaces-add-nsproxy.patch > namespaces-add-nsproxy-dont-include-compileh.patch > namespaces-incorporate-fs-namespace-into-nsproxy.patch > namespaces-utsname-introduce-temporary-helpers.patch > namespaces-utsname-switch-to-using-uts-namespaces.patch > namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch > namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch > namespaces-utsname-use-init_utsname-when-appropriate.patch > namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch > namespaces-utsname-implement-utsname-namespaces.patch > namespaces-utsname-implement-utsname-namespaces-export.patch > namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch > namespaces-utsname-sysctl-hack.patch > namespaces-utsname-sysctl-hack-cleanup.patch > namespaces-utsname-sysctl-hack-cleanup-2.patch > namespaces-utsname-sysctl-hack-cleanup-2-fix.patch > namespaces-utsname-remove-system_utsname.patch > namespaces-utsname-implement-clone_newuts-flag.patch > uts-copy-nsproxy-only-when-needed.patch > # needed if git-klibc isn't there: > #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch > #namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch > #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch > > utsname virtualisation. This doesn't seem very pointful as a standalone > thing. That's a general problem with infrastructural work for a very > large new feature. > > So probably I'll continue to babysit these patches, unless someone can > identify a decent reason why mainline needs this work. Not a strong argument for mainline, but I have a patch to make <hostname> larger (up to 255 bytes, per POSIX). http://www.xenotime.net/linux/patches/hostname-2617-rc5b.patch I can either update my hostname patch against mm/utsname.. or not. But I don't really want to see some/any patch blocked due to a patch in -mm being borderline "pointful," so how do we deal with this? > I don't want to carry an ever-growing stream of OS-virtualisation > groundwork patches for ever and ever so if we're going to do this thing... > faster, please. --- ~Randy ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: utsname/hostname 2006-06-05 0:02 ` utsname/hostname Randy.Dunlap @ 2006-06-05 1:06 ` Andrew Morton 2006-06-05 3:10 ` utsname/hostname Randy.Dunlap 0 siblings, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-05 1:06 UTC (permalink / raw) To: Randy.Dunlap; +Cc: linux-kernel On Sun, 4 Jun 2006 17:02:18 -0700 "Randy.Dunlap" <rdunlap@xenotime.net> wrote: > > utsname virtualisation. This doesn't seem very pointful as a standalone > > thing. That's a general problem with infrastructural work for a very > > large new feature. > > > > So probably I'll continue to babysit these patches, unless someone can > > identify a decent reason why mainline needs this work. > > Not a strong argument for mainline, but I have a patch to make > <hostname> larger (up to 255 bytes, per POSIX). > http://www.xenotime.net/linux/patches/hostname-2617-rc5b.patch My immediate reaction to that was to tell posix to go take a hike. I mean, sheesh. > I can either update my hostname patch against mm/utsname.. or not. > But I don't really want to see some/any patch blocked due to a patch > in -mm being borderline "pointful," so how do we deal with this? Well first we need to work out if there's any vague reason why we need to mucky up our kernel by implementing this dopey spec. If there is such a reason then I guess I drop all the ustname patches and ask that they be redone. They're a bit straggly and a refactoring/rechanngelogging wouldn't hurt. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: utsname/hostname 2006-06-05 1:06 ` utsname/hostname Andrew Morton @ 2006-06-05 3:10 ` Randy.Dunlap 0 siblings, 0 replies; 166+ messages in thread From: Randy.Dunlap @ 2006-06-05 3:10 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sun, 4 Jun 2006 18:06:18 -0700 Andrew Morton wrote: > On Sun, 4 Jun 2006 17:02:18 -0700 > "Randy.Dunlap" <rdunlap@xenotime.net> wrote: > > > > utsname virtualisation. This doesn't seem very pointful as a standalone > > > thing. That's a general problem with infrastructural work for a very > > > large new feature. > > > > > > So probably I'll continue to babysit these patches, unless someone can > > > identify a decent reason why mainline needs this work. > > > > Not a strong argument for mainline, but I have a patch to make > > <hostname> larger (up to 255 bytes, per POSIX). > > http://www.xenotime.net/linux/patches/hostname-2617-rc5b.patch > > My immediate reaction to that was to tell posix to go take a hike. I mean, > sheesh. well thanks for finally replying then. That's my reaction to some other patches (in -mm) as well (not that it matters). > > I can either update my hostname patch against mm/utsname.. or not. > > But I don't really want to see some/any patch blocked due to a patch > > in -mm being borderline "pointful," so how do we deal with this? > > Well first we need to work out if there's any vague reason why we need to > mucky up our kernel by implementing this dopey spec. If there is such a > reason then I guess I drop all the ustname patches and ask that they be > redone. They're a bit straggly and a refactoring/rechanngelogging wouldn't > hurt. Fixing the changelog is easy. What refactoring do you mean? --- ~Randy ^ permalink raw reply [flat|nested] 166+ messages in thread
[parent not found: <20060605002807.GA4919@mail.ustc.edu.cn>]
* readahead benchmark [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn> @ 2006-06-05 0:28 ` Fengguang Wu 2006-06-05 1:02 ` Andrew Morton 0 siblings, 1 reply; 166+ messages in thread From: Fengguang Wu @ 2006-06-05 0:28 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > readahead-kconfig-options.patch [...] > It's early days yet - needs heaps more performance testing. The results > from "Linux Portal" <linportal@gmail.com> were discouraging. I found this mail from the lkml archive, did you happen to have more results? ------ Date: Mon, 29 May 2006 17:22:50 +0200 From: "Linux Portal" <linportal@gmail.com> To: linux-kernel@vger.kernel.org Subject: The adaptive readahead patch benchmark There is an interesting (although simple) benchmark of Wu's adaptive readahead patchset (v12) together with graphs here: http://linux.inet.hr/adaptive_readahead_benchmark.html In that simple test it definitely looks promising (3x speedup). ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: readahead benchmark 2006-06-05 0:28 ` readahead benchmark Fengguang Wu @ 2006-06-05 1:02 ` Andrew Morton 0 siblings, 0 replies; 166+ messages in thread From: Andrew Morton @ 2006-06-05 1:02 UTC (permalink / raw) To: Fengguang Wu; +Cc: linux-kernel On Mon, 5 Jun 2006 08:28:07 +0800 Fengguang Wu <fengguang.wu@gmail.com> wrote: > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > > readahead-kconfig-options.patch > [...] > > It's early days yet - needs heaps more performance testing. The results > > from "Linux Portal" <linportal@gmail.com> were discouraging. > > I found this mail from the lkml archive, did you happen to have more > results? > Sorry, I had the wrong tester. Voluspa <lista1@comhem.se>: "Conclusion: On _this_ machine, with _these_ operations, Adaptive Readahead in its current incarnation and default settings is a _loss_." > > There is an interesting (although simple) benchmark of Wu's adaptive > readahead patchset (v12) together with graphs here: > > http://linux.inet.hr/adaptive_readahead_benchmark.html > > In that simple test it definitely looks promising (3x speedup). That's postgreql again. We know there's a problem at present with postgresql. Has anyone tried to fix it, without going and rewriting everything? ^ permalink raw reply [flat|nested] 166+ messages in thread
* new SCSI drivers (was Re: 2.6.18 -mm merge plans) 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (7 preceding siblings ...) [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn> @ 2006-06-05 0:32 ` Jeff Garzik [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn> ` (11 subsequent siblings) 20 siblings, 0 replies; 166+ messages in thread From: Jeff Garzik @ 2006-06-05 0:32 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-scsi On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > areca-raid-linux-scsi-driver.patch > I'm going to start sending the Areca driver to James, too. The vendor > has worked hard and the hardware is becoming more important - let's help > them get it in. The driver gets my ACK. Also, I have the Promise 'stex' (previously 'shasta') SCSI RAID driver in jgarzik/misc-2.6.git#stex that wants merging. It's been sent to linux-scsi and linux-kernel several times, but never seemed to make it into a SCSI tree. I kept it alive in #stex, and AFAICS it's been ready to merge for a while now. I'll send it to linux-scsi one more time, sometime this week. Jeff ^ permalink raw reply [flat|nested] 166+ messages in thread
[parent not found: <20060605010501.GA4931@mail.ustc.edu.cn>]
* statistics infrastructure [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn> @ 2006-06-05 1:05 ` Fengguang Wu 2006-06-05 16:30 ` Greg KH 1 sibling, 0 replies; 166+ messages in thread From: Fengguang Wu @ 2006-06-05 1:05 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Martin Peschke On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > statistics-infrastructure.patch > Another tough one. It offers generic intrastructure for non-task-related > instrumentation and it would really be good if someone who has an interest > in this for something other than the zfcp driver could stand up and say > "this works for us". I'm having a try of it. Looks good for my case, except some fixable issues/bugs. Here is a sample session for querying the readahead statistics: root ~# echo state=on > /debug/statistics/readahead/definition root ~# cat /debug/statistics/readahead/definition name=io_block state=on units=/ type=counter_inc data=[ 108.272356] started=[ 218.797000] stopped=[ 201.533118] name=read_random state=on units=pages/requests type=utilisation data=[ 108.272459] started=[ 218.797004] stopped=[ 201.533146] name=readahead-fadvise state=on units=pages/requests type=utilisation data=[ 108.272472] started=[ 218.797005] stopped=[ 201.533149] name=readahead-stock state=on units=pages/requests type=utilisation data=[ 108.272476] started=[ 218.797005] stopped=[ 201.533152] name=readaround-mmap state=on units=pages/requests type=utilisation data=[ 108.272479] started=[ 218.797006] stopped=[ 201.533155] name=readahead-mmap state=on units=pages/requests type=utilisation data=[ 108.272482] started=[ 218.797006] stopped=[ 201.533158] name=readahead-initial state=on units=pages/requests type=utilisation data=[ 108.272485] started=[ 218.797007] stopped=[ 201.533161] name=readahead-state state=on units=pages/requests type=utilisation data=[ 108.272488] started=[ 218.797007] stopped=[ 201.533164] name=readahead-context state=on units=pages/requests type=utilisation data=[ 108.272491] started=[ 218.797008] stopped=[ 201.533166] name=readahead-contexta state=on units=pages/requests type=utilisation data=[ 108.272494] started=[ 218.797008] stopped=[ 201.533169] name=readahead-backward state=on units=pages/requests type=utilisation data=[ 108.272497] started=[ 218.797009] stopped=[ 201.533172] name=readahead-onthrash state=on units=pages/requests type=utilisation data=[ 108.272500] started=[ 218.797009] stopped=[ 201.533175] name=readahead-onseek state=on units=pages/requests type=utilisation data=[ 108.272503] started=[ 218.797010] stopped=[ 201.533178] name=rescue state=on units=pages/chunks type=utilisation data=[ 108.272506] started=[ 218.797011] stopped=[ 201.533181] name=size_drop state=on units=from-pages/delta-pages type=counter_inc data=[ 108.272509] started=[ 218.797011] stopped=[ 201.533184] root ~# cat /debug/statistics/readahead/data io_block 40 read_random 0 0 0.000 0 readahead-fadvise 0 0 0.000 0 readahead-stock 0 0 0.000 0 readaround-mmap 7 1 28.286 88 readahead-mmap 0 0 0.000 0 readahead-initial 2 5 5.000 5 readahead-state 1331 256 256.010 269 readahead-context 0 0 0.000 0 readahead-contexta 0 0 0.000 0 readahead-backward 0 0 0.000 0 readahead-onthrash 0 0 0.000 0 readahead-onseek 0 0 0.000 0 rescue 0 0 0.000 0 size_drop 13 root ~# root ~# echo name=readahead-initial type=histogram_lin entries=32 range_min=8 base_interval=8 > /debug/statistics/readahead/definition root ~# cat /debug/statistics/readahead/data io_block 53 read_random 0 0 0.000 0 readahead-fadvise 0 0 0.000 0 readahead-stock 0 0 0.000 0 readaround-mmap 7 1 28.286 88 readahead-mmap 0 0 0.000 0 readahead-initial <=8 0 readahead-initial <=16 0 readahead-initial <=24 0 readahead-initial <=32 0 readahead-initial <=40 0 readahead-initial <=48 0 readahead-initial <=56 0 readahead-initial <=64 0 readahead-initial <=72 0 readahead-initial <=80 0 readahead-initial <=88 0 readahead-initial <=96 0 readahead-initial <=104 0 readahead-initial <=112 0 readahead-initial <=120 0 readahead-initial <=128 0 readahead-initial <=136 0 readahead-initial <=144 0 readahead-initial <=152 0 readahead-initial <=160 0 readahead-initial <=168 0 readahead-initial <=176 0 readahead-initial <=184 0 readahead-initial <=192 0 readahead-initial <=200 0 readahead-initial <=208 0 readahead-initial <=216 0 readahead-initial <=224 0 readahead-initial <=232 0 readahead-initial <=240 0 readahead-initial <=248 0 readahead-initial >248 0 readahead-state 1331 256 256.010 269 readahead-context 0 0 0.000 0 readahead-contexta 0 0 0.000 0 readahead-backward 0 0 0.000 0 readahead-onthrash 0 0 0.000 0 readahead-onseek 0 0 0.000 0 rescue 0 0 0.000 0 size_drop 13 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn> 2006-06-05 1:05 ` statistics infrastructure Fengguang Wu @ 2006-06-05 16:30 ` Greg KH 2006-06-13 23:47 ` statistics infrastructure (in -mm tree) review Greg KH 1 sibling, 1 reply; 166+ messages in thread From: Greg KH @ 2006-06-05 16:30 UTC (permalink / raw) To: Fengguang Wu, Andrew Morton, linux-kernel, Martin Peschke On Mon, Jun 05, 2006 at 09:05:01AM +0800, Fengguang Wu wrote: > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > > statistics-infrastructure.patch > > > Another tough one. It offers generic intrastructure for non-task-related > > instrumentation and it would really be good if someone who has an interest > > in this for something other than the zfcp driver could stand up and say > > "this works for us". > > I'm having a try of it. Looks good for my case, except some fixable > issues/bugs. Here is a sample session for querying the readahead statistics: The last I looked at this, it seemed way too complex for what was needed. A lot of the filtering and other parsing stuff should be done by a userspace tool, not the kernel. I'll take a second look at it and see what I can comment on. But I don't think it's 2.6.18 material yet... thanks, greg k-h ^ permalink raw reply [flat|nested] 166+ messages in thread
* statistics infrastructure (in -mm tree) review 2006-06-05 16:30 ` Greg KH @ 2006-06-13 23:47 ` Greg KH 2006-06-14 0:18 ` Randy.Dunlap ` (2 more replies) 0 siblings, 3 replies; 166+ messages in thread From: Greg KH @ 2006-06-13 23:47 UTC (permalink / raw) To: Martin Peschke; +Cc: akpm, linux-kernel First cut at reviewing this code. Initial impression is, "damm, that's a complex interface". I'd really like to see some other, real-world usages of this. Like perhaps the io-schedular statistics? Some other /proc stats that have nothing to do with processes? And what does this mean for relayfs? Those developers tuned that code to the nth degree to get speed and other goodness, and here you go just ignoring that stuff and add yet another way to get stats out of the kernel. Why should I use this instead of my own code with relayfs? And is the need for the in-kernel parser really necessary? I know it makes the userspace tools simpler (cat and echo), but should we be telling the kernel how to filter and adjust the data? Shouldn't we just dump it all to userspace and use tools there to manipulate it? Oh, and use C99 structure initializers for when creating the statisic structures in the example code (and real code), it makes it much easier to understand, and future proof when the api changes. Code comments now: > diff -puN arch/s390/Kconfig~statistics-infrastructure arch/s390/Kconfig > --- devel/arch/s390/Kconfig~statistics-infrastructure 2006-06-09 15:22:58.000000000 -0700 > +++ devel-akpm/arch/s390/Kconfig 2006-06-09 15:22:58.000000000 -0700 > @@ -490,8 +490,14 @@ source "drivers/net/Kconfig" > > source "fs/Kconfig" > > +menu "Instrumentation Support" > + > source "arch/s390/oprofile/Kconfig" > > +source "lib/Kconfig.statistic" > + > +endmenu > + > source "arch/s390/Kconfig.debug" > > source "security/Kconfig" > diff -puN arch/s390/oprofile/Kconfig~statistics-infrastructure arch/s390/oprofile/Kconfig > --- devel/arch/s390/oprofile/Kconfig~statistics-infrastructure 2006-06-09 15:22:58.000000000 -0700 > +++ devel-akpm/arch/s390/oprofile/Kconfig 2006-06-09 15:22:58.000000000 -0700 > @@ -1,6 +1,3 @@ > - > -menu "Profiling support" > - > config PROFILING > bool "Profiling support" > help > @@ -18,5 +15,3 @@ config OPROFILE > > If unsure, say N. > > -endmenu > - These two patches should probably go somewhere else, they don't have much to do with this one. (well, adding Kconfig.statistic" does, but the other wording doesn't.) > diff -puN /dev/null include/linux/statistic.h > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > +++ devel-akpm/include/linux/statistic.h 2006-06-09 15:22:58.000000000 -0700 > @@ -0,0 +1,348 @@ > +/* > + * include/linux/statistic.h > + * > + * Statistics facility > + * > + * (C) Copyright IBM Corp. 2005, 2006 > + * > + * Author(s): Martin Peschke <mpeschke@de.ibm.com> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2, or (at your option) > + * any later version. Are you sure "any later version"? > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Two not-needed paragraphs. > +#ifndef STATISTIC_H > +#define STATISTIC_H > + > +#include <linux/fs.h> > +#include <linux/types.h> > +#include <linux/percpu.h> > + > +#define STATISTIC_ROOT_DIR "statistics" > + > +#define STATISTIC_FILENAME_DATA "data" > +#define STATISTIC_FILENAME_DEF "definition" > + > +#define STATISTIC_NEED_BARRIER 1 Meta-comment about this file, does most of the stuff in this file, really belong here? At first glance, this should only hold the public interface to the statistic code, not everything else needed by the internal workings of that code. It looks like it could be made a lot smaller. > +enum statistic_state { > + STATISTIC_STATE_INVALID, > + STATISTIC_STATE_UNCONFIGURED, > + STATISTIC_STATE_RELEASED, > + STATISTIC_STATE_OFF, > + STATISTIC_STATE_ON > +}; > + > +enum statistic_type { > + STATISTIC_TYPE_COUNTER_INC, > + STATISTIC_TYPE_COUNTER_PROD, > + STATISTIC_TYPE_UTIL, > + STATISTIC_TYPE_HISTOGRAM_LIN, > + STATISTIC_TYPE_HISTOGRAM_LOG2, > + STATISTIC_TYPE_SPARSE, > + STATISTIC_TYPE_NONE > +}; Make these bit-safe so sparse can catch mistakes? > +#define STATISTIC_FLAGS_NOINCR 0x01 What's this for? > +/** > + * struct statistic_info - description of a class of statistics > + * @name: pointer to name name string > + * @x_unit: pointer to string describing unit of X of (X, Y) data pair > + * @y_unit: pointer to string describing unit of Y of (X, Y) data pair > + * @flags: only flag so far (distinction of incremental and other statistic) > + * @defaults: pointer to string describing defaults setting for attributes > + * > + * Exploiters must setup an array of struct statistic_info for a > + * corresponding array of struct statistic, which are then pointed to > + * by struct statistic_interface. > + * > + * Struct statistic_info and all members and addressed strings must stay for > + * the lifetime of corresponding statistics created with statistic_create(). > + * > + * Except for the name string, all other members may be left blank. > + * It would be nice of exploiters to fill it out completely, though. > + */ > +struct statistic_info { > +/* public: */ > + char *name; > + char *x_unit; > + char *y_unit; > + int flags; > + char *defaults; > +}; The whole "public:" and "private:" thing in these structures is not needed. Just document it in the kernel-doc comments and you should be fine. This isn't C++ :) > +struct sgrb_seg { > + struct list_head list; > + char *address; > + int offset; > + int size; > +}; > + > +struct statistic_file_private { > + struct list_head read_seg_lh; > + struct list_head write_seg_lh; > + size_t write_seg_total_size; > +}; > + > +struct statistic_merge_private { > + struct statistic *stat; > + spinlock_t lock; > + void *dst; > +}; I'm guessing these three structures aren't needed here. Otherwise, please document them. > +#ifdef CONFIG_STATISTICS Why ifdef now, so late? > +extern int statistic_create(struct statistic_interface *, const char *); > +extern int statistic_remove(struct statistic_interface *); > + > +/** > + * statistic_add - update statistic with incremental data in (X, Y) pair > + * @stat: struct statistic array > + * @i: index of statistic to be updated > + * @value: X > + * @incr: Y > + * > + * The actual processing of the (X, Y) data pair is determined by the current > + * the definition applied to the statistic. See Documentation/statistics.txt. > + * > + * This variant takes care of protecting per-cpu data. It is preferred whenever > + * exploiters don't update several statistics of the same entity in one go. > + */ > +static inline void statistic_add(struct statistic *stat, int i, > + s64 value, u64 incr) > +{ > + unsigned long flags; > + local_irq_save(flags); > + if (stat[i].state == STATISTIC_STATE_ON) > + stat[i].add(&stat[i], smp_processor_id(), value, incr); > + local_irq_restore(flags); > +} These are all inline, which I guess is acceptable. But see the current inline-or-not comments on lkml which may make you rethink this. > +/** > + * statistic_add_nolock - update statistic with incremental data in (X, Y) pair > + * @stat: struct statistic array > + * @i: index of statistic to be updated > + * @value: X > + * @incr: Y > + * > + * The actual processing of the (X, Y) data pair is determined by the current > + * definition applied to the statistic. See Documentation/statistics.txt. > + * > + * This variant leaves protecting per-cpu data to exploiters. It is preferred > + * whenever exploiters update several statistics of the same entity in one go. > + */ > +static inline void statistic_add_nolock(struct statistic *stat, int i, > + s64 value, u64 incr) > +{ > + if (stat[i].state == STATISTIC_STATE_ON) > + stat[i].add(&stat[i], smp_processor_id(), value, incr); > +} > + > +/** > + * statistic_inc - update statistic with incremental data in (X, 1) pair > + * @stat: struct statistic array > + * @i: index of statistic to be updated > + * @value: X > + * > + * The actual processing of the (X, Y) data pair is determined by the current > + * definition applied to the statistic. See Documentation/statistics.txt. > + * > + * This variant takes care of protecting per-cpu data. It is preferred whenever > + * exploiters don't update several statistics of the same entity in one go. > + */ > +static inline void statistic_inc(struct statistic *stat, int i, s64 value) > +{ > + unsigned long flags; > + local_irq_save(flags); > + if (stat[i].state == STATISTIC_STATE_ON) > + stat[i].add(&stat[i], smp_processor_id(), value, 1); > + local_irq_restore(flags); > +} Shouldn't this just call statistic_add() with a incr of 1? > + > +/** > + * statistic_inc_nolock - update statistic with incremental data in (X, 1) pair > + * @stat: struct statistic array > + * @i: index of statistic to be updated > + * @value: X > + * > + * The actual processing of the (X, Y) data pair is determined by the current > + * definition applied to the statistic. See Documentation/statistics.txt. > + * > + * This variant leaves protecting per-cpu data to exploiters. It is preferred > + * whenever exploiters update several statistics of the same entity in one go. > + */ > +static inline void statistic_inc_nolock(struct statistic *stat, int i, > + s64 value) > +{ > + if (stat[i].state == STATISTIC_STATE_ON) > + stat[i].add(&stat[i], smp_processor_id(), value, 1); > +} Shouldn't this just call statistic_add_nolock with a incr of 1? > diff -puN /dev/null lib/Kconfig.statistic > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > +++ devel-akpm/lib/Kconfig.statistic 2006-06-09 15:22:58.000000000 -0700 > @@ -0,0 +1,11 @@ > +config STATISTICS > + bool "Statistics infrastructure" > + depends on DEBUG_FS > + help > + The statistics infrastructure provides a debug-fs based user interface No "-" in debugfs :) > + for statistics of kernel components, that is, usually device drivers. Why mention drivers? Other things might use this (see original comments at the start of the message.) > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > +++ devel-akpm/lib/statistic.c 2006-06-09 15:22:58.000000000 -0700 > @@ -0,0 +1,1459 @@ > +/* > + * lib/statistic.c > + * statistics facility > + * > + * Copyright (C) 2005, 2006 > + * IBM Deutschland Entwicklung GmbH, > + * IBM Corporation > + * > + * Author(s): Martin Peschke (mpeschke@de.ibm.com), > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2, or (at your option) > + * any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Again with the verbose license :) > +static void _statistic_barrier(void *unused) > +{ > +} > + > +static inline int statistic_stop(struct statistic *stat) > +{ > + stat->stopped = sched_clock(); > + stat->state = STATISTIC_STATE_OFF; > + /* ensures that all CPUs have ceased updating statistics */ > + smp_mb(); > + on_each_cpu(_statistic_barrier, NULL, 0, 1); > + return 0; > +} Isn't there a way to use rcu for this instead? Just a suggestion, it might be totally wrong... > + > +static int statistic_transition(struct statistic *stat, > + struct statistic_info *info, > + enum statistic_state requested_state) > +{ > + int z = (requested_state < stat->state ? 1 : 0); > + int retval = -EINVAL; int retval = 0; > + > + while (stat->state != requested_state) { > + switch (stat->state) { > + case STATISTIC_STATE_INVALID: > + retval = ( z ? -EINVAL : statistic_initialise(stat) ); > + break; > + case STATISTIC_STATE_UNCONFIGURED: > + retval = ( z ? statistic_uninitialise(stat) > + : statistic_define(stat) ); > + break; > + case STATISTIC_STATE_RELEASED: > + retval = ( z ? statistic_initialise(stat) > + : statistic_alloc(stat, info) ); > + break; > + case STATISTIC_STATE_OFF: > + retval = ( z ? statistic_free(stat, info) > + : statistic_start(stat) ); > + break; > + case STATISTIC_STATE_ON: > + retval = ( z ? statistic_stop(stat) : -EINVAL ); > + break; > + } > + if (unlikely(retval)) > + return retval; delete these two lines. > + } > + return 0; return retval; > +static match_table_t statistic_match_type = { > + {1, "type=%s"}, > + {9, NULL} > +}; named field initializers please. > +static match_table_t statistic_match_common = { > + {STATISTIC_STATE_UNCONFIGURED, "state=unconfigured"}, > + {STATISTIC_STATE_RELEASED, "state=released"}, > + {STATISTIC_STATE_OFF, "state=off"}, > + {STATISTIC_STATE_ON, "state=on"}, > + {1001, "name=%s"}, > + {1002, "data=reset"}, > + {1003, "defaults"}, > + {9999, NULL} > +}; Same here. And why do you have numbers and a mix of enums here? Shouldn't you define the name=, data= and defaults too? Also, just null terminate the list, is 9999 really needed? > +static struct statistic_discipline statistic_discs[] = { > + { /* STATISTIC_TYPE_COUNTER_INC */ > + NULL, > + statistic_alloc_generic, > + NULL, > + statistic_reset_counter, > + statistic_merge_counter, > + statistic_fdata_counter, > + NULL, > + statistic_add_counter_inc, > + statistic_set_counter_inc, > + "counter_inc", sizeof(u64) > + }, named initializers please. That will let you not have to specify the NULL fields, making it much easier to read overall. thanks, greg k-h ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-13 23:47 ` statistics infrastructure (in -mm tree) review Greg KH @ 2006-06-14 0:18 ` Randy.Dunlap 2006-06-14 16:45 ` Greg KH 2006-06-14 22:48 ` Martin Peschke 2006-06-14 5:04 ` Andi Kleen 2006-06-17 10:30 ` Martin Peschke 2 siblings, 2 replies; 166+ messages in thread From: Randy.Dunlap @ 2006-06-14 0:18 UTC (permalink / raw) To: Greg KH; +Cc: mp3, akpm, linux-kernel On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote: > First cut at reviewing this code. > > Initial impression is, "damm, that's a complex interface". I'd really > like to see some other, real-world usages of this. Like perhaps the > io-schedular statistics? Some other /proc stats that have nothing to do > with processes? Agreed with complexity. > And what does this mean for relayfs? Those developers tuned that code > to the nth degree to get speed and other goodness, and here you go just > ignoring that stuff and add yet another way to get stats out of the > kernel. Why should I use this instead of my own code with relayfs? Good questions. > And is the need for the in-kernel parser really necessary? I know it > makes the userspace tools simpler (cat and echo), but should we be > telling the kernel how to filter and adjust the data? Shouldn't we just > dump it all to userspace and use tools there to manipulate it? I agree again. > Code comments now: > > > > diff -puN /dev/null include/linux/statistic.h > > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > > +++ devel-akpm/include/linux/statistic.h 2006-06-09 15:22:58.000000000 -0700 > > @@ -0,0 +1,348 @@ > > +/* > > + * include/linux/statistic.h > > + * > > + * Statistics facility > > +/** > > + * struct statistic_info - description of a class of statistics > > + * @name: pointer to name name string > > + * @x_unit: pointer to string describing unit of X of (X, Y) data pair > > + * @y_unit: pointer to string describing unit of Y of (X, Y) data pair > > + * @flags: only flag so far (distinction of incremental and other statistic) > > + * @defaults: pointer to string describing defaults setting for attributes > > + * > > + * Exploiters must setup an array of struct statistic_info for a > > + * corresponding array of struct statistic, which are then pointed to > > + * by struct statistic_interface. > > + * > > + * Struct statistic_info and all members and addressed strings must stay for > > + * the lifetime of corresponding statistics created with statistic_create(). > > + * > > + * Except for the name string, all other members may be left blank. > > + * It would be nice of exploiters to fill it out completely, though. > > + */ > > +struct statistic_info { > > +/* public: */ > > + char *name; > > + char *x_unit; > > + char *y_unit; > > + int flags; > > + char *defaults; > > +}; > > The whole "public:" and "private:" thing in these structures is not > needed. Just document it in the kernel-doc comments and you should be > fine. This isn't C++ :) but public: and private: are kernel-doc comments... Using "private:" causes those fields to be omitted from the generated documentation because those fields are for internal/private use of the (statistics) infrastructure code, not to be used by its clients (er, ugh, exploiters) etc. > > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > > +++ devel-akpm/lib/statistic.c 2006-06-09 15:22:58.000000000 -0700 > > @@ -0,0 +1,1459 @@ > > +/* > > + * lib/statistic.c > > + * statistics facility > > + * > Again with the verbose license :) Well it's not uncommon in kernel source files. Where do we document how licenses should be written? --- ~Randy ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-14 0:18 ` Randy.Dunlap @ 2006-06-14 16:45 ` Greg KH 2006-06-14 22:48 ` Martin Peschke 1 sibling, 0 replies; 166+ messages in thread From: Greg KH @ 2006-06-14 16:45 UTC (permalink / raw) To: Randy.Dunlap; +Cc: mp3, akpm, linux-kernel On Tue, Jun 13, 2006 at 05:18:27PM -0700, Randy.Dunlap wrote: > On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote: > > > +/** > > > + * struct statistic_info - description of a class of statistics > > > + * @name: pointer to name name string > > > + * @x_unit: pointer to string describing unit of X of (X, Y) data pair > > > + * @y_unit: pointer to string describing unit of Y of (X, Y) data pair > > > + * @flags: only flag so far (distinction of incremental and other statistic) > > > + * @defaults: pointer to string describing defaults setting for attributes > > > + * > > > + * Exploiters must setup an array of struct statistic_info for a > > > + * corresponding array of struct statistic, which are then pointed to > > > + * by struct statistic_interface. > > > + * > > > + * Struct statistic_info and all members and addressed strings must stay for > > > + * the lifetime of corresponding statistics created with statistic_create(). > > > + * > > > + * Except for the name string, all other members may be left blank. > > > + * It would be nice of exploiters to fill it out completely, though. > > > + */ > > > +struct statistic_info { > > > +/* public: */ > > > + char *name; > > > + char *x_unit; > > > + char *y_unit; > > > + int flags; > > > + char *defaults; > > > +}; > > > > The whole "public:" and "private:" thing in these structures is not > > needed. Just document it in the kernel-doc comments and you should be > > fine. This isn't C++ :) > > but public: and private: are kernel-doc comments... > Using "private:" causes those fields to be omitted from the > generated documentation because those fields are for internal/private > use of the (statistics) infrastructure code, not to be used by > its clients (er, ugh, exploiters) etc. Oh, I didn't realize that kerneldoc could do that now, nice. And look, it's even documented that it can support that, I'll shut up now :) thanks, greg k-h ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-14 0:18 ` Randy.Dunlap 2006-06-14 16:45 ` Greg KH @ 2006-06-14 22:48 ` Martin Peschke 2006-06-19 22:12 ` Greg KH 1 sibling, 1 reply; 166+ messages in thread From: Martin Peschke @ 2006-06-14 22:48 UTC (permalink / raw) To: Randy.Dunlap, Greg KH, akpm, Andi Kleen; +Cc: linux-kernel Randy.Dunlap wrote: > On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote: > >> First cut at reviewing this code. >> >> Initial impression is, "damm, that's a complex interface". I'd really >> like to see some other, real-world usages of this. Like perhaps the >> io-schedular statistics? Some other /proc stats that have nothing to do >> with processes? > > Agreed with complexity. Well, roughly 1500 lines of code is sort of complex, even if already being reviewed and cleaned up several times. Please, let's try to break it down into design details that add their measure of complexity. A flat "too comlex" doesn't help on. Could you ACK / NACK the following assumptions, so that we can figure out how far our (dis)agreement goes and how to continue? 1) There are various kernel components that gather statistical data. (kernel/profile.c, network stack, genhd, memory management, taskstats, S390 DASD driver, zfcp driver, ...). Requirements for other statistics aren't unusual. 2) Basically, they all implement similar things (smp-safe and efficient data structures used for data gathering, implement algorithms for on-the-fly data preprocessing, delivery of data through some user interface, sometimes a switch for turning statistics on/off) 3) They all introduce their own macros / functions, resulting in code duplication (bad), while they usually have their unique way to show data to users (bad, too). 4) Possible ways to aggregate statistics data include plain counters, histograms, a utilisation indicator (min, max, average etc.), and potentially other algorithms people might come up with. 5) Statistics counters should be maintained in kernel. That's cheapest. No bursts of zillions of incremental updates relayed to user space. (please see also other comment at bottom of message) 6) Some library routines would suffice to take over data gathering and preprocessing. Avoids further code duplication, avoids bugs, speeds up development and test. 7) With regard to the delivery of statistic data to user land, a library maintaining statistic counters, histograms or whatever on behalf of exploiters doesn't need any help from the exploiter. We can avoid the usual callbacks and code bloat in exploiters this way. 8) If some library functions are responsible for showing data, and the exploiter is not, we can achieve a common format for statistics data. For example, a histogram about block I/O has the same format as a histogram about network I/O. This provides ease of use and minimises the effort of writing scripts that could do further processing (e.g. formatting as spreadsheats or bar charts, comparison and summarisation of statistics, ...) 9) For performance reasons, per-cpu data and minimal locking (local_irq_save/restore) should be used. Adds to complexity, though. 10) If data is per-cpu, we want to be very careful with regard to memory footprint. That is why, memory is only allocated for online cpus (requires cpu hot(un)plug handling, which adds to complexity), 11) At least for data processing modes more expensive than plain counters, like histograms, an on/off state makes sense. 12) In order to minimise the memory footprint, a released/allocated state makes sense. 13) Unconfigured/released/off/on states should be handled by a tiny state machine and a single check on statistic updates. 14) Kernel code delivering statistics data through library routines can, at best, guess whether a user wants incremental updates be aggregated in a single counter, a set of counters (histograms), or in the form of other results. Users might want to change how much detail is retained in aggregated statistic results. Adds to complexity. 15) Nonetheless, exploiters are kindly requested to provide some default settings that are a good starting point for general purpose use. 16) Aggregated statistic results, in many cases, don't need to be pushed to user space through a high-speed, high-volume interface. Debugfs, for example, is fine for this purpose. 17) If the requirement for pushing data comes up anyway, we could, for example, add relay-entries in debugfs anytime. (For example, we could implement forwarding of incremental updates to user space. Just another conceivable data processing mode that fits into the current design.) 18) The programming interface of a statistics library can be rougly as simple as statistic_create(), statistics_remove(), statistic_add(). 19) Statistic_add() should come in different flavours: statistic_add/inc() (just for convenience), and statistic_*_nolock() (more efficient locking for a bundle of updates) 20) Statistic_add() takes a (X, Y) pair, with X being the main characteristics of the statistics (e.g. a request size) and with Y quantifying the update reported for a particular X (e.g. number of observed requests of a particular request size). 21) Processing of (X, Y) according to abstract rules imposed by counters, histograms etc. doesn't require any knowledge about the semantics of X or Y. 22) There might be statistic counters that exploiters want to use and maintain on their own, and which users still may want to have a look at along with other statistics. Statistic_set() fits in here nicely. >> And what does this mean for relayfs? Those developers tuned that code >> to the nth degree to get speed and other goodness, and here you go just >> ignoring that stuff and add yet another way to get stats out of the >> kernel. Why should I use this instead of my own code with relayfs? > > Good questions. Relayfs is a nice feature, but not appropriate here. For example, during a performance measurements I have seen SCSI I/O related statistics being updated millions of times while I was just having a short lunch break. Some of them just increased a counter, which is pretty fast if done immediately in the kernel. If all these updates update would have to be relayed to user space to just increase a counter maintained in user space.. urgh, surely more expensive and not the way to go. And what if user space isn't interested at all? Would we keep pumping zillions of unused updates into buffers instead of discarding them right away? Profile.c, taskstats, genhd and all the other statistics listed above... they all maintain their counters in the kernel and show aggregated statistics to users. >> And is the need for the in-kernel parser really necessary? I know it >> makes the userspace tools simpler (cat and echo), but should we be >> telling the kernel how to filter and adjust the data? Shouldn't we just >> dump it all to userspace and use tools there to manipulate it? > > I agree again. Assumimg we can agree on in-kernel counters, histograms etc. allowing for attributes being adjusted by users makes sense. The parser stuff required for these attributes is implemented using match_token() & friends, which should be acceptible. But, I think that the standard way of using match_token() and strsep() needs improvement (strsep is destructive to strings parsed, which is painful). Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-14 22:48 ` Martin Peschke @ 2006-06-19 22:12 ` Greg KH 2006-06-20 15:40 ` Martin Peschke 0 siblings, 1 reply; 166+ messages in thread From: Greg KH @ 2006-06-19 22:12 UTC (permalink / raw) To: Martin Peschke; +Cc: Randy.Dunlap, akpm, Andi Kleen, linux-kernel On Thu, Jun 15, 2006 at 12:48:29AM +0200, Martin Peschke wrote: > Randy.Dunlap wrote: > >On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote: > > > >>First cut at reviewing this code. > >> > >>Initial impression is, "damm, that's a complex interface". I'd really > >>like to see some other, real-world usages of this. Like perhaps the > >>io-schedular statistics? Some other /proc stats that have nothing to do > >>with processes? > > > >Agreed with complexity. > > Well, roughly 1500 lines of code is sort of complex, even if already > being reviewed and cleaned up several times. > > Please, let's try to break it down into design details that add > their measure of complexity. A flat "too comlex" doesn't help on. > > Could you ACK / NACK the following assumptions, so that we can > figure out how far our (dis)agreement goes and how to continue? Sure. > 1) There are various kernel components that gather statistical data. > (kernel/profile.c, network stack, genhd, memory management, taskstats, > S390 DASD driver, zfcp driver, ...). Requirements for other statistics > aren't unusual. Agreed. > 2) Basically, they all implement similar things (smp-safe and efficient > data structures used for data gathering, implement algorithms for > on-the-fly data preprocessing, delivery of data through some user > interface, sometimes a switch for turning statistics on/off) Agreed. > 3) They all introduce their own macros / functions, resulting in code > duplication (bad), while they usually have their unique way to show > data to users (bad, too). Agreed. > 4) Possible ways to aggregate statistics data include plain counters, > histograms, a utilisation indicator (min, max, average etc.), and > potentially other algorithms people might come up with. agreed. > 5) Statistics counters should be maintained in kernel. That's cheapest. > No bursts of zillions of incremental updates relayed to user space. > (please see also other comment at bottom of message) agreed. > 6) Some library routines would suffice to take over data gathering > and preprocessing. Avoids further code duplication, avoids bugs, > speeds up development and test. As long as the library functions do not cause any speed degradations, which I think your current ones do with the pointer dereference (which is very slow and measurable on some archs). > 7) With regard to the delivery of statistic data to user land, > a library maintaining statistic counters, histograms or whatever > on behalf of exploiters doesn't need any help from the exploiter. > We can avoid the usual callbacks and code bloat in exploiters > this way. I don't really understand what you are stating here. > 8) If some library functions are responsible for showing data, and the > exploiter is not, we can achieve a common format for statistics data. > For example, a histogram about block I/O has the same format as > a histogram about network I/O. > This provides ease of use and minimises the effort of writing > scripts that could do further processing (e.g. formatting as > spreadsheats or bar charts, comparison and summarisation of > statistics, ...) Common functionality and formats would be wonderful. But I'm not sure you can guarantee that we really want the network io and block io statistics in the same format, as they are fundimentally different things. Also, you will have to live with the existing interfaces, as we can't break them, so porting them will not happen. > 9) For performance reasons, per-cpu data and minimal locking > (local_irq_save/restore) should be used. > Adds to complexity, though. If necessary. Is this really necessary? > 10) If data is per-cpu, we want to be very careful with regard to > memory footprint. That is why, memory is only allocated for online > cpus (requires cpu hot(un)plug handling, which adds to complexity), Agreed. > 11) At least for data processing modes more expensive than plain > counters, like histograms, an on/off state makes sense. So that userspace can tell the kernel to go faster? I don't know why this is really necessary :) > 12) In order to minimise the memory footprint, a released/allocated > state makes sense. Again, telling userspace when to tell the kernel to free up memory can cause problems. > 13) Unconfigured/released/off/on states should be handled by a tiny > state machine and a single check on statistic updates. Ok, but you are now getting into implementation issues, like a few of the above ones... > 14) Kernel code delivering statistics data through library routines > can, at best, guess whether a user wants incremental updates be > aggregated in a single counter, a set of counters (histograms), or > in the form of other results. Users might want to change how much > detail is retained in aggregated statistic results. > Adds to complexity. Complexity where? Userspace or in the kernel? > 15) Nonetheless, exploiters are kindly requested to provide some > default settings that are a good starting point for general > purpose use. > > 16) Aggregated statistic results, in many cases, don't need to be > pushed to user space through a high-speed, high-volume interface. > Debugfs, for example, is fine for this purpose. > > 17) If the requirement for pushing data comes up anyway, we could, > for example, add relay-entries in debugfs anytime. > (For example, we could implement forwarding of incremental > updates to user space. Just another conceivable data processing > mode that fits into the current design.) > > 18) The programming interface of a statistics library can be rougly as > simple as statistic_create(), statistics_remove(), statistic_add(). > > 19) Statistic_add() should come in different flavours: > statistic_add/inc() (just for convenience), and > statistic_*_nolock() (more efficient locking for a bundle of updates) > > 20) Statistic_add() takes a (X, Y) pair, with X being the main > characteristics of the statistics (e.g. a request size) and with > Y quantifying the update reported for a particular X (e.g. number > of observed requests of a particular request size). > > 21) Processing of (X, Y) according to abstract rules imposed by > counters, histograms etc. doesn't require any knowledge about the > semantics of X or Y. > > 22) There might be statistic counters that exploiters want to use and > maintain on their own, and which users still may want to have a look at > along with other statistics. Statistic_set() fits in here nicely. Ok, these are all implementation details. Can you please step back a bit? What is the requirements that you are trying to achieve here? A kernel-wide statistic gathering library? If so, why? What has caused this to be needed? And if it's needed, would putting the stuff in debugfs for _all_ statistics really be a good idea (hint, I would say no...) > >>And what does this mean for relayfs? Those developers tuned that code > >>to the nth degree to get speed and other goodness, and here you go just > >>ignoring that stuff and add yet another way to get stats out of the > >>kernel. Why should I use this instead of my own code with relayfs? > > > > Good questions. > > Relayfs is a nice feature, but not appropriate here. > > For example, during a performance measurements I have seen > SCSI I/O related statistics being updated millions of times while > I was just having a short lunch break. Some of them just increased > a counter, which is pretty fast if done immediately in the kernel. > If all these updates update would have to be relayed to user space > to just increase a counter maintained in user space.. urgh, surely > more expensive and not the way to go. > > And what if user space isn't interested at all? Would we keep > pumping zillions of unused updates into buffers instead of > discarding them right away? Yes, for simple counters, relayfs is overkill. But so is an indirect function call through a pointer for every simple counter update :) > Profile.c, taskstats, genhd and all the other statistics listed > above... they all maintain their counters in the kernel and > show aggregated statistics to users. Yes, but will you be allowed to port the existing users over to your new framework without breaking any userspace stuff? I don't see that happening :( > >>And is the need for the in-kernel parser really necessary? I know it > >>makes the userspace tools simpler (cat and echo), but should we be > >>telling the kernel how to filter and adjust the data? Shouldn't we just > >>dump it all to userspace and use tools there to manipulate it? > > > >I agree again. > > Assumimg we can agree on in-kernel counters, histograms etc. > allowing for attributes being adjusted by users makes sense. > > The parser stuff required for these attributes is implemented > using match_token() & friends, which should be acceptible. > But, I think that the standard way of using match_token() and > strsep() needs improvement (strsep is destructive to strings > parsed, which is painful). Yeah, the parser isn't as bad as I originally thought it was. But overall, I'm still not sold on the real need for this kind of subsystem/library. thanks, greg k-h ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-19 22:12 ` Greg KH @ 2006-06-20 15:40 ` Martin Peschke 2006-06-20 16:50 ` Randy.Dunlap 0 siblings, 1 reply; 166+ messages in thread From: Martin Peschke @ 2006-06-20 15:40 UTC (permalink / raw) To: Greg KH; +Cc: Randy.Dunlap, akpm, Andi Kleen, linux-kernel Greg KH wrote: >> 6) Some library routines would suffice to take over data gathering >> and preprocessing. Avoids further code duplication, avoids bugs, >> speeds up development and test. > > As long as the library functions do not cause any speed degradations, > which I think your current ones do with the pointer dereference (which > is very slow and measurable on some archs). Implementation detail ;-) I will post another statistic_add() derivate that requires the caller to specify the way data aggregation should be done, and which, consequently, won't have an indirect function call. Then callers / clients / programmers can chose between higher flexibility and best performance, depending on the requirements of their statistics. >> 7) With regard to the delivery of statistic data to user land, >> a library maintaining statistic counters, histograms or whatever >> on behalf of exploiters doesn't need any help from the exploiter. >> We can avoid the usual callbacks and code bloat in exploiters >> this way. > > I don't really understand what you are stating here. Sorry. 1,$s/exploiter/client/g Any device driver or whatever gathering statistics data currently has code dealing with showing the data. Usually, they have some callbacks for procfs, sysfs or whatever. My point is that, if a library keeps track of statistics on behalf of its clients, no client needs to be called back in order to merge, format, copy, etc. data being shown to users. The library can handle as a background operation without disturbing clients. >> 8) If some library functions are responsible for showing data, and the >> exploiter is not, we can achieve a common format for statistics data. >> For example, a histogram about block I/O has the same format as >> a histogram about network I/O. >> This provides ease of use and minimises the effort of writing >> scripts that could do further processing (e.g. formatting as >> spreadsheats or bar charts, comparison and summarisation of >> statistics, ...) > > Common functionality and formats would be wonderful. But I'm not sure > you can guarantee that we really want the network io and block io > statistics in the same format, as they are fundimentally different > things. Subsystems are free to gather as many/few statistics as required. And I am not trying to enforce semantics. All I am saying is that, if two statistics are aggregated using similar algorithms, then the results should be presented or formatted in a similar way. My assumption is that the format of results doesn't depend on the the semantics of the data feeding a statistic. But it depends on the way we aggregate data. For example, there is no reason why statistic A of subsystem 1 aggregated in the form of a histogram should have a different format than statistic B of subsystem 2 also being aggregated in the form of a histogram. A <=0 0 A <=1 0 A <=2 3 A <=4 7 A <=8 29 A <=16 285 A <=32 295 A <=64 96 A <=128 52 A <=256 3 A >256 1 B <=10 1 B <=20 3 B <=30 92 B <=40 251 ... B <=490 34462 B <=500 23434 B >500 0 Semantics are different; statistic names are different; number of buckets, "diameter" of buckets, scale etc. might be different; basic format of results is identical - as long as both statistics are aggregated the same way (as histograms, in this case). A library can provide a common format, because semantics just don't matter. Its statistic_add() function (or whatever we want to call it) has no idea about the actual semantics of the incremental statistic data it accepts and processes according to abstract rules. And I think a library should provide a common format, because it makes it fun poking in the aggregated data, and writing a script that does further processing of that data. > Also, you will have to live with the existing interfaces, as we can't > break them, so porting them will not happen. Okay. A library could help to avoid a further proliferation of interfaces. >> 9) For performance reasons, per-cpu data and minimal locking >> (local_irq_save/restore) should be used. >> Adds to complexity, though. > > If necessary. Is this really necessary? I would think so. My initial patch was criticised for not using per-cpu data and, therewith, requiring more expensive locking. Besides, all other serious statistic implementations use per-cpu data (kernel/profile.c, include/linux/genhd.h, ...) >> 10) If data is per-cpu, we want to be very careful with regard to >> memory footprint. That is why, memory is only allocated for online >> cpus (requires cpu hot(un)plug handling, which adds to complexity), > > Agreed. > >> 11) At least for data processing modes more expensive than plain >> counters, like histograms, an on/off state makes sense. > > So that userspace can tell the kernel to go faster? I don't know why > this is really necessary :) Okay, here are two functions implemting two different ways data can be aggregated: static void statistic_add_counter_inc(struct statistic *stat, int cpu, s64 value, u64 incr) { *(u64*)stat->pdata->ptrs[cpu] += incr; } static void statistic_add_histogram_log2(struct statistic *stat, int cpu, s64 value, u64 incr) { int i = statistic_histogram_calc_index_log2(stat, value); ((u64*)stat->pdata->ptrs[cpu])[i] += incr; } with statistic_histogram_calc_index_log2 expanding to: static int statistic_histogram_calc_index_log2(struct statistic *stat, s64 value) { unsigned long long i; for (i = 0; i < stat->u.histogram.last_index && value > statistic_histogram_calc_value_log2(stat, i); i++); return i; } While incrementing a counter might be cheap, updating a histogram is more expensive. First, we need to identify the counter out of a set of counters that is to be incremented. For logarithmic scale, this requires a loop. Checking whether data gathering has been enabled at all might look expensive in the context of a plain counter. It certainly saves cycles for a histogram that users aren't interested in and that haven't been switched on. >> 12) In order to minimise the memory footprint, a released/allocated >> state makes sense. > > Again, telling userspace when to tell the kernel to free up memory can > cause problems. We have to make sure that released memory isn't used anymore. That's what _statistic_barrier() is for. Do you see other issues? >> 14) Kernel code delivering statistics data through library routines >> can, at best, guess whether a user wants incremental updates be >> aggregated in a single counter, a set of counters (histograms), or >> in the form of other results. Users might want to change how much >> detail is retained in aggregated statistic results. >> Adds to complexity. > > Complexity where? Userspace or in the kernel? Complexity in the kernel. Sorry. When a statistics library allows users to chose from about half a dozen ways of aggregating data, then this adds to the complexity of that library to some degree. >> 15) Nonetheless, exploiters are kindly requested to provide some >> default settings that are a good starting point for general >> purpose use. >> >> 16) Aggregated statistic results, in many cases, don't need to be >> pushed to user space through a high-speed, high-volume interface. >> Debugfs, for example, is fine for this purpose. >> >> 17) If the requirement for pushing data comes up anyway, we could, >> for example, add relay-entries in debugfs anytime. >> (For example, we could implement forwarding of incremental >> updates to user space. Just another conceivable data processing >> mode that fits into the current design.) >> >> 18) The programming interface of a statistics library can be rougly as >> simple as statistic_create(), statistics_remove(), statistic_add(). >> >> 19) Statistic_add() should come in different flavours: >> statistic_add/inc() (just for convenience), and >> statistic_*_nolock() (more efficient locking for a bundle of updates) >> >> 20) Statistic_add() takes a (X, Y) pair, with X being the main >> characteristics of the statistics (e.g. a request size) and with >> Y quantifying the update reported for a particular X (e.g. number >> of observed requests of a particular request size). >> >> 21) Processing of (X, Y) according to abstract rules imposed by >> counters, histograms etc. doesn't require any knowledge about the >> semantics of X or Y. >> >> 22) There might be statistic counters that exploiters want to use and >> maintain on their own, and which users still may want to have a look at >> along with other statistics. Statistic_set() fits in here nicely. > > > Ok, these are all implementation details. Maybe. But at least 21) is fundamental, as it provides a base for writing such a library: The library deals with a defined form of data, regardless of the semantics of the data. > Can you please step back a bit? What is the requirements that you are > trying to achieve here? Our customers have serious concerns that Linux has no means to gather SCSI performance data. Making sure we can get data from subsystems, we both provide for better service and give customers a good feeling. Statistics, and SCSI statistics in particular, are seen here as one of the more urgent things and real inhibitors on enterprise level. > A kernel-wide statistic gathering library? Yes, as a by-product of the specific SCSI requirement, so to speak. And, why not :) > If so, why? What has caused this to be needed? A clear distinction between code measuring statistics data and code handling statistics data makes for better code. There is no point in intermixing algorithms for processing statistics data and the semantics of statistics data. So what would you do if you got to write the N-th set of statistic functions? To me it looks like the next logical step to fully abstract statistics code out of a device driver. > And if it's needed, would > putting the stuff in debugfs for _all_ statistics really be a good idea > (hint, I would say no...) May I ask you why you think so. Well, so far I don't see a serious limitation in using debugfs. I think relayfs entries could be used to cover other requirements, if they pop up. And as I have explained, replacing debugfs by something else shouldn't be too difficult. But, I don't see a clear direction regarding this discussion. Or do you suggest that it would make sense to modularise that part of the code, so as to allow for other user interface code being "plugged in" and statistics data being shown through debugfs, procfs, netlink or whatever? >>>> And what does this mean for relayfs? Those developers tuned that code >>>> to the nth degree to get speed and other goodness, and here you go just >>>> ignoring that stuff and add yet another way to get stats out of the >>>> kernel. Why should I use this instead of my own code with relayfs? >>> Good questions. >> Relayfs is a nice feature, but not appropriate here. >> >> For example, during a performance measurements I have seen >> SCSI I/O related statistics being updated millions of times while >> I was just having a short lunch break. Some of them just increased >> a counter, which is pretty fast if done immediately in the kernel. >> If all these updates update would have to be relayed to user space >> to just increase a counter maintained in user space.. urgh, surely >> more expensive and not the way to go. >> >> And what if user space isn't interested at all? Would we keep >> pumping zillions of unused updates into buffers instead of >> discarding them right away? > > Yes, for simple counters, relayfs is overkill. But so is an indirect > function call through a pointer for every simple counter update :) Got it. >> Profile.c, taskstats, genhd and all the other statistics listed >> above... they all maintain their counters in the kernel and >> show aggregated statistics to users. > > Yes, but will you be allowed to port the existing users over to your new > framework without breaking any userspace stuff? I don't see that > happening :( Would it be me porting...? ;-) I see this library as an offering to anybody who is looking for a comfortable and established way to dump statistic data, including me. >>>> And is the need for the in-kernel parser really necessary? I know it >>>> makes the userspace tools simpler (cat and echo), but should we be >>>> telling the kernel how to filter and adjust the data? Shouldn't we just >>>> dump it all to userspace and use tools there to manipulate it? >>> I agree again. >> Assumimg we can agree on in-kernel counters, histograms etc. >> allowing for attributes being adjusted by users makes sense. >> >> The parser stuff required for these attributes is implemented >> using match_token() & friends, which should be acceptible. >> But, I think that the standard way of using match_token() and >> strsep() needs improvement (strsep is destructive to strings >> parsed, which is painful). > > Yeah, the parser isn't as bad as I originally thought it was. But > overall, I'm still not sold on the real need for this kind of > subsystem/library. In my eyes, there are several indications that a library makes sense: We want statistics for various components. Many of the reinvent-the-wheel statistics have similar programming interfaces (e.g. compare disk_stat_add(), dasd_profile_counter(), profile_hit()). There is unnecessary code duplication. There is no need to have statistics user interface code spread throughout the kernel. A library can achieve a common output format, simplyfing user space. A defined programming interface makes it much easier to get a general idea of the statistics being around. An API gives more control and might help to avoid introducing redundant statistics or statistics of lesser importance. I am not saying that such a library has to look exactly like the proposed patches. I think that these patches contain some concepts worth considering. Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-20 15:40 ` Martin Peschke @ 2006-06-20 16:50 ` Randy.Dunlap 2006-06-21 18:51 ` Martin Peschke 0 siblings, 1 reply; 166+ messages in thread From: Randy.Dunlap @ 2006-06-20 16:50 UTC (permalink / raw) To: Martin Peschke; +Cc: greg, akpm, ak, linux-kernel On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote: (I haven't forgotten that I owe you some review/feedback. It's on my long todo list.) > Greg KH wrote: > > >> 7) With regard to the delivery of statistic data to user land, > >> a library maintaining statistic counters, histograms or whatever > >> on behalf of exploiters doesn't need any help from the exploiter. > >> We can avoid the usual callbacks and code bloat in exploiters > >> this way. > > > > I don't really understand what you are stating here. > > Sorry. > 1,$s/exploiter/client/g > > Any device driver or whatever gathering statistics data currently > has code dealing with showing the data. Usually, they have some > callbacks for procfs, sysfs or whatever. > > My point is that, if a library keeps track of statistics on behalf > of its clients, no client needs to be called back in order to > merge, format, copy, etc. data being shown to users. The library > can handle as a background operation without disturbing clients. That could be a good thing. OTOH, it means that the library has to be either all-ways flexible or willing to change to accommodate clients since you can't predict the universe of all clients' requirements. > >> 8) If some library functions are responsible for showing data, and the > >> exploiter is not, we can achieve a common format for statistics data. > >> For example, a histogram about block I/O has the same format as > >> a histogram about network I/O. > >> This provides ease of use and minimises the effort of writing > >> scripts that could do further processing (e.g. formatting as > >> spreadsheats or bar charts, comparison and summarisation of > >> statistics, ...) > > > > Common functionality and formats would be wonderful. But I'm not sure > > you can guarantee that we really want the network io and block io > > statistics in the same format, as they are fundimentally different > > things. > > Subsystems are free to gather as many/few statistics as required. > And I am not trying to enforce semantics. > > All I am saying is that, if two statistics are aggregated using similar > algorithms, then the results should be presented or formatted in a > similar way. Am I reading this correctly? Are you trying to put presentation format in the statistics library in the kernel??? > My assumption is that the format of results doesn't depend on the > the semantics of the data feeding a statistic. But it depends on the > way we aggregate data. > > For example, there is no reason why statistic A of subsystem 1 > aggregated in the form of a histogram should have a different format > than statistic B of subsystem 2 also being aggregated in the form > of a histogram. > > A <=0 0 > A <=1 0 > A <=2 3 > A <=4 7 > A <=8 29 > A <=16 285 > A <=32 295 > A <=64 96 > A <=128 52 > A <=256 3 > A >256 1 > > > B <=10 1 > B <=20 3 > B <=30 92 > B <=40 251 > ... > B <=490 34462 > B <=500 23434 > B >500 0 > > Semantics are different; statistic names are different; > number of buckets, "diameter" of buckets, scale etc. might be different; > basic format of results is identical - as long as both statistics are > aggregated the same way (as histograms, in this case). > > A library can provide a common format, because semantics just don't > matter. Its statistic_add() function (or whatever we want to call it) > has no idea about the actual semantics of the incremental statistic data > it accepts and processes according to abstract rules. > > And I think a library should provide a common format, because it > makes it fun poking in the aggregated data, and writing a script that > does further processing of that data. Do you mean a userspace library here? The statements still apply to a userspace library. > > Also, you will have to live with the existing interfaces, as we can't > > break them, so porting them will not happen. > > Okay. > A library could help to avoid a further proliferation of interfaces. > > >> 9) For performance reasons, per-cpu data and minimal locking > >> (local_irq_save/restore) should be used. > >> Adds to complexity, though. > > > > If necessary. Is this really necessary? > > I would think so. Do your converted clients use all of the stat. infrastructure interfaces or are some of them added just to round out the full API? > >> 14) Kernel code delivering statistics data through library routines > >> can, at best, guess whether a user wants incremental updates be > >> aggregated in a single counter, a set of counters (histograms), or > >> in the form of other results. Users might want to change how much > >> detail is retained in aggregated statistic results. > >> Adds to complexity. > > > > Complexity where? Userspace or in the kernel? > > Complexity in the kernel. Sorry. > > When a statistics library allows users to chose from about half a > dozen ways of aggregating data, then this adds to the complexity > of that library to some degree. > >> 21) Processing of (X, Y) according to abstract rules imposed by > >> counters, histograms etc. doesn't require any knowledge about the > >> semantics of X or Y. > >> > >> 22) There might be statistic counters that exploiters want to use and > >> maintain on their own, and which users still may want to have a look at > >> along with other statistics. Statistic_set() fits in here nicely. > > > > > > Ok, these are all implementation details. > > Maybe. But at least 21) is fundamental, as it provides a base for > writing such a library: The library deals with a defined form of > data, regardless of the semantics of the data. Does 22) make the library somewhat extensible? If not, does anything do that? > > Can you please step back a bit? What is the requirements that you are > > trying to achieve here? > > Our customers have serious concerns that Linux has no means > to gather SCSI performance data. Making sure we can get data from > subsystems, we both provide for better service and give customers > a good feeling. > > Statistics, and SCSI statistics in particular, are seen here as one > of the more urgent things and real inhibitors on enterprise level. > > > A kernel-wide statistic gathering library? > > Yes, as a by-product of the specific SCSI requirement, so to speak. > And, why not :) > > > If so, why? What has caused this to be needed? > > A clear distinction between code measuring statistics data and > code handling statistics data makes for better code. > There is no point in intermixing algorithms for processing > statistics data and the semantics of statistics data. > > So what would you do if you got to write the N-th set of statistic > functions? > > To me it looks like the next logical step to fully abstract > statistics code out of a device driver. > > > And if it's needed, would > > putting the stuff in debugfs for _all_ statistics really be a good idea > > (hint, I would say no...) > > May I ask you why you think so. > > Well, so far I don't see a serious limitation in using debugfs. > I think relayfs entries could be used to cover other requirements, > if they pop up. > > And as I have explained, replacing debugfs by something else > shouldn't be too difficult. > But, I don't see a clear direction regarding this discussion. > > Or do you suggest that it would make sense to modularise that > part of the code, so as to allow for other user interface code > being "plugged in" and statistics data being shown through > debugfs, procfs, netlink or whatever? > > >>>> And what does this mean for relayfs? Those developers tuned that code > >>>> to the nth degree to get speed and other goodness, and here you go just > >>>> ignoring that stuff and add yet another way to get stats out of the > >>>> kernel. Why should I use this instead of my own code with relayfs? > >>> Good questions. > >> Relayfs is a nice feature, but not appropriate here. > >> > >> For example, during a performance measurements I have seen > >> SCSI I/O related statistics being updated millions of times while > >> I was just having a short lunch break. Some of them just increased > >> a counter, which is pretty fast if done immediately in the kernel. > >> If all these updates update would have to be relayed to user space > >> to just increase a counter maintained in user space.. urgh, surely > >> more expensive and not the way to go. Oh really, I wouldn't expect such a poor design (of pushing each counter update to userspace) to be considered seriously. It should be more like a procfs^W sysfs entry at least, or something similar to a MIB, or what iostat does. Does iostat not even come close to what you want for SCSI I/O statistics? > >> And what if user space isn't interested at all? Would we keep > >> pumping zillions of unused updates into buffers instead of > >> discarding them right away? > > > > Yes, for simple counters, relayfs is overkill. But so is an indirect > > function call through a pointer for every simple counter update :) > > Got it. > > >> Profile.c, taskstats, genhd and all the other statistics listed > >> above... they all maintain their counters in the kernel and > >> show aggregated statistics to users. > > > > Yes, but will you be allowed to port the existing users over to your new > > framework without breaking any userspace stuff? I don't see that > > happening :( > > Would it be me porting...? ;-) > > I see this library as an offering to anybody who is looking > for a comfortable and established way to dump statistic data, > including me. > > >>>> And is the need for the in-kernel parser really necessary? I know it > >>>> makes the userspace tools simpler (cat and echo), but should we be > >>>> telling the kernel how to filter and adjust the data? Shouldn't we just > >>>> dump it all to userspace and use tools there to manipulate it? > >>> I agree again. > >> Assumimg we can agree on in-kernel counters, histograms etc. > >> allowing for attributes being adjusted by users makes sense. > >> > >> The parser stuff required for these attributes is implemented > >> using match_token() & friends, which should be acceptible. > >> But, I think that the standard way of using match_token() and > >> strsep() needs improvement (strsep is destructive to strings > >> parsed, which is painful). > > > > Yeah, the parser isn't as bad as I originally thought it was. But > > overall, I'm still not sold on the real need for this kind of > > subsystem/library. > > In my eyes, there are several indications that a library makes sense: > > We want statistics for various components. > Many of the reinvent-the-wheel statistics have similar programming interfaces > (e.g. compare disk_stat_add(), dasd_profile_counter(), profile_hit()). > There is unnecessary code duplication. > There is no need to have statistics user interface code spread throughout > the kernel. > A library can achieve a common output format, simplyfing user space. > A defined programming interface makes it much easier to get a general > idea of the statistics being around. An API gives more control and > might help to avoid introducing redundant statistics or statistics of > lesser importance. > > I am not saying that such a library has to look exactly like the > proposed patches. I think that these patches contain some concepts > worth considering. Thanks. --- ~Randy ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-20 16:50 ` Randy.Dunlap @ 2006-06-21 18:51 ` Martin Peschke 2006-06-21 19:38 ` Matthew Frost 0 siblings, 1 reply; 166+ messages in thread From: Martin Peschke @ 2006-06-21 18:51 UTC (permalink / raw) To: Randy.Dunlap; +Cc: greg, akpm, ak, linux-kernel On Tue, 2006-06-20 at 09:50 -0700, Randy.Dunlap wrote: > On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote: > > Greg KH wrote: > > >> 7) With regard to the delivery of statistic data to user land, > > >> a library maintaining statistic counters, histograms or whatever > > >> on behalf of exploiters doesn't need any help from the exploiter. > > >> We can avoid the usual callbacks and code bloat in exploiters > > >> this way. > > > > > > I don't really understand what you are stating here. > > > > Sorry. > > 1,$s/exploiter/client/g > > > > Any device driver or whatever gathering statistics data currently > > has code dealing with showing the data. Usually, they have some > > callbacks for procfs, sysfs or whatever. > > > > My point is that, if a library keeps track of statistics on behalf > > of its clients, no client needs to be called back in order to > > merge, format, copy, etc. data being shown to users. The library > > can handle as a background operation without disturbing clients. > > That could be a good thing. OTOH, it means that the library > has to be either all-ways flexible or willing to change to > accommodate clients since you can't predict the universe of all > clients' requirements. Right. I have made provisions for that to some degree. First, I could imagine that the statistics data of a client requires a new way its data should be aggregated and, therewith, requires a new form of statistic result being shown to users. I have scanned through the kernel sources for ways of aggregating and showing statistics data. The usual constructs appear to be: - counter - histogram (for intervals), linear scale - histogram (for intervals), logarithmic scale - "histogram" for discrete and sparse values - "utilisation indicator" or "fill level indicator" (num-min-avg-max) These are implemented in my patches. I would expect these to cover most requirements of possible new clients. If another construct would be needed anyway, it can be added to the statistics library by implemententing about half a dozen routines described by struct statistic_discipline. I might be wrong, but I don't think we would see an inflationary growth there. Second, if a client needs to know anyway when users read statistics data, e.g. because it wants to update some statistic then, it can register an optional callback with the statistic infrastructure. This callback is described in struct statistic_interface(). Third, if a client preferred its data being exported to user land through a transport other than debugfs ... okay, then I will need to enhance the statistics library. Moderate effort, I guess. Actually, I already had a private patch that made the library use the evil procfs instead of debugfs. Fourth, if a client would like to take advantage of the library's existing aggregation code, e.g. the library compiles a histogram on behalf of the client, _but_ the client doesn't like the way the result is shown, e.g. the client wants a sysfs file for each bucket instead of a single debugfs file containing all data... well, that would defeat the purpose of the library, if this kind of requirement gets out of hand. OTOH, I don't see a real need for allowing that. Data can be reformatted and rearranged in any possible way in user space. > > >> 8) If some library functions are responsible for showing data, and the > > >> exploiter is not, we can achieve a common format for statistics data. > > >> For example, a histogram about block I/O has the same format as > > >> a histogram about network I/O. > > >> This provides ease of use and minimises the effort of writing > > >> scripts that could do further processing (e.g. formatting as > > >> spreadsheats or bar charts, comparison and summarisation of > > >> statistics, ...) > > > > > > Common functionality and formats would be wonderful. But I'm not sure > > > you can guarantee that we really want the network io and block io > > > statistics in the same format, as they are fundimentally different > > > things. > > > > Subsystems are free to gather as many/few statistics as required. > > And I am not trying to enforce semantics. > > > > All I am saying is that, if two statistics are aggregated using similar > > algorithms, then the results should be presented or formatted in a > > similar way. > > Am I reading this correctly? Are you trying to put presentation > format in the statistics library in the kernel??? Aehm, no. What's needed is simply an understanding between kernel and user space on how statistics data reads. If the interface were ioctl-based, I would need to define some structures containing the data. If the interface was netlink based, I would need to define some packet headers and fields (see taskstats, for example). And so on... Since my proposed interface uses debugfs, I have defined a minimal set of rules describing the ASCII output (compare sample output below): - A file contains all statistics of the measured entity. So, each output line is labeled with the name of the statistic it belongs to. - Each statistic may consist of several pieces, that is, output lines. So, each line of a multi-line statistic has another label, e.g. ">256" marking a histogram's bucket for values >256. These rules merely strive for unambiguousness of the file content. Coincidentally, readability isn't that bad, as well. > > My assumption is that the format of results doesn't depend on the > > the semantics of the data feeding a statistic. But it depends on the > > way we aggregate data. > > > > For example, there is no reason why statistic A of subsystem 1 > > aggregated in the form of a histogram should have a different format > > than statistic B of subsystem 2 also being aggregated in the form > > of a histogram. > > > > A <=0 0 > > A <=1 0 > > A <=2 3 > > A <=4 7 > > A <=8 29 > > A <=16 285 > > A <=32 295 > > A <=64 96 > > A <=128 52 > > A <=256 3 > > A >256 1 > > > > > > B <=10 1 > > B <=20 3 > > B <=30 92 > > B <=40 251 > > ... > > B <=490 34462 > > B <=500 23434 > > B >500 0 > > > > Semantics are different; statistic names are different; > > number of buckets, "diameter" of buckets, scale etc. might be different; > > basic format of results is identical - as long as both statistics are > > aggregated the same way (as histograms, in this case). > > > > A library can provide a common format, because semantics just don't > > matter. Its statistic_add() function (or whatever we want to call it) > > has no idea about the actual semantics of the incremental statistic data > > it accepts and processes according to abstract rules. > > > > And I think a library should provide a common format, because it > > makes it fun poking in the aggregated data, and writing a script that > > does further processing of that data. > > Do you mean a userspace library here? The statements still apply > to a userspace library. No, I am still talking about the kernel's statistic library functions. > > > Also, you will have to live with the existing interfaces, as we can't > > > break them, so porting them will not happen. > > > > Okay. > > A library could help to avoid a further proliferation of interfaces. > > > > >> 9) For performance reasons, per-cpu data and minimal locking > > >> (local_irq_save/restore) should be used. > > >> Adds to complexity, though. > > > > > > If necessary. Is this really necessary? > > > > I would think so. > > Do your converted clients use all of the stat. infrastructure > interfaces or are some of them added just to round out the > full API? My client patches for zfcp and scsi together use all of the statistic infrastructure's features and interfaces, except for the optional callback in statistic_interface and the related statistic_set(). > > >> 21) Processing of (X, Y) according to abstract rules imposed by > > >> counters, histograms etc. doesn't require any knowledge about the > > >> semantics of X or Y. > > >> > > >> 22) There might be statistic counters that exploiters want to use and > > >> maintain on their own, and which users still may want to have a look at > > >> along with other statistics. Statistic_set() fits in here nicely. > > > > > > > > > Ok, these are all implementation details. > > > > Maybe. But at least 21) is fundamental, as it provides a base for > > writing such a library: The library deals with a defined form of > > data, regardless of the semantics of the data. > > Does 22) make the library somewhat extensible? If not, does > anything do that? 21) or 22) ?? I don't understand what you are asking. Extensible in what regard? > > >>>> And what does this mean for relayfs? Those developers tuned that code > > >>>> to the nth degree to get speed and other goodness, and here you go just > > >>>> ignoring that stuff and add yet another way to get stats out of the > > >>>> kernel. Why should I use this instead of my own code with relayfs? > > >>> Good questions. > > >> Relayfs is a nice feature, but not appropriate here. > > >> > > >> For example, during a performance measurements I have seen > > >> SCSI I/O related statistics being updated millions of times while > > >> I was just having a short lunch break. Some of them just increased > > >> a counter, which is pretty fast if done immediately in the kernel. > > >> If all these updates update would have to be relayed to user space > > >> to just increase a counter maintained in user space.. urgh, surely > > >> more expensive and not the way to go. > > Oh really, I wouldn't expect such a poor design (of pushing each > counter update to userspace) to be considered seriously. > It should be more like a procfs^W sysfs entry at least, or something > similar to a MIB, or what iostat does. We are in agreement. > Does iostat not even > come close to what you want for SCSI I/O statistics? Not really. First, it measures from __make_request to end_that_request_last. So it's not very close to the point in time when I/Os hit the wire. Second, it merely provides sums and average values for all requests of a device. It doesn't provide much detail about the actual traffic pattern, like histograms for request latencies and request sizes. Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-21 18:51 ` Martin Peschke @ 2006-06-21 19:38 ` Matthew Frost 2006-06-22 11:43 ` Martin Peschke 0 siblings, 1 reply; 166+ messages in thread From: Matthew Frost @ 2006-06-21 19:38 UTC (permalink / raw) To: Martin Peschke; +Cc: Randy.Dunlap, greg, akpm, ak, linux-kernel Martin Peschke wrote: > On Tue, 2006-06-20 at 09:50 -0700, Randy.Dunlap wrote: >> On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote: >>> Greg KH wrote: >>>>> 7) With regard to the delivery of statistic data to user land, >>>>> a library maintaining statistic counters, histograms or whatever >>>>> on behalf of exploiters doesn't need any help from the exploiter. >>>>> We can avoid the usual callbacks and code bloat in exploiters >>>>> this way. >>>> I don't really understand what you are stating here. >>> Sorry. >>> 1,$s/exploiter/client/g >>> >>> Any device driver or whatever gathering statistics data currently >>> has code dealing with showing the data. Usually, they have some >>> callbacks for procfs, sysfs or whatever. >>> >>> My point is that, if a library keeps track of statistics on behalf >>> of its clients, no client needs to be called back in order to >>> merge, format, copy, etc. data being shown to users. The library >>> can handle as a background operation without disturbing clients. >> That could be a good thing. OTOH, it means that the library >> has to be either all-ways flexible or willing to change to >> accommodate clients since you can't predict the universe of all >> clients' requirements. > > Right. I have made provisions for that to some degree. > > > First, I could imagine that the statistics data of a client requires > a new way its data should be aggregated and, therewith, requires > a new form of statistic result being shown to users. > > I have scanned through the kernel sources for ways of aggregating > and showing statistics data. The usual constructs appear to be: > > - counter > - histogram (for intervals), linear scale > - histogram (for intervals), logarithmic scale > - "histogram" for discrete and sparse values > - "utilisation indicator" or "fill level indicator" (num-min-avg-max) > > These are implemented in my patches. I would expect these to cover most > requirements of possible new clients. So you're saying, as regards "putting presentation format in ... the kernel", that we already have presentation formats specified pell-mell in the kernel. That should then be a non-issue, because you aren't introducing anything new, just centralizing an existing kernel behavior. Do I have you right? > > If another construct would be needed anyway, it can be added to the > statistics library by implemententing about half a dozen routines > described by struct statistic_discipline. I might be wrong, but I don't > think we would see an inflationary growth there. > > -- elision -- > > OTOH, I don't see a real need for allowing that. Data can be reformatted > and rearranged in any possible way in user space. Because you're just providing a range of basic output formats, standardized. So anybody can ask for statistics from the kernel in a preferred output to then massage as needed in userland. ACK? Am I oversimplifying? Matt ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-21 19:38 ` Matthew Frost @ 2006-06-22 11:43 ` Martin Peschke 0 siblings, 0 replies; 166+ messages in thread From: Martin Peschke @ 2006-06-22 11:43 UTC (permalink / raw) To: artusemrys; +Cc: Randy.Dunlap, greg, akpm, ak, linux-kernel Matthew Frost wrote: > Martin Peschke wrote: >> On Tue, 2006-06-20 at 09:50 -0700, Randy.Dunlap wrote: >>> On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote: >>>> Greg KH wrote: >>>>>> 7) With regard to the delivery of statistic data to user land, >>>>>> a library maintaining statistic counters, histograms or whatever >>>>>> on behalf of exploiters doesn't need any help from the exploiter. >>>>>> We can avoid the usual callbacks and code bloat in exploiters >>>>>> this way. >>>>> I don't really understand what you are stating here. >>>> Sorry. >>>> 1,$s/exploiter/client/g >>>> >>>> Any device driver or whatever gathering statistics data currently >>>> has code dealing with showing the data. Usually, they have some >>>> callbacks for procfs, sysfs or whatever. >>>> >>>> My point is that, if a library keeps track of statistics on behalf >>>> of its clients, no client needs to be called back in order to >>>> merge, format, copy, etc. data being shown to users. The library >>>> can handle as a background operation without disturbing clients. >>> That could be a good thing. OTOH, it means that the library >>> has to be either all-ways flexible or willing to change to >>> accommodate clients since you can't predict the universe of all >>> clients' requirements. >> >> Right. I have made provisions for that to some degree. >> >> >> First, I could imagine that the statistics data of a client requires >> a new way its data should be aggregated and, therewith, requires >> a new form of statistic result being shown to users. >> >> I have scanned through the kernel sources for ways of aggregating >> and showing statistics data. The usual constructs appear to be: >> >> - counter >> - histogram (for intervals), linear scale >> - histogram (for intervals), logarithmic scale >> - "histogram" for discrete and sparse values >> - "utilisation indicator" or "fill level indicator" (num-min-avg-max) >> >> These are implemented in my patches. I would expect these to cover most >> requirements of possible new clients. > > So you're saying, as regards "putting presentation format in ... the > kernel", that we already have presentation formats specified pell-mell > in the kernel. That should then be a non-issue, because you aren't > introducing anything new, just centralizing an existing kernel behavior. > Do I have you right? Yes, there seem to be as many formats as statistics. See examples below. My patches can help to improve usuability by providing some common basic formats. IMO, it would not make sense to "enhance" the statistics library in an attempt to emulate all the preexisting statistic output formats. [root@t2930041 ~]# cat /proc/dasd/statistics 31 dasd I/O requests with 392 sectors(512B each) __<4 ___8 __16 __32 __64 _128 _256 _512 __1k __2k __4k __8k _16k _32k _64k 128k _256 _512 __1M __2M __4M __8M _16M _32M _64M 128M 256M 512M __1G __2G __4G _>4G Histogram of sizes (512B secs) 0 0 21 7 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Histogram of I/O times (microseconds) 0 0 0 0 0 0 0 3 6 1 5 6 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <snip> [root@t2930041 ~]# cat /proc/diskstats <snip> 94 0 dasda 67389 1478 1142520 281080 78260 461181 4326752 10280570 0 6392030 10565980 94 1 dasda1 68849 1142272 540838 4326704 94 4 dasdb 27 29 448 0 0 0 0 0 0 0 0 94 5 dasdb1 27 216 0 0 94 8 dasdc 28 29 456 40 0 0 0 0 0 30 40 94 9 dasdc1 28 224 0 0 9 0 md0 0 0 0 0 0 0 0 0 0 0 0 8 0 sda 35423 12268 4340826 284540 8605 275966 2276792 980810 0 219260 1265370 8 16 sdb 36741 12626 4588754 293140 10090 277678 2302400 440010 0 221990 733140 8 32 sdc 36621 11748 4548722 298170 10394 272680 2264736 303580 0 223110 601730 [root@t2930041 ~]# cat /proc/net/stat/arp_cache entries allocs destroys hash_grows lookups hits res_failed rcv_probes_mcast rcv_probes_ucast periodic_gc_runs forced_gc_runs 00000002 0000002c 000000cd 00000000 00000082 00000056 00000000 00000000 00000000 00017306 00000000 00000002 00000031 00000000 00000000 0000007d 0000004c 00000000 00000000 00000000 00000000 00000000 00000002 0000002f 00000000 00000001 00000074 00000045 00000000 00000000 00000000 00000000 00000000 00000002 00000043 00000000 00000000 00000084 00000041 00000000 00000000 00000000 00000000 00000000 >> If another construct would be needed anyway, it can be added to the >> statistics library by implemententing about half a dozen routines >> described by struct statistic_discipline. I might be wrong, but I don't >> think we would see an inflationary growth there. >> >> > -- elision -- >> >> OTOH, I don't see a real need for allowing that. Data can be reformatted >> and rearranged in any possible way in user space. > > Because you're just providing a range of basic output formats, > standardized. So anybody can ask for statistics from the kernel in a > preferred output to then massage as needed in userland. ACK? Am I > oversimplifying? Sounds reasonable. Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-13 23:47 ` statistics infrastructure (in -mm tree) review Greg KH 2006-06-14 0:18 ` Randy.Dunlap @ 2006-06-14 5:04 ` Andi Kleen 2006-06-14 22:49 ` Martin Peschke 2006-06-17 10:30 ` Martin Peschke 2 siblings, 1 reply; 166+ messages in thread From: Andi Kleen @ 2006-06-14 5:04 UTC (permalink / raw) To: Greg KH, mp3; +Cc: akpm, linux-kernel Greg KH <greg@kroah.com> writes: > > + * exploiters don't update several statistics of the same entity in one go. > > + */ > > +static inline void statistic_add(struct statistic *stat, int i, > > + s64 value, u64 incr) > > +{ > > + unsigned long flags; > > + local_irq_save(flags); > > + if (stat[i].state == STATISTIC_STATE_ON) > > + stat[i].add(&stat[i], smp_processor_id(), value, incr); Indirect call in statistics hotpath? You know how slow this is on IA64 and even on other architectures it tends to disrupt the pipeline. Also on i386 the u64s generate quite bad code. That looks like a really bad implementation that shouldn't be used anywhere. -Andi ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-14 5:04 ` Andi Kleen @ 2006-06-14 22:49 ` Martin Peschke 2006-06-16 20:40 ` Greg KH 2006-06-17 6:51 ` Andi Kleen 0 siblings, 2 replies; 166+ messages in thread From: Martin Peschke @ 2006-06-14 22:49 UTC (permalink / raw) To: Andi Kleen; +Cc: Greg KH, akpm, linux-kernel, rdunlap Andi Kleen wrote: > Greg KH <greg@kroah.com> writes: >>> + * exploiters don't update several statistics of the same entity in one go. >>> + */ >>> +static inline void statistic_add(struct statistic *stat, int i, >>> + s64 value, u64 incr) >>> +{ >>> + unsigned long flags; >>> + local_irq_save(flags); >>> + if (stat[i].state == STATISTIC_STATE_ON) >>> + stat[i].add(&stat[i], smp_processor_id(), value, incr); > > > Indirect call in statistics hotpath? You know how slow this is > on IA64 and even on other architectures it tends to disrupt > the pipeline. Okay, let's try to improve it then. The options here are: a) Replace the indirect function call by a switch statement which directly calls the add function of the data processing mode chosen by user. (e.g. simple counter, histogram, utilisation indicator etc.). No loss in functionality, slightly uglier code, acceptable performance(?). This would be my choice. b) Export statistic_add_counter(), statistic_add_histogram() and the like as part of the programming API (maybe in addition to the flexible statistic_add()) for those exploiters that definitively can't effort branching into a function. Loss in functionality (exploiting kernel code dictates how users see the data), a bit faster than option a). What do you think? Did I miss an option? Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-14 22:49 ` Martin Peschke @ 2006-06-16 20:40 ` Greg KH 2006-06-16 21:34 ` Martin Peschke 2006-06-17 6:51 ` Andi Kleen 1 sibling, 1 reply; 166+ messages in thread From: Greg KH @ 2006-06-16 20:40 UTC (permalink / raw) To: Martin Peschke; +Cc: Andi Kleen, akpm, linux-kernel, rdunlap On Thu, Jun 15, 2006 at 12:49:54AM +0200, Martin Peschke wrote: > Andi Kleen wrote: > >Greg KH <greg@kroah.com> writes: > >>>+ * exploiters don't update several statistics of the same entity in one > >>>go. > >>>+ */ > >>>+static inline void statistic_add(struct statistic *stat, int i, > >>>+ s64 value, u64 incr) > >>>+{ > >>>+ unsigned long flags; > >>>+ local_irq_save(flags); > >>>+ if (stat[i].state == STATISTIC_STATE_ON) > >>>+ stat[i].add(&stat[i], smp_processor_id(), value, incr); > > > > > >Indirect call in statistics hotpath? You know how slow this is > >on IA64 and even on other architectures it tends to disrupt > >the pipeline. > > Okay, let's try to improve it then. The options here are: > > a) Replace the indirect function call by a switch statement which directly > calls the add function of the data processing mode chosen by user. > (e.g. simple counter, histogram, utilisation indicator etc.). > > No loss in functionality, slightly uglier code, acceptable > performance(?). > This would be my choice. Probably best. Just don't make it an inline function :) thanks, greg k-h ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-16 20:40 ` Greg KH @ 2006-06-16 21:34 ` Martin Peschke 0 siblings, 0 replies; 166+ messages in thread From: Martin Peschke @ 2006-06-16 21:34 UTC (permalink / raw) To: Andi Kleen; +Cc: Greg KH, akpm, linux-kernel, rdunlap Greg KH wrote: > On Thu, Jun 15, 2006 at 12:49:54AM +0200, Martin Peschke wrote: >> Andi Kleen wrote: >>> Greg KH <greg@kroah.com> writes: >>>>> + * exploiters don't update several statistics of the same entity in one >>>>> go. >>>>> + */ >>>>> +static inline void statistic_add(struct statistic *stat, int i, >>>>> + s64 value, u64 incr) >>>>> +{ >>>>> + unsigned long flags; >>>>> + local_irq_save(flags); >>>>> + if (stat[i].state == STATISTIC_STATE_ON) >>>>> + stat[i].add(&stat[i], smp_processor_id(), value, incr); >>> >>> Indirect call in statistics hotpath? You know how slow this is >>> on IA64 and even on other architectures it tends to disrupt >>> the pipeline. >> Okay, let's try to improve it then. The options here are: >> >> a) Replace the indirect function call by a switch statement which directly >> calls the add function of the data processing mode chosen by user. >> (e.g. simple counter, histogram, utilisation indicator etc.). >> >> No loss in functionality, slightly uglier code, acceptable >> performance(?). >> This would be my choice. > > Probably best. Just don't make it an inline function :) Andi, would this be fine with you? Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-14 22:49 ` Martin Peschke 2006-06-16 20:40 ` Greg KH @ 2006-06-17 6:51 ` Andi Kleen 2006-06-17 11:03 ` Martin Peschke 1 sibling, 1 reply; 166+ messages in thread From: Andi Kleen @ 2006-06-17 6:51 UTC (permalink / raw) To: Martin Peschke; +Cc: Greg KH, akpm, linux-kernel, rdunlap > b) Export statistic_add_counter(), statistic_add_histogram() and the like > as part of the programming API (maybe in addition to the flexible > statistic_add()) for those exploiters that definitively can't effort > branching into a function. > > Loss in functionality (exploiting kernel code dictates how users see > the data), a bit faster than option a). (b) if anything. But do we really need all these weird options anyways? For me it seems you're far overdesigning. > What do you think? Did I miss an option? I think your whole approach is about 10x too complicated. The normal approach in Linux is to start simple and add complexity later as needed. You seem to try to do it the other way around. -Andi ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-17 6:51 ` Andi Kleen @ 2006-06-17 11:03 ` Martin Peschke 0 siblings, 0 replies; 166+ messages in thread From: Martin Peschke @ 2006-06-17 11:03 UTC (permalink / raw) To: Andi Kleen; +Cc: Greg KH, akpm, linux-kernel, rdunlap On Sat, 2006-06-17 at 08:51 +0200, Andi Kleen wrote: > > b) Export statistic_add_counter(), statistic_add_histogram() and the like > > as part of the programming API (maybe in addition to the flexible > > statistic_add()) for those exploiters that definitively can't effort > > branching into a function. > > > > Loss in functionality (exploiting kernel code dictates how users see > > the data), a bit faster than option a). > > (b) if anything. Yes, I have anticipated this choice. I am looking into this option. > But do we really need all these weird options anyways? Which options? Assuming you refer to the distinction of counter, histogram, utilisation indicator etc. ... well, that's what I found when was looking into existing approaches: counters everywhere, histograms for example in the s390 DASD driver, some with linear scale, other with logarithmic scale, counters that only make sense if seen in combination (which made me come up with this utilisation indicator thing), ... I have just been trying to find a simple concept to reconcile various ways of preprocessing statistics data. This is reflected by struct statistic_discipline. > For me it seems you're far overdesigning. > I think your whole approach is about 10x too complicated. I disagree. The programing interface is simple. The modularisation of data processing modes is straight-forward. I have tried to break down my design into a dozen and a half assumptions in my other mail. I am happy to discuss which of them make sense, which of them might be overkill, which might be deferred, etc. But please understand that it is hard for me to guess which 10th part of my design is okay for you, if you don't go into details. A fair share of complexity is caused by performance considerations (per-cpu data). Which should be fine. And, in that regard, my code isn't quite as complex yet as lib/profile.c. Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: statistics infrastructure (in -mm tree) review 2006-06-13 23:47 ` statistics infrastructure (in -mm tree) review Greg KH 2006-06-14 0:18 ` Randy.Dunlap 2006-06-14 5:04 ` Andi Kleen @ 2006-06-17 10:30 ` Martin Peschke 2 siblings, 0 replies; 166+ messages in thread From: Martin Peschke @ 2006-06-17 10:30 UTC (permalink / raw) To: Greg KH; +Cc: Wu Fengguang, akpm, linux-kernel On Tue, 2006-06-13 at 16:47 -0700, Greg KH wrote: > ... I'd really > like to see some other, real-world usages of this. Like perhaps the > io-schedular statistics? Some other /proc stats that have nothing to do > with processes? Wu is trying it out for readahead statistics: http://marc.theaimsgroup.com/?l=linux-kernel&m=114946958531310&w=2 I am working on SCSI I/O statistics: http://marc.theaimsgroup.com/?l=linux-kernel&m=114780190921567&w=2 The zfcp driver (FCP HBA driver for s390) in -mm exports statistics through this infrastructure. I could imagine that this code might be exploited by other s390 device drivers, once we are forced to find a replacement for homegrown statistics in procfs. > Oh, and use C99 structure initializers for when creating the statisic > structures in the example code (and real code), it makes it much easier > to understand, and future proof when the api changes. good point - done > Code comments now: > > > > diff -puN arch/s390/Kconfig~statistics-infrastructure arch/s390/Kconfig > > --- devel/arch/s390/Kconfig~statistics-infrastructure 2006-06-09 15:22:58.000000000 -0700 > > +++ devel-akpm/arch/s390/Kconfig 2006-06-09 15:22:58.000000000 -0700 > > @@ -490,8 +490,14 @@ source "drivers/net/Kconfig" > > > > source "fs/Kconfig" > > > > +menu "Instrumentation Support" > > + > > source "arch/s390/oprofile/Kconfig" > > > > +source "lib/Kconfig.statistic" > > + > > +endmenu > > + > > source "arch/s390/Kconfig.debug" > > > > source "security/Kconfig" > > diff -puN arch/s390/oprofile/Kconfig~statistics-infrastructure arch/s390/oprofile/Kconfig > > --- devel/arch/s390/oprofile/Kconfig~statistics-infrastructure 2006-06-09 15:22:58.000000000 -0700 > > +++ devel-akpm/arch/s390/oprofile/Kconfig 2006-06-09 15:22:58.000000000 -0700 > > @@ -1,6 +1,3 @@ > > - > > -menu "Profiling support" > > - > > config PROFILING > > bool "Profiling support" > > help > > @@ -18,5 +15,3 @@ config OPROFILE > > > > If unsure, say N. > > > > -endmenu > > - > > These two patches should probably go somewhere else, they don't have > much to do with this one. (well, adding Kconfig.statistic" does, but > the other wording doesn't.) sorry, not sure what you mean > > diff -puN /dev/null include/linux/statistic.h > > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > > +++ devel-akpm/include/linux/statistic.h 2006-06-09 15:22:58.000000000 -0700 > > @@ -0,0 +1,348 @@ > > +/* > > + * include/linux/statistic.h > > + * > > + * Statistics facility > > + * > > + * (C) Copyright IBM Corp. 2005, 2006 > > + * > > + * Author(s): Martin Peschke <mpeschke@de.ibm.com> > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2, or (at your option) > > + * any later version. > > Are you sure "any later version"? well, let me get back to an IBM lawyer first... ;-) > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program; if not, write to the Free Software > > + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. > > Two not-needed paragraphs. ditto. > > +#ifndef STATISTIC_H > > +#define STATISTIC_H > > + > > +#include <linux/fs.h> > > +#include <linux/types.h> > > +#include <linux/percpu.h> > > + > > +#define STATISTIC_ROOT_DIR "statistics" > > + > > +#define STATISTIC_FILENAME_DATA "data" > > +#define STATISTIC_FILENAME_DEF "definition" > > + > > +#define STATISTIC_NEED_BARRIER 1 > > Meta-comment about this file, does most of the stuff in this file, > really belong here? At first glance, this should only hold the public > interface to the statistic code, not everything else needed by the > internal workings of that code. It looks like it could be made a lot > smaller. I slimmed the header file down, for example by moving some structures to lib/statistic.c. Do you think a lib/statistic.h would be a better place? > > +enum statistic_state { > > + STATISTIC_STATE_INVALID, > > + STATISTIC_STATE_UNCONFIGURED, > > + STATISTIC_STATE_RELEASED, > > + STATISTIC_STATE_OFF, > > + STATISTIC_STATE_ON > > +}; > > + > > +enum statistic_type { > > + STATISTIC_TYPE_COUNTER_INC, > > + STATISTIC_TYPE_COUNTER_PROD, > > + STATISTIC_TYPE_UTIL, > > + STATISTIC_TYPE_HISTOGRAM_LIN, > > + STATISTIC_TYPE_HISTOGRAM_LOG2, > > + STATISTIC_TYPE_SPARSE, > > + STATISTIC_TYPE_NONE > > +}; > > Make these bit-safe so sparse can catch mistakes? > > > +#define STATISTIC_FLAGS_NOINCR 0x01 > > What's this for? added comment with explaination > > +struct sgrb_seg { > > + struct list_head list; > > + char *address; > > + int offset; > > + int size; > > +}; > > + > > +struct statistic_file_private { > > + struct list_head read_seg_lh; > > + struct list_head write_seg_lh; > > + size_t write_seg_total_size; > > +}; > > + > > +struct statistic_merge_private { > > + struct statistic *stat; > > + spinlock_t lock; > > + void *dst; > > +}; > > I'm guessing these three structures aren't needed here. Otherwise, > please document them. moved to lib/statistic.c > > +#ifdef CONFIG_STATISTICS > > Why ifdef now, so late? added comment with explaination > > +extern int statistic_create(struct statistic_interface *, const char *); > > +extern int statistic_remove(struct statistic_interface *); > > + > > +/** > > + * statistic_add - update statistic with incremental data in (X, Y) pair > > + * @stat: struct statistic array > > + * @i: index of statistic to be updated > > + * @value: X > > + * @incr: Y > > + * > > + * The actual processing of the (X, Y) data pair is determined by the current > > + * the definition applied to the statistic. See Documentation/statistics.txt. > > + * > > + * This variant takes care of protecting per-cpu data. It is preferred whenever > > + * exploiters don't update several statistics of the same entity in one go. > > + */ > > +static inline void statistic_add(struct statistic *stat, int i, > > + s64 value, u64 incr) > > +{ > > + unsigned long flags; > > + local_irq_save(flags); > > + if (stat[i].state == STATISTIC_STATE_ON) > > + stat[i].add(&stat[i], smp_processor_id(), value, incr); > > + local_irq_restore(flags); > > +} > > These are all inline, which I guess is acceptable. But see the current > inline-or-not comments on lkml which may make you rethink this. Still got to lookup this thread. I might change it later. > > +/** > > + * statistic_add_nolock - update statistic with incremental data in (X, Y) pair > > + * @stat: struct statistic array > > + * @i: index of statistic to be updated > > + * @value: X > > + * @incr: Y > > + * > > + * The actual processing of the (X, Y) data pair is determined by the current > > + * definition applied to the statistic. See Documentation/statistics.txt. > > + * > > + * This variant leaves protecting per-cpu data to exploiters. It is preferred > > + * whenever exploiters update several statistics of the same entity in one go. > > + */ > > +static inline void statistic_add_nolock(struct statistic *stat, int i, > > + s64 value, u64 incr) > > +{ > > + if (stat[i].state == STATISTIC_STATE_ON) > > + stat[i].add(&stat[i], smp_processor_id(), value, incr); > > +} > > + > > +/** > > + * statistic_inc - update statistic with incremental data in (X, 1) pair > > + * @stat: struct statistic array > > + * @i: index of statistic to be updated > > + * @value: X > > + * > > + * The actual processing of the (X, Y) data pair is determined by the current > > + * definition applied to the statistic. See Documentation/statistics.txt. > > + * > > + * This variant takes care of protecting per-cpu data. It is preferred whenever > > + * exploiters don't update several statistics of the same entity in one go. > > + */ > > +static inline void statistic_inc(struct statistic *stat, int i, s64 value) > > +{ > > + unsigned long flags; > > + local_irq_save(flags); > > + if (stat[i].state == STATISTIC_STATE_ON) > > + stat[i].add(&stat[i], smp_processor_id(), value, 1); > > + local_irq_restore(flags); > > +} > > Shouldn't this just call statistic_add() with a incr of 1? correct - changed > > + > > +/** > > + * statistic_inc_nolock - update statistic with incremental data in (X, 1) pair > > + * @stat: struct statistic array > > + * @i: index of statistic to be updated > > + * @value: X > > + * > > + * The actual processing of the (X, Y) data pair is determined by the current > > + * definition applied to the statistic. See Documentation/statistics.txt. > > + * > > + * This variant leaves protecting per-cpu data to exploiters. It is preferred > > + * whenever exploiters update several statistics of the same entity in one go. > > + */ > > +static inline void statistic_inc_nolock(struct statistic *stat, int i, > > + s64 value) > > +{ > > + if (stat[i].state == STATISTIC_STATE_ON) > > + stat[i].add(&stat[i], smp_processor_id(), value, 1); > > +} > > Shouldn't this just call statistic_add_nolock with a incr of 1? ditto > > diff -puN /dev/null lib/Kconfig.statistic > > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > > +++ devel-akpm/lib/Kconfig.statistic 2006-06-09 15:22:58.000000000 -0700 > > @@ -0,0 +1,11 @@ > > +config STATISTICS > > + bool "Statistics infrastructure" > > + depends on DEBUG_FS > > + help > > + The statistics infrastructure provides a debug-fs based user interface > > No "-" in debugfs :) sorry, has been fixed. > > + for statistics of kernel components, that is, usually device drivers. > > Why mention drivers? Other things might use this (see original comments > at the start of the message.) yep, changed that as well > > --- /dev/null 2006-06-03 22:34:36.282200750 -0700 > > +++ devel-akpm/lib/statistic.c 2006-06-09 15:22:58.000000000 -0700 > > @@ -0,0 +1,1459 @@ > > +/* > > + * lib/statistic.c > > + * statistics facility > > + * > > + * Copyright (C) 2005, 2006 > > + * IBM Deutschland Entwicklung GmbH, > > + * IBM Corporation > > + * > > + * Author(s): Martin Peschke (mpeschke@de.ibm.com), > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2, or (at your option) > > + * any later version. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program; if not, write to the Free Software > > + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. > > Again with the verbose license :) I will see... > > +static void _statistic_barrier(void *unused) > > +{ > > +} > > + > > +static inline int statistic_stop(struct statistic *stat) > > +{ > > + stat->stopped = sched_clock(); > > + stat->state = STATISTIC_STATE_OFF; > > + /* ensures that all CPUs have ceased updating statistics */ > > + smp_mb(); > > + on_each_cpu(_statistic_barrier, NULL, 0, 1); > > + return 0; > > +} > > Isn't there a way to use rcu for this instead? Just a suggestion, it > might be totally wrong... I am not an rcu expert. But I think rcu doesn't help here. My barrier makes sure that all the concurrent updates have ceased before I go on to free the underlying memory. It's a "many writers"-scenario. > > + > > +static int statistic_transition(struct statistic *stat, > > + struct statistic_info *info, > > + enum statistic_state requested_state) > > +{ > > + int z = (requested_state < stat->state ? 1 : 0); > > + int retval = -EINVAL; > > int retval = 0; > > > + > > + while (stat->state != requested_state) { > > + switch (stat->state) { > > + case STATISTIC_STATE_INVALID: > > + retval = ( z ? -EINVAL : statistic_initialise(stat) ); > > + break; > > + case STATISTIC_STATE_UNCONFIGURED: > > + retval = ( z ? statistic_uninitialise(stat) > > + : statistic_define(stat) ); > > + break; > > + case STATISTIC_STATE_RELEASED: > > + retval = ( z ? statistic_initialise(stat) > > + : statistic_alloc(stat, info) ); > > + break; > > + case STATISTIC_STATE_OFF: > > + retval = ( z ? statistic_free(stat, info) > > + : statistic_start(stat) ); > > + break; > > + case STATISTIC_STATE_ON: > > + retval = ( z ? statistic_stop(stat) : -EINVAL ); > > + break; > > + } > > + if (unlikely(retval)) > > + return retval; > > delete these two lines. > > > + } > > + return 0; > > return retval; I have simplified this loop. > > +static match_table_t statistic_match_type = { > > + {1, "type=%s"}, > > + {9, NULL} > > +}; > > named field initializers please. done > > +static match_table_t statistic_match_common = { > > + {STATISTIC_STATE_UNCONFIGURED, "state=unconfigured"}, > > + {STATISTIC_STATE_RELEASED, "state=released"}, > > + {STATISTIC_STATE_OFF, "state=off"}, > > + {STATISTIC_STATE_ON, "state=on"}, > > + {1001, "name=%s"}, > > + {1002, "data=reset"}, > > + {1003, "defaults"}, > > + {9999, NULL} > > +}; > > Same here. Well, no one appears to do this with match_table_t. And agree that this would be overkill. > And why do you have numbers and a mix of enums here? Shouldn't you > define the name=, data= and defaults too? Just for my convenience. It simplifies the (single) function using it. > Also, just null terminate the list, is 9999 really needed? match_token() requires this array to be terminated. > > +static struct statistic_discipline statistic_discs[] = { > > + { /* STATISTIC_TYPE_COUNTER_INC */ > > + NULL, > > + statistic_alloc_generic, > > + NULL, > > + statistic_reset_counter, > > + statistic_merge_counter, > > + statistic_fdata_counter, > > + NULL, > > + statistic_add_counter_inc, > > + statistic_set_counter_inc, > > + "counter_inc", sizeof(u64) > > + }, > > named initializers please. That will let you not have to specify the > NULL fields, making it much easier to read overall. You are right. Done. Thanks, Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* wireless (was Re: 2.6.18 -mm merge plans) 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (9 preceding siblings ...) [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn> @ 2006-06-05 1:06 ` Jeff Garzik 2006-06-05 1:15 ` Andrew Morton 2006-06-05 8:54 ` Christoph Hellwig 2006-06-05 1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik ` (9 subsequent siblings) 20 siblings, 2 replies; 166+ messages in thread From: Jeff Garzik @ 2006-06-05 1:06 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, netdev, linville On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > acx1xx-wireless-driver.patch > fix-tiacx-on-alpha.patch > tiacx-fix-attribute-packed-warnings.patch > tiacx-pci-build-fix.patch > tiacx-ia64-fix.patch > > It is about time we did something with this large and presumably useful > wireless driver. I've never had technical objections to merging this, just AFAIK it had a highly questionable origin, namely being reverse-engineered in a non-clean-room environment that might leave Linux legally vulnerable. If we can clear that hurdle, by all means pass it on to John Linville and get it moving towards upstream. Jeff ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik @ 2006-06-05 1:15 ` Andrew Morton 2006-06-05 8:33 ` Andreas Mohr 2006-06-05 8:54 ` Christoph Hellwig 1 sibling, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-05 1:15 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel, netdev, linville, Denis Vlasenko On Sun, 4 Jun 2006 21:06:36 -0400 Jeff Garzik <jeff@garzik.org> wrote: > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > > acx1xx-wireless-driver.patch > > fix-tiacx-on-alpha.patch > > tiacx-fix-attribute-packed-warnings.patch > > tiacx-pci-build-fix.patch > > tiacx-ia64-fix.patch > > > > It is about time we did something with this large and presumably useful > > wireless driver. > > I've never had technical objections to merging this, just AFAIK it had a > highly questionable origin, namely being reverse-engineered in a > non-clean-room environment that might leave Linux legally vulnerable. I never knew that. <reads changelog> <reads website> <reads wiki> I still don't know that. Denis, do you know the details? > If we can clear that hurdle, by all means pass it on to John Linville > and get it moving towards upstream. OK, thanks. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 1:15 ` Andrew Morton @ 2006-06-05 8:33 ` Andreas Mohr 2006-06-05 8:45 ` Arjan van de Ven 0 siblings, 1 reply; 166+ messages in thread From: Andreas Mohr @ 2006-06-05 8:33 UTC (permalink / raw) To: Andrew Morton Cc: Jeff Garzik, linux-kernel, netdev, linville, Denis Vlasenko, acx100-devel, acx100-users Hi, On Sun, Jun 04, 2006 at 06:15:15PM -0700, Andrew Morton wrote: > On Sun, 4 Jun 2006 21:06:36 -0400 > Jeff Garzik <jeff@garzik.org> wrote: > > > It is about time we did something with this large and presumably useful > > > wireless driver. > > > > I've never had technical objections to merging this, just AFAIK it had a > > highly questionable origin, namely being reverse-engineered in a > > non-clean-room environment that might leave Linux legally vulnerable. > > I never knew that. > > <reads changelog> > <reads website> > <reads wiki> > > I still don't know that. Denis, do you know the details? The acx100 project was started by about 5 people examining the various acx100 binary Linux driver "releases" for distro kernels around 2.4.18 etc. Since this might fail to comply with usual "clean-room" practices (e.g. one party examining a driver and then a separate party implementing a new driver with the data gained from examining the original driver), it may fail to be seen as acceptable for Linux inclusion. Since missing kernel inclusion is both a maintenance overhead and (most importantly!) a huge user-level issue, I'd see this as a big problem. In case there are development-unrelated obstacles against kernel inclusion, I see (at least?) two possibilities: a) asking TI to sprinkle our driver effort with the (ahem) holy penguin pee required to have it blessed sufficiently for kernel inclusion (preferrably in combination with nice firmware blob licensing and specs for those chipsets would be nice) This might be a problem given that Theo de Raadt and many other people had fun repeatedly trying to contact TI for a useful statement concerning WLAN support. b) abandoning our unfortunately not as blessed as intended (stability, community involvement, ...) big-effort driver efforts ("3 years and still going strong...") [1] and suggesting donating about 100000 OEM WLAN cards equipped with TI chipsets to various beautiful landfills in various countries ;-) Whichever way this irons out, at this point I'm quite indifferent to what happens, given that I really don't feel like spending too many endless weekends with hardware and driver puzzles any more in exchange for rather dubious gains. There's also a lot of fun in generic Linux kernel hacking, so... Andreas Mohr [1] we're *still* having issues with spotty ACK reception and radio temperature drift recalibration on those unsupported chipsets, which requires quite some focused development efforts and close examination of WLAN traffic in order to really find out what the heck is going wrong here. And please note that there's now the newer TNETW1450 chipset variant (most prominently used by AVM hardware with its initial x86-only Linux USB2.0 driver) with similar support issues which would require even more development. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 8:33 ` Andreas Mohr @ 2006-06-05 8:45 ` Arjan van de Ven 2006-06-05 10:26 ` Alan Cox 0 siblings, 1 reply; 166+ messages in thread From: Arjan van de Ven @ 2006-06-05 8:45 UTC (permalink / raw) To: Andreas Mohr Cc: Andrew Morton, Jeff Garzik, linux-kernel, netdev, linville, Denis Vlasenko, acx100-devel, acx100-users On Mon, 2006-06-05 at 10:33 +0200, Andreas Mohr wrote: > Hi, > > On Sun, Jun 04, 2006 at 06:15:15PM -0700, Andrew Morton wrote: > > On Sun, 4 Jun 2006 21:06:36 -0400 > > Jeff Garzik <jeff@garzik.org> wrote: > > > > It is about time we did something with this large and presumably useful > > > > wireless driver. > > > > > > I've never had technical objections to merging this, just AFAIK it had a > > > highly questionable origin, namely being reverse-engineered in a > > > non-clean-room environment that might leave Linux legally vulnerable. > > > > I never knew that. > > > > <reads changelog> > > <reads website> > > <reads wiki> > > > > I still don't know that. Denis, do you know the details? > > The acx100 project was started by about 5 people examining the various > acx100 binary Linux driver "releases" for distro kernels around 2.4.18 etc. > Since this might fail to comply with usual "clean-room" practices > (e.g. one party examining a driver and then a separate party implementing > a new driver with the data gained from examining the original driver), > it may fail to be seen as acceptable for Linux inclusion. I disagree there (not speaking for any company just for myself here): the "clean room" thing is ONLY a USA thing, and is not even required in the USA. It is a "we want to be extra safe in the USA" thing only. Eg if you want to be tripple safe and do this in the USA, the clean room is a good way to be sure. If you do things in europe or elsewhere, and/or as long as you don't copy from the original, only use it to learn how it works, you should be fine as well. It's just that a cleanroom approach is a sure way to prove you didn't copy. That's all. If "clean room" now is a requirement for a driver to hit the kernel, then we need to remove about half the drivers in the kernel I suspect; that'd just be silly. I would say that as long as you and the others can certify that you didn't copy from the original driver, but only used it to learn how it worked, the kernel should be fine with it. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 8:45 ` Arjan van de Ven @ 2006-06-05 10:26 ` Alan Cox 2006-06-05 10:35 ` Arjan van de Ven 0 siblings, 1 reply; 166+ messages in thread From: Alan Cox @ 2006-06-05 10:26 UTC (permalink / raw) To: Arjan van de Ven Cc: Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel, netdev, linville, Denis Vlasenko, acx100-devel, acx100-users Ar Llu, 2006-06-05 am 10:45 +0200, ysgrifennodd Arjan van de Ven: > It's just that a cleanroom approach is a sure way to prove > you didn't copy. That's all. Which is an extremely important detail especially if you have been reverse engineering another driver for the same or similar OS where it is likely that people will retain knowledge and copy rather than re-implement things. We've had "fun" with this before and the pwc camera driver. I don't want to see a repeat of that. Alan ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 10:26 ` Alan Cox @ 2006-06-05 10:35 ` Arjan van de Ven 2006-06-05 10:59 ` Alan Cox 2006-06-10 6:58 ` Pavel Machek 0 siblings, 2 replies; 166+ messages in thread From: Arjan van de Ven @ 2006-06-05 10:35 UTC (permalink / raw) To: Alan Cox Cc: Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel, netdev, linville, Denis Vlasenko, acx100-devel, acx100-users On Mon, 2006-06-05 at 11:26 +0100, Alan Cox wrote: > Ar Llu, 2006-06-05 am 10:45 +0200, ysgrifennodd Arjan van de Ven: > > It's just that a cleanroom approach is a sure way to prove > > you didn't copy. That's all. > > Which is an extremely important detail especially if you have been > reverse engineering another driver for the same or similar OS where it > is likely that people will retain knowledge and copy rather than > re-implement things. oh don't get me wrong, it's important to not copy from the original. (even if that original did copy from linux ;) > We've had "fun" with this before and the pwc camera driver. I don't want > to see a repeat of that. yet at the same time, the cleanroom approach is not the ONLY way to do it right. And making following that exact approach a strict requirement is just silly. And it would mean we'd need to remove quite a few drivers from the tree if you follow that logic. And to be fair the pwc camera driver was just a guy with a personality problem rather than any real legal standing. Again doing things right is important. But I would say that if you do the rev-engineering in Europe, just being careful and avoiding copying should be enough (well and certifying that you were in fact careful and didn't do any copying). ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 10:35 ` Arjan van de Ven @ 2006-06-05 10:59 ` Alan Cox 2006-06-10 6:58 ` Pavel Machek 1 sibling, 0 replies; 166+ messages in thread From: Alan Cox @ 2006-06-05 10:59 UTC (permalink / raw) To: Arjan van de Ven Cc: Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel, netdev, linville, Denis Vlasenko, acx100-devel, acx100-users Ar Llu, 2006-06-05 am 12:35 +0200, ysgrifennodd Arjan van de Ven: > And to be fair the pwc camera driver was just a guy with a personality > problem rather than any real legal standing. I must disagree there having reviewed the code in question and been directly involved in the fallout. Alan ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 10:35 ` Arjan van de Ven 2006-06-05 10:59 ` Alan Cox @ 2006-06-10 6:58 ` Pavel Machek 1 sibling, 0 replies; 166+ messages in thread From: Pavel Machek @ 2006-06-10 6:58 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Cox, Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel, netdev, linville, Denis Vlasenko, acx100-devel, acx100-users Hi! > > > It's just that a cleanroom approach is a sure way to prove > > > you didn't copy. That's all. > > > > Which is an extremely important detail especially if you have been > > reverse engineering another driver for the same or similar OS where it > > is likely that people will retain knowledge and copy rather than > > re-implement th?ngs. > > oh don't get me wrong, it's important to not copy from the original. > (even if that original did copy from linux ;) Well, if original did copy from linux, it surely is GPLed and case closed, no? Being sued from vendor not respecting the GPL would probably only do harm to them. Like US courts are crazy, but hopefully not _that_ crazy. Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik 2006-06-05 1:15 ` Andrew Morton @ 2006-06-05 8:54 ` Christoph Hellwig 2006-06-05 12:33 ` Jeff Garzik 2006-06-05 13:27 ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville 1 sibling, 2 replies; 166+ messages in thread From: Christoph Hellwig @ 2006-06-05 8:54 UTC (permalink / raw) To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, netdev, linville On Sun, Jun 04, 2006 at 09:06:36PM -0400, Jeff Garzik wrote: > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > > acx1xx-wireless-driver.patch > > fix-tiacx-on-alpha.patch > > tiacx-fix-attribute-packed-warnings.patch > > tiacx-pci-build-fix.patch > > tiacx-ia64-fix.patch > > > > It is about time we did something with this large and presumably useful > > wireless driver. > > I've never had technical objections to merging this, just AFAIK it had a > highly questionable origin, namely being reverse-engineered in a > non-clean-room environment that might leave Linux legally vulnerable. As are at leasdt a fourth of linux drivers. Andrew, please just go ahead and merge it (I'll do another review ASAP). Please don't let this reverse engineering idiocy hinder wireless driver adoption, we're already falling far behind openbsd who are very successfull reverse engineering lots of wireless chipsets. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 8:54 ` Christoph Hellwig @ 2006-06-05 12:33 ` Jeff Garzik 2006-06-05 12:48 ` Arjan van de Ven 2006-06-05 13:27 ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville 1 sibling, 1 reply; 166+ messages in thread From: Jeff Garzik @ 2006-06-05 12:33 UTC (permalink / raw) To: Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville On Mon, Jun 05, 2006 at 09:54:51AM +0100, Christoph Hellwig wrote: > On Sun, Jun 04, 2006 at 09:06:36PM -0400, Jeff Garzik wrote: > > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > > > acx1xx-wireless-driver.patch > > > fix-tiacx-on-alpha.patch > > > tiacx-fix-attribute-packed-warnings.patch > > > tiacx-pci-build-fix.patch > > > tiacx-ia64-fix.patch > > > > > > It is about time we did something with this large and presumably useful > > > wireless driver. > > > > I've never had technical objections to merging this, just AFAIK it had a > > highly questionable origin, namely being reverse-engineered in a > > non-clean-room environment that might leave Linux legally vulnerable. > > As are at leasdt a fourth of linux drivers. Andrew, please just go ahead Hardly. The -vast majority- of drivers I've dealt with in my time hacking the kernel are either blessed by the vendor, or are of unquestionably legal origin. It's a good thing I pay attention to this issue, too, Mr. Just Go Ahead And Merge It. > Please don't let this reverse engineering idiocy hinder wireless driver > adoption, we're already falling far behind openbsd who are very successfull > reverse engineering lots of wireless chipsets. Thanks for your highly professional, legal opinion :) Jeff ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 12:33 ` Jeff Garzik @ 2006-06-05 12:48 ` Arjan van de Ven 2006-06-05 12:52 ` Jeff Garzik 0 siblings, 1 reply; 166+ messages in thread From: Arjan van de Ven @ 2006-06-05 12:48 UTC (permalink / raw) To: Jeff Garzik Cc: Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville > > It's a good thing I pay attention to this issue, too, Mr. Just Go Ahead > And Merge It. dude, name calling is way out of line here. Why is it a good thing you are blocking this driver? Do you have ANY indication AT ALL that there is anything fishy about it? (and don't say "they didn't follow cleanroom procedure", because you know that cleanroom is not the only way to do reverse engineering properly). Paying attention to proper reverse engineering is good. Being overzealous is not. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 12:48 ` Arjan van de Ven @ 2006-06-05 12:52 ` Jeff Garzik 2006-06-05 14:02 ` Linux kernel and laws Adrian Bunk 0 siblings, 1 reply; 166+ messages in thread From: Jeff Garzik @ 2006-06-05 12:52 UTC (permalink / raw) To: Arjan van de Ven Cc: Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville On Mon, Jun 05, 2006 at 02:48:27PM +0200, Arjan van de Ven wrote: > Why is it a good thing you are blocking this driver? Do you have ANY > indication AT ALL that there is anything fishy about it? Yes. > Paying attention to proper reverse engineering is good. Being > overzealous is not. Being overzealous about merging drivers without first checking the legal ramifications is a good way to torpedo Linux. Far too many people have a careless "U.S.A. laws suck, merge it anyway" attitude. Jeff ^ permalink raw reply [flat|nested] 166+ messages in thread
* Linux kernel and laws 2006-06-05 12:52 ` Jeff Garzik @ 2006-06-05 14:02 ` Adrian Bunk 2006-06-05 14:21 ` linux-os (Dick Johnson) 2006-06-06 5:33 ` Evgeniy Polyakov 0 siblings, 2 replies; 166+ messages in thread From: Adrian Bunk @ 2006-06-05 14:02 UTC (permalink / raw) To: Jeff Garzik Cc: Arjan van de Ven, Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville On Mon, Jun 05, 2006 at 08:52:35AM -0400, Jeff Garzik wrote: >... > > Paying attention to proper reverse engineering is good. Being > > overzealous is not. > > Being overzealous about merging drivers without first checking the legal > ramifications is a good way to torpedo Linux. > > Far too many people have a careless "U.S.A. laws suck, merge it anyway" > attitude. Independent of this issue: An interesting question is how to handle legal issues properly. Where is the borderline for rejecting code due to legal issues? Might not be 100% correct according to laws in the USA. Might not be 100% correct according to laws in Germany. Might not be 100% correct according to laws in Finland. Might not be 100% correct according to laws in Norway. Might not be 100% correct according to laws in Brasil. Might not be 100% correct according to laws in Japan. Might not be 100% correct according to laws in India. Might not be 100% correct according to laws in Russia. Might not be 100% correct according to laws in China. Might not be 100% correct according to laws in Saudi Arabia. Might not be 100% correct according to laws in Iran. For me living in Germany, none of these laws except for the German one has any relevance. I've never seen people on this list pointing to probable problems with Chinese laws although these laws are relevant for four times as many people as US laws. If someone would state a submission to the kernel might have issues according to Chinese laws, or Iranian laws, or Russian laws, would this be enough for keeping code out of the kernel? This might sound like a theoretical question, but e.g. considering that the kernel contains cryptography code it's a question that might have wide practical implications. > Jeff cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Linux kernel and laws 2006-06-05 14:02 ` Linux kernel and laws Adrian Bunk @ 2006-06-05 14:21 ` linux-os (Dick Johnson) 2006-06-06 5:33 ` Evgeniy Polyakov 1 sibling, 0 replies; 166+ messages in thread From: linux-os (Dick Johnson) @ 2006-06-05 14:21 UTC (permalink / raw) To: Adrian Bunk Cc: Jeff Garzik, Arjan van de Ven, Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville On Mon, 5 Jun 2006, Adrian Bunk wrote: > On Mon, Jun 05, 2006 at 08:52:35AM -0400, Jeff Garzik wrote: >> ... >>> Paying attention to proper reverse engineering is good. Being >>> overzealous is not. >> >> Being overzealous about merging drivers without first checking the legal >> ramifications is a good way to torpedo Linux. >> >> Far too many people have a careless "U.S.A. laws suck, merge it anyway" >> attitude. > > Independent of this issue: > > An interesting question is how to handle legal issues properly. > > Where is the borderline for rejecting code due to legal issues? > Might not be 100% correct according to laws in the USA. > Might not be 100% correct according to laws in Germany. > Might not be 100% correct according to laws in Finland. > Might not be 100% correct according to laws in Norway. > Might not be 100% correct according to laws in Brasil. > Might not be 100% correct according to laws in Japan. > Might not be 100% correct according to laws in India. > Might not be 100% correct according to laws in Russia. > Might not be 100% correct according to laws in China. > Might not be 100% correct according to laws in Saudi Arabia. > Might not be 100% correct according to laws in Iran. > > For me living in Germany, none of these laws except for the German one > has any relevance. > > I've never seen people on this list pointing to probable problems with > Chinese laws although these laws are relevant for four times as many > people as US laws. > > If someone would state a submission to the kernel might have issues > according to Chinese laws, or Iranian laws, or Russian laws, would this > be enough for keeping code out of the kernel? > > This might sound like a theoretical question, but e.g. considering that > the kernel contains cryptography code it's a question that might have > wide practical implications. > >> Jeff > > cu > Adrian If the kernel represented simply a knowledge base, then the burden about whether or not someone could use it used to rest entirely upon the user. That's why some Pacific rim governments are reportedly fire-walling information. In most western cultures, knowledge was not a crime. For many years, someone could write a book, telling you how to kill somebody and, as long as he didn't carry it out, he could not be held culpable. Recently, in the US and some other countries, knowledge has become criminalized. If you know how to defeat copy protection, and you are not in a protected industry, you could be tried and convicted of a federal crime. That's one of the reasons why there are now no general guidelines about kernel code, or any intellectual property use, for that matter. The conditions could occur where the government thinks that you know too much and are, therefore, a threat to "national security". So, again, see a lawyer. The fact that you sought and accepted legal opinion may in the future be your only viable defense as governments bring charges against you! Sorry state of affairs for sure. Cheers, Dick Johnson Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Linux kernel and laws 2006-06-05 14:02 ` Linux kernel and laws Adrian Bunk 2006-06-05 14:21 ` linux-os (Dick Johnson) @ 2006-06-06 5:33 ` Evgeniy Polyakov 1 sibling, 0 replies; 166+ messages in thread From: Evgeniy Polyakov @ 2006-06-06 5:33 UTC (permalink / raw) To: Adrian Bunk Cc: Jeff Garzik, Arjan van de Ven, Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville On Mon, Jun 05, 2006 at 04:02:26PM +0200, Adrian Bunk (bunk@stusta.de) wrote: > > Far too many people have a careless "U.S.A. laws suck, merge it anyway" > > attitude. > If someone would state a submission to the kernel might have issues > according to Chinese laws, or Iranian laws, or Russian laws, would this > be enough for keeping code out of the kernel? Btw, did kernel hackers consulted with Papua New Guinea or bloody Russian laws? It is possible that they have a law which forbids to write open source code. So we should stop Linux kernel development and completely remove it's sources from the Internet ASAP. P.S. It is explicitly permitted to make reverse engineering in Russia. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 8:54 ` Christoph Hellwig 2006-06-05 12:33 ` Jeff Garzik @ 2006-06-05 13:27 ` John W. Linville 2006-06-05 13:31 ` Christoph Hellwig ` (2 more replies) 1 sibling, 3 replies; 166+ messages in thread From: John W. Linville @ 2006-06-05 13:27 UTC (permalink / raw) To: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel, netdev On Mon, Jun 05, 2006 at 09:54:51AM +0100, Christoph Hellwig wrote: > On Sun, Jun 04, 2006 at 09:06:36PM -0400, Jeff Garzik wrote: > > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > > > acx1xx-wireless-driver.patch > > > fix-tiacx-on-alpha.patch > > > tiacx-fix-attribute-packed-warnings.patch > > > tiacx-pci-build-fix.patch > > > tiacx-ia64-fix.patch > > > > > > It is about time we did something with this large and presumably useful > > > wireless driver. > > > > I've never had technical objections to merging this, just AFAIK it had a > > highly questionable origin, namely being reverse-engineered in a > > non-clean-room environment that might leave Linux legally vulnerable. > > As are at leasdt a fourth of linux drivers. Andrew, please just go ahead > and merge it (I'll do another review ASAP). Actually, I was planning to merge the softmac-based version for 2.6.18. It looks like I may want some of Andrew's patches on top (ia64, alpha, etc). http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/master/ 0003-wireless-add-acx-driver.txt 0004-acxsm-merge-from-acx-0.3.32.txt 0005-tiacx-Let-only-ACX_PCI-ACX_USB-be-user-visible.txt 0007-tiacx-revert-neither-PCI-nor-USB-is-selected-change.txt 0008-tiacx-implement-much-more-flexible-firmware-statistics-parsing.txt 0009-tiacx-Change-acx_ioctl_-get-set-_encode-to-use-kernel-80211-stack.txt 0010-tiacx-fix-breakage-of-Get-rid-of-circular-list-of-adev-s.txt 0011-tiacx-split-module-into-acx-common-acx-pci-acx-usb.txt Of course, I didn't know there were serious concerns about this driver's origin. I hope we aren't confusing this with the atheros driver...? > Please don't let this reverse engineering idiocy hinder wireless driver > adoption, we're already falling far behind openbsd who are very successfull > reverse engineering lots of wireless chipsets. This bugbear does seem to keep visiting us. It is a bit of a minefield. I'm inclined to think that Christoph and Arjan are right, that we have been too cautious. Of course, neither of these fine gentlemen are known for their timidity... :-) Does not the Signed-off-by: line on a patch submission give us some level of "good faith" protection? I'm tempted to take contributors at their word, that they have produced their own work and not copied from others. What else do we need? John -- John W. Linville linville@tuxdriver.com ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 13:27 ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville @ 2006-06-05 13:31 ` Christoph Hellwig 2006-06-05 13:42 ` Arjan van de Ven 2006-06-05 16:24 ` Alan Cox 2 siblings, 0 replies; 166+ messages in thread From: Christoph Hellwig @ 2006-06-05 13:31 UTC (permalink / raw) To: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel, netdev On Mon, Jun 05, 2006 at 09:27:37AM -0400, John W. Linville wrote: > Actually, I was planning to merge the softmac-based version for 2.6.18. > It looks like I may want some of Andrew's patches on top (ia64, alpha, etc). duh, didn't know that wasn't in -mm. we want the softmac version of course. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 13:27 ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville 2006-06-05 13:31 ` Christoph Hellwig @ 2006-06-05 13:42 ` Arjan van de Ven 2006-06-05 16:24 ` Alan Cox 2 siblings, 0 replies; 166+ messages in thread From: Arjan van de Ven @ 2006-06-05 13:42 UTC (permalink / raw) To: John W. Linville Cc: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel, netdev > Of course, I didn't know there were serious concerns about this > driver's origin. I hope we aren't confusing this with the atheros > driver...? > > > Please don't let this reverse engineering idiocy hinder wireless driver > > adoption, we're already falling far behind openbsd who are very successfull > > reverse engineering lots of wireless chipsets. > > This bugbear does seem to keep visiting us. It is a bit of a > minefield. > > I'm inclined to think that Christoph and Arjan are right, that we > have been too cautious. Of course, neither of these fine gentlemen > are known for their timidity... :-) > > Does not the Signed-off-by: line on a patch submission give us some > level of "good faith" protection? I would suggest asking them an explicit "did you copy anything" and make sure their "we didn't copy" answer is in the description of the original patch submission. > > I'm tempted to take contributors at their word, that they have produced > their own work and not copied from others. What else do we need? to a large degree that's all you can do. (of course you can look at the code for something that looks "obviously not from here" as well, and we all tend to do that anyway since such stuff tends to highly violate coding style anyway) ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 13:27 ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville 2006-06-05 13:31 ` Christoph Hellwig 2006-06-05 13:42 ` Arjan van de Ven @ 2006-06-05 16:24 ` Alan Cox 2006-06-29 14:26 ` ACX100 (softmac-based) driver ready to merge, but is it legal? -- " John W. Linville 2 siblings, 1 reply; 166+ messages in thread From: Alan Cox @ 2006-06-05 16:24 UTC (permalink / raw) To: John W. Linville Cc: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel, netdev Ar Llu, 2006-06-05 am 09:27 -0400, ysgrifennodd John W. Linville: > Does not the Signed-off-by: line on a patch submission give us some > level of "good faith" protection? > > I'm tempted to take contributors at their word, that they have produced > their own work and not copied from others. What else do we need? To keep an eye out for problems. Given the questions raised the tiacx people need to clarify their position and someone needs to look into it. Until that is done it certainly isn't "good faith" any more. Alan ^ permalink raw reply [flat|nested] 166+ messages in thread
* ACX100 (softmac-based) driver ready to merge, but is it legal? -- Re: wireless (was Re: 2.6.18 -mm merge plans) 2006-06-05 16:24 ` Alan Cox @ 2006-06-29 14:26 ` John W. Linville [not found] ` <20060629144233.GB24463@tuxdriver.com> 0 siblings, 1 reply; 166+ messages in thread From: John W. Linville @ 2006-06-29 14:26 UTC (permalink / raw) To: netdev, linux-kernel Cc: Denis Vlasenko, Carlos Martin, Andreas Mohr, acx100-devel, acx100-users, Arjan van de Ven, Adrian Bunk, Alan Cox, Christoph Hellwig, linux-os (Dick Johnson), Evgeniy Polyakov, Jeff Garzik, Andrew Morton, Linus Torvalds On Mon, Jun 05, 2006 at 05:24:51PM +0100, Alan Cox wrote: > Ar Llu, 2006-06-05 am 09:27 -0400, ysgrifennodd John W. Linville: > > Does not the Signed-off-by: line on a patch submission give us some > > level of "good faith" protection? > > > > I'm tempted to take contributors at their word, that they have produced > > their own work and not copied from others. What else do we need? > > To keep an eye out for problems. Given the questions raised the tiacx > people need to clarify their position and someone needs to look into it. > Until that is done it certainly isn't "good faith" any more. I apologize for the long copy list. I have tried to include all known interested parties. This is a follow-up to a thread started by Andrew a few weeks ago about what should be merged for 2.6.18. One of the topics he cited was the ACX100 driver which he has carried in -mm for quite some time. I have a slightly different (softmac based) version of that driver in wireless-2.6 which I think is worth merging now. In the aforementioned thread, some questions were raised about the legality of the ACX100 driver (i.e. tiacx) code base, but no one had any specific points other than that it is not 100% "clean room" derived. Others point-out that this is not strictly a requirement. The matter dropped without a strong defense from the tiacx team. I hereby invite the tiacx team to defend their work by making public, affirmative statements indicating a) how they produced their code; and, b) that they have the legal right to license it as part of the Linux kernel under the GPL. As an incentive to this, I have already made the necessary preparations for this driver to be merged immediately. This is the softmac-based tiacx that has been in wireless-2.6 for some time, with the addition of a few patches that akpm had in -mm which I did not previously have. For easy review, a tarball with the full driver is available here: http://www.kernel.org/pub/linux/kernel/people/linville/tiacx.tar.gz A git pull request follows. I am confident that if the legal status of this code can be confirmed, it will be merged upstream ASAP. Comments welcome! Thanks, John --- The following changes since commit 70a332b048e4d90635dfa47fc5d91cf87b5cc3a5: John W. Linville: softmac: fix build-break from 881ee6999d66c8fc903b429b73bbe6045b38c549 are found in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git tiacx Andreas Mohr: tiacx: implement much more flexible firmware statistics parsing Andrew Morton: tiacx: pci build fix Carlos Martin: tiacx: fix breakage of "Get rid of circular list of adev's" tiacx: split module into acx-common + acx-pci + acx-usb Denis Vlasenko: acxsm: merge from acx 0.3.32 tiacx: revert "neither PCI nor USB is selected" change tiacx: Change acx_ioctl_{get,set}_encode to use kernel 80211 stack fix tiacx on alpha tiacx: fix attribute packed warnings John W. Linville: wireless: add acx driver tiacx: Let only ACX_PCI/ACX_USB be user-visible tiacx: support ia64 drivers/net/wireless/Kconfig | 1 drivers/net/wireless/Makefile | 2 drivers/net/wireless/tiacx/Changelog | 114 drivers/net/wireless/tiacx/Kconfig | 65 drivers/net/wireless/tiacx/Makefile | 6 drivers/net/wireless/tiacx/README | 61 drivers/net/wireless/tiacx/acx.h | 11 drivers/net/wireless/tiacx/acx_config.h | 40 drivers/net/wireless/tiacx/acx_func.h | 598 ++ drivers/net/wireless/tiacx/acx_struct.h | 2048 ++++++++ drivers/net/wireless/tiacx/common.c | 7542 ++++++++++++++++++++++++++++++ drivers/net/wireless/tiacx/ioctl.c | 2738 +++++++++++ drivers/net/wireless/tiacx/pci.c | 4243 +++++++++++++++++ drivers/net/wireless/tiacx/setrate.c | 213 + drivers/net/wireless/tiacx/usb.c | 1954 ++++++++ drivers/net/wireless/tiacx/wlan.c | 422 ++ drivers/net/wireless/tiacx/wlan_compat.h | 267 + drivers/net/wireless/tiacx/wlan_hdr.h | 497 ++ drivers/net/wireless/tiacx/wlan_mgmt.h | 582 ++ 19 files changed, 21404 insertions(+), 0 deletions(-) create mode 100644 drivers/net/wireless/tiacx/Changelog create mode 100644 drivers/net/wireless/tiacx/Kconfig create mode 100644 drivers/net/wireless/tiacx/Makefile create mode 100644 drivers/net/wireless/tiacx/README create mode 100644 drivers/net/wireless/tiacx/acx.h create mode 100644 drivers/net/wireless/tiacx/acx_config.h create mode 100644 drivers/net/wireless/tiacx/acx_func.h create mode 100644 drivers/net/wireless/tiacx/acx_struct.h create mode 100644 drivers/net/wireless/tiacx/common.c create mode 100644 drivers/net/wireless/tiacx/ioctl.c create mode 100644 drivers/net/wireless/tiacx/pci.c create mode 100644 drivers/net/wireless/tiacx/setrate.c create mode 100644 drivers/net/wireless/tiacx/usb.c create mode 100644 drivers/net/wireless/tiacx/wlan.c create mode 100644 drivers/net/wireless/tiacx/wlan_compat.h create mode 100644 drivers/net/wireless/tiacx/wlan_hdr.h create mode 100644 drivers/net/wireless/tiacx/wlan_mgmt.h The complete (history-free) is available here: http://www.kernel.org/pub/linux/kernel/people/linville/tiacx.patch.gz -- John W. Linville linville@tuxdriver.com ^ permalink raw reply [flat|nested] 166+ messages in thread
[parent not found: <20060629144233.GB24463@tuxdriver.com>]
* Re: [Acx100-users] Denis Vlasenko, where are you? (mail bounced) [not found] ` <20060629144233.GB24463@tuxdriver.com> @ 2006-06-29 14:47 ` Andreas Mohr 0 siblings, 0 replies; 166+ messages in thread From: Andreas Mohr @ 2006-06-29 14:47 UTC (permalink / raw) To: acx100-users; +Cc: netdev, linux-kernel, acx100-devel Hi, On Thu, Jun 29, 2006 at 10:42:39AM -0400, John W. Linville wrote: > If anyone knows how to get in touch w/ Denis, I'd appreciate it... He sent me (and few other addresses) his new address recently (*important* mails only!): vda.linux AT a server called googlemail.com (he got a new job and moved) Andreas Mohr ^ permalink raw reply [flat|nested] 166+ messages in thread
* merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (10 preceding siblings ...) 2006-06-05 1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik @ 2006-06-05 1:32 ` Jeff Garzik 2006-06-05 1:47 ` Andrew Morton 2006-06-05 6:58 ` Francois Romieu 2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse ` (8 subsequent siblings) 20 siblings, 2 replies; 166+ messages in thread From: Jeff Garzik @ 2006-06-05 1:32 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > areca-raid-linux-scsi-driver.patch > I'm going to start sending the Areca driver to James, too. The vendor > has worked hard and the hardware is becoming more important - let's help > them get it in. In general, I'm a bit disappointed at the time it takes new drivers to reach the upstream kernel. I grant that a lot of vendor drivers are unreadable, unmergable shite... but on the other side of the coin, I see a lot of decent drivers get stalled simply because they aren't perfect. I would rather see a driver get "95% there" -- because once a driver is merged into the upstream kernel, it has a lot more visibility, and will inevitably receive the remaining changes and cleanups anyway. Jeff ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik @ 2006-06-05 1:47 ` Andrew Morton 2006-06-05 8:59 ` Christoph Hellwig 2006-06-05 6:58 ` Francois Romieu 1 sibling, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-05 1:47 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel On Sun, 4 Jun 2006 21:32:23 -0400 Jeff Garzik <jeff@garzik.org> wrote: > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote: > > areca-raid-linux-scsi-driver.patch > > > I'm going to start sending the Areca driver to James, too. The vendor > > has worked hard and the hardware is becoming more important - let's help > > them get it in. > > > In general, I'm a bit disappointed at the time it takes new drivers to > reach the upstream kernel. I grant that a lot of vendor drivers are > unreadable, unmergable shite... but on the other side of the coin, I > see a lot of decent drivers get stalled simply because they aren't > perfect. > > I would rather see a driver get "95% there" -- because once a driver is > merged into the upstream kernel, it has a lot more visibility, and will > inevitably receive the remaining changes and cleanups anyway. > Yes, I agree. As long as we reasonably think that a piece of code *will* become acceptable within a reasonable amount of time then going early is safe. A large part of that calculation is non-technical: do we believe that the originator will do the work to get things finished off. Often, that's a pretty easy call to make. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 1:47 ` Andrew Morton @ 2006-06-05 8:59 ` Christoph Hellwig 2006-06-05 9:10 ` Andrew Morton 2006-06-05 11:10 ` Ivan Novick 0 siblings, 2 replies; 166+ messages in thread From: Christoph Hellwig @ 2006-06-05 8:59 UTC (permalink / raw) To: Andrew Morton; +Cc: Jeff Garzik, linux-kernel On Sun, Jun 04, 2006 at 06:47:11PM -0700, Andrew Morton wrote: > Yes, I agree. As long as we reasonably think that a piece of code *will* > become acceptable within a reasonable amount of time then going early is > safe. Definitly not the case for areca. The only progress at all is where people like Arjan, Randy or me did very intensive babysitting. And it's still far from beeing there. And especially in scsi land I'm absolutely against putting in more substandard drivers. The subsystem is still badly plagued from lots of old drivers that aren't up to any standards, and we need to decrease the maintaince load due to odd drivers not increase it even further. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 8:59 ` Christoph Hellwig @ 2006-06-05 9:10 ` Andrew Morton 2006-06-05 9:16 ` Arjan van de Ven 2006-06-05 11:10 ` Ivan Novick 1 sibling, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-05 9:10 UTC (permalink / raw) To: Christoph Hellwig; +Cc: jeff, linux-kernel On Mon, 5 Jun 2006 09:59:18 +0100 Christoph Hellwig <hch@infradead.org> wrote: > On Sun, Jun 04, 2006 at 06:47:11PM -0700, Andrew Morton wrote: > > Yes, I agree. As long as we reasonably think that a piece of code *will* > > become acceptable within a reasonable amount of time then going early is > > safe. > > > Definitly not the case for areca. The only progress at all is where people > like Arjan, Randy or me did very intensive babysitting. And it's still far > from beeing there. > > And especially in scsi land I'm absolutely against putting in more substandard > drivers. The subsystem is still badly plagued from lots of old drivers that > aren't up to any standards, and we need to decrease the maintaince load due > to odd drivers not increase it even further. So.. How are we going to get the Areca controllers supported in Linux? The code's been sitting in -mm for over a year and the vendor does have staff assigned to work on it. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 9:10 ` Andrew Morton @ 2006-06-05 9:16 ` Arjan van de Ven 0 siblings, 0 replies; 166+ messages in thread From: Arjan van de Ven @ 2006-06-05 9:16 UTC (permalink / raw) To: Andrew Morton; +Cc: Christoph Hellwig, jeff, linux-kernel On Mon, 2006-06-05 at 02:10 -0700, Andrew Morton wrote: > On Mon, 5 Jun 2006 09:59:18 +0100 > Christoph Hellwig <hch@infradead.org> wrote: > > > On Sun, Jun 04, 2006 at 06:47:11PM -0700, Andrew Morton wrote: > > > Yes, I agree. As long as we reasonably think that a piece of code *will* > > > become acceptable within a reasonable amount of time then going early is > > > safe. > > > > > > Definitly not the case for areca. The only progress at all is where people > > like Arjan, Randy or me did very intensive babysitting. And it's still far > > from beeing there. > > > > And especially in scsi land I'm absolutely against putting in more substandard > > drivers. The subsystem is still badly plagued from lots of old drivers that > > aren't up to any standards, and we need to decrease the maintaince load due > > to odd drivers not increase it even further. > > So.. How are we going to get the Areca controllers supported in Linux? > The code's been sitting in -mm for over a year and the vendor does have > staff assigned to work on it. the driver is improving for sure. What seems to work well is when we make a work-to-do list, the vendor then goes about and fixes most of that quite quickly. I think I'm approaching the end of useful review input I can give (they fixed most if not all the stuff I flagged before), it would be really nice if Christoph or some other scsi person would do a review again and make a list of "these should be fixed and then we can merge" (and a list of "these can be fixed post merge" as well) ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 8:59 ` Christoph Hellwig 2006-06-05 9:10 ` Andrew Morton @ 2006-06-05 11:10 ` Ivan Novick 2006-06-05 11:26 ` Adrian Bunk 1 sibling, 1 reply; 166+ messages in thread From: Ivan Novick @ 2006-06-05 11:10 UTC (permalink / raw) To: Christoph Hellwig, Andrew Morton; +Cc: Jeff Garzik, linux-kernel > And especially in scsi land I'm absolutely against putting in more > substandard drivers. The subsystem is still badly plagued from lots of old drivers > that aren't up to any standards, and we need to decrease the maintaince load > due to odd drivers not increase it even further. Is there a hit-list of old drivers that need work, in case someone is interested in helping? Ivan ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 11:10 ` Ivan Novick @ 2006-06-05 11:26 ` Adrian Bunk 0 siblings, 0 replies; 166+ messages in thread From: Adrian Bunk @ 2006-06-05 11:26 UTC (permalink / raw) To: Ivan Novick; +Cc: Christoph Hellwig, Andrew Morton, Jeff Garzik, linux-kernel On Mon, Jun 05, 2006 at 12:10:20PM +0100, Ivan Novick wrote: > > And especially in scsi land I'm absolutely against putting in more > > substandard drivers. The subsystem is still badly plagued from lots of old drivers > > that aren't up to any standards, and we need to decrease the maintaince load > > due to odd drivers not increase it even further. > > Is there a hit-list of old drivers that need work, in case someone is > interested in helping? Not a complete list, but a good way for finding 50 drivers that need work: grep scsi_module.c drivers/scsi/* > Ivan cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik 2006-06-05 1:47 ` Andrew Morton @ 2006-06-05 6:58 ` Francois Romieu 2006-06-05 10:32 ` Alan Cox 1 sibling, 1 reply; 166+ messages in thread From: Francois Romieu @ 2006-06-05 6:58 UTC (permalink / raw) To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel Jeff Garzik <jeff@garzik.org> : [...] > In general, I'm a bit disappointed at the time it takes new drivers to > reach the upstream kernel. I grant that a lot of vendor drivers are > unreadable, unmergable shite... but on the other side of the coin, I > see a lot of decent drivers get stalled simply because they aren't > perfect. Could you provide an informal list of a few drivers which are currently stalled ? -- Ueimor ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 6:58 ` Francois Romieu @ 2006-06-05 10:32 ` Alan Cox 2006-06-05 10:36 ` Arjan van de Ven 0 siblings, 1 reply; 166+ messages in thread From: Alan Cox @ 2006-06-05 10:32 UTC (permalink / raw) To: Francois Romieu; +Cc: Jeff Garzik, Andrew Morton, linux-kernel Ar Llu, 2006-06-05 am 08:58 +0200, ysgrifennodd Francois Romieu: > Jeff Garzik <jeff@garzik.org> : > [...] > > In general, I'm a bit disappointed at the time it takes new drivers to > > reach the upstream kernel. I grant that a lot of vendor drivers are > > unreadable, unmergable shite... but on the other side of the coin, I > > see a lot of decent drivers get stalled simply because they aren't > > perfect. > > Could you provide an informal list of a few drivers which are currently > stalled ? It isn't just drivers. Xen has the same problem. All large code blocks have this problem. The older policy was to get stuff roughly right, merge it into a tree then beat on it. Now everyone is blocking anything that is the slightest imperfect which makes it impossible to add anything large to the tree because it will *never* be perfect before a merge and hack session and it will never be perfect in everyones eyes. Plus of course some people have personal dislikes of Xen, and of various other projects that get in the way. Perfection is the enemy of progress and of success. We risk moving back to the case we got into in 2.4 when merging got so hard that most vendors shipped kernels bearing no relationship to the "upstream" tree. Probably worse this time as there is no common "unofficial" tree like -ac so they will all ship different variants and combinations. Perfect is the wrong test. In the overall interest of the kernel is the right test. Alan ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 10:32 ` Alan Cox @ 2006-06-05 10:36 ` Arjan van de Ven 2006-06-06 2:02 ` Chris Wright 0 siblings, 1 reply; 166+ messages in thread From: Arjan van de Ven @ 2006-06-05 10:36 UTC (permalink / raw) To: Alan Cox; +Cc: Francois Romieu, Jeff Garzik, Andrew Morton, linux-kernel On Mon, 2006-06-05 at 11:32 +0100, Alan Cox wrote: > Ar Llu, 2006-06-05 am 08:58 +0200, ysgrifennodd Francois Romieu: > > Jeff Garzik <jeff@garzik.org> : > > [...] > > > In general, I'm a bit disappointed at the time it takes new drivers to > > > reach the upstream kernel. I grant that a lot of vendor drivers are > > > unreadable, unmergable shite... but on the other side of the coin, I > > > see a lot of decent drivers get stalled simply because they aren't > > > perfect. > > > > Could you provide an informal list of a few drivers which are currently > > stalled ? > > It isn't just drivers. Xen has the same problem. Xen has many problems. This is not nearly their biggest ;) ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-05 10:36 ` Arjan van de Ven @ 2006-06-06 2:02 ` Chris Wright 2006-06-06 7:01 ` Andi Kleen 0 siblings, 1 reply; 166+ messages in thread From: Chris Wright @ 2006-06-06 2:02 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Cox, Francois Romieu, Jeff Garzik, Andrew Morton, linux-kernel * Arjan van de Ven (arjan@infradead.org) wrote: > On Mon, 2006-06-05 at 11:32 +0100, Alan Cox wrote: > > It isn't just drivers. Xen has the same problem. > > Xen has many problems. This is not nearly their biggest ;) What is the biggest, or even top 3 or 5? I've a todo list of some 140-odd entries which are being worked on. It's slow and tedious, but in progress. I'd be happy to hear some top issues (guess we're talking high-level technical categories here) to help prioritize and perhaps even augment that todo list. I've got my ideas and priorities, but I'm interested to hear yours. thanks, -chris ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-06 2:02 ` Chris Wright @ 2006-06-06 7:01 ` Andi Kleen 2006-06-06 13:04 ` Steven Rostedt 0 siblings, 1 reply; 166+ messages in thread From: Andi Kleen @ 2006-06-06 7:01 UTC (permalink / raw) To: Chris Wright Cc: Alan Cox, Francois Romieu, Jeff Garzik, Andrew Morton, linux-kernel Chris Wright <chrisw@sous-sol.org> writes: > * Arjan van de Ven (arjan@infradead.org) wrote: > > On Mon, 2006-06-05 at 11:32 +0100, Alan Cox wrote: > > > It isn't just drivers. Xen has the same problem. > > > > Xen has many problems. This is not nearly their biggest ;) > > What is the biggest, or even top 3 or 5? I've a todo list of some I would say the biggest is that things haven't gotten submitted for so long and aren't not resubmitted quickly. e.g. Xen code needs a lot of arch/* cleanups in small patches that should be just submitted, fixed, resubmitted quickly. Many of them could be already merged. For example Jan Beulich has been sending many of the cleanups he needed for x86-64/i386 Xen immediately and at least for x86-64 I merged most of them. If the other things were submitted earlier a lot of it could be already merged too. Then Xen net/block/char etc. drivers should be submitted to the respective maintainers independently (they are useful even without the rest of Xen in HVM guests) > 140-odd entries which are being worked on. It's slow and tedious, > but in progress. What I would do is to concentrate on the small cleanup patches first and post them as soon as you fix them. I think a lot of them were actually ok without changes. Then bigger stuff piece by piece. You don't need to wait to fix everything first. -Andi ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: merging new drivers (was Re: 2.6.18 -mm merge plans) 2006-06-06 7:01 ` Andi Kleen @ 2006-06-06 13:04 ` Steven Rostedt 0 siblings, 0 replies; 166+ messages in thread From: Steven Rostedt @ 2006-06-06 13:04 UTC (permalink / raw) To: Andi Kleen Cc: Chris Wright, Alan Cox, Francois Romieu, Jeff Garzik, Andrew Morton, linux-kernel On Tue, 2006-06-06 at 09:01 +0200, Andi Kleen wrote: > > > 140-odd entries which are being worked on. It's slow and tedious, > > but in progress. > > What I would do is to concentrate on the small cleanup patches first > and post them as soon as you fix them. I think a lot of them were > actually ok without changes. Then bigger stuff piece by piece. You > don't need to wait to fix everything first. I totally agree with this approach. There are probably over 900 patches that have been accepted into mainline as cleanups and bug fixes that originated from Ingo's -rt patch set. A lot of developers don't realize how much the -rt patch has helped the mainline kernel. But to keep the -rt patch maintainable, when a problem or cleanup is discovered, we try very hard to get that fix into mainline. That way we don't still need to maintain it. Doesn't the Xen team do the same? -- Steve ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans -- GFS 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (11 preceding siblings ...) 2006-06-05 1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik @ 2006-06-05 13:38 ` David Woodhouse 2006-06-05 14:10 ` Russell King 2006-06-05 15:01 ` Steven Whitehouse 2006-06-05 14:08 ` 2.6.18 -mm merge plans Oleg Nesterov ` (7 subsequent siblings) 20 siblings, 2 replies; 166+ messages in thread From: David Woodhouse @ 2006-06-05 13:38 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, Patrick Caulfield, Steven Whitehouse, davej, David Teigland On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote: > It's time to take a look at the -mm queue for 2.6.18. You didn't mention GFS2 either -- is that ready to go upstream? It contains this in its user<->kernel communication (whitespace sic)... /* struct passed to the lock write */ struct dlm_lock_params { __u8 mode; __u16 flags; __u32 lkid; __u32 parent; __u8 namelen; void __user *castparam; void __user *castaddr; void __user *bastparam; void __user *bastaddr; struct dlm_lksb __user *lksb; char lvb[DLM_USER_LVB_LEN]; char name[1]; }; struct dlm_lspace_params { __u32 flags; __u32 minor; char name[1]; }; struct dlm_write_request { __u32 version[3]; __u8 cmd; union { struct dlm_lock_params lock; struct dlm_lspace_params lspace; } i; }; /* struct read from the "device" fd, consists mainly of userspace pointers for the library to use */ struct dlm_lock_result { __u32 length; void __user * user_astaddr; void __user * user_astparam; struct dlm_lksb __user * user_lksb; struct dlm_lksb lksb; __u8 bast_mode; /* Offsets may be zero if no data is present */ __u32 lvb_offset; }; Now, the intention seems to be that instead of doing CONFIG_COMPAT stuff in the kernel for backwards-compatibility with 32-bit userspace on 64-bit kernels, we _instead_ attempt to make the 32-bit userspace _forward_ compatible with 64-bit kernels. The userspace side of this is implemented (for sparc and s390 only) at http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/cluster/dlm/lib/dlm32.c?rev=1.3&content-type=text/plain&cvsroot=cluster That approach looks broken when i386 binaries are run on x86_64, because the offset of the 'qinfo' member in a struct which starts like this... struct dlm_query_params64 { uint32_t query; uint64_t qinfo; ... ... is going to be _different_ between the 32-bit userspace code and the 64-bit kernel anyway, despite the fact that this structure is supposed to match the 64-bit kernel's idea of the structure. The hdrcleanup and hdrinstall stuff helped to highlight this, because it now stands out as being immediately obvious that we're adding pointers to user-visible structures. I'm being asked to add the GFS2 headers to Fedora's glibc-kernheaders package, but I _don't_ want to diverge from upstream -- part of the reason for 'make headers_install' was that all the distros will be able to ship a _consistent_ set of headers. So if we're going to barf at the compat stuff above, can we do it ASAP and get it fixed, or if we're going to accept the "userspace must be forward-compatible" approach and send it to Linus as is, can we reach a consensus on that instead? -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans -- GFS 2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse @ 2006-06-05 14:10 ` Russell King 2006-06-05 15:01 ` Steven Whitehouse 1 sibling, 0 replies; 166+ messages in thread From: Russell King @ 2006-06-05 14:10 UTC (permalink / raw) To: David Woodhouse Cc: Andrew Morton, linux-kernel, Patrick Caulfield, Steven Whitehouse, davej, David Teigland On Mon, Jun 05, 2006 at 02:38:50PM +0100, David Woodhouse wrote: > You didn't mention GFS2 either -- is that ready to go upstream? > It contains this in its user<->kernel communication (whitespace sic)... > > /* struct passed to the lock write */ > struct dlm_lock_params { > __u8 mode; > __u16 flags; > __u32 lkid; > __u32 parent; > __u8 namelen; Hmm. This is going to be subject to random compiler padding. It would be much better to have: __u8 mode; __u8 namelen; __u16 flags; __u32 lkid; __u32 parent; which should be less subject to compiler padding. > struct dlm_write_request { > __u32 version[3]; > __u8 cmd; Ditto - though maybe following this by: __u8 unused[3]; would be a sane solution. > struct dlm_lock_result { > __u32 length; > void __user * user_astaddr; > void __user * user_astparam; > struct dlm_lksb __user * user_lksb; > struct dlm_lksb lksb; > __u8 bast_mode; Ditto. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans -- GFS 2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse 2006-06-05 14:10 ` Russell King @ 2006-06-05 15:01 ` Steven Whitehouse 2006-06-07 7:12 ` Steven Whitehouse 1 sibling, 1 reply; 166+ messages in thread From: Steven Whitehouse @ 2006-06-05 15:01 UTC (permalink / raw) To: David Woodhouse Cc: Andrew Morton, linux-kernel, Patrick Caulfield, davej, David Teigland Hi, On Mon, 2006-06-05 at 14:38 +0100, David Woodhouse wrote: > On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote: > > It's time to take a look at the -mm queue for 2.6.18. > > You didn't mention GFS2 either -- is that ready to go upstream? Assuming that 2.6.18 is imminent, as I understand it is, then my preferred option would be for GFS2 to spend one more cycle in -mm, assuming that nobody would disagree with that. I pretty sure that we'll be ready by the time 2.6.19 comes around to request inclusion upstream but 2.6.18 might be just a bit too soon, Steve. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans -- GFS 2006-06-05 15:01 ` Steven Whitehouse @ 2006-06-07 7:12 ` Steven Whitehouse 0 siblings, 0 replies; 166+ messages in thread From: Steven Whitehouse @ 2006-06-07 7:12 UTC (permalink / raw) To: Steven Whitehouse Cc: David Teigland, davej, Patrick Caulfield, linux-kernel, Andrew Morton, David Woodhouse Hi, [at the risk of appearing to be mad by replying to myself...] On Mon, 2006-06-05 at 16:01 +0100, Steven Whitehouse wrote: > Hi, > > On Mon, 2006-06-05 at 14:38 +0100, David Woodhouse wrote: > > On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote: > > > It's time to take a look at the -mm queue for 2.6.18. > > > > You didn't mention GFS2 either -- is that ready to go upstream? > > Assuming that 2.6.18 is imminent, as I understand it is, then my > preferred option would be for GFS2 to spend one more cycle in -mm, > assuming that nobody would disagree with that. I pretty sure that we'll > be ready by the time 2.6.19 comes around to request inclusion upstream > but 2.6.18 might be just a bit too soon, > > Steve. > To clarify the above a bit more, there is one regression in GFS2 in the current -mm tree that needs to be fixed. Since Linus has announced 2.6.17-rc6, there is now in fact time for that to happen, so that we will be ready to merge for 2.6.18. Sorry for any confusion my earlier comment might have caused, Steve. [now recovered from jet lag :-) ] ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (12 preceding siblings ...) 2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse @ 2006-06-05 14:08 ` Oleg Nesterov 2006-06-05 14:43 ` Serge E. Hallyn ` (6 subsequent siblings) 20 siblings, 0 replies; 166+ messages in thread From: Oleg Nesterov @ 2006-06-05 14:08 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On 06/04, Andrew Morton wrote: > > de_thread-fix-lockless-do_each_thread.patch > coredump-optimize-mm-users-traversal.patch > coredump-speedup-sigkill-sending.patch > coredump-kill-ptrace-related-stuff.patch > coredump-kill-ptrace-related-stuff-fix.patch > coredump-dont-take-tasklist_lock.patch > coredump-some-code-relocations.patch > coredump-shutdown-current-process-first.patch > coredump-copy_process-dont-check-signal_group_exit.patch > > Will merge. I have a note here that Roland had issues with > coredump-kill-ptrace-related-stuff.patch? Should be solved by coredump-kill-ptrace-related-stuff-fix.patch. (There was no explicit ack from Roland though). Oleg. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (13 preceding siblings ...) 2006-06-05 14:08 ` 2.6.18 -mm merge plans Oleg Nesterov @ 2006-06-05 14:43 ` Serge E. Hallyn 2006-06-08 19:56 ` Eric W. Biederman 2006-06-06 0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh ` (5 subsequent siblings) 20 siblings, 1 reply; 166+ messages in thread From: Serge E. Hallyn @ 2006-06-05 14:43 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, Eric W. Biederman, Kirill Korotaev, Dave Hansen, Hubertus Franke, Cedric Le Goater Quoting Andrew Morton (akpm@osdl.org): > proc-sysctl-add-_proc_do_string-helper.patch > namespaces-add-nsproxy.patch > namespaces-add-nsproxy-dont-include-compileh.patch > namespaces-incorporate-fs-namespace-into-nsproxy.patch > namespaces-utsname-introduce-temporary-helpers.patch > namespaces-utsname-switch-to-using-uts-namespaces.patch > namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch > namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch > namespaces-utsname-use-init_utsname-when-appropriate.patch > namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch > namespaces-utsname-implement-utsname-namespaces.patch > namespaces-utsname-implement-utsname-namespaces-export.patch > namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch > namespaces-utsname-sysctl-hack.patch > namespaces-utsname-sysctl-hack-cleanup.patch > namespaces-utsname-sysctl-hack-cleanup-2.patch > namespaces-utsname-sysctl-hack-cleanup-2-fix.patch > namespaces-utsname-remove-system_utsname.patch > namespaces-utsname-implement-clone_newuts-flag.patch > uts-copy-nsproxy-only-when-needed.patch > # needed if git-klibc isn't there: > #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch > #namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch > #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch > > utsname virtualisation. This doesn't seem very pointful as a standalone > thing. That's a general problem with infrastructural work for a very > large new feature. > > So probably I'll continue to babysit these patches, unless someone can > identify a decent reason why mainline needs this work. > > I don't want to carry an ever-growing stream of OS-virtualisation > groundwork patches for ever and ever so if we're going to do this thing... > faster, please. Eric, Kirill, Dave, Hubertus, In the spirit of 'faster, please', does someone care to port and resubmit a pidspace patch? I'll do it if noone else wants to, just don't want to step on anyone's toes if you were already working on it. thanks, -serge ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-05 14:43 ` Serge E. Hallyn @ 2006-06-08 19:56 ` Eric W. Biederman 2006-06-09 13:02 ` Serge E. Hallyn 2006-06-09 23:25 ` Serge E. Hallyn 0 siblings, 2 replies; 166+ messages in thread From: Eric W. Biederman @ 2006-06-08 19:56 UTC (permalink / raw) To: Serge E. Hallyn Cc: Andrew Morton, linux-kernel, Kirill Korotaev, Dave Hansen, Hubertus Franke, Cedric Le Goater "Serge E. Hallyn" <serue@us.ibm.com> writes: > Quoting Andrew Morton (akpm@osdl.org): >> proc-sysctl-add-_proc_do_string-helper.patch >> namespaces-add-nsproxy.patch >> namespaces-add-nsproxy-dont-include-compileh.patch >> namespaces-incorporate-fs-namespace-into-nsproxy.patch >> namespaces-utsname-introduce-temporary-helpers.patch >> namespaces-utsname-switch-to-using-uts-namespaces.patch >> namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch >> namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch >> namespaces-utsname-use-init_utsname-when-appropriate.patch >> namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch >> namespaces-utsname-implement-utsname-namespaces.patch >> namespaces-utsname-implement-utsname-namespaces-export.patch >> namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch >> namespaces-utsname-sysctl-hack.patch >> namespaces-utsname-sysctl-hack-cleanup.patch >> namespaces-utsname-sysctl-hack-cleanup-2.patch >> namespaces-utsname-sysctl-hack-cleanup-2-fix.patch >> namespaces-utsname-remove-system_utsname.patch >> namespaces-utsname-implement-clone_newuts-flag.patch >> uts-copy-nsproxy-only-when-needed.patch >> # needed if git-klibc isn't there: >> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch >> #namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch >> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch >> >> utsname virtualisation. This doesn't seem very pointful as a standalone >> thing. That's a general problem with infrastructural work for a very >> large new feature. >> >> So probably I'll continue to babysit these patches, unless someone can >> identify a decent reason why mainline needs this work. >> >> I don't want to carry an ever-growing stream of OS-virtualisation >> groundwork patches for ever and ever so if we're going to do this thing... >> faster, please. Ack. I agree we need to start moving faster. I had a couple of distractions but I should be sending out some relevant patches in a bit. The more we can get out for review before kernel summit the better the conversation will be I suspect. > Eric, Kirill, Dave, Hubertus, > > In the spirit of 'faster, please', does someone care to port and > resubmit a pidspace patch? I think I can get that one. Except for the very tail end though most of my patches probably won't be directly pidspace patches. I'm going to work on killing sys_sysctl a little before I get to far into that. A pidspace is one of the most controversial patches so it is a bit tricky. > I'll do it if noone else wants to, just don't want to step on anyone's > toes if you were already working on it. If you want to help with the bare pid to struct pid conversion I don't have any outstanding patches, and getting that done kills some theoretical pid wrap around problems as well as laying the ground work for a simple pidspace implementation. Eric ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-08 19:56 ` Eric W. Biederman @ 2006-06-09 13:02 ` Serge E. Hallyn 2006-06-09 23:25 ` Serge E. Hallyn 1 sibling, 0 replies; 166+ messages in thread From: Serge E. Hallyn @ 2006-06-09 13:02 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Andrew Morton, linux-kernel, Kirill Korotaev, Dave Hansen, Hubertus Franke, Cedric Le Goater Quoting Eric W. Biederman (ebiederm@xmission.com): > "Serge E. Hallyn" <serue@us.ibm.com> writes: > > Eric, Kirill, Dave, Hubertus, > > > > In the spirit of 'faster, please', does someone care to port and > > resubmit a pidspace patch? > > I think I can get that one. Except for the very tail end though > most of my patches probably won't be directly pidspace patches. > I'm going to work on killing sys_sysctl a little before I > get to far into that. A pidspace is one of the most controversial > patches so it is a bit tricky. > > > I'll do it if noone else wants to, just don't want to step on anyone's > > toes if you were already working on it. > > If you want to help with the bare pid to struct pid conversion I > don't have any outstanding patches, and getting that done kills > some theoretical pid wrap around problems as well as laying the ground > work for a simple pidspace implementation. Yeah, I'll get going on that over the next week. A quick lxr search shows quite a few remaining hits on pid_t :) -serge ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-08 19:56 ` Eric W. Biederman 2006-06-09 13:02 ` Serge E. Hallyn @ 2006-06-09 23:25 ` Serge E. Hallyn 2006-06-10 0:39 ` Eric W. Biederman 2006-06-10 9:53 ` Christoph Hellwig 1 sibling, 2 replies; 166+ messages in thread From: Serge E. Hallyn @ 2006-06-09 23:25 UTC (permalink / raw) To: Eric W. Biederman; +Cc: linux-kernel Quoting Eric W. Biederman (ebiederm@xmission.com): > If you want to help with the bare pid to struct pid conversion I > don't have any outstanding patches, and getting that done kills > some theoretical pid wrap around problems as well as laying the ground > work for a simple pidspace implementation. > > Eric Is this the sort of thing you are looking for? Is this worthwhile for kernel_threads, or only for userspace threads - i.e. do we expect kernel threads to live? If we do want to do this for kernel threads, then I assume that eventually we'll want to change kernel_thread() itself. I actually started to do that earlier, but of course that way every user would have to be changed in the same patch :) Subject: [PATCH] struct pid: convert ieee1394 to hold struct pid ieee1394 driver caches pid_t's for kernel threads. Switch to holding a reference to a struct pid. This prevents concern about the cached pid pointing to the wrong process after the kernel thread dies and pids wrap around. Signed-off-by: Serge Hallyn <serue@us.ibm.com> --- drivers/ieee1394/ieee1394_core.c | 16 ++++++++++------ drivers/ieee1394/nodemgr.c | 12 ++++++++---- 2 files changed, 18 insertions(+), 10 deletions(-) ca429eb5558988a34815c8cdfcecd26a06170f4f diff --git a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c index be6854e..4db5c54 100644 --- a/drivers/ieee1394/ieee1394_core.c +++ b/drivers/ieee1394/ieee1394_core.c @@ -33,6 +33,7 @@ #include <linux/kdev_t.h> #include <linux/skbuff.h> #include <linux/suspend.h> +#include <linux/pid.h> #include <asm/byteorder.h> #include <asm/semaphore.h> @@ -997,7 +998,8 @@ void abort_timedouts(unsigned long __opa * packets that have a "complete" function are sent here. This way, the * completion is run out of kernel context, and doesn't block the rest of * the stack. */ -static int khpsbpkt_pid = -1, khpsbpkt_kill; +static int khpsbpkt_kill; +static struct pid *khpsbpkt_pid; static DECLARE_COMPLETION(khpsbpkt_complete); static struct sk_buff_head hpsbpkt_queue; static DECLARE_MUTEX_LOCKED(khpsbpkt_sig); @@ -1056,6 +1058,7 @@ static int hpsbpkt_thread(void *__hi) static int __init ieee1394_init(void) { int i, ret; + pid_t nr; skb_queue_head_init(&hpsbpkt_queue); @@ -1065,12 +1068,13 @@ static int __init ieee1394_init(void) HPSB_ERR("Some features may not be available\n"); } - khpsbpkt_pid = kernel_thread(hpsbpkt_thread, NULL, CLONE_KERNEL); - if (khpsbpkt_pid < 0) { + nr = kernel_thread(hpsbpkt_thread, NULL, CLONE_KERNEL); + if (nr < 0) { HPSB_ERR("Failed to start hpsbpkt thread!\n"); ret = -ENOMEM; goto exit_cleanup_config_roms; } + khpsbpkt_pid = get_pid(nr); if (register_chrdev_region(IEEE1394_CORE_DEV, 256, "ieee1394")) { HPSB_ERR("unable to register character device major %d!\n", IEEE1394_MAJOR); @@ -1148,8 +1152,8 @@ release_all_bus: release_chrdev: unregister_chrdev_region(IEEE1394_CORE_DEV, 256); exit_release_kernel_thread: - if (khpsbpkt_pid >= 0) { - kill_proc(khpsbpkt_pid, SIGTERM, 1); + if (khpsbpkt_pid) { + kill_proc(khpsbpkt_pid->nr, SIGTERM, 1); wait_for_completion(&khpsbpkt_complete); } exit_cleanup_config_roms: @@ -1172,7 +1176,7 @@ static void __exit ieee1394_cleanup(void bus_remove_file(&ieee1394_bus_type, fw_bus_attrs[i]); bus_unregister(&ieee1394_bus_type); - if (khpsbpkt_pid >= 0) { + if (khpsbpkt_pid) { khpsbpkt_kill = 1; mb(); up(&khpsbpkt_sig); diff --git a/drivers/ieee1394/nodemgr.c b/drivers/ieee1394/nodemgr.c index 082c7fd..d33f2fe 100644 --- a/drivers/ieee1394/nodemgr.c +++ b/drivers/ieee1394/nodemgr.c @@ -19,6 +19,7 @@ #include <linux/delay.h> #include <linux/pci.h> #include <linux/moduleparam.h> +#include <linux/pid.h> #include <asm/atomic.h> #include "ieee1394_types.h" @@ -115,7 +116,7 @@ struct host_info { struct list_head list; struct completion exited; struct semaphore reset_sem; - int pid; + struct pid *pid; char daemon_name[15]; int kill_me; }; @@ -1705,6 +1706,7 @@ int hpsb_node_write(struct node_entry *n static void nodemgr_add_host(struct hpsb_host *host) { struct host_info *hi; + pid_t nr; hi = hpsb_create_hostinfo(&nodemgr_highlevel, host, sizeof(*hi)); @@ -1719,14 +1721,15 @@ static void nodemgr_add_host(struct hpsb sprintf(hi->daemon_name, "knodemgrd_%d", host->id); - hi->pid = kernel_thread(nodemgr_host_thread, hi, CLONE_KERNEL); + nr = kernel_thread(nodemgr_host_thread, hi, CLONE_KERNEL); - if (hi->pid < 0) { + if (nr < 0) { HPSB_ERR ("NodeMgr: failed to start %s thread for %s", hi->daemon_name, host->driver->name); hpsb_destroy_hostinfo(&nodemgr_highlevel, host); return; } + hi->pid = find_get_pid(nr); return; } @@ -1749,11 +1752,12 @@ static void nodemgr_remove_host(struct h struct host_info *hi = hpsb_get_hostinfo(&nodemgr_highlevel, host); if (hi) { - if (hi->pid >= 0) { + if (hi->pid->nr >= 0) { hi->kill_me = 1; mb(); up(&hi->reset_sem); wait_for_completion(&hi->exited); + put_pid(hi->pid); nodemgr_remove_host_dev(&host->device); } } else -- 1.1.6 ^ permalink raw reply related [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-09 23:25 ` Serge E. Hallyn @ 2006-06-10 0:39 ` Eric W. Biederman 2006-06-10 1:23 ` Serge E. Hallyn 2006-06-10 9:53 ` Christoph Hellwig 1 sibling, 1 reply; 166+ messages in thread From: Eric W. Biederman @ 2006-06-10 0:39 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: linux-kernel "Serge E. Hallyn" <serue@us.ibm.com> writes: > Quoting Eric W. Biederman (ebiederm@xmission.com): >> If you want to help with the bare pid to struct pid conversion I >> don't have any outstanding patches, and getting that done kills >> some theoretical pid wrap around problems as well as laying the ground >> work for a simple pidspace implementation. >> >> Eric > > Is this the sort of thing you are looking for? Is this worthwhile for > kernel_threads, or only for userspace threads - i.e. do we expect kernel > threads to live? For kernel threads we should simply be able to use their task struct. In this instance we have hit upon a different problem. Anything using the kernel_thread API instead of the kthread api needs to be updated. The basic problem is that for kernel_threads can show up inside of containers. We can fix that by updating daemonize or we can simply universally use the kthread api. Since the kernel_thread api is deprecated because of these kinds of reasons what really makes sense is to work on the transition to the kthread api. > If we do want to do this for kernel threads, then I assume that > eventually we'll want to change kernel_thread() itself. I actually > started to do that earlier, but of course that way every user would > have to be changed in the same patch :) > > Subject: [PATCH] struct pid: convert ieee1394 to hold struct pid > > ieee1394 driver caches pid_t's for kernel threads. Switch to > holding a reference to a struct pid. This prevents concern > about the cached pid pointing to the wrong process after the > kernel thread dies and pids wrap around. > > Signed-off-by: Serge Hallyn <serue@us.ibm.com> Ok a couple of comments. As I recall there are some pretty sane ways of going from struct pid to a task_struct and then we can use things like group_send_sig. But otherwise you seem to be using struct pid ok. Eric ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-10 0:39 ` Eric W. Biederman @ 2006-06-10 1:23 ` Serge E. Hallyn 2006-06-10 7:52 ` Eric W. Biederman 2006-06-10 8:09 ` Eric W. Biederman 0 siblings, 2 replies; 166+ messages in thread From: Serge E. Hallyn @ 2006-06-10 1:23 UTC (permalink / raw) To: Eric W. Biederman; +Cc: linux-kernel Quoting Eric W. Biederman (ebiederm@xmission.com): > "Serge E. Hallyn" <serue@us.ibm.com> writes: > > > Quoting Eric W. Biederman (ebiederm@xmission.com): > >> If you want to help with the bare pid to struct pid conversion I > >> don't have any outstanding patches, and getting that done kills > >> some theoretical pid wrap around problems as well as laying the ground > >> work for a simple pidspace implementation. > >> > >> Eric > > > > Is this the sort of thing you are looking for? Is this worthwhile for > > kernel_threads, or only for userspace threads - i.e. do we expect kernel > > threads to live? > > For kernel threads we should simply be able to use their task > struct. > > In this instance we have hit upon a different problem. Anything > using the kernel_thread API instead of the kthread api needs > to be updated. > > The basic problem is that for kernel_threads can show up > inside of containers. > > We can fix that by updating daemonize or we can simply > universally use the kthread api. Since the kernel_thread > api is deprecated because of these kinds of reasons > what really makes sense is to work on the transition > to the kthread api. Egads, I apologize. Apparently I was in a daze, as I'd forgotten that converting all kernel_thread users to kthread was something else we wanted to work towards, and which Christoph had explicitly asked for help with. > Ok a couple of comments. > > As I recall there are some pretty sane ways of going > from struct pid to a task_struct and then we can use things > like group_send_sig. Oh, you mean instead of doing kill_proc(struct pid->nr), which I guess was pretty braindead? :) Ok, futile as this may have seemed overall, I think it's helped me figure out what to actually do. thanks, -serge ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-10 1:23 ` Serge E. Hallyn @ 2006-06-10 7:52 ` Eric W. Biederman 2006-06-10 8:09 ` Eric W. Biederman 1 sibling, 0 replies; 166+ messages in thread From: Eric W. Biederman @ 2006-06-10 7:52 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: linux-kernel "Serge E. Hallyn" <serue@us.ibm.com> writes: > Egads, I apologize. > > Apparently I was in a daze, as I'd forgotten that converting > all kernel_thread users to kthread was something else we wanted > to work towards, and which Christoph had explicitly asked for > help with. Yep. And the linux-vserver guys discovered the hard way. >> Ok a couple of comments. >> >> As I recall there are some pretty sane ways of going >> from struct pid to a task_struct and then we can use things >> like group_send_sig. > > Oh, you mean instead of doing kill_proc(struct pid->nr), which > I guess was pretty braindead? :) I think it defeats half our purpose. > Ok, futile as this may have seemed overall, I think it's helped > me figure out what to actually do. Sure and that is what it was aimed to do. You want to attack the kernel_thread -> kthread thing? Eric ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-10 1:23 ` Serge E. Hallyn 2006-06-10 7:52 ` Eric W. Biederman @ 2006-06-10 8:09 ` Eric W. Biederman 1 sibling, 0 replies; 166+ messages in thread From: Eric W. Biederman @ 2006-06-10 8:09 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: linux-kernel "Serge E. Hallyn" <serue@us.ibm.com> writes: > Oh, you mean instead of doing kill_proc(struct pid->nr), which > I guess was pretty braindead? :) For a single process we should be able to do: struct pid *pid = ( some value ... ) struct task_struct *task; rcu_read_lock(); task = pid_task(pid); if (task) group_send_sig_info(sig, info, task); rcu_read_unlock(); If it comes up very often that looks like an idiom that would appreciate a helper function. For process groups we must get a read_lock on the task_list_lock because otherwise the atomicity guarantees of sending a signal to a process group are broken. Eric ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-09 23:25 ` Serge E. Hallyn 2006-06-10 0:39 ` Eric W. Biederman @ 2006-06-10 9:53 ` Christoph Hellwig 1 sibling, 0 replies; 166+ messages in thread From: Christoph Hellwig @ 2006-06-10 9:53 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Eric W. Biederman, linux-kernel On Fri, Jun 09, 2006 at 06:25:51PM -0500, Serge E. Hallyn wrote: > Quoting Eric W. Biederman (ebiederm@xmission.com): > > If you want to help with the bare pid to struct pid conversion I > > don't have any outstanding patches, and getting that done kills > > some theoretical pid wrap around problems as well as laying the ground > > work for a simple pidspace implementation. > > > > Eric > > Is this the sort of thing you are looking for? Is this worthwhile for > kernel_threads, or only for userspace threads - i.e. do we expect kernel > threads to live? > > If we do want to do this for kernel threads, then I assume that > eventually we'll want to change kernel_thread() itself. I actually > started to do that earlier, but of course that way every user would > have to be changed in the same patch :) > > Subject: [PATCH] struct pid: convert ieee1394 to hold struct pid > > ieee1394 driver caches pid_t's for kernel threads. Switch to > holding a reference to a struct pid. This prevents concern > about the cached pid pointing to the wrong process after the > kernel thread dies and pids wrap around. NACK. please conver to the kthread_ API instead. A reference to a pid_t in a driver should generally be treated as a bug, the few exception should be discussed on lkml and commented verbosely. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (14 preceding siblings ...) 2006-06-05 14:43 ` Serge E. Hallyn @ 2006-06-06 0:54 ` Balbir Singh 2006-06-06 22:28 ` Shailabh Nagar 2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt ` (4 subsequent siblings) 20 siblings, 1 reply; 166+ messages in thread From: Balbir Singh @ 2006-06-06 0:54 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Jay Lan, Peter Chubb Andrew Morton wrote: > per-task-delay-accounting-setup.patch > per-task-delay-accounting-setup-fix-1.patch > per-task-delay-accounting-setup-fix-2.patch > per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch > per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch > per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch > per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch > per-task-delay-accounting-utilities-for-genetlink-usage.patch > per-task-delay-accounting-taskstats-interface.patch > per-task-delay-accounting-taskstats-interface-fix-1.patch > per-task-delay-accounting-taskstats-interface-fix-2.patch > per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch > per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch > per-task-delay-accounting-documentation.patch > per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch > per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch > > I just don't know. There are a number of groups who pop up with various > enhanced accounting requirements and patches (all quite different) but I > haven't heard a lot of enthusiasm from any of them over this work, which > attempts to provide an extensible framework for accumulation and querying > of per-task metrics. > > But then again, we cannot just sit there and wait for everyone to be 100% > happy. So I'm 51% inclined to push this along. > > Anyone else who has an interest in this sort of thing needs to be aware > that there will be an expectation that any future statistics submissions > should use these interfaces. So the time to pay attention is right now. > Hi, Andrew, Here is a brief summary of the status of the response we have received from the stakeholders (some of it has been duplicated in previous postings) Project Response 1. CSA accounting/PAGG/JOB: Has agreed to use taskstats Jay Lan <jlan@engr.sgi.com> interface 2. per-process IO statistics: None Levent Serinol <lserinol@gmail.com> Needs are subset of CSA 3. per-cpu time statistics: None (email bounced) Erich Focht <efocht@ess.nec.de> Needs can be met by taskstats Statistics not yet submitted 4. Microstate accounting: None Peter Chubb <peterc@gelato.unsw.edu.au> overlap with delay accounting prefers /proc due to convenience taskstats can meet the needs 5. ELSA: Guillaume Thouvenin None <guillaume.thouvenin@bull.net> ELSA is not a direct user of new kernel statistics Consumer of CSA/BSD accounting statistics 6. pnotify: Jes Sorensen <jes@sgi.com> None (taken over pnotify from Erik Jacobson) Informed over private email that pnotify replacement is being worked on. pnotify or its replacement will not be concerned with exporting data to user space or collecting any statistics. 7. Scalable statistics counters with /proc Not working on it reporting: anymore Ravikiran G Thirumalai, Dipankar Sarma <dipankar@in.ibm.com> Studying the responses from all stake holders, Jay Lan's was the most encouraging. Peter Chubb prefers the /proc interface due to the text interface and ease of parsing. (in our opinion, taskstats can meet the needs easily and the getdelays utility can provide the same ease for parsing). The others did not respond. Some performance numbers of taskstats were posted at http://lkml.org/lkml/2006/3/23/141. The result highlights are included below Results highlights - Configuring delay accounting adds < 0.5% overhead in most cases and even reduces overhead in some cases - Enabling delay accounting has similar results with a maximum overhead of 1.2% for hackbench, most other overheads < 1% and reduction in overhead in some cases These statistics are _per task_ and can be extended easily by anyone who wishes to obtain per task data. An example of per task improved scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381 (I am not sure if the email refers to our per-task statistics). If not, the new statistics could easily use the taskstats interface. These statistics can be used by software product stacks to monitor usage information about the various tasks they create and control. I also informally spoke to a group of students (verbally), who were excited at the possibility of using the per-task statistics to do dynamic deadline based power management. They want to use the delay data (CPU and IO) to predict deadlines for a task and then use these results for dynamically scaling CPU frequency. The ability to monitor the CPU run and delay data and IO delay data is useful. I would request you to consider the inclusion per-task delay accounting into 2.6.18. -- Thanks, Balbir Singh, Linux Technology Center, IBM Software Labs ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-06 0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh @ 2006-06-06 22:28 ` Shailabh Nagar 2006-06-06 22:40 ` Andrew Morton 2006-06-06 22:52 ` Jay Lan 0 siblings, 2 replies; 166+ messages in thread From: Shailabh Nagar @ 2006-06-06 22:28 UTC (permalink / raw) To: balbir; +Cc: Andrew Morton, linux-kernel, Jay Lan, Peter Chubb Balbir Singh wrote: > Andrew Morton wrote: > >> per-task-delay-accounting-setup.patch >> per-task-delay-accounting-setup-fix-1.patch >> per-task-delay-accounting-setup-fix-2.patch >> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch >> >> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch >> >> per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch >> per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch >> per-task-delay-accounting-utilities-for-genetlink-usage.patch >> per-task-delay-accounting-taskstats-interface.patch >> per-task-delay-accounting-taskstats-interface-fix-1.patch >> per-task-delay-accounting-taskstats-interface-fix-2.patch >> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch >> >> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch >> >> per-task-delay-accounting-documentation.patch >> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch >> >> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch >> >> >> I just don't know. There are a number of groups who pop up with various >> enhanced accounting requirements and patches (all quite different) but I >> haven't heard a lot of enthusiasm from any of them over this work, which >> attempts to provide an extensible framework for accumulation and >> querying >> of per-task metrics. >> >> But then again, we cannot just sit there and wait for everyone to be >> 100% >> happy. So I'm 51% inclined to push this along. >> >> Anyone else who has an interest in this sort of thing needs to be aware >> that there will be an expectation that any future statistics submissions >> should use these interfaces. So the time to pay attention is right now. >> > > Hi, Andrew, > > Here is a brief summary of the status of the response we have received from > the stakeholders (some of it has been duplicated in previous postings) > > Project Response > > 1. CSA accounting/PAGG/JOB: Has agreed to use taskstats > Jay Lan <jlan@engr.sgi.com> interface > > 2. per-process IO statistics: None > Levent Serinol <lserinol@gmail.com> Needs are subset of CSA > > 3. per-cpu time statistics: None (email bounced) > Erich Focht <efocht@ess.nec.de> Needs can be met by taskstats > Statistics not yet submitted > > 4. Microstate accounting: None > Peter Chubb <peterc@gelato.unsw.edu.au> overlap with delay accounting > prefers /proc due to > convenience > taskstats can meet the needs > > > 5. ELSA: Guillaume Thouvenin None > <guillaume.thouvenin@bull.net> ELSA is not a direct user > of new kernel statistics > Consumer of CSA/BSD > accounting > statistics > > 6. pnotify: Jes Sorensen <jes@sgi.com> None > (taken over pnotify from Erik Jacobson) Informed over private email > that pnotify replacement is > being worked on. pnotify > or its replacement will > not be concerned with > exporting data to user space > or collecting any statistics. > > > 7. Scalable statistics counters with /proc Not working on it > reporting: anymore > Ravikiran G Thirumalai, > Dipankar Sarma <dipankar@in.ibm.com> > > Studying the responses from all stake holders, Jay Lan's was the most > encouraging. Peter Chubb prefers the /proc interface due to the text > interface > and ease of parsing. (in our opinion, taskstats can meet the needs easily > and the getdelays utility can provide the same ease for parsing). > The others did not respond. > Some performance numbers of taskstats were posted at > http://lkml.org/lkml/2006/3/23/141. The result highlights are included > below > > Results highlights > > - Configuring delay accounting adds < 0.5% > overhead in most cases and even reduces overhead > in some cases > > - Enabling delay accounting has similar results > with a maximum overhead of 1.2% for hackbench, > most other overheads < 1% and reduction in > overhead in some cases > > These statistics are _per task_ and can be extended easily by anyone > who wishes to obtain per task data. An example of per task improved > scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381 > (I am not sure if the email refers to our per-task statistics). If not, > the new statistics could easily use the taskstats interface. > > These statistics can be used by software product stacks to monitor > usage information about the various tasks they create and control. > I also informally spoke to a group of students (verbally), who were > excited at the possibility of using the per-task statistics to do > dynamic deadline based power management. They want to use the delay data > (CPU and IO) to predict deadlines for a task and then use these results > for dynamically scaling CPU frequency. > > > The ability to monitor the CPU run and delay data and IO delay data is > useful. > > I would request you to consider the inclusion per-task delay accounting > into > 2.6.18. > Andrew, The only other new set of patches to be discussed in this context are the statistics-infrastructure patches from Martin Peschke. That infrastructure cannot meet the needs of delay accounting, CSA etc. because - it only provides "user pull" model of getting stats whereas "kernel push" is needed for delay accounting - it uses a relatively slow interface unsuitable for high volumes of data. Each statistic has its own definition, needs to be read separately using ASCII, reading data continuously means open/read/close each time.....all of which is not very conducive to large structures being sent to userspace. - its oriented towards sampled data whereas taskstats isn't. So, we have a good consensus from existing/potential users of taskstats and would very much appreciate it being included in 2.6.18. --Shailabh ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-06 22:28 ` Shailabh Nagar @ 2006-06-06 22:40 ` Andrew Morton 2006-06-08 14:27 ` Shailabh Nagar 2006-06-06 22:52 ` Jay Lan 1 sibling, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-06 22:40 UTC (permalink / raw) To: Shailabh Nagar; +Cc: balbir, linux-kernel, jlan, peterc On Tue, 06 Jun 2006 18:28:15 -0400 Shailabh Nagar <nagar@watson.ibm.com> wrote: > So, we have a good consensus from existing/potential users of taskstats and would > very much appreciate it being included in 2.6.18. Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play wait-and-see on the statistics infrastructure. Greg is taking a look at the stats code, which is good. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-06 22:40 ` Andrew Morton @ 2006-06-08 14:27 ` Shailabh Nagar 2006-06-08 17:42 ` Andrew Morton 0 siblings, 1 reply; 166+ messages in thread From: Shailabh Nagar @ 2006-06-08 14:27 UTC (permalink / raw) To: Andrew Morton; +Cc: balbir, linux-kernel, jlan, peterc Andrew Morton wrote: >On Tue, 06 Jun 2006 18:28:15 -0400 >Shailabh Nagar <nagar@watson.ibm.com> wrote: > > > >>So, we have a good consensus from existing/potential users of taskstats and would >>very much appreciate it being included in 2.6.18. >> >> > >Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play >wait-and-see on the statistics infrastructure. Greg is taking a look at >the stats code, which is good. > > > Thanks ! The suggestion from Jay Lan to extend the interface by making sending of tgid stats configurable is quite reasonable and can be done relatively simply: set some parameter, either by sending a separate command (verify sender is privileged) or by some sysfs parameter and use that to control sending of tgid stats on task exit (as well as allocation of any tgid stat related structures). Would you recommend we submit a patch for it now or wait till after delay accounting has gone into 2.6.18 ? Such requests for extending the interface are likely to happen as more users start using the interface. But since any patch will need some testing etc. and we are very close to the 2.6.18 merge window, I wanted your advice on whether this should wait until later. Regards, Shailabh ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-08 14:27 ` Shailabh Nagar @ 2006-06-08 17:42 ` Andrew Morton 2006-06-08 18:36 ` Shailabh Nagar 0 siblings, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-08 17:42 UTC (permalink / raw) To: Shailabh Nagar; +Cc: balbir, linux-kernel, jlan, peterc On Thu, 08 Jun 2006 10:27:46 -0400 Shailabh Nagar <nagar@watson.ibm.com> wrote: > Andrew Morton wrote: > > >On Tue, 06 Jun 2006 18:28:15 -0400 > >Shailabh Nagar <nagar@watson.ibm.com> wrote: > > > > > > > >>So, we have a good consensus from existing/potential users of taskstats and would > >>very much appreciate it being included in 2.6.18. > >> > >> > > > >Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play > >wait-and-see on the statistics infrastructure. Greg is taking a look at > >the stats code, which is good. > > > > > > > Thanks ! > > The suggestion from Jay Lan to extend the interface by making sending > of tgid stats configurable > is quite reasonable and can be done relatively simply: > set some parameter, either by sending a separate command (verify sender > is privileged) or by > some sysfs parameter and use that to control sending of tgid stats on > task exit (as well as allocation of > any tgid stat related structures). hm. Is it possible to check the privileges of a netlink message sender? > Would you recommend we submit a patch for it now or wait till after > delay accounting has gone into > 2.6.18 ? Earlier, please. > Such requests for extending the interface are likely to happen as more > users start using the interface. > But since any patch will need some testing etc. and we are very close to > the 2.6.18 merge window, I > wanted your advice on whether this should wait until later. If it's merged, we'll have a couple more months to test it, and to fix any little problems. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-08 17:42 ` Andrew Morton @ 2006-06-08 18:36 ` Shailabh Nagar 2006-06-08 19:33 ` Balbir Singh 0 siblings, 1 reply; 166+ messages in thread From: Shailabh Nagar @ 2006-06-08 18:36 UTC (permalink / raw) To: Andrew Morton; +Cc: balbir, linux-kernel, jlan, peterc Andrew Morton wrote: >On Thu, 08 Jun 2006 10:27:46 -0400 >Shailabh Nagar <nagar@watson.ibm.com> wrote: > > > >>Andrew Morton wrote: >> >> >> >>>On Tue, 06 Jun 2006 18:28:15 -0400 >>>Shailabh Nagar <nagar@watson.ibm.com> wrote: >>> >>> >>> >>> >>> >>>>So, we have a good consensus from existing/potential users of taskstats and would >>>>very much appreciate it being included in 2.6.18. >>>> >>>> >>>> >>>> >>>Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play >>>wait-and-see on the statistics infrastructure. Greg is taking a look at >>>the stats code, which is good. >>> >>> >>> >>> >>> >>Thanks ! >> >>The suggestion from Jay Lan to extend the interface by making sending >>of tgid stats configurable >>is quite reasonable and can be done relatively simply: >>set some parameter, either by sending a separate command (verify sender >>is privileged) or by >>some sysfs parameter and use that to control sending of tgid stats on >>task exit (as well as allocation of >>any tgid stat related structures). >> >> > >hm. Is it possible to check the privileges of a netlink message sender? > > Not entirely sure. But there's a check in net/netlink/genetlink.c: genl_rcv_msg() for if ((ops->flags & GENL_ADMIN_PERM) && security_netlink_recv(skb)) { err = -EPERM; goto errout; } and security_netlink_recv(skb), normally set to cap_netlink_recv, checks on the skb's effective capability being CAP_NET_ADMIN which I thought would be sufficient. Need to look further. If it doesn't turn out to fit properly, sysfs config variable can be used. >>Would you recommend we submit a patch for it now or wait till after >>delay accounting has gone into >>2.6.18 ? >> >> > >Earlier, please. > > Ok. will submit asap. > > >>Such requests for extending the interface are likely to happen as more >>users start using the interface. >>But since any patch will need some testing etc. and we are very close to >>the 2.6.18 merge window, I >>wanted your advice on whether this should wait until later. >> >> > >If it's merged, we'll have a couple more months to test it, and to fix any >little problems. > > Sounds good. Thanks, Shailabh ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-08 18:36 ` Shailabh Nagar @ 2006-06-08 19:33 ` Balbir Singh 0 siblings, 0 replies; 166+ messages in thread From: Balbir Singh @ 2006-06-08 19:33 UTC (permalink / raw) To: Shailabh Nagar; +Cc: Andrew Morton, linux-kernel, jlan, peterc Shailabh Nagar wrote: >> hm. Is it possible to check the privileges of a netlink message sender? >> >> > Not entirely sure. But there's. a check in net/netlink/genetlink.c: > genl_rcv_msg() > for > if ((ops->flags & GENL_ADMIN_PERM) && security_netlink_recv(skb)) > { err = -EPERM; > goto errout; > } > > and security_netlink_recv(skb), normally set to cap_netlink_recv, checks > on the skb's effective capability > being CAP_NET_ADMIN which I thought would be sufficient. > Need to look further. > > If it doesn't turn out to fit properly, sysfs config variable can be used. > The genl_ops has a flags field. If the flags field is initialized to GENL_ADMIN_PERM, then privleges are checked as pointed out by you. -- Balbir Singh, Linux Technology Center, IBM Software Labs ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-06 22:28 ` Shailabh Nagar 2006-06-06 22:40 ` Andrew Morton @ 2006-06-06 22:52 ` Jay Lan 2006-06-06 22:55 ` Shailabh Nagar 2006-06-12 12:02 ` Martin Peschke 1 sibling, 2 replies; 166+ messages in thread From: Jay Lan @ 2006-06-06 22:52 UTC (permalink / raw) To: Shailabh Nagar Cc: balbir, Andrew Morton, linux-kernel, Chris Sturtivant, Peter Chubb Shailabh Nagar wrote: > Balbir Singh wrote: > >>Andrew Morton wrote: >> >> >>>per-task-delay-accounting-setup.patch >>>per-task-delay-accounting-setup-fix-1.patch >>>per-task-delay-accounting-setup-fix-2.patch >>>per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch >>> >>>per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch >>> >>>per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch >>>per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch >>>per-task-delay-accounting-utilities-for-genetlink-usage.patch >>>per-task-delay-accounting-taskstats-interface.patch >>>per-task-delay-accounting-taskstats-interface-fix-1.patch >>>per-task-delay-accounting-taskstats-interface-fix-2.patch >>>per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch >>> >>>per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch >>> >>>per-task-delay-accounting-documentation.patch >>>per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch >>> >>>per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch >>> >>> >>> I just don't know. There are a number of groups who pop up with various >>> enhanced accounting requirements and patches (all quite different) but I >>> haven't heard a lot of enthusiasm from any of them over this work, which >>> attempts to provide an extensible framework for accumulation and >>>querying >>> of per-task metrics. >>> >>> But then again, we cannot just sit there and wait for everyone to be >>>100% >>> happy. So I'm 51% inclined to push this along. >>> >>> Anyone else who has an interest in this sort of thing needs to be aware >>> that there will be an expectation that any future statistics submissions >>> should use these interfaces. So the time to pay attention is right now. >>> >> >>Hi, Andrew, >> >>Here is a brief summary of the status of the response we have received from >>the stakeholders (some of it has been duplicated in previous postings) >> >>Project Response >> >>1. CSA accounting/PAGG/JOB: Has agreed to use taskstats >> Jay Lan <jlan@engr.sgi.com> interface >> >>2. per-process IO statistics: None >> Levent Serinol <lserinol@gmail.com> Needs are subset of CSA >> >>3. per-cpu time statistics: None (email bounced) >> Erich Focht <efocht@ess.nec.de> Needs can be met by taskstats >> Statistics not yet submitted >> >>4. Microstate accounting: None >> Peter Chubb <peterc@gelato.unsw.edu.au> overlap with delay accounting >> prefers /proc due to >>convenience >> taskstats can meet the needs >> >> >>5. ELSA: Guillaume Thouvenin None >> <guillaume.thouvenin@bull.net> ELSA is not a direct user >> of new kernel statistics >> Consumer of CSA/BSD >>accounting >> statistics >> >>6. pnotify: Jes Sorensen <jes@sgi.com> None >>(taken over pnotify from Erik Jacobson) Informed over private email >> that pnotify replacement is >> being worked on. pnotify >> or its replacement will >> not be concerned with >> exporting data to user space >> or collecting any statistics. >> >> >>7. Scalable statistics counters with /proc Not working on it >> reporting: anymore >> Ravikiran G Thirumalai, >> Dipankar Sarma <dipankar@in.ibm.com> >> >>Studying the responses from all stake holders, Jay Lan's was the most >>encouraging. Peter Chubb prefers the /proc interface due to the text >>interface >>and ease of parsing. (in our opinion, taskstats can meet the needs easily >>and the getdelays utility can provide the same ease for parsing). >>The others did not respond. >>Some performance numbers of taskstats were posted at >>http://lkml.org/lkml/2006/3/23/141. The result highlights are included >>below >> >> Results highlights >> >> - Configuring delay accounting adds < 0.5% >> overhead in most cases and even reduces overhead >> in some cases >> >> - Enabling delay accounting has similar results >> with a maximum overhead of 1.2% for hackbench, >> most other overheads < 1% and reduction in >> overhead in some cases >> >>These statistics are _per task_ and can be extended easily by anyone >>who wishes to obtain per task data. An example of per task improved >>scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381 >>(I am not sure if the email refers to our per-task statistics). If not, >>the new statistics could easily use the taskstats interface. >> >>These statistics can be used by software product stacks to monitor >>usage information about the various tasks they create and control. >>I also informally spoke to a group of students (verbally), who were >>excited at the possibility of using the per-task statistics to do >>dynamic deadline based power management. They want to use the delay data >>(CPU and IO) to predict deadlines for a task and then use these results >>for dynamically scaling CPU frequency. >> >> >>The ability to monitor the CPU run and delay data and IO delay data is >>useful. >> >>I would request you to consider the inclusion per-task delay accounting >>into >>2.6.18. >> > > > > Andrew, > > The only other new set of patches to be discussed in this context are the > statistics-infrastructure patches from Martin Peschke. > > That infrastructure cannot meet the needs of delay accounting, CSA etc. because > - it only provides "user pull" model of getting stats whereas "kernel push" is > needed for delay accounting Doesn't taskstats interface provide "user pull" request-reply model also? Serious accounting needs to push accounting data as soon as possible. > - it uses a relatively slow interface unsuitable for high volumes of data. Each > statistic has its own definition, needs to be read separately using ASCII, > reading data continuously means open/read/close each time.....all of > which is not very conducive to large structures being sent to userspace. Yes, i second the point. It won't be able to catch up the traffic. > - its oriented towards sampled data whereas taskstats isn't. > > So, we have a good consensus from existing/potential users of taskstats and would > very much appreciate it being included in 2.6.18. Andrew, it has become clear that the community wants to see accounting data processing being moved to userspace. Thus there is a need for a common accounting interface to provide minimal works at kernel (via hooks at fork and exit) and deliver data to userspace. The delayacct patchset provides a good framework and example that i believe CSA/Job can follow and build upon to move most of our work to userspace and thus cut off dependency of PAGG. We will submit CSA patch soon based on the taskstats interface. Thanks, - jay P.S. Balbir and Shailabh, Chris Sturtivant will continue the CSA work at SGI. Please also cc Chris <csturtiv@sgi.com> in the future. Thanks! > > --Shailabh > > > > ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-06 22:52 ` Jay Lan @ 2006-06-06 22:55 ` Shailabh Nagar 2006-06-12 12:02 ` Martin Peschke 1 sibling, 0 replies; 166+ messages in thread From: Shailabh Nagar @ 2006-06-06 22:55 UTC (permalink / raw) To: Jay Lan; +Cc: balbir, Andrew Morton, linux-kernel, Chris Sturtivant, Peter Chubb Jay Lan wrote: > Shailabh Nagar wrote: > >> Balbir Singh wrote: >> >>> Andrew Morton wrote: >>> >>> >>>> per-task-delay-accounting-setup.patch >>>> per-task-delay-accounting-setup-fix-1.patch >>>> per-task-delay-accounting-setup-fix-2.patch >>>> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch >>>> >>>> >>>> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch >>>> >>>> >>>> per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch >>>> per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch >>>> >>>> per-task-delay-accounting-utilities-for-genetlink-usage.patch >>>> per-task-delay-accounting-taskstats-interface.patch >>>> per-task-delay-accounting-taskstats-interface-fix-1.patch >>>> per-task-delay-accounting-taskstats-interface-fix-2.patch >>>> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch >>>> >>>> >>>> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch >>>> >>>> >>>> per-task-delay-accounting-documentation.patch >>>> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch >>>> >>>> >>>> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch >>>> >>>> >>>> >>>> I just don't know. There are a number of groups who pop up with >>>> various >>>> enhanced accounting requirements and patches (all quite different) >>>> but I >>>> haven't heard a lot of enthusiasm from any of them over this work, >>>> which >>>> attempts to provide an extensible framework for accumulation and >>>> querying >>>> of per-task metrics. >>>> >>>> But then again, we cannot just sit there and wait for everyone to be >>>> 100% >>>> happy. So I'm 51% inclined to push this along. >>>> >>>> Anyone else who has an interest in this sort of thing needs to be aware >>>> that there will be an expectation that any future statistics >>>> submissions >>>> should use these interfaces. So the time to pay attention is right >>>> now. >>>> >>> >>> Hi, Andrew, >>> >>> Here is a brief summary of the status of the response we have >>> received from >>> the stakeholders (some of it has been duplicated in previous postings) >>> >>> Project Response >>> >>> 1. CSA accounting/PAGG/JOB: Has agreed to use >>> taskstats >>> Jay Lan <jlan@engr.sgi.com> interface >>> >>> 2. per-process IO statistics: None >>> Levent Serinol <lserinol@gmail.com> Needs are subset of CSA >>> >>> 3. per-cpu time statistics: None (email bounced) >>> Erich Focht <efocht@ess.nec.de> Needs can be met by >>> taskstats >>> Statistics not yet >>> submitted >>> >>> 4. Microstate accounting: None >>> Peter Chubb <peterc@gelato.unsw.edu.au> overlap with delay >>> accounting >>> prefers /proc due to >>> convenience >>> taskstats can meet the >>> needs >>> >>> >>> 5. ELSA: Guillaume Thouvenin None >>> <guillaume.thouvenin@bull.net> ELSA is not a direct user >>> of new kernel statistics >>> Consumer of CSA/BSD >>> accounting >>> statistics >>> >>> 6. pnotify: Jes Sorensen <jes@sgi.com> None >>> (taken over pnotify from Erik Jacobson) Informed over private >>> email >>> that pnotify replacement is >>> being worked on. pnotify >>> or its replacement will >>> not be concerned with >>> exporting data to user >>> space >>> or collecting any >>> statistics. >>> >>> >>> 7. Scalable statistics counters with /proc Not working on it >>> reporting: anymore >>> Ravikiran G Thirumalai, >>> Dipankar Sarma <dipankar@in.ibm.com> >>> >>> Studying the responses from all stake holders, Jay Lan's was the most >>> encouraging. Peter Chubb prefers the /proc interface due to the text >>> interface >>> and ease of parsing. (in our opinion, taskstats can meet the needs >>> easily >>> and the getdelays utility can provide the same ease for parsing). >>> The others did not respond. >>> Some performance numbers of taskstats were posted at >>> http://lkml.org/lkml/2006/3/23/141. The result highlights are included >>> below >>> >>> Results highlights >>> >>> - Configuring delay accounting adds < 0.5% >>> overhead in most cases and even reduces overhead >>> in some cases >>> >>> - Enabling delay accounting has similar results >>> with a maximum overhead of 1.2% for hackbench, >>> most other overheads < 1% and reduction in >>> overhead in some cases >>> >>> These statistics are _per task_ and can be extended easily by anyone >>> who wishes to obtain per task data. An example of per task improved >>> scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381 >>> (I am not sure if the email refers to our per-task statistics). If not, >>> the new statistics could easily use the taskstats interface. >>> >>> These statistics can be used by software product stacks to monitor >>> usage information about the various tasks they create and control. >>> I also informally spoke to a group of students (verbally), who were >>> excited at the possibility of using the per-task statistics to do >>> dynamic deadline based power management. They want to use the delay data >>> (CPU and IO) to predict deadlines for a task and then use these results >>> for dynamically scaling CPU frequency. >>> >>> >>> The ability to monitor the CPU run and delay data and IO delay data is >>> useful. >>> >>> I would request you to consider the inclusion per-task delay accounting >>> into >>> 2.6.18. >>> >> >> >> >> Andrew, >> >> The only other new set of patches to be discussed in this context are the >> statistics-infrastructure patches from Martin Peschke. >> >> That infrastructure cannot meet the needs of delay accounting, CSA >> etc. because >> - it only provides "user pull" model of getting stats whereas "kernel >> push" is >> needed for delay accounting > > > Doesn't taskstats interface provide "user pull" request-reply model > also? Serious accounting needs to push accounting data as soon as > possible. Yes, I meant to say "kernel push" is also needed for delay accounting. So taskstats provides both pull and push whereas statistics infrastructure, on account of use of fs-based interface, provides only user-pull. > >> - it uses a relatively slow interface unsuitable for high volumes of >> data. Each >> statistic has its own definition, needs to be read separately using >> ASCII, >> reading data continuously means open/read/close each time.....all of >> which is not very conducive to large structures being sent to userspace. > > > Yes, i second the point. It won't be able to catch up the traffic. > >> - its oriented towards sampled data whereas taskstats isn't. >> >> So, we have a good consensus from existing/potential users of >> taskstats and would >> very much appreciate it being included in 2.6.18. > > > Andrew, it has become clear that the community wants to see accounting > data processing being moved to userspace. Thus there is a need for a > common accounting interface to provide minimal works at kernel (via > hooks at fork and exit) and deliver data to userspace. > > The delayacct patchset provides a good framework and example that > i believe CSA/Job can follow and build upon to move most of our work > to userspace and thus cut off dependency of PAGG. We will submit CSA > patch soon based on the taskstats interface. > > Thanks, > - jay > > P.S. Balbir and Shailabh, Chris Sturtivant will continue the CSA work > at SGI. Please also cc Chris <csturtiv@sgi.com> in the future. > Thanks! Sure. Thanks, Shailabh > > >> >> --Shailabh >> >> >> >> > ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-06 22:52 ` Jay Lan 2006-06-06 22:55 ` Shailabh Nagar @ 2006-06-12 12:02 ` Martin Peschke 2006-06-12 13:28 ` Shailabh Nagar 1 sibling, 1 reply; 166+ messages in thread From: Martin Peschke @ 2006-06-12 12:02 UTC (permalink / raw) To: Jay Lan Cc: Shailabh Nagar, balbir, Andrew Morton, linux-kernel, Chris Sturtivant, Peter Chubb Jay Lan wrote: > Shailabh Nagar wrote: >> Balbir Singh wrote: >> Andrew, >> >> The only other new set of patches to be discussed in this context are the >> statistics-infrastructure patches from Martin Peschke. >> >> That infrastructure cannot meet the needs of delay accounting, CSA >> etc. because >> - it only provides "user pull" model of getting stats whereas "kernel >> push" is >> needed for delay accounting > > Doesn't taskstats interface provide "user pull" request-reply model > also? Serious accounting needs to push accounting data as soon as > possible. > >> - it uses a relatively slow interface unsuitable for high volumes of >> data. By design. I think it would be fatal to report every event relevant to statistical data gathering up to user space. It's fine to have the kernel maintain counters and to provide preprocessed data. Given that, is there a need for a high-speed interface for a huge amount of unprocessed statistical data? However, the user interface is a just one building brick, which can be enhanced or replaced with moderate effort, if there is a need. >> Each statistic has its own definition, Allowing users to restrict accounting to what they need in their particular case. Sensible defaults are usually available. >> needs to be read separately using ASCII, >> reading data continuously means open/read/close each time.....all of >> which is not very conducive to large structures being sent to userspace. Debugfs file are fine for larger structures. Unless one keeps reading statistics dozens of times per second, I don't see an issue with that. The question is: what are the requirements to be covered? > Yes, i second the point. It won't be able to catch up the traffic. > >> - its oriented towards sampled data whereas taskstats isn't. >> >> So, we have a good consensus from existing/potential users of >> taskstats and would >> very much appreciate it being included in 2.6.18. > > Andrew, it has become clear that the community wants to see accounting > data processing being moved to userspace. Thus there is a need for a > common accounting interface to provide minimal works at kernel (via > hooks at fork and exit) and deliver data to userspace. Both, the statistics infrastructure on behalf of its exploiters as well as the exploiters of the taskstats interface do data preprocessing, that is, maintain counters in the kernel. User space counters won't perform, of course. AFAICS, actual differences are: - triggers for data delivery to user space (statistics infrastructure: when user reads statistics through file, taskstats: on certain task related events, right?) - and, therewith, frequency of data delivery to user space Martin ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) 2006-06-12 12:02 ` Martin Peschke @ 2006-06-12 13:28 ` Shailabh Nagar 0 siblings, 0 replies; 166+ messages in thread From: Shailabh Nagar @ 2006-06-12 13:28 UTC (permalink / raw) To: Martin Peschke Cc: Jay Lan, balbir, Andrew Morton, linux-kernel, Chris Sturtivant, Peter Chubb Martin Peschke wrote: > Jay Lan wrote: > >> Shailabh Nagar wrote: >> >>> Balbir Singh wrote: >> > >>> Andrew, >>> >>> The only other new set of patches to be discussed in this context >>> are the >>> statistics-infrastructure patches from Martin Peschke. >>> >>> That infrastructure cannot meet the needs of delay accounting, CSA >>> etc. because >>> - it only provides "user pull" model of getting stats whereas >>> "kernel push" is >>> needed for delay accounting >> >> >> Doesn't taskstats interface provide "user pull" request-reply model >> also? Serious accounting needs to push accounting data as soon as >> possible. >> >>> - it uses a relatively slow interface unsuitable for high volumes of >>> data. >> > > By design. > > I think it would be fatal to report every event relevant > to statistical data gathering up to user space. It's fine to have > the kernel maintain counters and to provide preprocessed data. > > Given that, is there a need for a high-speed interface for a > huge amount of unprocessed statistical data? Broadly speaking, yes. Inserting policy into the kernel will no doubt save the data being sent to userspace but also - limits flexibility of what userspace can do with it and - adds to the kernel code base unnecessarily Specifically for taskstats, there is a need for a high-speed interface because of the potential volume of data resulting from - large number of tasks in a single kernel - high frequency of task exits Some kernel-based preprocessing, such as per-tgid aggregation, can help cut down the volume but as long as we have a need for getting per-task data, an efficient interface will matter, atleast for our needs. > > However, the user interface is a just one building brick, > which can be enhanced or replaced with moderate effort, if > there is a need. True. This is not to suggest statistical infrastructure's interface choice isn't correct.. just that its not enough for the needs we seek to serve. A filesystem based interface has plenty of usability benefits so its primarily a question of which stats you want to export using its interface. > > >> Each statistic has its own definition, > > Allowing users to restrict accounting to what they need in their > particular case. Sensible defaults are usually available. > >>> needs to be read separately using ASCII, >>> reading data continuously means open/read/close each time.....all of >>> which is not very conducive to large structures being sent to >>> userspace. >> > > Debugfs file are fine for larger structures. > > Unless one keeps reading statistics dozens of times per second, > I don't see an issue with that. Since delay accounting stats can be exploited for resource management at user space, if one wants to get data for all tasks/processes periodically, it could add up to a fairly high demand for user<->kernel bandwidth even if the frequency need for reading one task's stats isn't that high. Its a question of scalability as number of tasks increase. For systemwide stats, your point is well taken...unlikely to be an issue, unless of course, one needs to read lots of them. > > The question is: what are the requirements to be covered? Yup...I think the infrastructures are serving differing needs. > >> Yes, i second the point. It won't be able to catch up the traffic. >> >>> - its oriented towards sampled data whereas taskstats isn't. >>> >>> So, we have a good consensus from existing/potential users of >>> taskstats and would >>> very much appreciate it being included in 2.6.18. >> >> >> Andrew, it has become clear that the community wants to see accounting >> data processing being moved to userspace. Thus there is a need for a >> common accounting interface to provide minimal works at kernel (via >> hooks at fork and exit) and deliver data to userspace. > > > Both, the statistics infrastructure on behalf of its exploiters as well > as the exploiters of the taskstats interface do data preprocessing, > that is, maintain counters in the kernel. > User space counters won't perform, of course. > > AFAICS, actual differences are: > - triggers for data delivery to user space > (statistics infrastructure: when user reads statistics through file, > taskstats: on certain task related events, right?) Taskstats data delivery is triggered by - user asking for data (i.e. akin to reading through a file, only done through a command-response interface) - on task exit event > - and, therewith, frequency of data delivery to user space > > > Martin --Shailabh ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm pi-futex merge 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (15 preceding siblings ...) 2006-06-06 0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh @ 2006-06-06 12:32 ` Steven Rostedt 2006-06-06 13:34 ` Roman Zippel 2006-06-06 14:42 ` genirq Ingo Molnar ` (3 subsequent siblings) 20 siblings, 1 reply; 166+ messages in thread From: Steven Rostedt @ 2006-06-06 12:32 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote: > pi-futex-futex-code-cleanups.patch > pi-futex-robust-futex-docs-fix.patch > pi-futex-introduce-debug_check_no_locks_freed.patch > pi-futex-introduce-warn_on_smp.patch > pi-futex-add-plist-implementation.patch > pi-futex-scheduler-support-for-pi.patch > pi-futex-rt-mutex-core.patch > pi-futex-rt-mutex-docs.patch > pi-futex-rt-mutex-docs-update.patch > pi-futex-rt-mutex-debug.patch > pi-futex-rt-mutex-tester.patch > pi-futex-rt-mutex-futex-api.patch > pi-futex-futex_lock_pi-futex_unlock_pi-support.patch > # > futex_requeue-optimization.patch > > Priority-inheriting futexes. I don't have a clue how this code works, > but it sure has a lot of trylocks for something which allegedly works. > Will merge. Andrew, I wrote the rt-mutex-design.txt just so you would have a clue :) If you have any questions, I would be happy to update it to make it clearer. -- Steve ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm pi-futex merge 2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt @ 2006-06-06 13:34 ` Roman Zippel 2006-06-06 13:44 ` Steven Rostedt 0 siblings, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-06 13:34 UTC (permalink / raw) To: Steven Rostedt; +Cc: Andrew Morton, linux-kernel, tglx Hi, On Tue, 6 Jun 2006, Steven Rostedt wrote: > On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote: > > > pi-futex-futex-code-cleanups.patch > > pi-futex-robust-futex-docs-fix.patch > > pi-futex-introduce-debug_check_no_locks_freed.patch > > pi-futex-introduce-warn_on_smp.patch > > pi-futex-add-plist-implementation.patch > > pi-futex-scheduler-support-for-pi.patch > > pi-futex-rt-mutex-core.patch > > pi-futex-rt-mutex-docs.patch > > pi-futex-rt-mutex-docs-update.patch > > pi-futex-rt-mutex-debug.patch > > pi-futex-rt-mutex-tester.patch > > pi-futex-rt-mutex-futex-api.patch > > pi-futex-futex_lock_pi-futex_unlock_pi-support.patch > > # > > futex_requeue-optimization.patch > > > > Priority-inheriting futexes. I don't have a clue how this code works, > > but it sure has a lot of trylocks for something which allegedly works. > > Will merge. Please also include the patch below to fix defaults and dependencies. Thomas, could you please also provide a little more verbose help text? BTW what's the correct spelling - RT Mutex, rt mutex or rt-mutex? bye, Roman [PATCH] fix rt-mutex defaults and dependencies Signed-off-by: Roman Zippel <zippel@linux-m68k.org> --- lib/Kconfig.debug | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) Index: linux-2.6-mm/lib/Kconfig.debug =================================================================== --- linux-2.6-mm.orig/lib/Kconfig.debug 2006-06-06 15:24:45.000000000 +0200 +++ linux-2.6-mm/lib/Kconfig.debug 2006-06-06 15:25:30.000000000 +0200 @@ -158,7 +158,6 @@ config DEBUG_MUTEX_DEADLOCKS config DEBUG_RT_MUTEXES bool "RT Mutex debugging, deadlock detection" - default y depends on DEBUG_KERNEL && RT_MUTEXES help This allows rt mutex semantics violations and rt mutex related @@ -171,8 +170,7 @@ config DEBUG_PI_LIST config RT_MUTEX_TESTER bool "Built-in scriptable tester for rt-mutexes" - depends on RT_MUTEXES - default n + depends on DEBUG_KERNEL && RT_MUTEXES help This option enables a rt-mutex tester. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm pi-futex merge 2006-06-06 13:34 ` Roman Zippel @ 2006-06-06 13:44 ` Steven Rostedt 0 siblings, 0 replies; 166+ messages in thread From: Steven Rostedt @ 2006-06-06 13:44 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel, tglx On Tue, 2006-06-06 at 15:34 +0200, Roman Zippel wrote: > BTW what's the correct spelling - RT Mutex, rt mutex or rt-mutex? I'd recommend "RT Mutex" for when it is in titles (as it is now) and rt-mutex when explaining the code. IMO "rt mutex" is just wrong. -- Steve ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: genirq 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (16 preceding siblings ...) 2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt @ 2006-06-06 14:42 ` Ingo Molnar 2006-06-06 16:56 ` genirq Daniel Walker 2006-06-07 3:46 ` genirq Benjamin Herrenschmidt 2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar ` (2 subsequent siblings) 20 siblings, 2 replies; 166+ messages in thread From: Ingo Molnar @ 2006-06-06 14:42 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Thomas Gleixner * Andrew Morton <akpm@osdl.org> wrote: > genirq-rename-desc-handler-to-desc-chip.patch > genirq-rename-desc-handler-to-desc-chip-power-fix.patch > genirq-rename-desc-handler-to-desc-chip-ia64-fix.patch > genirq-rename-desc-handler-to-desc-chip-ia64-fix-2.patch > genirq-sem2mutex-probe_sem-probing_active.patch > genirq-cleanup-merge-irq_affinity-into-irq_desc.patch > genirq-cleanup-remove-irq_descp.patch > genirq-cleanup-remove-irq_descp-fix.patch > genirq-cleanup-remove-fastcall.patch > genirq-cleanup-misc-code-cleanups.patch > genirq-cleanup-reduce-irq_desc_t-use-mark-it-obsolete.patch > genirq-cleanup-include-linux-irqh.patch > genirq-cleanup-merge-irq_dir-smp_affinity_entry-into-irq_desc.patch > genirq-cleanup-merge-pending_irq_cpumask-into-irq_desc.patch > genirq-cleanup-turn-arch_has_irq_per_cpu-into-config_irq_per_cpu.patch > genirq-debug-better-debug-printout-in-enable_irq.patch > genirq-add-retrigger-irq-op-to-consolidate-hw_irq_resend.patch > genirq-doc-comment-include-linux-irqh-structures.patch > genirq-doc-handle_irq_event-and-__do_irq-comments.patch > genirq-cleanup-no_irq_type-cleanups.patch > genirq-doc-add-design-documentation.patch > genirq-add-genirq-sw-irq-retrigger.patch > genirq-add-irq_noprobe-support.patch > genirq-add-irq_norequest-support.patch > genirq-add-irq_noautoen-support.patch > genirq-update-copyrights.patch > genirq-core.patch > genirq-msi-fixes-2.patch > genirq-add-irq-chip-support.patch > genirq-add-irq-chip-support-fix.patch > genirq-add-handle_bad_irq.patch > genirq-add-irq-wake-power-management-support.patch > genirq-add-sa_trigger-support.patch > genirq-cleanup-no_irq_type-no_irq_chip-rename.patch > genirq-convert-the-x86_64-architecture-to-irq-chips.patch > genirq-convert-the-i386-architecture-to-irq-chips.patch > genirq-convert-the-i386-architecture-to-irq-chips-fix-2.patch > genirq-more-verbose-debugging-on-unexpected-irq-vectors.patch > genirq-add-chip-eoi-fastack-fasteoi.patch > genirq-add-chip-eoi-fastack-fasteoi-fix.patch > > Still stabilising. It's looking more like 2.6.19 material. Needs > more review from arch maintainers too. there hasnt been any real problem since the MSI one. The core bits are rather stable. The patch-queue had positive input from the maintainers of the two architectures with the most complex IRQ hardware (arm and ppc*), and that's reassuring. But in any case, other architectures are not affected at all (sans brow paperbag build bugs and typos), their __do_IRQ() handling remains unchanged. So i'd like to see this in 2.6.18. (there a good deal of stuff we have ontop of genirq) (the irqpoll discussions are unrelated to genirq - they are fixes for an irqpoll problem that the lock validator uncovered, and naturally those patches were ontop of genirq.) Ingo ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: genirq 2006-06-06 14:42 ` genirq Ingo Molnar @ 2006-06-06 16:56 ` Daniel Walker 2006-06-07 8:42 ` genirq Ingo Molnar 2006-06-07 3:46 ` genirq Benjamin Herrenschmidt 1 sibling, 1 reply; 166+ messages in thread From: Daniel Walker @ 2006-06-06 16:56 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner On Tue, 2006-06-06 at 16:42 +0200, Ingo Molnar wrote: > there hasnt been any real problem since the MSI one. The core bits are > rather stable. The patch-queue had positive input from the maintainers > of the two architectures with the most complex IRQ hardware (arm and > ppc*), and that's reassuring. But in any case, other architectures are > not affected at all (sans brow paperbag build bugs and typos), their > __do_IRQ() handling remains unchanged. So i'd like to see this in > 2.6.18. (there a good deal of stuff we have ontop of genirq) There was a problem reported by Kevin Hillman , the -v5 version was not functional on ARM omap boards .. Was that handled already in -v6? Daniel ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: genirq 2006-06-06 16:56 ` genirq Daniel Walker @ 2006-06-07 8:42 ` Ingo Molnar 0 siblings, 0 replies; 166+ messages in thread From: Ingo Molnar @ 2006-06-07 8:42 UTC (permalink / raw) To: Daniel Walker; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner * Daniel Walker <dwalker@mvista.com> wrote: > On Tue, 2006-06-06 at 16:42 +0200, Ingo Molnar wrote: > > > there hasnt been any real problem since the MSI one. The core bits are > > rather stable. The patch-queue had positive input from the maintainers > > of the two architectures with the most complex IRQ hardware (arm and > > ppc*), and that's reassuring. But in any case, other architectures are > > not affected at all (sans brow paperbag build bugs and typos), their > > __do_IRQ() handling remains unchanged. So i'd like to see this in > > 2.6.18. (there a good deal of stuff we have ontop of genirq) > > There was a problem reported by Kevin Hillman , the -v5 version was > not functional on ARM omap boards .. Was that handled already in -v6? Daniel - you should be aware that the -mm genirq lineup does _not_ include the ARM bits. Those changes go via the normal ARM QA and merge path, i.e. via rmk. The -mm lineup only includes the generic bits (for type-based platforms). In any case, they dont impact upstream genirq merging plans. (The omap thing in the armirq queue is likely some small thing. Thomas is checking it.) Ingo ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: genirq 2006-06-06 14:42 ` genirq Ingo Molnar 2006-06-06 16:56 ` genirq Daniel Walker @ 2006-06-07 3:46 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 166+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-07 3:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner > > Still stabilising. It's looking more like 2.6.19 material. Needs > > more review from arch maintainers too. > > there hasnt been any real problem since the MSI one. The core bits are > rather stable. The patch-queue had positive input from the maintainers > of the two architectures with the most complex IRQ hardware (arm and > ppc*), and that's reassuring. But in any case, other architectures are > not affected at all (sans brow paperbag build bugs and typos), their > __do_IRQ() handling remains unchanged. So i'd like to see this in > 2.6.18. (there a good deal of stuff we have ontop of genirq) > > (the irqpoll discussions are unrelated to genirq - they are fixes for an > irqpoll problem that the lock validator uncovered, and naturally those > patches were ontop of genirq.) I vote for genirq inclusion in 2.6.18 too. I'm almost finishing porting powerpc over to it and so far it looks good. In addition, I'm pretty confident the patches have a very low impact (if at all) on archs that haven't been ported over (the old mecanism is still there mostly untouched). In addition, I'm doing some fairly heavy rework of some of the powerpc irq management that is based on top of the genirq port and I'd really want it in 2.6.18... Finally, we are doing some crash-work on MSI (trying to get some basic support for powerpc separate from the current unuseable drivers/pci/msi.c) so we can at least get something working in 2.6.18 and that too will be based on my work mentioned above. So as far as I'm concerned, genirq is pretty important to have in right at the beginning. Cheers, Ben. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (17 preceding siblings ...) 2006-06-06 14:42 ` genirq Ingo Molnar @ 2006-06-06 14:53 ` Ingo Molnar 2006-06-06 16:02 ` Andrew Morton 2006-06-07 3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt 2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig 20 siblings, 1 reply; 166+ messages in thread From: Ingo Molnar @ 2006-06-06 14:53 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Arjan van de Ven * Andrew Morton <akpm@osdl.org> wrote: > lock-validator-floppyc-irq-release-fix.patch > lock-validator-floppyc-irq-release-fix-fix.patch > lock-validator-forcedethc-fix.patch > lock-validator-mutex-section-binutils-workaround.patch > lock-validator-add-__module_address-method.patch > lock-validator-better-lock-debugging.patch > lock-validator-locking-api-self-tests.patch > lock-validator-locking-api-self-tests-self-test-fix.patch > lock-validator-locking-init-debugging-improvement.patch > lock-validator-beautify-x86_64-stacktraces.patch > lock-validator-beautify-x86_64-stacktraces-fix.patch > lock-validator-beautify-x86_64-stacktraces-fix-2.patch > lock-validator-beautify-x86_64-stacktraces-fix-3.patch > lock-validator-beautify-x86_64-stacktraces-fix-4.patch > lock-validator-x86_64-document-stack-frame-internals.patch > lock-validator-stacktrace.patch > lock-validator-stacktrace-build-fix.patch > lock-validator-stacktrace-warning-fix.patch > lock-validator-stacktrace-fix-on-x86_64.patch > lock-validator-fown-locking-workaround.patch > lock-validator-sk_callback_lock-workaround.patch > lock-validator-irqtrace-core.patch > lock-validator-irqtrace-core-powerpc-fix-1.patch > lock-validator-irqtrace-core-non-x86-fix.patch > lock-validator-irqtrace-core-non-x86-fix-2.patch > lock-validator-irqtrace-core-non-x86-fix-3.patch > lock-validator-irqtrace-entrys-fix.patch > lock-validator-irqtrace-core-remove-softirqc-warn_on.patch > lock-validator-irqtrace-cleanup-include-asm-i386-irqflagsh.patch > lock-validator-irqtrace-cleanup-include-asm-x86_64-irqflagsh.patch > lock-validator-x86_64-irqflags-trace-entrys-fix.patch > lock-validator-lockdep-add-local_irq_enable_in_hardirq-api.patch > lock-validator-add-per_cpu_offset.patch > lock-validator-add-per_cpu_offset-fix.patch > lock-validator-core.patch > lock-validator-core-early_boot_irqs_-build-fix.patch > lock-validator-core-fix-compiler-warning.patch > lock-validator-procfs.patch > lock-validator-core-multichar-fix.patch > lock-validator-core-count_matching_names-fix.patch > lock-validator-design-docs.patch > lock-validator-prove-rwsem-locking-correctness.patch > lock-validator-prove-rwsem-locking-correctness-fix.patch > lock-validator-prove-rwsem-locking-correctness-powerpc-fix.patch > lock-validator-prove-spinlock-rwlock-locking-correctness.patch > lock-validator-prove-mutex-locking-correctness.patch > lock-validator-prove-mutex-locking-correctness-fix-null-type-name-bug.patch > lock-validator-print-all-lock-types-on-sysrq-d.patch > lock-validator-x86_64-early-init.patch > lock-validator-smp-alternatives-workaround.patch > lock-validator-do-not-recurse-in-printk.patch > lock-validator-disable-nmi-watchdog-if-config_lockdep.patch > lock-validator-disable-nmi-watchdog-if-config_lockdep-i386.patch > lock-validator-disable-nmi-watchdog-if-config_lockdep-x86_64.patch > lock-validator-special-locking-bdev.patch > lock-validator-special-locking-direct-io.patch > lock-validator-special-locking-serial.patch > lock-validator-special-locking-serial-fix.patch > lock-validator-special-locking-dcache.patch > lock-validator-special-locking-i_mutex.patch > lock-validator-special-locking-s_lock.patch > lock-validator-special-locking-futex.patch > lock-validator-special-locking-genirq.patch > lock-validator-special-locking-completions.patch > lock-validator-special-locking-waitqueues.patch > lock-validator-special-locking-mm.patch > lock-validator-special-locking-serio.patch > lock-validator-special-locking-slab.patch > lock-validator-special-locking-skb_queue_head_init.patch > lock-validator-special-locking-net-ipv4-igmpcpatch.patch > lock-validator-special-locking-net-ipv4-igmpc-2.patch > lock-validator-special-locking-timerc.patch > lock-validator-special-locking-schedc.patch > lock-validator-special-locking-hrtimerc.patch > lock-validator-special-locking-sock_lock_init.patch > lock-validator-special-locking-af_unix.patch > lock-validator-special-locking-bh_lock_sock.patch > lock-validator-special-locking-mmap_sem.patch > lock-validator-special-locking-sb-s_umount.patch > lock-validator-special-locking-sb-s_umount-fix.patch > lock-validator-special-locking-sb-s_umount-2.patch > lock-validator-special-locking-sb-s_umount-2-fix.patch > lockdep-annotate-rpc_populate-for.patch > lock-validator-special-locking-jbd.patch > lock-validator-special-locking-posix-timers.patch > lock-validator-special-locking-sch_genericc.patch > lock-validator-special-locking-xfrm.patch > lockdep-add-i_mutex-ordering-annotations-to-the-sunrpc.patch > lockdep-add-parent-child-annotations-to-usbfs.patch > lock-validator-special-locking-sound-core-seq-seq_portsc.patch > lock-validator-special-locking-sound-core-seq-seq_devicec.patch > lock-validator-special-locking-sound-core-seq-seq_devicec-fix.patch > lock-validator-fix-rt_hash_lock_sz.patch > lock-validator-introduce-irq__lockdep.patch > locking-validator-special-rule-8390c-disable_irq.patch > locking-validator-special-rule-3c59xc-disable_irq.patch > lock-validator-enable-lock-validator-in-kconfig.patch > lock-validator-enable-lock-validator-in-kconfig-require-trace_irqflags_support.patch > lock-validator-enable-lock-validator-in-kconfig-not-yet.patch > lockdep-one-stacktrace-column-if-config_lockdep=y.patch > i386-remove-multi-entry-backtraces.patch > lockdep-further-improve-stacktrace-output.patch > lock-validator-irqtrace-support-non-x86-architectures.patch > lock-validator-disable-oprofile-if-lockdep=y.patch > lock-validator-select-kallsyms_all.patch > > I'm not really sure that this has as good a bugfixes/effort ratio as > would, say, working on our ever-growing bugzilla list. well, the two sets of bugs are pretty much disjunct. Deadlocks that trigger (and produce an NMI watchdog output) are easy to fix. But the overwhelming majority of the deadlocks the lock validator found were not actually triggered. > But given that it exists, and that it'll fix (or rather prevent) > future bugs at a constant-but-low rate for a long time, I guess it's > something we want. > > I think it's more like 2.6.19 material. The number of > teach-lockdep-about-this-unusual-but-correct-locking-code patches > continues to grow and I don't think we fully have a handle on how > it'll all end up looking. the biggest proportion of fixlets were due to out-of-order unlocking, which i took care of with CONFIG_DEBUG_NON_NESTED_UNLOCKS. Note that most of those annotations are trivial, and i think we've now got most of them. Also, those annotations are definitely useful in documenting "unusual" locking sequences - and we very much want to document the locking details of Linux. Also note that for example the local_irq_enable_in_hardirq() annotation found at least one real deadlock as well. So unless something unexpected happens in -mm, i'd like to see this merged into 2.6.18 too. Ingo ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar @ 2006-06-06 16:02 ` Andrew Morton 2006-06-06 16:35 ` Arjan van de Ven 2006-06-06 20:47 ` lock validator [2.6.18 -mm merge plans] Ingo Molnar 0 siblings, 2 replies; 166+ messages in thread From: Andrew Morton @ 2006-06-06 16:02 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, arjan On Tue, 6 Jun 2006 16:53:37 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > [ lockdep ] > > So unless something unexpected happens in -mm, i'd like to see this > merged into 2.6.18 too. Well, we _could_, and I guess that we'd get things acceptably sorted out in time for release. But it'll be pretty chaotic and we don't want chaos happening in Linus's tree. I don't think there's any rush here - the code is only now reaching sort-of-ready-for-mm status. And.. - I think we still have a problem with the raid/bdev changes in block_dev.c. - the changes to block_dev.c _do_ impact non-lockdep kernels - we need to take a second look to see which other dont-affect-non-lockdep-kernels patches are in fact affecting non-lockdep kernels - the changes to block_dev.c were pretty awful anyway - did the various review comments I sent get disposed of in some fashion? My overarching concern is the rate at which false-positive workaround patches are piling up. At some point we need to step back and decide whether the goodness justifies the badness. I expect we'll be OK, but I don't think we're yet in a position to know that for sure. (I'm actually quite surprised at how few real bugs this checker has revealed. We must rock, or something). ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-06 16:02 ` Andrew Morton @ 2006-06-06 16:35 ` Arjan van de Ven 2006-06-06 20:47 ` lock validator [2.6.18 -mm merge plans] Ingo Molnar 1 sibling, 0 replies; 166+ messages in thread From: Arjan van de Ven @ 2006-06-06 16:35 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel On Tue, 2006-06-06 at 09:02 -0700, Andrew Morton wrote: > > (I'm actually quite surprised at how few real bugs this checker has > revealed. We must rock, or something). > in part that is because we sent fixes (or bugreports) to the various maintainers for issues that we found already during the development; over the last few months that is. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: lock validator [2.6.18 -mm merge plans] 2006-06-06 16:02 ` Andrew Morton 2006-06-06 16:35 ` Arjan van de Ven @ 2006-06-06 20:47 ` Ingo Molnar 1 sibling, 0 replies; 166+ messages in thread From: Ingo Molnar @ 2006-06-06 20:47 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, arjan * Andrew Morton <akpm@osdl.org> wrote: > - I think we still have a problem with the raid/bdev changes in > block_dev.c. > - the changes to block_dev.c _do_ impact non-lockdep kernels yes, there are a few more functions that do things explicitly. If you worry about the stack footprint, there should be little if any impact: the stack footprint you looked at yesterday was on a kernel that included patches that are not in -mm (the -fno-sibling-calls patch) and the lockdep tracer (-pg) which both increase stack footprint. > - we need to take a second look to see which other > dont-affect-non-lockdep-kernels patches are in fact affecting > non-lockdep kernels there should be no change to the logic of non-lockdep kernels. There might be some minimal impact to the code built. Wanna have an explicit list of what the affected functions are? [but unless you think it causes problems i dont think we should worry about it - we do changes all the time that affect the generated code but dont affect the logic. The inlining overhead thing is a red herring i believe.] > - the changes to block_dev.c were pretty awful anyway yeah - but it just matches the code there. I'll think about finding better ways. > - did the various review comments I sent get disposed of in some > fashion? they were not forgotten at all (and there are others too who have sent feedback), they are next on my list. > My overarching concern is the rate at which false-positive workaround > patches are piling up. [...] i feel a bit frustrated that my arguments regarding these "false positives" remain apparently unanswered. I expressed it lots of times that i find most of the semantics restrictions a necessary step towards having a more robust kernel. The restrictions are: - unordered unlocks. I still think we want to document all of them. They pointed to real bugs multiple times. They pointed to suboptimal code. There has been _one_ case so far that i'd declare a true false positive. Nevertheless i gave up resistance and implemented the CONFIG_DEBUG_NON_NESTED_UNLOCKS (default-off) option, which makes these messages totally voluntary. - stealth locking via disable_irq(). We know that this is both wrong and dangerous. It's wrong because it affects all other handlers on the same IRQ line, for a possibly long period of time. This also uncovered the real deadlock and design flaw related to irqpoll. - nested locking of the same lock-type. These are not broken, nevertheless it's useful to extend validation to these categories too - it found real bugs (about 5) in the networking code for example - some of them were in core networking code. The majority of these are related to i_mutex: here too i think we want to document all the locking rules. (and there are some other restrictions too, which i started enforcing via earlier cleanups of the locking code. As i mentioned in the big mutex flamewar^H^H^H^Hdiscussion, restriction of semantics to a natural model is what i believe leads to a kernel that can be trusted more.) > (I'm actually quite surprised at how few real bugs this checker has > revealed. We must rock, or something). The number of deadlocks found was actually much higher (and happened much sooner) than i expected. I expected there to be less than 10 bugs left, which would trickle in the timeframe of weeks. (because we thought we covered alot of code in our testing) What happened is that we are well above 10 deadlocks found so far, in just a few days. The focus of the validator is on _new code_. Lock dependencies can now reach near-perfect quality almost immediately, while they needed to sit in the kernel for many months before. (and even then the validator found deadlock bugs that were years old) The fact that we now check and document the existing and pretty well-tested kernel too is just an added bonus. Also, the validator was in the works for months and we fixed a bunch of bugs before we posted the patches. In fact in the past month it was based on -mm and was tested on an allyesconfig bzImage bootup which further decreased the rate of detection. I guess i should quote Davem's blog: http://vger.kernel.org/~davem/cgi-bin/blog.cgi "[...] I've known about this for some time, because as he was writing it Ingo passed along some networking locking bugs that this sucker has found. " The networking code is the subsystem with probably the best locking design in place. It's nearly half a million lines of code (not counting drivers) and needed only about 5 annotations so far. Another thing that also reduced the number of deadlock bugs is the effect of the -rt kernel's deadlock checker and agressive preemption model: we found dozens of deadlocks there too. The -rt kernel's deadlock checker covers all lock types too. (while the upstream kernel only covered about 50% of all locking APIs via in-situ deadlock checking.) So there has been alot of focus on the locking APIs in the past year or so, so dont be surprised that the quality of locking in existing kernel code isnt all that bad. Regarding annotations, my current estimation is that we'll have at most ~0.2% rate of explicit annotations (== 'false positives'), which with the ~50,000 locking APIs will be at most 100 places. The deadlock rate for newly released locking code is at least 0.5%-2%, and we introduce about 2000 new locking API uses per kernel release (about 5% of all the existing locking code in the kernel), which means the validator can find 10-40 new deadlocks per kernel release, at the price of 2 new annotations. (where 90% of the annotations are trivial, and the rest is only difficult because no-one knows/remembers the locking rules anymore ...) Furthermore, chances are that problematic locking constructs will be introduced with a lower probability due to the validator. Also, code around existing annotations could be cleaned up too. (as it happened in a number of cases already) So maybe the annotation rate will go down as well. Furtermore, untold amount of developer time is wasted on finding deadlocks. The fresher the code is, the more likely it is that it has a deadlock bug. The bug rate of lock uses could be as high as 10% in totally new code. With the validator, many of these bugs will be found _much_ earlier, improving productivity - and putting less crap into your tree! That will also mean that we wont see most of those bugs. Plus Linux support engineers at sw/hw vendors are spending significant amount of time fixing deadlocks. This has the added twist that changing locking code is always risky in a product, and it's not bad to have some good proof in place that the code they trigger during re-QA is actually correct in terms of locking dependencies. Users also get totally frustrated at deadlocks. If something doesnt work as expected that's an usually an annoyance, but if the box locks up totally which necessiates the destruction of its current volatile data (its memory, via a hard reset) that's a complete and immediate showstopper. If you look at the kind of bugs that annoy users most you'll find 'lockups' really high on the list. The validator found bugs in my _own code_ that i thought to be production quality, and i thought i can write correct locking code. We should really let the computer do this job. Ingo ^ permalink raw reply [flat|nested] 166+ messages in thread
* mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (18 preceding siblings ...) 2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar @ 2006-06-07 3:52 ` Benjamin Herrenschmidt 2006-06-07 4:29 ` Andrew Morton 2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig 20 siblings, 1 reply; 166+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-07 3:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Ingo Molnar > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch > kernel-kernel-cpuc-to-mutexes.patch > > ug. We cannot convert the cpu.c semaphore into a mutex until we work out > why power4 goes titsup if you enable local interrupts during boot. What is the exact problem ? Some mutex is forcing local irqs enabled before init_IRQ() ? (Before the normal enabling of IRQ done by init/main.c just after init_IRQ() more precisely ?) This is bad for any architecture. Basically, at this point, the interrupt controller can be in _any_ state, with possible pending interrupts for whatever sources, etc... As we discussed before, that problem should really be fixed in the mutex code by not hard-enabling. There is an incredible amount of crap that could be cleaned up for example by re-ordering a bit the init code and making things like slab available before init_IRQ/time_init etc... but all of those will break because of that. In addition, even without that re-ordering, I'm pretty sure we are hitting semaphores/mutexes early, before init_IRQ(), already and if not in generic code, in arch code somewhere down the call stacks. I don't think that whole pile of problems lurking around the corner is worth the couple of cycles saved by hard-enabling irq in the mutex instead of doing a save/restore. Ben. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt @ 2006-06-07 4:29 ` Andrew Morton 2006-06-07 5:04 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-07 4:29 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo On Wed, 07 Jun 2006 13:52:58 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch > > kernel-kernel-cpuc-to-mutexes.patch > > > > ug. We cannot convert the cpu.c semaphore into a mutex until we work out > > why power4 goes titsup if you enable local interrupts during boot. > > What is the exact problem ? Some mutex is forcing local irqs enabled > before init_IRQ() ? (Before the normal enabling of IRQ done by > init/main.c just after init_IRQ() more precisely ?) Any code which does mutex_lock() will have interrupts reenabled if the mutex code was compiled in debug mode. > This is bad for any architecture. Basically, at this point, the > interrupt controller can be in _any_ state, with possible pending > interrupts for whatever sources, etc... > > As we discussed before, that problem should really be fixed in the mutex > code by not hard-enabling. > > There is an incredible amount of crap that could be cleaned up for > example by re-ordering a bit the init code and making things like slab > available before init_IRQ/time_init etc... but all of those will break > because of that. > > In addition, even without that re-ordering, I'm pretty sure we are > hitting semaphores/mutexes early, before init_IRQ(), already and if not > in generic code, in arch code somewhere down the call stacks. > > I don't think that whole pile of problems lurking around the corner is > worth the couple of cycles saved by hard-enabling irq in the mutex > instead of doing a save/restore. A couple of cycles repeated a zillion times per second for the entire uptime, just because we cannot get our act together in the first few seconds of booting. How much does that suck? And how much does it suck that we require that an attempt to take a sleeping lock must keep local interrupts disabled if the lock wasn't contended? Fortunately, it only happens (or at least, is only _known_ to happen) when mutex debugging is enabled, so the performance loss is moot. I do not know where the offending mutex_lock()s are occuring (although it would be super-simple to find out). By far the best solution to this would be to remove this requirement that local interrupts remain disabled for impractical amounts of time during boot. Either whack the PIC in setup_arch() or reorganise start_kernel() in some appropriate manner. But I'll be merging work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch so we'll just continue to suck I guess. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 4:29 ` Andrew Morton @ 2006-06-07 5:04 ` Benjamin Herrenschmidt 2006-06-07 5:29 ` Andrew Morton 0 siblings, 1 reply; 166+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-07 5:04 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, mingo, Paul Mackerras > A couple of cycles repeated a zillion times per second for the entire > uptime, just because we cannot get our act together in the first few > seconds of booting. How much does that suc I don't follow you... what would you call "getting our act together" then ? Being able to initialize half of the kernel data structures without ever going near a mutex ? The current stuff is just crap. > And how much does it suck that we require that an attempt to take a > sleeping lock must keep local interrupts disabled if the lock wasn't > contended? That is a more interesting point :) > Fortunately, it only happens (or at least, is only _known_ to happen) when > mutex debugging is enabled, so the performance loss is moot. > > I do not know where the offending mutex_lock()s are occuring (although it > would be super-simple to find out). And even if we fix those, we'll ultimately get more. > By far the best solution to this would be to remove this requirement that > local interrupts remain disabled for impractical amounts of time during boot. That is not possible in any remotely sane way accross the board. > Either whack the PIC in setup_arch() or reorganise start_kernel() in some > appropriate manner. Neither would be satisfactory. Whacking the PIC means accessing hardware, which for a lot of architectures means having page tables up, some kind of ioremap, etc... Hence the bunch of workarounds done by various archs like having their PTE allocation function do horrors like if (mem_init_done) kmalloc() else alloc_bootmem(). It's just too ugly for words. As you said, we need to get our act together. That means having basic kernel services that do _not_ rely on any hardware (interrupts, timer, whatever...) be initialized first before we start needing ioremap's and friends. That means having things like init_IRQ() which has to handle allocating and initializing PICs all over the range and all sorts of data structures that are related to interrupt handling, be able to use said kernel services instead of having dodgy things like do half init now, and another half later from a hook somewhere or an initcall while hopeing that nobody will get in the middle. It would make so much more sense to have the init code do something like: setup_arch(); init_basic_kernel_services(); <--- that's the blob you spotted with mem init, slab init, ... init_arch(); <--- new arch hook and later on, as part of the various inits, you get init_IRQ() and so on... In my example, init_arch() would be where the arch code moves the bits currently in setup_arch() that do things like ioremap system devices and do things that may want to use the slab etc... thus leaving setup_arch() to very basic initialisations. Not being able to do all of those because we have this hyper-optimized-mutex-blah thing that hard enables interrupt all over the place seems like a stupid thing to me. In fact, as you mentioned, it only affects a debug code path which thus could perfectly take the performance hit. > But I'll be merging > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch > so we'll just continue to suck I guess. How so ? Can you tell me how making the mutex debug code path do something sane makes it 'suck' ? Don't argue about the couple of cycles benefit, as you mentionned yourself, it's a debug code path. Ben. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 5:04 ` Benjamin Herrenschmidt @ 2006-06-07 5:29 ` Andrew Morton 2006-06-07 6:44 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 166+ messages in thread From: Andrew Morton @ 2006-06-07 5:29 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo, paulus On Wed, 07 Jun 2006 15:04:07 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > Either whack the PIC in setup_arch() or reorganise start_kernel() in some > > appropriate manner. > > Neither would be satisfactory. Whacking the PIC means accessing > hardware, which for a lot of architectures means having page tables up, > some kind of ioremap, etc... Hence the bunch of workarounds done by > various archs like having their PTE allocation function do horrors like > if (mem_init_done) kmalloc() else alloc_bootmem(). Why on earth does the PIC come up pulling an interrupt when it hasn't been spoken to yet? > It would make so much more sense to have the init code do something > like: > > setup_arch(); > init_basic_kernel_services(); <--- that's the blob you spotted with mem > init, slab init, ... > init_arch(); <--- new arch hook > > and later on, as part of the various inits, you get init_IRQ() and so > on... > > In my example, init_arch() would be where the arch code moves the bits > currently in setup_arch() that do things like ioremap system devices and > do things that may want to use the slab etc... thus leaving setup_arch() > to very basic initialisations. > > Not being able to do all of those because we have this > hyper-optimized-mutex-blah thing that hard enables interrupt all over > the place seems like a stupid thing to me. In fact, as you mentioned, it > only affects a debug code path which thus could perfectly take the > performance hit. Nonsense. mutex_lock() can sleep. Sleeping will enable interrupts. Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED, that's it. But now, because some broken piece of hardware is coming out of reset/firmware asserting an interrupt we need to change the rules to be "mutex_lock() must preserve local interrupts if the lock is uncontended". Ditto down(), down_read() and down_write(). And why does this bizarre restriction upon the implementation of our locking primtives exist? Because of your broken PIC and because of our inability to sort out the early boot code. And because the early boot code has this implicit knowledge that the locks will be uncontended, else we're toast. We're doing mutex_lock(), down(), down_read() and down_write() with local interrupts disabled, which is a bug. We have explicit code in there to *disable* our runtime debugging checks because we know about this bug but don't know how to fix it. I call that sucky. > > But I'll be merging > > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch > > so we'll just continue to suck I guess. > > How so ? Can you tell me how making the mutex debug code path do > something sane makes it 'suck' ? Don't argue about the couple of cycles > benefit, as you mentionned yourself, it's a debug code path. > Would you prefer "wildly idiotic"? ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 5:29 ` Andrew Morton @ 2006-06-07 6:44 ` Benjamin Herrenschmidt 2006-06-07 7:03 ` Andrew Morton 2006-06-07 13:21 ` Ingo Molnar 0 siblings, 2 replies; 166+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-07 6:44 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, mingo, paulus On Tue, 2006-06-06 at 22:29 -0700, Andrew Morton wrote: > Why on earth does the PIC come up pulling an interrupt when it hasn't been > spoken to yet? You leave in an ideal world :) Unfortunately the harsh reality is broken firmwares or bootloaders, PICs that can't mask interrupts, virtual PICs from hypervisors that want to talk to you as soon as you can take it, kexec/kdump-style boots which didn't or couldn't completely shut the PIC up, etc etc etc .... In addition, on PowerPC (and possibly others), there is the decrementer too that never stops ticking and interrupting (it simply can't be stopped). I'm sure other archs, especially embedded, can come up with a gazillion other reasons. > Nonsense. mutex_lock() can sleep. Sleeping will enable interrupts. > Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED, > that's it. Yes, except that we are talking about a well defined path which is the kernel initialization here where it is guaranteed that there will be no contention. I think it's a fairly sane thing to require mutexes and other synchronisation primitives not to hard enable interrupts when there is no contention. > But now, because some broken piece of hardware is coming out of > reset/firmware asserting an interrupt we need to change the rules to be > "mutex_lock() must preserve local interrupts if the lock is uncontended". > Ditto down(), down_read() and down_write(). I'm fairly convinced that "broken piece of hardware" is the general case and that an idle PIC the exception :) Ask Linus why we have kept carfully interrupts disabled until after init_IRQ() in the first place ? > And why does this bizarre restriction upon the implementation of our > locking primtives exist? Because of your broken PIC and because of our > inability to sort out the early boot code. And because the early boot code > has this implicit knowledge that the locks will be uncontended, else we're > toast. Because we live in a world where you can't assume that a PIC will be well behaved. That's not a ppc specific problem, by far. I remember having similar issues on some ARM bits I did ages ago for example. And I'm sure kdump kind of things will bite on x86 as well. > We're doing mutex_lock(), down(), down_read() and down_write() with local > interrupts disabled, which is a bug. We have explicit code in there to > *disable* our runtime debugging checks because we know about this bug but > don't know how to fix it. > > I call that sucky. So what can we do ? There is simply no other option I can see else of having an entire pile of infrastructure and hacks (which is the case today) dedicated to being able to do IOs and smashing the PIC down before we can allocate memory in a remotely sane way... I really need/want to clean that shit up. It all got triggered by the fact that I need a radix tree at init_IRQ() time, but then I went looking for all the cases where we hack around the lack of slab from there and it's really not funny... > > > But I'll be merging > > > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch > > > so we'll just continue to suck I guess. > > > > How so ? Can you tell me how making the mutex debug code path do > > something sane makes it 'suck' ? Don't argue about the couple of cycles > > benefit, as you mentionned yourself, it's a debug code path. > > > > Would you prefer "wildly idiotic"? Heh. I still think not but you seem to have a pretty firm idea about what makes sense and what not... I just happen not to share it ;) Now, let's talk about solutions :) One is to ignore the problem, I can do a band aid for PowerPC, I think I know what the problem is (probably the decrementer, not the PIC) and consider that we must be always safe to have interrupts enabled during the entire boot sequence of the kernel. If you do that, I strongly recommend that you put a local_irq_enable() somewhere in start_kernel(), maybe as early as setup_arch(), in -mm and see what breaks :) I do have a solution in mind though that could work around the problem of a bad behaved PIC or spurrious decrementer interrupts on powerpc, and other architectures could probably do something similar. Basically, the idea is to keep irqs disabled by default at startup (so we still need your test to silence might_sleep() at leat until the main local_irq_enable() is done, not later, so we still get useful warnings during boot). In addition to that, archs need to add something to their actual interrupt entry: if (no_irq_boot) { local_irq_disable(); return; } That will have the effect of "cleaning" up after a mutex/semaphore re-enabling interrupts at least until no_irq_boot is cleared which would be done right before the local_irq_enable() in init/main.c Ben. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 6:44 ` Benjamin Herrenschmidt @ 2006-06-07 7:03 ` Andrew Morton 2006-06-07 13:21 ` Ingo Molnar 1 sibling, 0 replies; 166+ messages in thread From: Andrew Morton @ 2006-06-07 7:03 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo, paulus On Wed, 07 Jun 2006 16:44:31 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > So what can we do ? Well my plan is to keep being sucky, hence work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch. The rule is that sleeping locks need to preserve local IRQs in the non-contended case. So be it, move on to more pressing things. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 6:44 ` Benjamin Herrenschmidt 2006-06-07 7:03 ` Andrew Morton @ 2006-06-07 13:21 ` Ingo Molnar 2006-06-08 0:31 ` Benjamin Herrenschmidt 2006-06-08 22:59 ` Paul Mackerras 1 sibling, 2 replies; 166+ messages in thread From: Ingo Molnar @ 2006-06-07 13:21 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Andrew Morton, linux-kernel, paulus * Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > during boot). In addition to that, archs need to add something to their > actual interrupt entry: > > if (no_irq_boot) { > local_irq_disable(); > return; > } that just moves the suckage from the mutex-debugging slowpath to the irq-handling hotpath. (at which point i still prefer to have that in the mutex-debugging path) a better solution would be to install boot-time IRQ vectors that just do nothing but return. They dont mask, they dont ACK nor EOI - they just return. The only thing that could break this is a screaming interrupt, and even that one probably just slows things down a tiny bit until we get so far in the init sequence to set up the PIC. Ingo ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 13:21 ` Ingo Molnar @ 2006-06-08 0:31 ` Benjamin Herrenschmidt 2006-06-08 10:49 ` David Woodhouse 2006-06-08 11:17 ` Roman Zippel 2006-06-08 22:59 ` Paul Mackerras 1 sibling, 2 replies; 166+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-08 0:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, paulus > a better solution would be to install boot-time IRQ vectors that just do > nothing but return. They dont mask, they dont ACK nor EOI - they just > return. The only thing that could break this is a screaming interrupt, > and even that one probably just slows things down a tiny bit until we > get so far in the init sequence to set up the PIC. Changing vectors on the fly is hard on some platforms.... We could change our toplevel ppc_md.get_irq() on powerpc, but we still to do something about decrementer interrupts. A screaming level interrupt will lockup the machine at least on some platforms. The problem with all those approaches is that they require changes to all archs interrupt handling to make the situation safe vs. mutexes... I still don't think where is the suckage in just not hard-enabling in the mutex debug code... Ben. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-08 0:31 ` Benjamin Herrenschmidt @ 2006-06-08 10:49 ` David Woodhouse 2006-06-08 10:53 ` Ingo Molnar 2006-06-08 11:17 ` Roman Zippel 1 sibling, 1 reply; 166+ messages in thread From: David Woodhouse @ 2006-06-08 10:49 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus On Thu, 2006-06-08 at 10:31 +1000, Benjamin Herrenschmidt wrote: > I still don't think where is the suckage in just not hard-enabling in > the mutex debug code... If the mutex debugging code is hard-enabling interrupts before init_IRQ() ever got called, that's just broken. Fixing that can hardly be called 'suckage'. -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-08 10:49 ` David Woodhouse @ 2006-06-08 10:53 ` Ingo Molnar 2006-06-08 11:01 ` David Woodhouse 0 siblings, 1 reply; 166+ messages in thread From: Ingo Molnar @ 2006-06-08 10:53 UTC (permalink / raw) To: David Woodhouse Cc: Benjamin Herrenschmidt, Andrew Morton, linux-kernel, paulus * David Woodhouse <dwmw2@infradead.org> wrote: > On Thu, 2006-06-08 at 10:31 +1000, Benjamin Herrenschmidt wrote: > > I still don't think where is the suckage in just not hard-enabling in > > the mutex debug code... > > If the mutex debugging code is hard-enabling interrupts before > init_IRQ() ever got called, that's just broken. Fixing that can hardly > be called 'suckage'. to quote Andrew: ---------------> Nonsense. mutex_lock() can sleep. Sleeping will enable interrupts. Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED, that's it. But now, because some broken piece of hardware is coming out of reset/firmware asserting an interrupt we need to change the rules to be "mutex_lock() must preserve local interrupts if the lock is uncontended". Ditto down(), down_read() and down_write(). And why does this bizarre restriction upon the implementation of our locking primtives exist? Because of your broken PIC and because of our inability to sort out the early boot code. And because the early boot code has this implicit knowledge that the locks will be uncontended, else we're toast. We're doing mutex_lock(), down(), down_read() and down_write() with local interrupts disabled, which is a bug. We have explicit code in there to *disable* our runtime debugging checks because we know about this bug but don't know how to fix it. I call that sucky. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-08 10:53 ` Ingo Molnar @ 2006-06-08 11:01 ` David Woodhouse 0 siblings, 0 replies; 166+ messages in thread From: David Woodhouse @ 2006-06-08 11:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: Benjamin Herrenschmidt, Andrew Morton, linux-kernel, paulus On Thu, 2006-06-08 at 12:53 +0200, Ingo Molnar quoted Andrew: > Nonsense. mutex_lock() can sleep. Sleeping will enable interrupts. > Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED, > that's it. > > But now, because some broken piece of hardware is coming out of > reset/firmware asserting an interrupt we need to change the rules to be > "mutex_lock() must preserve local interrupts if the lock is uncontended". > Ditto down(), down_read() and down_write(). > > And why does this bizarre restriction upon the implementation of our > locking primtives exist? Because of your broken PIC and because of our > inability to sort out the early boot code. And because the early boot code > has this implicit knowledge that the locks will be uncontended, else we're > toast. > > We're doing mutex_lock(), down(), down_read() and down_write() with local > interrupts disabled, which is a bug. We have explicit code in there to > *disable* our runtime debugging checks because we know about this bug but > don't know how to fix it. > > I call that sucky. OK, if you put it like that, and you're going to be consistent by declaring the disabling of __might_sleep() warnings to be sucky too, then I suppose we can buy that argument. Yes, we need to sort out the early boot code. It isn't so much that we're unable as that nobody's really tried very hard. People seem scared of it -- they even invent pointless special cases like the 'earlyconsole' crap instead of just registering the damn consoles earlier, for example. Register_console() has _always_ worked right from the beginning of setup_arch(), and I've often put it there. Let's make a concerted effort to reorder the startup code so that we _can_ enable interrupts and have slab working quite early. Ben's plans for this look sane enough to me. -- dwmw2 ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-08 0:31 ` Benjamin Herrenschmidt 2006-06-08 10:49 ` David Woodhouse @ 2006-06-08 11:17 ` Roman Zippel 2006-06-08 13:38 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-08 11:17 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus Hi, On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote: > > a better solution would be to install boot-time IRQ vectors that just do > > nothing but return. They dont mask, they dont ACK nor EOI - they just > > return. The only thing that could break this is a screaming interrupt, > > and even that one probably just slows things down a tiny bit until we > > get so far in the init sequence to set up the PIC. > > Changing vectors on the fly is hard on some platforms.... We could > change our toplevel ppc_md.get_irq() on powerpc, but we still to do > something about decrementer interrupts. On ppc it should not be that difficult to even modify the exception entry code. Instead of calling do_IRQ use do_early_IRQ and only install the real handler later. > A screaming level interrupt will lockup the machine at least on some > platforms. I guess that's even deadly on most platforms. > The problem with all those approaches is that they require changes to > all archs interrupt handling to make the situation safe vs. mutexes... Only those archs that want to delay interrupt initialization and they at least have to provide minimal support to survive enabled interrupts. init_IRQ() stays the same for all other archs and we add another hook to allow the delayed initializtion. > I still don't think where is the suckage in just not hard-enabling in > the mutex debug code... If you want to have full services, then irqs are part of it. :) bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-08 11:17 ` Roman Zippel @ 2006-06-08 13:38 ` Benjamin Herrenschmidt 2006-06-08 14:02 ` Roman Zippel 0 siblings, 1 reply; 166+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-08 13:38 UTC (permalink / raw) To: Roman Zippel; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus On Thu, 2006-06-08 at 13:17 +0200, Roman Zippel wrote: > Hi, > > On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote: > > > > a better solution would be to install boot-time IRQ vectors that just do > > > nothing but return. They dont mask, they dont ACK nor EOI - they just > > > return. The only thing that could break this is a screaming interrupt, > > > and even that one probably just slows things down a tiny bit until we > > > get so far in the init sequence to set up the PIC. > > > > Changing vectors on the fly is hard on some platforms.... We could > > change our toplevel ppc_md.get_irq() on powerpc, but we still to do > > something about decrementer interrupts. > > On ppc it should not be that difficult to even modify the exception entry > code. Instead of calling do_IRQ use do_early_IRQ and only install the real > handler later. Yes, it's possible, but will add overhead to the common IRQ path just to handle an early boot special case. > > A screaming level interrupt will lockup the machine at least on some > > platforms. > > I guess that's even deadly on most platforms. Yup. > > The problem with all those approaches is that they require changes to > > all archs interrupt handling to make the situation safe vs. mutexes... > > Only those archs that want to delay interrupt initialization and they at > least have to provide minimal support to survive enabled interrupts. > init_IRQ() stays the same for all other archs and we add another hook to > allow the delayed initializtion. I'm taking a broader point of view here. More than just interrupt init, it's in general having basic kernel services such as memory allocator, which shouldn't need any special hardware initialization outside of the mmu, be setup before we start banging hardware. > > I still don't think where is the suckage in just not hard-enabling in > > the mutex debug code... > > If you want to have full services, then irqs are part of it. :) No. THere is, imho, a very clear difference between services that do rely on hw devices, ioremap, and such major infrastructure as page tables, and services like slab which essentially, can be initialized with the CPU itself being ready and nothing else. Cheers, Ben. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-08 13:38 ` Benjamin Herrenschmidt @ 2006-06-08 14:02 ` Roman Zippel 2006-06-08 23:40 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 166+ messages in thread From: Roman Zippel @ 2006-06-08 14:02 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus Hi, On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote: > > On ppc it should not be that difficult to even modify the exception entry > > code. Instead of calling do_IRQ use do_early_IRQ and only install the real > > handler later. > > Yes, it's possible, but will add overhead to the common IRQ path just > to handle an early boot special case. What I mean is to directly patch the exception entry code, so after the initialization is complete you'll have no additional overhead. In the EXC_XFER_TEMPLATE() macro the handler is stored at i##n. You can either export that address or you can use a special transfer handler, which automatically patches the values once some flag is set. bye, Roman ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-08 14:02 ` Roman Zippel @ 2006-06-08 23:40 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 166+ messages in thread From: Benjamin Herrenschmidt @ 2006-06-08 23:40 UTC (permalink / raw) To: Roman Zippel; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus On Thu, 2006-06-08 at 16:02 +0200, Roman Zippel wrote: > Hi, > > On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote: > > > > On ppc it should not be that difficult to even modify the exception entry > > > code. Instead of calling do_IRQ use do_early_IRQ and only install the real > > > handler later. > > > > Yes, it's possible, but will add overhead to the common IRQ path just > > to handle an early boot special case. > > What I mean is to directly patch the exception entry code, so after the > initialization is complete you'll have no additional overhead. > In the EXC_XFER_TEMPLATE() macro the handler is stored at i##n. You can > either export that address or you can use a special transfer handler, > which automatically patches the values once some flag is set. That is a possibility. Also totally PPC specific for a problem that will hit every arch once I start moving things around in the init code. I still think that the best way is to fix the mutex code. You remember what you can read on most public toilets about leaving them in the state you found them ? Sounds like a pretty good rule to me here as well. Ben. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans) 2006-06-07 13:21 ` Ingo Molnar 2006-06-08 0:31 ` Benjamin Herrenschmidt @ 2006-06-08 22:59 ` Paul Mackerras 1 sibling, 0 replies; 166+ messages in thread From: Paul Mackerras @ 2006-06-08 22:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: Benjamin Herrenschmidt, Andrew Morton, linux-kernel Ingo Molnar writes: > a better solution would be to install boot-time IRQ vectors that just do > nothing but return. They dont mask, they dont ACK nor EOI - they just > return. How would that help? We'd just end up taking the interrupt over and over again. We have to either poke the PIC to tell it to shut up somehow (which we can't do before ioremap is available) or arrange for interrupts to be disabled after the return (which means that might_sleep() will scream at us). Paul. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton ` (19 preceding siblings ...) 2006-06-07 3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt @ 2006-06-10 10:22 ` Christoph Hellwig 2006-06-14 15:18 ` Michael Halcrow 20 siblings, 1 reply; 166+ messages in thread From: Christoph Hellwig @ 2006-06-10 10:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel > ecryptfs-crypto-functions.patch > ecryptfs-debug-functions.patch > ecryptfs-alpha-build-fix.patch > ecryptfs-convert-assert-to-bug_on.patch > ecryptfs-remove-unnecessary-null-checks.patch > ecryptfs-rewrite-ecryptfs_fsync.patch > ecryptfs-overhaul-file-locking.patch > > Christoph has half-reviewed this and all the issues arising from that > have, I believe, been addressed. With the exception of the "we should > have a generic stacking layer" issue. Which is true. Michael's take is > "yes, but that's not my job". Which also is true. It's far from ready. There's various things that simply can't be done properly in a lowlevel fs or abosulutely shouldn't. And I think a few uniqueue gems in there. Most urgent thing of course is that we somehow need to deal with the idiocy of the nameidata passed into most namespace methods. ^ permalink raw reply [flat|nested] 166+ messages in thread
* Re: 2.6.18 -mm merge plans 2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig @ 2006-06-14 15:18 ` Michael Halcrow 0 siblings, 0 replies; 166+ messages in thread From: Michael Halcrow @ 2006-06-14 15:18 UTC (permalink / raw) To: Christoph Hellwig, Andrew Morton, linux-kernel On Sat, Jun 10, 2006 at 11:22:11AM +0100, Christoph Hellwig wrote: > > ecryptfs-crypto-functions.patch > > ecryptfs-debug-functions.patch > > ecryptfs-alpha-build-fix.patch > > ecryptfs-convert-assert-to-bug_on.patch > > ecryptfs-remove-unnecessary-null-checks.patch > > ecryptfs-rewrite-ecryptfs_fsync.patch > > ecryptfs-overhaul-file-locking.patch > > > > Christoph has half-reviewed this and all the issues arising from > > that have, I believe, been addressed. With the exception of the > > "we should have a generic stacking layer" issue. Which is true. > > Michael's take is "yes, but that's not my job". Which also is > > true. We are looking into how this can be best accomplished, given the requirements of the various stackable filesystems out there. > It's far from ready. There's various things that simply can't be > done properly in a lowlevel fs or abosulutely shouldn't. And I > think a few uniqueue gems in there. We will work on fixes for any such issues brought to our attention. Up to this point, we have provided fixes for all but two of the items Christoph brought up in his initial analysis of eCryptfs. The setlk/getlk code is redundant with the existing VFS implementations, and so we are working on a fix for that. > Most urgent thing of course is that we somehow need to deal with the > idiocy of the nameidata passed into most namespace methods. We would appreciate any suggestions folks in the community could give on this. Until then, is this patch a reasonable approach to address the problem of just replacing the vfsmount and dentry in the existing nameidata struct? Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> --- Don't muck with the existing nameidata structures. --- fs/ecryptfs/dentry.c | 18 +++++++----------- fs/ecryptfs/inode.c | 14 +++++--------- 2 files changed, 12 insertions(+), 20 deletions(-) 5f0a8c57f8b51ba87cc950d1d2bac6873f73a8b3 diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c index 7b1018a..6f19fc4 100644 --- a/fs/ecryptfs/dentry.c +++ b/fs/ecryptfs/dentry.c @@ -41,23 +41,19 @@ #include "ecryptfs_kernel.h" */ static int ecryptfs_d_revalidate(struct dentry *dentry, struct nameidata *nd) { - int err = 1; + int rc = 1; struct dentry *lower_dentry; - struct dentry *saved_dentry; - struct vfsmount *saved_vfsmount; + struct nameidata lower_nd; lower_dentry = ecryptfs_dentry_to_lower(dentry); if (!lower_dentry->d_op || !lower_dentry->d_op->d_revalidate) goto out; - saved_dentry = nd->dentry; - saved_vfsmount = nd->mnt; - nd->dentry = lower_dentry; - nd->mnt = ecryptfs_superblock_to_private(dentry->d_sb)->lower_mnt; - err = lower_dentry->d_op->d_revalidate(lower_dentry, nd); - nd->dentry = saved_dentry; - nd->mnt = saved_vfsmount; + memcpy(&lower_nd, nd, sizeof(struct nameidata)); + lower_nd.dentry = lower_dentry; + lower_nd.mnt = ecryptfs_superblock_to_private(dentry->d_sb)->lower_mnt; + rc = lower_dentry->d_op->d_revalidate(lower_dentry, &lower_nd); out: - return err; + return rc; } struct kmem_cache *ecryptfs_dentry_info_cache; diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 47e4202..342b0fa 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -122,17 +122,13 @@ ecryptfs_create_underlying_file(struct i struct nameidata *nd) { int rc; - struct dentry *saved_dentry = NULL; - struct vfsmount *saved_vfsmount = NULL; + struct nameidata lower_nd; - saved_dentry = nd->dentry; - saved_vfsmount = nd->mnt; - nd->dentry = lower_dentry; - nd->mnt = ecryptfs_superblock_to_private( + memcpy(&lower_nd, nd, sizeof(struct nameidata)); + lower_nd.dentry = lower_dentry; + lower_nd.mnt = ecryptfs_superblock_to_private( ecryptfs_dentry->d_sb)->lower_mnt; - rc = vfs_create(lower_dir_inode, lower_dentry, mode, nd); - nd->dentry = saved_dentry; - nd->mnt = saved_vfsmount; + rc = vfs_create(lower_dir_inode, lower_dentry, mode, &lower_nd); return rc; } -- 1.3.3 ^ permalink raw reply related [flat|nested] 166+ messages in thread
end of thread, other threads:[~2006-06-29 14:47 UTC | newest]
Thread overview: 166+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer
2006-06-04 21:33 ` header cleanup and install David Woodhouse
2006-06-04 21:43 ` Andrew Morton
2006-06-05 10:52 ` Jens Axboe
2006-06-05 10:54 ` David Woodhouse
2006-06-05 10:59 ` Jens Axboe
2006-06-05 10:57 ` David Woodhouse
2006-06-05 11:03 ` Jens Axboe
2006-06-05 18:09 ` Andrew Morton
2006-06-05 19:19 ` David Woodhouse
2006-06-17 20:35 ` Alistair John Strachan
2006-06-17 21:20 ` David Woodhouse
2006-06-04 21:36 ` 2.6.18 -mm merge plans Alan Cox
2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg
2006-06-04 21:54 ` David Woodhouse
2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin
2006-06-05 18:09 ` Roman Zippel
2006-06-06 15:20 ` Pavel Machek
2006-06-06 20:56 ` Rafael J. Wysocki
2006-06-07 3:37 ` H. Peter Anvin
2006-06-07 4:00 ` Nigel Cunningham
2006-06-07 4:10 ` H. Peter Anvin
2006-06-07 4:25 ` Nigel Cunningham
2006-06-07 4:26 ` klibc H. Peter Anvin
2006-06-07 6:22 ` klibc Nigel Cunningham
2006-06-07 6:38 ` klibc H. Peter Anvin
2006-06-07 6:51 ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson
2006-06-07 21:12 ` H. Peter Anvin
2006-06-09 8:03 ` klibc Nix
2006-06-09 18:45 ` klibc H. Peter Anvin
[not found] ` <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com>
[not found] ` <871wty6rl9.fsf@hades.wkstn.nix>
2006-06-09 22:28 ` klibc Joshua Hudson
2006-06-09 22:48 ` klibc H. Peter Anvin
2006-06-09 23:13 ` klibc Joshua Hudson
2006-06-09 23:44 ` klibc H. Peter Anvin
2006-06-16 6:02 ` klibc Joshua Hudson
2006-06-16 19:19 ` klibc H. Peter Anvin
2006-06-07 8:44 ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek
2006-06-07 9:44 ` Rafael J. Wysocki
2006-06-04 23:50 ` clocksource Roman Zippel
2006-06-05 20:20 ` clocksource john stultz
2006-06-05 20:53 ` clocksource john stultz
2006-06-05 21:07 ` clocksource Roman Zippel
2006-06-06 19:42 ` clocksource john stultz
2006-06-07 0:41 ` clocksource Roman Zippel
2006-06-08 8:05 ` clocksource john stultz
2006-06-15 11:40 ` clocksource Roman Zippel
2006-06-16 3:21 ` clocksource john stultz
2006-06-16 3:35 ` clocksource john stultz
2006-06-16 15:33 ` clocksource Roman Zippel
2006-06-16 18:48 ` clocksource john stultz
2006-06-17 19:45 ` clocksource Roman Zippel
2006-06-17 17:04 ` clocksource Andrew Morton
2006-06-05 0:02 ` utsname/hostname Randy.Dunlap
2006-06-05 1:06 ` utsname/hostname Andrew Morton
2006-06-05 3:10 ` utsname/hostname Randy.Dunlap
[not found] ` <20060605002807.GA4919@mail.ustc.edu.cn>
2006-06-05 0:28 ` readahead benchmark Fengguang Wu
2006-06-05 1:02 ` Andrew Morton
2006-06-05 0:32 ` new SCSI drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
[not found] ` <20060605010501.GA4931@mail.ustc.edu.cn>
2006-06-05 1:05 ` statistics infrastructure Fengguang Wu
2006-06-05 16:30 ` Greg KH
2006-06-13 23:47 ` statistics infrastructure (in -mm tree) review Greg KH
2006-06-14 0:18 ` Randy.Dunlap
2006-06-14 16:45 ` Greg KH
2006-06-14 22:48 ` Martin Peschke
2006-06-19 22:12 ` Greg KH
2006-06-20 15:40 ` Martin Peschke
2006-06-20 16:50 ` Randy.Dunlap
2006-06-21 18:51 ` Martin Peschke
2006-06-21 19:38 ` Matthew Frost
2006-06-22 11:43 ` Martin Peschke
2006-06-14 5:04 ` Andi Kleen
2006-06-14 22:49 ` Martin Peschke
2006-06-16 20:40 ` Greg KH
2006-06-16 21:34 ` Martin Peschke
2006-06-17 6:51 ` Andi Kleen
2006-06-17 11:03 ` Martin Peschke
2006-06-17 10:30 ` Martin Peschke
2006-06-05 1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik
2006-06-05 1:15 ` Andrew Morton
2006-06-05 8:33 ` Andreas Mohr
2006-06-05 8:45 ` Arjan van de Ven
2006-06-05 10:26 ` Alan Cox
2006-06-05 10:35 ` Arjan van de Ven
2006-06-05 10:59 ` Alan Cox
2006-06-10 6:58 ` Pavel Machek
2006-06-05 8:54 ` Christoph Hellwig
2006-06-05 12:33 ` Jeff Garzik
2006-06-05 12:48 ` Arjan van de Ven
2006-06-05 12:52 ` Jeff Garzik
2006-06-05 14:02 ` Linux kernel and laws Adrian Bunk
2006-06-05 14:21 ` linux-os (Dick Johnson)
2006-06-06 5:33 ` Evgeniy Polyakov
2006-06-05 13:27 ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
2006-06-05 13:31 ` Christoph Hellwig
2006-06-05 13:42 ` Arjan van de Ven
2006-06-05 16:24 ` Alan Cox
2006-06-29 14:26 ` ACX100 (softmac-based) driver ready to merge, but is it legal? -- " John W. Linville
[not found] ` <20060629144233.GB24463@tuxdriver.com>
2006-06-29 14:47 ` [Acx100-users] Denis Vlasenko, where are you? (mail bounced) Andreas Mohr
2006-06-05 1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
2006-06-05 1:47 ` Andrew Morton
2006-06-05 8:59 ` Christoph Hellwig
2006-06-05 9:10 ` Andrew Morton
2006-06-05 9:16 ` Arjan van de Ven
2006-06-05 11:10 ` Ivan Novick
2006-06-05 11:26 ` Adrian Bunk
2006-06-05 6:58 ` Francois Romieu
2006-06-05 10:32 ` Alan Cox
2006-06-05 10:36 ` Arjan van de Ven
2006-06-06 2:02 ` Chris Wright
2006-06-06 7:01 ` Andi Kleen
2006-06-06 13:04 ` Steven Rostedt
2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse
2006-06-05 14:10 ` Russell King
2006-06-05 15:01 ` Steven Whitehouse
2006-06-07 7:12 ` Steven Whitehouse
2006-06-05 14:08 ` 2.6.18 -mm merge plans Oleg Nesterov
2006-06-05 14:43 ` Serge E. Hallyn
2006-06-08 19:56 ` Eric W. Biederman
2006-06-09 13:02 ` Serge E. Hallyn
2006-06-09 23:25 ` Serge E. Hallyn
2006-06-10 0:39 ` Eric W. Biederman
2006-06-10 1:23 ` Serge E. Hallyn
2006-06-10 7:52 ` Eric W. Biederman
2006-06-10 8:09 ` Eric W. Biederman
2006-06-10 9:53 ` Christoph Hellwig
2006-06-06 0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh
2006-06-06 22:28 ` Shailabh Nagar
2006-06-06 22:40 ` Andrew Morton
2006-06-08 14:27 ` Shailabh Nagar
2006-06-08 17:42 ` Andrew Morton
2006-06-08 18:36 ` Shailabh Nagar
2006-06-08 19:33 ` Balbir Singh
2006-06-06 22:52 ` Jay Lan
2006-06-06 22:55 ` Shailabh Nagar
2006-06-12 12:02 ` Martin Peschke
2006-06-12 13:28 ` Shailabh Nagar
2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt
2006-06-06 13:34 ` Roman Zippel
2006-06-06 13:44 ` Steven Rostedt
2006-06-06 14:42 ` genirq Ingo Molnar
2006-06-06 16:56 ` genirq Daniel Walker
2006-06-07 8:42 ` genirq Ingo Molnar
2006-06-07 3:46 ` genirq Benjamin Herrenschmidt
2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar
2006-06-06 16:02 ` Andrew Morton
2006-06-06 16:35 ` Arjan van de Ven
2006-06-06 20:47 ` lock validator [2.6.18 -mm merge plans] Ingo Molnar
2006-06-07 3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt
2006-06-07 4:29 ` Andrew Morton
2006-06-07 5:04 ` Benjamin Herrenschmidt
2006-06-07 5:29 ` Andrew Morton
2006-06-07 6:44 ` Benjamin Herrenschmidt
2006-06-07 7:03 ` Andrew Morton
2006-06-07 13:21 ` Ingo Molnar
2006-06-08 0:31 ` Benjamin Herrenschmidt
2006-06-08 10:49 ` David Woodhouse
2006-06-08 10:53 ` Ingo Molnar
2006-06-08 11:01 ` David Woodhouse
2006-06-08 11:17 ` Roman Zippel
2006-06-08 13:38 ` Benjamin Herrenschmidt
2006-06-08 14:02 ` Roman Zippel
2006-06-08 23:40 ` Benjamin Herrenschmidt
2006-06-08 22:59 ` Paul Mackerras
2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig
2006-06-14 15:18 ` Michael Halcrow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox