2.6.18 -mm merge plans

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.18 -mm merge plans
@ 2006-06-04 20:50 Andrew Morton
  2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer
                   ` (20 more replies)
  0 siblings, 21 replies; 166+ messages in thread
From: Andrew Morton @ 2006-06-04 20:50 UTC (permalink / raw)
  To: linux-kernel


It's time to take a look at the -mm queue for 2.6.18.

There is an unusually large amount of difficult material here.  If you were
bcc'ed, please take the time to think about what we should do.

I have an Asia trip June 10-17 which will probably be during the 2.6.18
merge window.  It'll take some time to get all this material sorted out in
a decent fashion so I might end up having to ask Linus to delay -rc1 by a
week or so.  We'll see.


When replying to this email pleeeeeeze rewrite the Subject: to something
appropriate so we do not all go mad.  Thanks.




The list:

git-hdrcleanup.patch
git-hdrinstall.patch

 This is Dave Woodhouse's work cleaning up the kernel headers and adding a
 `make headerinstall' target which automates the exporting of kernel
 headers as a userspace-usable package.

 All I can say about this is that it doesn't appear to break anything and
 is ready to merge from that point of view.  It's not an area in which I
 have much interest or knowledge.

 That being said, it's relatively costly to carry such extensive patches
 in -mm for long periods, so I'd ask Linus and the distro people to work
 out what we want to do here promptly, please.

git-klibc.patch

 Similar.  This all appears to work sufficiently well for a 2.6.18 merge. 
 But it's been so long since klibc was a hot topic that I've forgotten who
 wanted it, and what for.

 Can whoever has an interest in this work please pipe up and let's get our
 direction sorted out quickly.

fix-hpet-operation-on-32-bit-nvidia-platforms.patch
fix-hpet-operation-on-32-bit-nvidia-platforms-build-fix.patch
fix-hpet-operation-on-64-bit-nvidia-platforms.patch

 These are bugfixes and are a marginal call for 2.6.17.  But they're
 playing in fragile areas, they're quite new and I fixed a bug in here just
 a couple of hours ago.  So I'll hold these off until 2.6.18-rc1 and will
 tag them for a 2.6.17.x backport.

acpi-update-asus_acpi-driver-registration-fix.patch
acpi-memory-hotplug-cannot-manage-_crs-with-plural-resoureces.patch
catch-notification-of-memory-add-event-of-acpi-via-container-driver-register-start-func-for-memory-device.patch
catch-notification-of-memory-add-event-of-acpi-via-container-driveravoid-redundant-call-add_memory.patch
kevent-add-new-uevent.patch
acpi-dock-driver.patch
acpiphp-use-new-dock-driver.patch
acpiphp-prevent-duplicate-slot-numbers-when-no-_sun.patch
asus_acpi-w3000-support.patch
acpi-atlas-acpi-driver.patch
acpi-atlas-acpi-driver-fix.patch
remove-acpi_os_create_lock-acpi_os_delete_lock.patch
asus_acpi-invert-read-of-wled-proc-file-to-show-correct.patch
2.6-sony_acpi4.patch
acpi-remove-__init-__exit-from-sony-add-remove-methods.patch
sony_apci-resume.patch
git-agpgart.patch
uninorth-agp-warning-fixes.patch
alpha-agp-warning-fix.patch
git-alsa.patch
fix-drivers-mfd-ucb1x00-corec-irq-probing-bug.patch
kauditd_thread-warning-fix.patch
blk_start_queue-must-be-called-with-irq-disabled-add-warning.patch
blktrace_apih-endian-annotations.patch
powernow-k8-crash-workaround.patch
dprintk-adjustments-to-cpufreq-nforce2.patch
dprintk-adjustments-to-cpufreq-speedstep-centrino.patch
cpufreq-dprintk-adjustments.patch
create-sys-hypervisor-when-needed.patch
trivial-videodev2h-patch.patch
scx200_acb-use-pci-i-o-resource-when-appropriate.patch
i2c-pca954x-i2c-mux-driver.patch
i2c-mpc-fix-up-error-handling.patch
opencores-i2c-bus-driver.patch
i2c-pca954x-fix-initial-access-to-first-mux-switch-port.patch
ieee1394-video1394-be-quiet.patch
ieee1394-ohci1394c-function-calls-without.patch
ieee1394-sbp2-make-tsb42aa9-workaround-specific.patch
ieee1394-semaphore-to-mutex-conversion.patch
ieee1394-raw1394-fix-whitespace-after-x86_64.patch
ieee1394-ieee1394-ohci1394-cycletoolong.patch
ieee1394-ieee1394-support-for-slow-links-or-slow.patch
ieee1394-ieee1394-save-ram-by-using-a-single.patch
ieee1394-sbp2-remove-manipulation-of-inquiry.patch
ieee1394-sbp2-log-number-of-supported-concurrent.patch
ieee1394-ieee1394-extend-lowlevel-api-for.patch
ieee1394-ohci1394-set-address-range-properties.patch
ieee1394-ohci1394-make-phys_dma-parameter.patch
ieee1394-sbp2-sbp2-remove-ohci1394-specific.patch
ieee1394-sbp2-fix-s800-transfers-if-phys_dma-is.patch
ieee1394-update-feature-removal-of-obsolete.patch
ieee1394-sbp2-provide-helptext-for.patch
ieee1394-sbp2-kconfig-fix.patch
ieee1394-sbp2-use-__attribute__packed-for.patch
ieee1394-speed-up-of-dma_region_sync_for_cpu.patch
ieee1394-sbp2-fix-deregistration-of-status-fifo-address-space.patch
ieee1394-add-preprocessor-constant-for-invalid-csr.patch
fix-broken-suspend-resume-in-ohci1394-was-acpi-suspend.patch
ieee1394_core-switch-to-kthread-api.patch
eth1394-endian-fixes.patch
input-keyboard_tasklet-dont-touch-leds-of-already-grabed-device.patch
remove-silly-messages-from-input-layer.patch
via-pmu-add-input-device.patch
input-powermac-cleanup-of-mac_hid-and-support-for-ctrlclick-and-commandclick.patch
mm-constify-drivers-char-keyboardc.patch
input-move-fixp-arithh-to-drivers-input.patch
input-fix-accuracy-of-fixp-arithh.patch
input-new-force-feedback-interface.patch
input-adapt-hid-force-feedback-drivers-for-the-new-interface.patch
input-adapt-uinput-for-the-new-force-feedback-interface.patch
input-adapt-iforce-driver-for-the-new-force-feedback-interface.patch
input-force-feedback-driver-for-pid-devices.patch
input-force-feedback-driver-for-zeroplus-devices.patch
input-update-documentation-of-force-feedback.patch
input-drop-the-remains-of-the-old-ff-interface.patch
input-drop-the-old-pid-driver.patch
input-use-enospc-instead-of-enomem-in-iforce-when-device-full.patch
add-dependency-on-kernelrelease-to-the-package-targets.patch
kconfig-improve-config-load-save-output.patch
kconfig-fix-config-dependencies.patch
kconfig-remove-symbol_yesmodno.patch
kconfig-allow-multiple-default-values-per-symbol.patch
kconfig-allow-loading-multiple-configurations.patch
kconfig-integrate-split-config-into-silentoldconfig.patch
kconfig-integrate-split-config-into-silentoldconfig-fix.patch
kconfig-move-kernelrelease.patch
kconfig-add-symbol-option-config-syntax.patch
kconfig-add-defconfig_list-module-option.patch
kconfig-add-search-option-for-xconfig.patch
kconfig-finer-customization-via-popup-menus.patch
kconfig-create-links-in-info-window.patch
kconfig-jump-to-linked-menu-prompt.patch
kconfig-warn-about-leading-whitespace-for-menu-prompts.patch
kconfig-remove-leading-whitespace-in-menu-prompts.patch
config-exit-if-no-beginning-filename.patch
make-kernelrelease-speedup.patch
kconfig-kconfig_overwriteconfig.patch
sane-menuconfig-colours.patch
kbuild-export-type-enhancement-to-modpostc.patch
kbuild-export-type-enhancement-to-modpostc-fix.patch
kbuild-prevent-building-modules-that-wont-load.patch
kbuild-export-symbol-usage-report-generator.patch
kbuild-obj-dirs-is-calculated-incorrectly-if-hostprogs-y-is-defined.patch
fix-make-rpm-for-powerpc.patch
revert-sata_sil24-sii3124-sata-driver-endian-problem.patch
libata-add-missing-data_xfer-for-pata_pdc2027x-and-pdc_adma.patch
libata-add-missing-data_xfer-for-pata_pdc2027x-and-pdc_adma-fix.patch
libata-reduce-timeouts.patch
libata-debug.patch
2.6.17-rc4-mm1-ich8-fix.patch
for_each_possible_cpu-mips.patch
sdhci-truncated-pointer-fix.patch
prevent-au1xmmcc-breakage-on-non-au1200-alchemy.patch
myri10ge-alpha-build-fix.patch
smc911x-Kconfig-fix.patch
tulip-natsemi-dp83840a-phy-fix.patch
natsemi-add-support-for-using-mii-port-with-no-phy.patch
pci-error-recovery-e1000-network-device-driver.patch
pci-error-recovery-e100-network-device-driver.patch
e1000-prevent-statistics-from-getting-garbled-during-reset.patch
e100-disable-interrupts-at-boot.patch
drivers-char-hw_randomc-remove-asserts.patch
forcedeth-config-ring-sizes.patch
forcedeth-config-flow-control.patch
forcedeth-config-phy.patch
forcedeth-config-wol.patch
forcedeth-config-csum.patch
forcedeth-config-statistics.patch
forcedeth-config-diagnostics.patch
forcedeth-config-module-parameters.patch
forcedeth-config-version.patch
forcedeth-new-device-ids.patch
forcedeth-typecast-cleanup.patch
add-a-pci-vendor-id-definition-for-aculab.patch
natsemi-add-quirks-for-aculab-e1-t1-pmxc-cpci-carrier-cards.patch
tulip-fix-for-64-bit-mips.patch
drivers-net-ns83820c-add-paramter-to-disable-auto.patch
fix-phy-id-for-lxt971a-lxt972a.patch
clean-up-initcall-warning-for-netconsole.patch
remove-dead-entry-in-net-wan-kconfig.patch
eliminate-unused-proc-sys-net-ethernet.patch
ppp_async-hang-fix.patch
selinux-add-security-class-for-appletalk-sockets.patch
neighbourc-pneigh_get_next-skips-published-entry.patch
secmark-add-new-flask-definitions-to-selinux.patch
secmark-add-selinux-exports.patch
secmark-add-secmark-support-to-core-networking.patch
secmark-add-xtables-secmark-target.patch
secmark-add-secmark-support-to-conntrack.patch
secmark-add-connsecmark-xtables-target.patch
secmark-add-new-packet-controls-to-selinux.patch
irda-missing-allocation-result-check-in-irlap_change_speed.patch
pppoe-missing-result-check-in-__pppoe_xmit.patch
lock-validator-netlinkc-netlink_table_grab-fix.patch
recent-match-fix-sleeping-function-called-from-invalid-context.patch
recent-match-missing-refcnt-initialization.patch
client-side-nfsacl-caching-fix.patch
nfs-really-return-status-from-decode_recall_args.patch
powerpc-kbuild-warning-fix.patch
serial-fix-uart_bug_txen-test.patch
revert-gregkh-pci-pci-test-that-drivers-properly-call-pci_set_master.patch
gregkh-pci-kconfigurable-resources-arch-dependent-changes-arm-fix.patch
gregkh-pci-pci-64-bit-resources-core-changes-mips-fix.patch
fix-pciehp-driver-on-non-acpi-systems.patch
gregkh-pci-acpiphp-configure-_prt-v3-cleanup.patch
kconfigurable-resources-mtd-fixes.patch
drivers-scsi-fix-proc_scsi_write-to-return-length-on.patch
drivers-scsi-sdc-fix-uninitialized-variable-in-handling-medium-errors.patch
drivers-scsi-aic7xxx-possible-cleanups.patch
drivers-scsi-small-cleanups.patch
drivers-scsi-megaraidc-add-a-dummy-mega_create_proc_entry-for-proc_fs=y.patch
drivers-scsi-qla2xxx-make-some-functions-static.patch
drivers-scsi-aic7xxx-aic79xx_corec-make-ahd_done_with_status-static.patch
small-whitespace-cleanup-for-qlogic-driver.patch
remove-drivers-scsi-constantscscsi_print_req_sense.patch
drivers-scsi-aic7xxx-aic79xx_corec-make-ahd_match_scb-static.patch
aic7xxx-deinline-large-functions-save-80k-of-text.patch
aic7xxx-s-__inline-inline.patch
drivers-scsi-aic7xxx-possible-cleanups-2.patch
scsi-remove-documentation-scsi-cpqfctxt.patch
mpt-fusion-driver-initialization-failure-fix.patch
drivers-scsi-use-array_size-macro.patch
lpfc-sparse-null-warnings.patch
mpt_interrupt-should-return-irq_none-when.patch
aic7-cleanup-module_parm_desc-strings.patch
random-remove-redundant-sa_sample_random-from-ninjascsi.patch
megaraid-gcc-41-warning-fix.patch
buslogic-gcc-41-warning-fixes.patch
add-scsi_add_host-failure-handling-for-nsp32.patch
qla1280-fix-section-mismatch-warnings.patch
bogus-disk-geometry-on-large-disks.patch
megaraid_sas-switch-fw_outstanding-to-an-atomic_t.patch
megaraid_sas-add-support-for-zcr-controller.patch
megaraid_sas-add-support-for-zcr-controller-fix.patch
gdth-add-execute-firmware-command-abstraction.patch
drivers-scsi-gdthc-make-__gdth_execute-static.patch
areca-raid-linux-scsi-driver.patch
scsi-clean-up-warnings-in-advansys-driver.patch
git-scsi-target-warning-fix.patch
touchkit-ps-2-touchscreen-driver.patch
fix-sco-on-some-bluetooth-adapters-2.patch
fall-back-to-old-style-call-trace-if-no-unwinding.patch
allow-unwinder-to-build-without-module-support.patch
x86_64-mm-moving-phys_proc_id-and-cpu_core_id-to-cpuinfo_x86-warning-fix.patch
add-abilty-to-enable-disable-nmi-watchdog-from-procfs.patch
x86_64-unexport-ia32_sys_call_table.patch
x86_64-msi-apic-build-fix.patch
x86_64-dont-warn-for-overflow-in-nommu-case-when-dma_mask-is-32bit-fix.patch
lock-validator-lockdep-small-xfs-init_rwsem-cleanup.patch

  That's over 200 patches which need to be handled by subsystem
  maintainers.  I continue to have some difficulty getting this material
  processed.

  I'll try to make Thursdays be my unload-stuff-on-maintainers day. 
  Hopefully the boredom of seeing the same patches over and over will
  motivate some merging, nacking and fixing.

  I'm going to start sending the Areca driver to James, too.  The vendor
  has worked hard and the hardware is becoming more important - let's help
  them get it in.

  I'll henceforth include the highpoint rocketraid controller driver
  (hptiop-highpoint-rocketraid-3xxx-controller-driver.patch) as well.

s390_hypfs-filesystem.patch

 Will merge

mm-vm_bug_on.patch
mm-thrash-detect-process-thrashing-against-itself.patch
zone-init-check-and-report-unaligned-zone-boundaries.patch
x86-align-highmem-zone-boundaries-with-numa.patch
zone-allow-unaligned-zone-boundaries.patch
zone-allow-unaligned-zone-boundaries-x86-add-zone-alignment-qualifier.patch
page-migration-make-do_swap_page-redo-the-fault.patch
slab-extract-cache_free_alien-from-__cache_free.patch
pg_uncached-is-ia64-only.patch
slab-page-mapping-cleanup.patch
migration-remove-unnecessary-pageswapcache-checks.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-change-name-of-wait_table_size.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-change-to-meminit-for-build_zonelist.patch
wait_table-and-zonelist-initializing-for-memory-hotaddadd-return-code-for-init_current_empty_zone.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-wait_table-initialization.patch
wait_table-and-zonelist-initializing-for-memory-hotadd-update-zonelists.patch
squash-duplicate-page_to_pfn-and-pfn_to_page.patch
support-for-panic-at-oom.patch
mm-fix-typos-in-comments-in-mm-oom_killc.patch
reserve-space-for-swap-label.patch
tightening-hugetlb-strict-accounting.patch
slab-cleanup-kmem_getpages.patch
slab-stop-using-list_for_each.patch
swsusp-rework-memory-shrinker-rev-2.patch
unify-pxm_to_node-and-node_to_pxm.patch
pgdat-allocation-for-new-node-add-specify-node-id.patch
pgdat-allocation-for-new-node-add-get-node-id-by-acpi.patch
pgdat-allocation-for-new-node-add-generic-alloc-node_data.patch
pgdat-allocation-for-new-node-add-refresh-node_data.patch
pgdat-allocation-for-new-node-add-export-kswapd-start-func.patch
pgdat-allocation-for-new-node-add-call-pgdat-allocation.patch
register-hot-added-memory-to-iomem-resource.patch
catch-valid-mem-range-at-onlining-memory.patch
fix-compile-error-undefined-reference-for-sparc64.patch
register-sysfs-file-for-hotpluged-new-node.patch
pgdat-allocation-and-update-for-ia64-of-memory-hotplughold-pgdat-address-at-system-running.patch
pgdat-allocation-and-update-for-ia64-of-memory-hotplug-update-pgdat-address-array.patch
pgdat-allocation-and-update-for-ia64-of-memory-hotplugallocate-pgdat-and-per-node-data.patch
mm-introduce-remap_vmalloc_range.patch
change-gen_pool-allocator-to-not-touch-managed-memory.patch
radix-tree-direct-data.patch
radix-tree-small.patch
likely-cleanup-remove-unlikely-in-sys_mprotect.patch
slab-redzone-double-free-detection.patch
buglet-in-radix_tree_tag_set.patch
writeback-fix-range-handling.patch
page-migration-cleanup-rename-ignrefs-to-migration.patch
page-migration-cleanup-group-functions.patch
page-migration-cleanup-remove-useless-definitions.patch
page-migration-cleanup-drop-nr_refs-in-remove_references.patch
page-migration-cleanup-extract-try_to_unmap-from-migration-functions.patch
page-migration-cleanup-pass-mapping-to-migration-functions.patch
page-migration-cleanup-move-fallback-handling-into-special-function.patch
swapless-pm-add-r-w-migration-entries.patch
swapless-pm-add-r-w-migration-entries-fix-2.patch
swapless-page-migration-rip-out-swap-based-logic.patch
swapless-page-migration-modify-core-logic.patch
more-page-migration-do-not-inc-dec-rss-counters.patch
more-page-migration-use-migration-entries-for-file-pages.patch
page-migration-update-documentation.patch
aop_truncated_page-victims-in-read_pages-belong-in-the-lru.patch
flatmem-relax-requirement-for-memory-to-start-at-pfn-0.patch
slab-verify-pointers-before-free.patch
sparsemem-record-nid-during-memory-present.patch
mm-cleanup-swap-unused-warning.patch
node-hotplug-register-cpu-remove-node-struct.patch
node-hotplug-register-cpu-remove-node-struct-alpha-fix.patch
add-page_mkwrite-vm_operations-method.patch
mm-remove-vm_locked-before-remap_pfn_range-and-drop-vm_shm.patch
swapoff-atomic_inc_not_zero-on-mm_users.patch
remove-unused-o_flags-from-do_shmat.patch
fix-update_mmu_cache-in-fremapc.patch
fix-update_mmu_cache-in-fremapc-fix.patch
mm-slabc-fix-early-init-assumption.patch

  Memory management.  Will merge.

page-migration-simplify-migrate_pages.patch
page-migration-simplify-migrate_pages-tweaks.patch
page-migration-handle-freeing-of-pages-in-migrate_pages.patch
page-migration-use-allocator-function-for-migrate_pages.patch
page-migration-support-moving-of-individual-pages.patch
page-migration-detailed-status-for-moving-of-individual-pages.patch
page-migration-support-moving-of-individual-pages-fixes.patch
page-migration-support-moving-of-individual-pages-x86_64-support.patch
page-migration-support-moving-of-individual-pages-x86-support.patch
page-migration-support-moving-of-individual-pages-x86-support-fix.patch
page-migration-support-a-vma-migration-function.patch
allow-migration-of-mlocked-pages.patch

  Post-2.6.18.

acx1xx-wireless-driver.patch
fix-tiacx-on-alpha.patch
tiacx-fix-attribute-packed-warnings.patch
tiacx-pci-build-fix.patch
tiacx-ia64-fix.patch

  It is about time we did something with this large and presumably useful
  wireless driver.

lsm-add-task_setioprio-hook.patch
selinux-add-hooks-for-key-subsystem.patch
au1550-1200-add-missing-psc-defines-make-oss-driver-use.patch

  Will merge.

x86-cache-pollution-aware-__copy_from_user_ll.patch
x86-cpu_init-avoid-gfp_kernel-allocation-while-atomic.patch
arch-i386-kernel-apicc-make-modern_apic-static.patch
i386-apmc-optimization.patch
x86-dont-trigger-full-rebuild-via-config_mtrr.patch
fix-x86-microcode-driver-handling-of-multiple-matching.patch
i386-break-out-of-recursion-in-stackframe-walk.patch
dont-trigger-full-rebuild-via-config_x86_mce.patch
x86-increase-interrupt-vector-range.patch
x86-call-eisa_set_level_irq-in-pcibios_lookup_irq.patch
x86-kernel-irq-balancer-fix.patch
x86-kernel-irq-balancer-fix-tidy.patch
i386-let-usermode-execute-the-enter.patch
fix-broken-vm86-interrupt-signal-handling.patch
x86-re-enable-generic-numa.patch
x86-make-using_apic_timer-__read_mostly.patch
x86-cyrix-code-config_pci-fix--add-__initdata.patch
x86-constify-some-parts-of-arch-i386-kernel-cpu.patch
x86-make-i387-mxcsr_feature_mask-__read_mostly.patch
x86-make-acpi-errata-__read_mostly.patch
x86-constify-arch-i386-pci-irqc.patch
x86-use-proper-defines-for-i8259a-i-o.patch
i386-moving-phys_proc_id-and-cpu_core_id-to-cpuinfo_x86.patch
i386-moving-phys_proc_id-and-cpu_core_id-to-cpuinfo_x86-warning-fix.patch
i386-fix-get_segment_eip-with-vm86.patch
i386-dont-try-kprobes-for-v8086-mode.patch

 x86 queue.  Will mostly merge.  I have a note here that Zach Amsden had
 issues with x86-cpu_init-avoid-gfp_kernel-allocation-while-atomic.patch?

 x86-cache-pollution-aware-__copy_from_user_ll.patch has been in -mm for a
 very long time - it's never been clear that it's a net gain.  Will
 merge-and-see-what-happens I guess.

support-physical-cpu-hotplug-for-x86_64.patch

 I think this got nacked.  Will resend, see what happens.

vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-tidy.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-arch_vma_name-fix.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-vs-x86_64-mm-reliable-stack-trace-support-i386.patch
vdso-randomize-the-i386-vdso-by-moving-it-into-a-vma-vs-x86_64-mm-reliable-stack-trace-support-i386-2.patch

 Will merge.

powerpc-vdso-updates.patch

 Will send to Paul.

remove-duplicate-symbol-exports-on-alpha.patch
alpha-generic-hweight-build-fix.patch

 Will merge.

remove-empty-node-at-boot-time.patch

 Will send to Tony when the prerequisites are merged.

swsusp-add-architecture-special-saveable-pages-support.patch
swsusp-i386-mark-special-saveable-unsaveable-pages.patch
swsusp-x86_64-mark-special-saveable-unsaveable-pages.patch
swsusp-take-lowmem-reserves-into-account.patch
kernel-power-snapshotc-cleanups.patch
swsusp-use-less-memory-during-resume.patch
dont-use-flush_tlb_all-in-suspend-time.patch
swsusp-documentation-updates.patch

 Will merge.

m68k-completely-initialize-hw_regs_t-in-ide_setup_ports.patch
m68k-atyfb_base-compile-fix-for-config_pci=n.patch
m68k-cleanup-unistdh.patch
m68k-remove-some-unused-definitions-in-zorroh.patch
m68k-use-c99-initializer.patch
m68k-print-correct-stack-trace.patch
m68k-restore-amikbd-compatibility-with-24.patch
m68k-extra-delay.patch
m68k-use-proper-defines-for-zone-initialization.patch
m68k-adjust-to-changed-hardirq_mask.patch
m68k-m68k-mac-via2-fixes-and-cleanups.patch

 Will merge.

uml-make-copy__user-atomic.patch
uml-fix-not_dead_yet-when-directory-is-in-bad-state.patch
uml-rename-and-improve-actually_do_remove.patch

 These are marked "mm only".  I'm not sure if that's permanent?

xtensa-remove-verify_area-macros.patch
xtensa-remove-verify_area-macros-fix.patch

 Will merge.

remove-fs-jffs2-ioctlc.patch

 Will re-re-re-spam maintainer.

work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
kernel-kernel-cpuc-to-mutexes.patch

 ug.  We cannot convert the cpu.c semaphore into a mutex until we work out
 why power4 goes titsup if you enable local interrupts during boot.

fix-a-race-condition-between-i_mapping-and-iput.patch
insert-identical-resources-above-existing-resources.patch
make-sure-nobodys-leaking-resources.patch
remove-steal_locks.patch
avoid-tasklist_lock-at-getrusage-for-multithreaded-case-too.patch
add-prctl-to-change-endian-of-a-task.patch
#writeback-fix-range-handling.patch
fix-dcache-race-during-umount.patch
prune_one_dentry-tweaks.patch
vgacon-make-vga_map_mem-take-size-remove-extra-use.patch
zlib_inflate-upgrade-library-code-to-a-recent-version.patch
zlib_inflate-upgrade-library-code-to-a-recent-version-fix.patch
initramfs-cpio-unpacking-fix.patch
fix-cdrom-being-confused-on-using-kdump.patch
read_mapping_page-for-address-space.patch
locks-dont-unnecessarily-fail-posix-lock-operations.patch
locks-dont-do-unnecessary-allocations.patch
locks-clean-up-locks_remove_posix.patch
vfs-add-lock-owner-argument-to-flush-operation.patch
fs-locksc-make-posix_locks_deadlock-static.patch
moduleh-updated-comments-with-a-new.patch
remove-config_parport_arc-drivers-parport-parport_arcc.patch
add-poisonh-and-patch-primary-users.patch
update-2-drivers-for-poisonh.patch
mmput-might-sleep.patch
fs-fat-miscc-unexport-fat_sync_bhs.patch
poll-cleanups-microoptimizations.patch
ptrace-document-the-locking-rules.patch
cleanup-default-value-of-sched_smt.patch
cleanup-default-value-of-syscall_debug.patch
cleanup-default-value-of-usb_isp116x_hcd-usb_sl811_hcd-and-usb_sl811_cs.patch
cleanup-default-value-of-ip_dccp_ackvec.patch
cleanup-default-value-of-dvb_cinergyt2_enable_rc_input_device.patch
dup-fd-error.patch
rtc-framework-driver-for-ds1307-and-similar-rtc-chips.patch
cond-resched-might-sleep-fix.patch
enhancing-accessibility-of-lxdialog.patch
the-scheduled-unexport-of-insert_resource.patch
jbd-fix-bug-in-journal_commit_transaction.patch
jbd-fix-bug-in-journal_commit_transaction-fix.patch
rename-swapper-to-idle.patch
oss-cs46xx-cleanup-and-tiny-bugfix.patch
i4l-memory-leak-fix-for-sc_ioctl.patch
isdn-unsafe-interaction-between-isdn_write-and-isdn_writebuf_stub.patch
isdn-unsafe-interaction-between-isdn_write-and-isdn_writebuf_stub-fix.patch
invert-irq-migrationc-brach-prediction.patch
x86-powerpc-make-hardirq_ctx-and-softirq_ctx-__read_mostly.patch
jbd-avoid-kfree-null.patch
ext3_clear_inode-avoid-kfree-null.patch
make-noirqdebug-irqfixup-__read_mostly-add-unlikely.patch
leds-amstrad-delta-led-support.patch
leds-amstrad-delta-led-support-tidy.patch
update-devicestxt.patch
binfmt_elf-codingstyle-cleanup-and-remove-some-pointless-casts.patch
binfnt_elf-remove-more-casts.patch
fix-incorrect-sa_onstack-behaviour-for-64-bit-processes.patch
percpu-counters-add-percpu_counter_exceeds.patch
percpu-counter-data-type-changes-to-suppport.patch
remove-unlikely-in-might_sleep_if.patch
process-events-header-cleanup.patch
process-events-license-change.patch
strstrip-api.patch
ipmi-strstrip-conversion.patch
connector-exports.patch
config_net=n-build-fix.patch
remove-softlockup-from-invalidate_mapping_pages.patch
add-doc-submitchecklist.patch
kernel-sysc-doesnt-need-inith.patch
make-rcu-api-inaccessible-to-non-gpl-linux-kernel-modules.patch
doc-add-audit-acct-to-docbook.patch
ip2-fix-sections.patch
sgi-ioc4-detect-io-card-variant.patch
two-additions-to-linux-documentation-ioctl-numbertxt.patch
list-introduce-list_replace-helper.patch
list-use-list_replace_init-instead-of-list_splice_init.patch
when-config_base_samll=1-the-kernel-261611-cascade-in-kernel-timerc-may-enter-the-infinite-loop.patch
when-config_base_samll=1-the-kernel-261611-cascade-in-kernel-timerc-may-enter-the-infinite-loop-use-list_replace_init.patch
codingstyle-add-typedefs-chapter.patch
fs-bufferc-possible-cleanups.patch
rtc-rtc-dev-uie-emulation.patch
drivers-md-raid6algosc-fix-a-null-dereference.patch
adjust-handle_irr_event-return-type.patch
sparse-fixes-for-synclink_cs.patch
jbd-split-checkpoint-lists.patch
add-__iowrite64_copy.patch
mark-address_space_operations-const.patch
more-bug_on-conversion.patch
make-kernel-ignore-bogus-partitions.patch
drivers-block-loopc-dont-return-garbage-if-loop_set_status-not-called.patch
docs-update-sparsetxt-with-check_endian.patch
drivers-acorn-char-pcf8583-vs-rtc-subsystem.patch
rewritten-backlight-infrastructure-for-portable-apple-computers.patch
rewritten-backlight-infrastructure-for-portable-apple-computers-fix.patch
ensure-null-deref-cant-possibly-happen-in-is_exported.patch
bluetooth-fix-potential-null-ptr-deref-in-dtl1_cscdtl1_hci_send_frame.patch
bloat-o-meter-gcc-4-fix.patch
random-remove-sa_sample_random-from-floppy-driver.patch
random-make-cciss-use-add_disk_randomness.patch
random-change-cpqarray-to-use-add_disk_randomness.patch
random-remove-bogus-sa_sample_random-from-at91-compact-flash-driver.patch
random-remove-redundant-sa_sample_random-from-touchscreen-drivers.patch
define-__raw_get_cpu_var-and-use-it.patch
allow-for-per-cpu-data-being-in-tdata-and-tbss-sections.patch
allow-for-per-cpu-data-being-in-tdata-and-tbss-sections-fix.patch
allow-for-per-cpu-data-being-in-tdata-and-tbss-sections-tidy.patch
deprecate-smbfs-in-favour-of-cifs.patch
allow-raw_notifier-callouts-to-unregister-themselves.patch
hptiop-highpoint-rocketraid-3xxx-controller-driver.patch
fix-kbuild-dependencies-for-synclink-drivers.patch
fs-freevxfs-cleanup-of-spelling-errors.patch
pnp-card_probe-fix-memory-leak.patch
ufs-ufs_trunc_indirect-infinite-cycle.patch
ufs-right-block-allocation.patch
ufs-change-block-number-on-the-fly.patch
ufs-directory-and-page-cache-install-aops.patch
ufs-directory-and-page-cache-from-blocks-to-pages.patch
ufs-wrong-type-cast.patch
ufs-not-usual-amounts-of-fragments-per-block.patch
ufs-unmark-config_ufs_fs_write-as-broken-mm-tree.patch
ufs-easy-debug.patch
ufs-little-directory-lookup-optimization.patch
ufs-i_blocks-wrong-count.patch
ufs-unlock_super-without-lock.patch
ufs-zero-metadata.patch
ufs-printk-warning-fixes.patch
oprofile-fix-unnecessary-cleverness.patch
msnd-section-fix.patch
oprofile-convert-from-semaphores-to-mutexes.patch
drivers-char-applicomc-proper-module_initexit.patch
remove-dead-entry-in-net-wan-makefile.patch
openpromfs-fix-missing-nul.patch
openpromfs-remove-unnecessary-casts.patch
openpromfs-factorize-out.patch
openpromfs-factorize-out-tidy.patch
idetape-gcc-41-warning-fix.patch
add-driver-for-arm-amba-pl031-rtc.patch
rtc-subsystem-fix-capability-checks-in-kernel-interface.patch
rtc-subsystem-add-capability-checks.patch
add-export_unused_symbol-and-export_unused_symbol_gpl.patch
add-export_unused_symbol-and-export_unused_symbol_gpl-default.patch
make-printk-work-for-really-early-debugging.patch
kernel-sysc-cleanups.patch
kernel-sysc-cleanups-fix.patch
nbd-kill-obsolete-changelog-add-gpl.patch
fix-listh-kernel-doc.patch
listh-doc-change-counter-to-control.patch
fix-magic-sysrq-on-strange-keyboards.patch
ide-cd-end-of-media-error-fix.patch
add-a-sysfs-file-to-determine-if-a-kexec-kernel-is-loaded.patch
cpqarray-section-fix.patch
pdflush-handle-resume-wakeups.patch
edd-isnt-experimental-anymore.patch
kernel-doc-drop-leading-space-in-sections.patch
kernel-doc-script-cleanups.patch
schedule_on_each_cpu-reduce-kmalloc-size.patch
avoid-disk-sector_t-overflow-for-2tb-ext3-filesystem.patch
cleanup-dead-code-from-ext2-mount-code.patch
fix-memory-leak-when-the-ext3s-journal-file-is-corrupted.patch
remove-inconsistent-space-before-exclamation-point-in-ext3s-mount-code.patch
moxa-remove-pointless-casts.patch
moxa-remove-pointless-check-of-tty-argument-vs-null.patch
moxa-partial-codingstyle-cleanup-spelling-fixes.patch
updated-kdump-documentation.patch
cpuset-remove-extra-cpuset_zone_allowed-check-in-__alloc_pages.patch
spin-rwlock-init-cleanups.patch
make-debug_mutex_on-__read_mostly.patch
constify-parts-of-kernel-power.patch
constify-libcrc32c-table.patch
apple-motion-sensor-driver.patch
prepare-for-__copy_from_user_inatomic-to-not-zero-missed-bytes.patch
make-copy_from_user_inatomic-not-zero-the-tail-on-i386.patch
remove-unecessary-null-check-in-kernel-acctc.patch
ax88796-parallel-port-driver.patch
ax88796-parallel-port-driver-build-fix.patch
wd7000-fix-section-mismatch-warnings.patch
megaraid_mbox-fix-section-mismatch-warnings.patch
keys-fix-race-between-two-instantiators-of-a-key.patch
keys-fix-race-between-two-instantiators-of-a-key-tidy.patch
ext3_fsblk_t-filesystem-group-blocks-and-bug-fixes.patch
ext3_fsblk_t-the-rest-of-in-kernel-filesystem-blocks.patch
list_del-debug.patch
inotify-split-kernel-api-from-userspace-support.patch
inotify-add-names-inode-to-event-handler.patch
inotify-add-interfaces-to-kernel-api.patch
inotify-allow-watch-removal-from-event-handler.patch
inotify-update-kernel-documentation.patch
kernel-doc-mm-readhead-fixup.patch
make-procfs-obligatory-except-under-config_embedded.patch
lock-validator-introduce-warn_on_oncecond.patch
lock-validator-introduce-warn_on_oncecond-speedup.patch
make-sysctl-obligatory-except-under-config_embedded.patch
for_each_cpu_mask-warning-fix.patch
emu10k1-mark-midi_spinlock-as-used.patch
add-max6902-rtc-support.patch
add-max6902-rtc-support-update.patch
add-max6902-rtc-support-tidy.patch
rtc-small-documentation-update.patch
#big-kernel-lock-contention-in-do_open-and-blkdev_put.patch
make-ext2_debug-work-again.patch
nbd-endian-annotations.patch
epoll-use-unlocked-wqueue-operations.patch

 This is the misc-random-stuff-which-doesnt-have-a-subsystem-tree queue. 
 Will mostly merge, based upon re-review.

use-list_add_tail-instead-of-list_add.patch
arch-use-list_move.patch
core-use-list_move.patch
net-rxrpc-use-list_move.patch
drivers-use-list_move.patch
fs-use-list_move.patch

 Will merge.

per-task-delay-accounting-setup.patch
per-task-delay-accounting-setup-fix-1.patch
per-task-delay-accounting-setup-fix-2.patch
per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch
per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch
per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch
per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch
per-task-delay-accounting-utilities-for-genetlink-usage.patch
per-task-delay-accounting-taskstats-interface.patch
per-task-delay-accounting-taskstats-interface-fix-1.patch
per-task-delay-accounting-taskstats-interface-fix-2.patch
per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch
per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch
per-task-delay-accounting-documentation.patch
per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch
per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch

 I just don't know.  There are a number of groups who pop up with various
 enhanced accounting requirements and patches (all quite different) but I
 haven't heard a lot of enthusiasm from any of them over this work, which
 attempts to provide an extensible framework for accumulation and querying
 of per-task metrics.

 But then again, we cannot just sit there and wait for everyone to be 100%
 happy.  So I'm 51% inclined to push this along.

 Anyone else who has an interest in this sort of thing needs to be aware
 that there will be an expectation that any future statistics submissions
 should use these interfaces.  So the time to pay attention is right now.

time-clocksource-infrastructure.patch
time-clocksource-infrastructure-dont-enable-irq-too-early.patch
time-use-clocksource-infrastructure-for-update_wall_time.patch
time-use-clocksource-infrastructure-for-update_wall_time-mark-few-functions-as-__init.patch
time-let-user-request-precision-from-current_tick_length.patch
time-use-clocksource-abstraction-for-ntp-adjustments.patch
time-use-clocksource-abstraction-for-ntp-adjustments-optimize-out-some-mults-since-gcc-cant-avoid-them.patch
time-introduce-arch-generic-time-accessors.patch
hangcheck-remove-monotomic_clock-on-x86.patch
time-i386-conversion-part-1-move-timer_pitc-to-i8253c.patch
time-i386-conversion-part-2-rework-tsc-support.patch
time-i386-conversion-part-3-enable-generic-timekeeping.patch
time-i386-conversion-part-4-remove-old-timer_opts-code.patch
time-i386-clocksource-drivers.patch
time-i386-clocksource-drivers-pm-timer-doesnt-use-workaround-if-chipset-is-not-buggy.patch
time-i386-clocksource-drivers-pm-timer-doesnt-use-workaround-if-chipset-is-not-buggy-acpi_pm-cleanup.patch
time-i386-clocksource-drivers-pm-timer-doesnt-use-workaround-if-chipset-is-not-buggy-acpi_pm-cleanup-fix-missing-to-rename-pmtmr_good-to-acpi_pm_good.patch
time-i386-clocksource-drivers-fix-spelling-typos.patch
time-rename-clocksource-functions.patch
make-pmtmr_ioport-__read_mostly.patch
generic-time-add-macro-to-simplify-hide-mask.patch
time-fix-time-going-backward-w-clock=pit.patch

 John's x86 time clocksource patches.   Will merge.  At last.

kprobe-boost-2byte-opcodes-on-i386.patch
kprobemulti-kprobe-posthandler-for-booster.patch
kprobemulti-kprobe-posthandler-for-booster-kprobes-bugfix-of-kprobe-booster-reenable-kprobe-booster.patch
notify-page-fault-call-chain-for-x86_64.patch
notify-page-fault-call-chain-for-i386.patch
notify-page-fault-call-chain-for-ia64.patch
notify-page-fault-call-chain-for-powerpc.patch
notify-page-fault-call-chain-for-sparc64.patch
kprobes-registers-for-notify-page-fault.patch
notify-page-fault-call-chain.patch

 Will merge.

kconfig-select-things-at-the-closest-tristate-instead-of-bool.patch

 <wonders what this is>

sched-fix-smt-nice-lock-contention-and-optimization.patch
sched-fix-smt-nice-lock-contention-and-optimization-tidy.patch

 Will merge.

sched-comment-bitmap-size-accounting.patch
sched-fix-interactive-ceiling-code.patch
unnecessary-long-index-i-in-sched.patch
sched-implement-smpnice.patch
sched-protect-calculation-of-max_pull-from-integer-wrap.patch
sched-store-weighted-load-on-up.patch
sched-add-discrete-weighted-cpu-load-function.patch
sched-prevent-high-load-weight-tasks-suppressing-balancing.patch
sched-improve-stability-of-smpnice-load-balancing.patch
sched-improve-smpnice-load-balancing-when-load-per-task.patch
smpnice-dont-consider-sched-groups-which-are-lightly-loaded-for-balancing.patch
smpnice-dont-consider-sched-groups-which-are-lightly-loaded-for-balancing-fix.patch
smpnice-dont-consider-sched-groups-which-are-lightly-loaded-for-balancing-fix-2patch.patch
sched-modify-move_tasks-to-improve-load-balancing-outcomes.patch
sched-avoid-unnecessarily-moving-highest-priority-task-move_tasks.patch
sched-avoid-unnecessarily-moving-highest-priority-task-move_tasks-fix-2.patch
sched_domain-handle-kmalloc-failure.patch
sched_domain-handle-kmalloc-failure-fix.patch
sched_domain-dont-use-gfp_atomic.patch
sched_domain-use-kmalloc_node.patch
sched_domain-allocate-sched_group-structures-dynamically.patch
sched2-sched-domain-sysctl.patch

 It's all been quiet on the sched performance regressions front lately. 
 I'll ping the usual suspects and see if we can get smpnice merged this
 time.

sched-add-above-background-load-function.patch
mm-implement-swap-prefetching.patch
mm-implement-swap-prefetching-fix.patch
mm-implement-swap-prefetching-sched-batch.patch
swap-prefetch-fix-lru_cache_add_tail.patch
swap-prefetch-fix-lru_cache_add_tail-tidy.patch
mm-swap-prefetch-fix-lowmem-reserve-calc.patch

 Swap prefetch.  I remain skeptical, but I have a lot of RAM.  Multiple
 people have sung its praises.  I guess I'll re-review and tentatively plan
 on sending them along or 2.6.18.  Opinions are sought.

pi-futex-futex-code-cleanups.patch
pi-futex-robust-futex-docs-fix.patch
pi-futex-introduce-debug_check_no_locks_freed.patch
pi-futex-introduce-warn_on_smp.patch
pi-futex-add-plist-implementation.patch
pi-futex-scheduler-support-for-pi.patch
pi-futex-rt-mutex-core.patch
pi-futex-rt-mutex-docs.patch
pi-futex-rt-mutex-docs-update.patch
pi-futex-rt-mutex-debug.patch
pi-futex-rt-mutex-tester.patch
pi-futex-rt-mutex-futex-api.patch
pi-futex-futex_lock_pi-futex_unlock_pi-support.patch
#
futex_requeue-optimization.patch

 Priority-inheriting futexes.  I don't have a clue how this code works,
 but it sure has a lot of trylocks for something which allegedly works. 
 Will merge.

proc-fix-the-inode-number-on-proc-pid-fd.patch
proc-remove-useless-bkl-in-proc_pid_readlink.patch
proc-remove-unnecessary-and-misleading-assignments.patch
proc-simplify-the-ownership-rules-for-proc.patch
proc-replace-proc_inodetype-with-proc_inodefd.patch
proc-remove-bogus-proc_task_permission.patch
proc-kill-proc_mem_inode_operations.patch
proc-properly-filter-out-files-that-are-not-visible.patch
proc-fix-the-link-count-for-proc-pid-task.patch
proc-move-proc_maps_operations-into-task_mmuc.patch
proc-rewrite-the-proc-dentry-flush-on-exit.patch
proc-close-the-race-of-a-process-dying-durning.patch
proc-refactor-reading-directories-of-tasks.patch
proc-remove-tasklist_lock-from-proc_pid_readdir.patch
proc-remove-tasklist_lock-from-proc_pid_lookup-and.patch
proc-remove-tasklist_lock-from-proc_pid_readdir-simply-fix-first_tgid.patch
proc-make-proc_numbuf-the-buffer-size-for-holding-a.patch
proc-dont-lock-task_structs-indefinitely.patch
proc-dont-lock-task_structs-indefinitely-task_mmu-small-fixes.patch
proc-use-struct-pid-not-struct-task_ref.patch
proc-optimize-proc_check_dentry_visible.patch
proc-use-sane-permission-checks-on-the-proc-pid-fd.patch
proc-cleanup-proc_fd_access_allowed.patch
proc-remove-tasklist_lock-from-proc_task_readdir.patch
simplify-fix-first_tid.patch
cleanup-next_tid.patch

 /proc/pid revamp.  Will merge.

de_thread-fix-lockless-do_each_thread.patch
coredump-optimize-mm-users-traversal.patch
coredump-speedup-sigkill-sending.patch
coredump-kill-ptrace-related-stuff.patch
coredump-kill-ptrace-related-stuff-fix.patch
coredump-dont-take-tasklist_lock.patch
coredump-some-code-relocations.patch
coredump-shutdown-current-process-first.patch
coredump-copy_process-dont-check-signal_group_exit.patch

 Will merge.  I have a note here that Roland had issues with
 coredump-kill-ptrace-related-stuff.patch?

ecryptfs-fs-makefile-and-fs-kconfig.patch
ecryptfs-fs-makefile-and-fs-kconfig-remove-ecrypt_debug-from-fs-kconfig.patch
ecryptfs-documentation.patch
ecryptfs-makefile.patch
ecryptfs-main-module-functions.patch
ecryptfs-main-module-functions-uint16_t-u16.patch
ecryptfs-header-declarations.patch
ecryptfs-header-declarations-update.patch
ecryptfs-header-declarations-update-convert-signed-data-types-to-unsigned-data-types.patch
ecryptfs-header-declarations-remove-unnecessary-ifndefs.patch
ecryptfs-superblock-operations.patch
ecryptfs-dentry-operations.patch
ecryptfs-file-operations.patch
ecryptfs-file-operations-remove-null-==-syntax.patch
ecryptfs-file-operations-remove-extraneous-read-of-inode-size-from-header.patch
#ecryptfs-vs-streamline-generic_file_-interfaces-and-filemap.patch
#ecryptfs-vs-streamline-generic_file_-interfaces-and-filemap-fix.patch
ecryptfs-file-operations-fix.patch
ecryptfs-file-operations-fix-premature-release-of-file_info-memory.patch
ecryptfs-inode-operations.patch
ecryptfs-mmap-operations.patch
mark-address_space_operations-const-vs-ecryptfs-mmap-operations.patch
ecryptfs-keystore.patch
ecryptfs-crypto-functions.patch
ecryptfs-debug-functions.patch
ecryptfs-alpha-build-fix.patch
ecryptfs-convert-assert-to-bug_on.patch
ecryptfs-remove-unnecessary-null-checks.patch
ecryptfs-rewrite-ecryptfs_fsync.patch
ecryptfs-overhaul-file-locking.patch

 Christoph has half-reviewed this and all the issues arising from that
 have, I believe, been addressed.  With the exception of the "we should
 have a generic stacking layer" issue.  Which is true.  Michael's take is
 "yes, but that's not my job".  Which also is true.

 Don't know.

proc-sysctl-add-_proc_do_string-helper.patch
namespaces-add-nsproxy.patch
namespaces-add-nsproxy-dont-include-compileh.patch
namespaces-incorporate-fs-namespace-into-nsproxy.patch
namespaces-utsname-introduce-temporary-helpers.patch
namespaces-utsname-switch-to-using-uts-namespaces.patch
namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch
namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch
namespaces-utsname-use-init_utsname-when-appropriate.patch
namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch
namespaces-utsname-implement-utsname-namespaces.patch
namespaces-utsname-implement-utsname-namespaces-export.patch
namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch
namespaces-utsname-sysctl-hack.patch
namespaces-utsname-sysctl-hack-cleanup.patch
namespaces-utsname-sysctl-hack-cleanup-2.patch
namespaces-utsname-sysctl-hack-cleanup-2-fix.patch
namespaces-utsname-remove-system_utsname.patch
namespaces-utsname-implement-clone_newuts-flag.patch
uts-copy-nsproxy-only-when-needed.patch
# needed if git-klibc isn't there:
#namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch
#namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch
#namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch

 utsname virtualisation.  This doesn't seem very pointful as a standalone
 thing.  That's a general problem with infrastructural work for a very
 large new feature.

 So probably I'll continue to babysit these patches, unless someone can
 identify a decent reason why mainline needs this work.

 I don't want to carry an ever-growing stream of OS-virtualisation
 groundwork patches for ever and ever so if we're going to do this thing...
 faster, please.

readahead-kconfig-options.patch
radixtree-introduce-radix_tree_scan_hole.patch
mm-introduce-probe_page.patch
mm-introduce-pg_readahead.patch
readahead-add-look-ahead-support-to-__do_page_cache_readahead.patch
readahead-delay-page-release-in-do_generic_mapping_read.patch
readahead-insert-cond_resched-calls.patch
readahead-minmax_ra_pages.patch
readahead-events-accounting.patch
readahead-rescue_pages.patch
readahead-sysctl-parameters.patch
readahead-sysctl-parameters-fix.patch
readahead-min-max-sizes.patch
readahead-state-based-method-aging-accounting.patch
readahead-state-based-method-routines.patch
readahead-state-based-method.patch
readahead-state-based-method-readahead-state-based-method-stand-alone-size-limit-code.patch
readahead-context-based-method.patch
readahead-context-based-method-apply-stream_shift-size-limits-to-contexta-method.patch
readahead-context-based-method-fix-remain-counting.patch
readahead-initial-method-guiding-sizes.patch
readahead-initial-method-thrashing-guard-size.patch
readahead-initial-method-expected-read-size.patch
readahead-initial-method-user-recommended-size.patch
readahead-initial-method.patch
readahead-backward-prefetching-method.patch
readahead-backward-prefetching-method-add-use-case-comment.patch
readahead-seeking-reads-method.patch
readahead-thrashing-recovery-method.patch
readahead-call-scheme.patch
readahead-laptop-mode.patch
readahead-loop-case.patch
readahead-nfsd-case.patch
readahead-turn-on-by-default.patch
readahead-debug-radix-tree-new-functions.patch
readahead-debug-traces-showing-accessed-file-names.patch
readahead-debug-traces-showing-read-patterns.patch

 It's early days yet - needs heaps more performance testing.  The results
 from "Linux Portal" <linportal@gmail.com> were discouraging.

reiser4-export-handle_ra_miss.patch
reiser4-sb_sync_inodes.patch
reiser4-export-remove_from_page_cache.patch
reiser4-export-radix_tree_preload.patch
reiser4-export-find_get_pages.patch
make-copy_from_user_inatomic-not-zero-the-tail-on-i386-vs-reiser4.patch
reiser4.patch
reiser4-hardirq-include-fix.patch
reiser4-fix-trivial-tyops-which-were-hard-to-hit.patch
reiser4-run-truncate_inode_pages-in-reiser4_delete_inode.patch

 We need to do something about this.  It does need an intensive review and
 there aren't many people who have the experience to do that right, and
 there are fewer who have the time.  Uptake by a vendor or two would be
 good.

ide-pdc202xx_oldc-remove-unneeded-tuneproc-call.patch
ide-claim-extra-dma-ports-regardless-of-channel.patch
ide-remove-dma_base2-field-form-ide_hwif_t.patch
ide-always-release-dma-engine.patch
fix-ide-locking-error.patch
ide-error-handling-fixes.patch
ide-hpt3xxn-clocking-fixes.patch
ide-io-increase-timeout-value-to-allow-for-slave-wakeup.patch
ide-actually-honor-drives-minimum-pio-dma-cycle-times.patch
ide-fix-hpt37x-timing-tables.patch
ide-optimize-hpt37x-timing-tables.patch
ide-fix-hpt3xx-hotswap-support.patch
ide-fix-the-case-of-multiple-hpt3xx-chips-present.patch
ide-hpt3xx-fix-pci-clock-detection.patch
ide-hpt3xx-fix-pci-clock-detection-fix-2.patch
ide-pdc202xx_old-remove-the-obsolete-busproc.patch
piix-fix-82371mx-enablebits.patch
piix-remove-check-for-broken-mw-dma-mode-0.patch
piix-slc90e66-pio-mode-fallback-fix.patch
make-number-of-ide-interfaces-configurable.patch
ide_dma_speed-fixes.patch
ide_dma_speed-fixes-warning-fix.patch
ide_dma_speed-fixes-tidy.patch
hpt3xx-rework-rate-filtering.patch
hpt3xx-rework-rate-filtering-tidy.patch
hpt3xx-print-the-real-chip-name-at-startup.patch
hpt3xx-switch-to-using-pci_get_slot.patch
hpt3xx-cache-channels-mcr-address.patch

 Will merge, subject to maintainer-poking.

radeonfb-powerdrain-issue-on-ibm-thinkpads-and-suspend-to-d2.patch
savagefb-allocate-space-for-current-and-saved-register.patch
savagefb-add-state-save-and_restore-hooks.patch
savagefb-add-state-save-and_restore-hooks-tidy.patch
savagefb-add-state-save-and_restore-hooks-fix.patch
backlight-locomo-backlight-driver-updates.patch
fbdev-cleanup-the-config_video_select-mess.patch
fbdev-remove-duplicate-includes.patch
fbdev-more-accurate-sync-range-extrapolation.patch
nvidiafb-revise-pci_device_id-table.patch
atyfb-fix-hardware-cursor-handling.patch
atyfb-remove-unneeded-calls-to-wait_for_idle.patch
atyfb-set-correct-acceleration-flags.patch
epson1355fb-update-platform-code.patch
vesafb-update-platform-code.patch
vfb-update-platform-code.patch
vga16fb-update-platform-code.patch
fbdev-static-pseudocolor-with-depth-less-than-4-does.patch
savagefb-whitespace-cleanup.patch
fbdev-firmware-edid-fixes.patch
fbdev-firmware-edid-fixes-fix.patch
nvidiafb-add-support-for-geforce-6100-and-related-chipsets.patch
fbdev-add-1366x768-wxga-mode-to-mode-database.patch
vesafb-fix-return-code-of-vesafb_setcolreg.patch
vesafb-prefer-vga-registers-over-pmi.patch
vt-delay-the-update-of-the-visible-console.patch
atyfb-fix-dead-code.patch
fbdev-coverity-bug-85.patch
fbdev-coverity-bug-90.patch
fbdev-remove-unused-exports.patch
s3c2410fb-fix-resume.patch
backlight-fix-kconfig-dependency.patch
au1100fb-add-power-management-support.patch
au1100fb-add-power-management-support-tidy.patch
skeletonfb-remove-duplicate-module-init-exit-license-lines.patch
neofb-fix-unblank-logic-interfering-with-lid-toggled-backlight.patch

 Will merge.

dm-snapshot-unify-chunk_size.patch
lib-add-idr_replace.patch
lib-add-idr_replace-tidy.patch
dm-fix-idr-minor-allocation.patch
dm-move-idr_pre_get.patch
dm-change-minor_lock-to-spinlock.patch
dm-add-dmf_freeing.patch
dm-fix-mapped-device-ref-counting.patch
dm-add-module-ref-counting.patch
dm-fix-block-device-initialisation.patch
dm-mirror-sector-offset-fix.patch
dm-table-get_target-fix-last-index.patch

 Will merge.

md-reformat-code-in-raid1_end_write_request-to-avoid-goto.patch
md-remove-arbitrary-limit-on-chunk-size.patch
md-remove-useless-ioctl-warning.patch
md-increase-the-delay-before-marking-metadata-clean-and-make-it-configurable.patch
md-merge-raid5-and-raid6-code.patch
md-remove-nuisance-message-at-shutdown.patch
md-allow-checkpoint-of-recovery-with-version-1-superblock.patch
md-allow-checkpoint-of-recovery-with-version-1-superblock-fix.patch
md-allow-a-linear-array-to-have-drives-added-while-active.patch
md-support-stripe-offset-mode-in-raid10.patch
md-make-md_print_devices-static.patch
md-split-reshape-portion-of-raid5-sync_request-into-a-separate-function.patch
#
md-bitmap-fix-online-removal-of-file-backed-bitmaps.patch
md-bitmap-remove-bitmap-writeback-daemon.patch
md-bitmap-cleaner-separation-of-page-attribute-handlers-in-md-bitmap.patch
md-bitmap-use-set_bit-etc-for-bitmap-page-attributes.patch
md-bitmap-remove-unnecessary-page-reference-manipulations-from-md-bitmap-code.patch
md-bitmap-remove-dead-code-from-md-bitmap.patch
md-bitmap-tidy-up-i_writecount-handling-in-md-bitmap.patch
md-bitmap-change-md-bitmap-file-handling-to-use-bmap-to-file-blocks.patch
md-change-md-bitmap-file-handling-to-use-bmap-to-file-blocks-fix.patch
md-calculate-correct-array-size-for-raid10-in-new-offset-mode.patch
#
md-md-kconfig-speeling-feex.patch
md-fix-kconfig-error.patch
md-fix-bug-that-stops-raid5-resync-from-happening.patch
md-allow-re-add-to-work-on-array-without-bitmaps.patch
md-dont-write-dirty-clean-update-to-spares-leave-them-alone.patch
md-set-get-state-of-array-via-sysfs.patch
md-allow-rdev-state-to-be-set-via-sysfs.patch
md-allow-raid-layout-to-be-read-and-set-via-sysfs.patch
md-allow-resync_start-to-be-set-and-queried-via-sysfs.patch
md-allow-the-write_mostly-flag-to-be-set-via-sysfs.patch

 Will merge.

statistics-infrastructure-prerequisite-list.patch
statistics-infrastructure-prerequisite-parser.patch
statistics-infrastructure-prerequisite-timestamp.patch
statistics-infrastructure-prerequisite-timestamp-fix.patch
statistics-infrastructure-make-printk_clock-a-generic-kernel-wide-nsec-resolution.patch
statistics-infrastructure-documentation.patch
statistics-infrastructure.patch
statistics-infrastructure-update-1.patch
statistics-infrastructure-update-2.patch
statistics-infrastructure-update-3.patch
statistics-infrastructure-exploitation-zfcp.patch

 Another tough one.  It offers generic intrastructure for non-task-related
 instrumentation and it would really be good if someone who has an interest
 in this for something other than the zfcp driver could stand up and say
 "this works for us".

genirq-rename-desc-handler-to-desc-chip.patch
genirq-rename-desc-handler-to-desc-chip-power-fix.patch
genirq-rename-desc-handler-to-desc-chip-ia64-fix.patch
genirq-rename-desc-handler-to-desc-chip-ia64-fix-2.patch
genirq-sem2mutex-probe_sem-probing_active.patch
genirq-cleanup-merge-irq_affinity-into-irq_desc.patch
genirq-cleanup-remove-irq_descp.patch
genirq-cleanup-remove-irq_descp-fix.patch
genirq-cleanup-remove-fastcall.patch
genirq-cleanup-misc-code-cleanups.patch
genirq-cleanup-reduce-irq_desc_t-use-mark-it-obsolete.patch
genirq-cleanup-include-linux-irqh.patch
genirq-cleanup-merge-irq_dir-smp_affinity_entry-into-irq_desc.patch
genirq-cleanup-merge-pending_irq_cpumask-into-irq_desc.patch
genirq-cleanup-turn-arch_has_irq_per_cpu-into-config_irq_per_cpu.patch
genirq-debug-better-debug-printout-in-enable_irq.patch
genirq-add-retrigger-irq-op-to-consolidate-hw_irq_resend.patch
genirq-doc-comment-include-linux-irqh-structures.patch
genirq-doc-handle_irq_event-and-__do_irq-comments.patch
genirq-cleanup-no_irq_type-cleanups.patch
genirq-doc-add-design-documentation.patch
genirq-add-genirq-sw-irq-retrigger.patch
genirq-add-irq_noprobe-support.patch
genirq-add-irq_norequest-support.patch
genirq-add-irq_noautoen-support.patch
genirq-update-copyrights.patch
genirq-core.patch
genirq-msi-fixes-2.patch
genirq-add-irq-chip-support.patch
genirq-add-irq-chip-support-fix.patch
genirq-add-handle_bad_irq.patch
genirq-add-irq-wake-power-management-support.patch
genirq-add-sa_trigger-support.patch
genirq-cleanup-no_irq_type-no_irq_chip-rename.patch
genirq-convert-the-x86_64-architecture-to-irq-chips.patch
genirq-convert-the-i386-architecture-to-irq-chips.patch
genirq-convert-the-i386-architecture-to-irq-chips-fix-2.patch
genirq-more-verbose-debugging-on-unexpected-irq-vectors.patch
genirq-add-chip-eoi-fastack-fasteoi.patch
genirq-add-chip-eoi-fastack-fasteoi-fix.patch

 Still stabilising.  It's looking more like 2.6.19 material.  Needs more
 review from arch maintainers too.

lock-validator-floppyc-irq-release-fix.patch
lock-validator-floppyc-irq-release-fix-fix.patch
lock-validator-forcedethc-fix.patch
lock-validator-mutex-section-binutils-workaround.patch
lock-validator-add-__module_address-method.patch
lock-validator-better-lock-debugging.patch
lock-validator-locking-api-self-tests.patch
lock-validator-locking-api-self-tests-self-test-fix.patch
lock-validator-locking-init-debugging-improvement.patch
lock-validator-beautify-x86_64-stacktraces.patch
lock-validator-beautify-x86_64-stacktraces-fix.patch
lock-validator-beautify-x86_64-stacktraces-fix-2.patch
lock-validator-beautify-x86_64-stacktraces-fix-3.patch
lock-validator-beautify-x86_64-stacktraces-fix-4.patch
lock-validator-x86_64-document-stack-frame-internals.patch
lock-validator-stacktrace.patch
lock-validator-stacktrace-build-fix.patch
lock-validator-stacktrace-warning-fix.patch
lock-validator-stacktrace-fix-on-x86_64.patch
lock-validator-fown-locking-workaround.patch
lock-validator-sk_callback_lock-workaround.patch
lock-validator-irqtrace-core.patch
lock-validator-irqtrace-core-powerpc-fix-1.patch
lock-validator-irqtrace-core-non-x86-fix.patch
lock-validator-irqtrace-core-non-x86-fix-2.patch
lock-validator-irqtrace-core-non-x86-fix-3.patch
lock-validator-irqtrace-entrys-fix.patch
lock-validator-irqtrace-core-remove-softirqc-warn_on.patch
lock-validator-irqtrace-cleanup-include-asm-i386-irqflagsh.patch
lock-validator-irqtrace-cleanup-include-asm-x86_64-irqflagsh.patch
lock-validator-x86_64-irqflags-trace-entrys-fix.patch
lock-validator-lockdep-add-local_irq_enable_in_hardirq-api.patch
lock-validator-add-per_cpu_offset.patch
lock-validator-add-per_cpu_offset-fix.patch
lock-validator-core.patch
lock-validator-core-early_boot_irqs_-build-fix.patch
lock-validator-core-fix-compiler-warning.patch
lock-validator-procfs.patch
lock-validator-core-multichar-fix.patch
lock-validator-core-count_matching_names-fix.patch
lock-validator-design-docs.patch
lock-validator-prove-rwsem-locking-correctness.patch
lock-validator-prove-rwsem-locking-correctness-fix.patch
lock-validator-prove-rwsem-locking-correctness-powerpc-fix.patch
lock-validator-prove-spinlock-rwlock-locking-correctness.patch
lock-validator-prove-mutex-locking-correctness.patch
lock-validator-prove-mutex-locking-correctness-fix-null-type-name-bug.patch
lock-validator-print-all-lock-types-on-sysrq-d.patch
lock-validator-x86_64-early-init.patch
lock-validator-smp-alternatives-workaround.patch
lock-validator-do-not-recurse-in-printk.patch
lock-validator-disable-nmi-watchdog-if-config_lockdep.patch
lock-validator-disable-nmi-watchdog-if-config_lockdep-i386.patch
lock-validator-disable-nmi-watchdog-if-config_lockdep-x86_64.patch
lock-validator-special-locking-bdev.patch
lock-validator-special-locking-direct-io.patch
lock-validator-special-locking-serial.patch
lock-validator-special-locking-serial-fix.patch
lock-validator-special-locking-dcache.patch
lock-validator-special-locking-i_mutex.patch
lock-validator-special-locking-s_lock.patch
lock-validator-special-locking-futex.patch
lock-validator-special-locking-genirq.patch
lock-validator-special-locking-completions.patch
lock-validator-special-locking-waitqueues.patch
lock-validator-special-locking-mm.patch
lock-validator-special-locking-serio.patch
lock-validator-special-locking-slab.patch
lock-validator-special-locking-skb_queue_head_init.patch
lock-validator-special-locking-net-ipv4-igmpcpatch.patch
lock-validator-special-locking-net-ipv4-igmpc-2.patch
lock-validator-special-locking-timerc.patch
lock-validator-special-locking-schedc.patch
lock-validator-special-locking-hrtimerc.patch
lock-validator-special-locking-sock_lock_init.patch
lock-validator-special-locking-af_unix.patch
lock-validator-special-locking-bh_lock_sock.patch
lock-validator-special-locking-mmap_sem.patch
lock-validator-special-locking-sb-s_umount.patch
lock-validator-special-locking-sb-s_umount-fix.patch
lock-validator-special-locking-sb-s_umount-2.patch
lock-validator-special-locking-sb-s_umount-2-fix.patch
lockdep-annotate-rpc_populate-for.patch
lock-validator-special-locking-jbd.patch
lock-validator-special-locking-posix-timers.patch
lock-validator-special-locking-sch_genericc.patch
lock-validator-special-locking-xfrm.patch
lockdep-add-i_mutex-ordering-annotations-to-the-sunrpc.patch
lockdep-add-parent-child-annotations-to-usbfs.patch
lock-validator-special-locking-sound-core-seq-seq_portsc.patch
lock-validator-special-locking-sound-core-seq-seq_devicec.patch
lock-validator-special-locking-sound-core-seq-seq_devicec-fix.patch
lock-validator-fix-rt_hash_lock_sz.patch
lock-validator-introduce-irq__lockdep.patch
locking-validator-special-rule-8390c-disable_irq.patch
locking-validator-special-rule-3c59xc-disable_irq.patch
lock-validator-enable-lock-validator-in-kconfig.patch
lock-validator-enable-lock-validator-in-kconfig-require-trace_irqflags_support.patch
lock-validator-enable-lock-validator-in-kconfig-not-yet.patch
lockdep-one-stacktrace-column-if-config_lockdep=y.patch
i386-remove-multi-entry-backtraces.patch
lockdep-further-improve-stacktrace-output.patch
lock-validator-irqtrace-support-non-x86-architectures.patch
lock-validator-disable-oprofile-if-lockdep=y.patch
lock-validator-select-kallsyms_all.patch

 I'm not really sure that this has as good a bugfixes/effort ratio as would,
 say, working on our ever-growing bugzilla list.

 But given that it exists, and that it'll fix (or rather prevent) future
 bugs at a constant-but-low rate for a long time, I guess it's something we
 want.

 I think it's more like 2.6.19 material.  The number of
 teach-lockdep-about-this-unusual-but-correct-locking-code patches
 continues to grow and I don't think we fully have a handle on how it'll
 all end up looking.



^ permalink raw reply	[flat|nested] 166+ messages in thread

* 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans)
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
@ 2006-06-04 21:20 ` Bernhard Rosenkraenzer
  2006-06-04 21:33 ` header cleanup and install David Woodhouse
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 166+ messages in thread
From: Bernhard Rosenkraenzer @ 2006-06-04 21:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sunday, 4. June 2006 22:50, Andrew Morton wrote:
> git-hdrcleanup.patch
> git-hdrinstall.patch
>
>  This is Dave Woodhouse's work cleaning up the kernel headers and adding a
>  `make headerinstall' target which automates the exporting of kernel
>  headers as a userspace-usable package.
>
>  All I can say about this is that it doesn't appear to break anything and
>  is ready to merge from that point of view.  It's not an area in which I
>  have much interest or knowledge.

I've played with it and rebuilt all of Ark Linux (around 5000 packages) with 
glibc-kernheaders replaced with make headerinstall-ed headers, no problems at 
all (except some stupid apps thinking BITS_PER_LONG is supposed to be 
defined, but they were probably broken with the last couple of 
glibc-kernheaders releases as well).

So from a user's perspective, it's ready.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
  2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer
@ 2006-06-04 21:33 ` David Woodhouse
  2006-06-04 21:43   ` Andrew Morton
  2006-06-05 10:52   ` Jens Axboe
  2006-06-04 21:36 ` 2.6.18 -mm merge plans Alan Cox
                   ` (18 subsequent siblings)
  20 siblings, 2 replies; 166+ messages in thread
From: David Woodhouse @ 2006-06-04 21:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote:
> git-hdrcleanup.patch
> git-hdrinstall.patch
> 
>  This is Dave Woodhouse's work cleaning up the kernel headers and adding a
>  `make headerinstall' target which automates the exporting of kernel
>  headers as a userspace-usable package.

More specifically:

git-hdrcleanup is simple and boring janitorial stuff in headers --
nothing particularly new and exciting. Mostly it's just moving stuff
that shouldn't be user-visible inside existing instances of #ifdef
__KERNEL__ -- it doesn't even add many new ifdefs. A large chunk of it
is just removing the superfluous #include <linux/config.h> from every
file.

The only bit that's even vaguely interesting, if you're _desperate_ to
find something exciting in it, is the fact that I hid the broken
_syscallX macros from asm-*/unistd.h inside #ifdef __KERNEL__. They're
broken for 64-bit syscall arguments on architectures like MIPS, they
were even broken for PIC code on i386. Not only were they broken, but
also the kernel headers are _not_ a library of random crap for userspace
to use. Glibc doesn't use them, klibc doesn't use them, and dietlibc
folks were working on not using them last time I checked.

git-hdrinstall is just the 'make headers_install' thing, based on an
original implementation by Arnd Bergmann. It takes the set of headers
which are at all suitable for userspace and exports them with unifdef.
The idea is that distributions can have a _consistent_ set of headers to
build stuff like glibc and system tools against, rather than the horrid
mess we have now. Those files can also be diffed from one release to the
next, and we have a decent chance of actually _seeing_ what changed,
without all the noise. Having done that diff on my last few updates, it
does actually seem to work like that in practice.

>  That being said, it's relatively costly to carry such extensive patches
>  in -mm for long periods, so I'd ask Linus and the distro people to work
>  out what we want to do here promptly, please. 

The result of this is already shipping in Fedora rawhide, and it's a
godsend. I haven't heard much from the relevant package maintainers in
other distros recently, but they were generally in agreement last time I
heard. There's not a lot of 'working out' to be done -- we just need
Linus to take it.

Btw, no mention of the rbtree shrinkage. I plan to send that Linuswards
as soon as 2.6.17 is out too, OK? And the mtd tree too but that's just a
normal maintainer tree so I _expected_ you to omit that.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-04 21:33 ` header cleanup and install David Woodhouse
@ 2006-06-04 21:43   ` Andrew Morton
  2006-06-05 10:52   ` Jens Axboe
  1 sibling, 0 replies; 166+ messages in thread
From: Andrew Morton @ 2006-06-04 21:43 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-kernel

On Sun, 04 Jun 2006 22:33:13 +0100
David Woodhouse <dwmw2@infradead.org> wrote:

> Btw, no mention of the rbtree shrinkage. I plan to send that Linuswards
> as soon as 2.6.17 is out too, OK?

yup.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-04 21:33 ` header cleanup and install David Woodhouse
  2006-06-04 21:43   ` Andrew Morton
@ 2006-06-05 10:52   ` Jens Axboe
  2006-06-05 10:54     ` David Woodhouse
  1 sibling, 1 reply; 166+ messages in thread
From: Jens Axboe @ 2006-06-05 10:52 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Morton, linux-kernel

On Sun, Jun 04 2006, David Woodhouse wrote:
> Btw, no mention of the rbtree shrinkage. I plan to send that Linuswards
> as soon as 2.6.17 is out too, OK? And the mtd tree too but that's just a
> normal maintainer tree so I _expected_ you to omit that.

I guess the color -> colour transformation is clouding the inclusion :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-05 10:52   ` Jens Axboe
@ 2006-06-05 10:54     ` David Woodhouse
  2006-06-05 10:59       ` Jens Axboe
  0 siblings, 1 reply; 166+ messages in thread
From: David Woodhouse @ 2006-06-05 10:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Mon, 2006-06-05 at 12:52 +0200, Jens Axboe wrote:
> I guess the color -> colour transformation is clouding the
> inclusion :-)

Heh. Well, mostly I've just _removed_ the references colo{u,}r, since
callers shouldn't be poking at it anyway in general. We have
rb_set_black() and rb_set_red() now, but even those are only really for
the rbtree code itself to be using.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-05 10:54     ` David Woodhouse
@ 2006-06-05 10:59       ` Jens Axboe
  2006-06-05 10:57         ` David Woodhouse
  0 siblings, 1 reply; 166+ messages in thread
From: Jens Axboe @ 2006-06-05 10:59 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Morton, linux-kernel

On Mon, Jun 05 2006, David Woodhouse wrote:
> On Mon, 2006-06-05 at 12:52 +0200, Jens Axboe wrote:
> > I guess the color -> colour transformation is clouding the
> > inclusion :-)
> 
> Heh. Well, mostly I've just _removed_ the references colo{u,}r, since
> callers shouldn't be poking at it anyway in general. We have
> rb_set_black() and rb_set_red() now, but even those are only really for
> the rbtree code itself to be using.

Yeah I'm just kidding, I just noticed that your British fingers could
not leave the "color" alone! The patches are fine with me, rb usage is
quite wide spread and shrinking the nodes is definitely a good thing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-05 10:59       ` Jens Axboe
@ 2006-06-05 10:57         ` David Woodhouse
  2006-06-05 11:03           ` Jens Axboe
  0 siblings, 1 reply; 166+ messages in thread
From: David Woodhouse @ 2006-06-05 10:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Mon, 2006-06-05 at 12:59 +0200, Jens Axboe wrote:
> Yeah I'm just kidding, I just noticed that your British fingers could
> not leave the "color" alone! The patches are fine with me, rb usage is
> quite wide spread and shrinking the nodes is definitely a good thing.

Hey... I left rb_insert_color() as it was, didn't I? :)

-- 
dwmw2


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-05 10:57         ` David Woodhouse
@ 2006-06-05 11:03           ` Jens Axboe
  2006-06-05 18:09             ` Andrew Morton
  0 siblings, 1 reply; 166+ messages in thread
From: Jens Axboe @ 2006-06-05 11:03 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Morton, linux-kernel

On Mon, Jun 05 2006, David Woodhouse wrote:
> On Mon, 2006-06-05 at 12:59 +0200, Jens Axboe wrote:
> > Yeah I'm just kidding, I just noticed that your British fingers could
> > not leave the "color" alone! The patches are fine with me, rb usage is
> > quite wide spread and shrinking the nodes is definitely a good thing.
> 
> Hey... I left rb_insert_color() as it was, didn't I? :)

You did - and snuck in the renaming in the headers :-)
But don't label me as a colour racist.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-05 11:03           ` Jens Axboe
@ 2006-06-05 18:09             ` Andrew Morton
  2006-06-05 19:19               ` David Woodhouse
  0 siblings, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-05 18:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: dwmw2, linux-kernel

On Mon, 5 Jun 2006 13:03:31 +0200
Jens Axboe <axboe@suse.de> wrote:

> On Mon, Jun 05 2006, David Woodhouse wrote:
> > On Mon, 2006-06-05 at 12:59 +0200, Jens Axboe wrote:
> > > Yeah I'm just kidding, I just noticed that your British fingers could
> > > not leave the "color" alone! The patches are fine with me, rb usage is
> > > quite wide spread and shrinking the nodes is definitely a good thing.
> > 
> > Hey... I left rb_insert_color() as it was, didn't I? :)
> 
> You did - and snuck in the renaming in the headers :-)
> But don't label me as a colour racist.
> 

I'm not as shy.

David, we now have a mixture of "color" and "colour" in the same piece of
code.  That's just dumb.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-05 18:09             ` Andrew Morton
@ 2006-06-05 19:19               ` David Woodhouse
  2006-06-17 20:35                 ` Alistair John Strachan
  0 siblings, 1 reply; 166+ messages in thread
From: David Woodhouse @ 2006-06-05 19:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jens Axboe, linux-kernel

On Mon, 2006-06-05 at 11:09 -0700, Andrew Morton wrote:
> I'm not as shy.
> 
> David, we now have a mixture of "color" and "colour" in the same piece of
> code.  That's just dumb. 

I blame them damn Frenchies. Fixed in the git tree.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-05 19:19               ` David Woodhouse
@ 2006-06-17 20:35                 ` Alistair John Strachan
  2006-06-17 21:20                   ` David Woodhouse
  0 siblings, 1 reply; 166+ messages in thread
From: Alistair John Strachan @ 2006-06-17 20:35 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Morton, Jens Axboe, linux-kernel

On Monday 05 June 2006 20:19, David Woodhouse wrote:
> On Mon, 2006-06-05 at 11:09 -0700, Andrew Morton wrote:
> > I'm not as shy.
> >
> > David, we now have a mixture of "color" and "colour" in the same piece of
> > code.  That's just dumb.
>
> I blame them damn Frenchies. Fixed in the git tree.

To colour, I assume. ;-)

-- 
Cheers,
Alistair.

Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: header cleanup and install
  2006-06-17 20:35                 ` Alistair John Strachan
@ 2006-06-17 21:20                   ` David Woodhouse
  0 siblings, 0 replies; 166+ messages in thread
From: David Woodhouse @ 2006-06-17 21:20 UTC (permalink / raw)
  To: Alistair John Strachan; +Cc: Andrew Morton, Jens Axboe, linux-kernel

On Sat, 2006-06-17 at 21:35 +0100, Alistair John Strachan wrote:
> > > David, we now have a mixture of "color" and "colour" in the same piece of
> > > code.  That's just dumb.
> >
> > I blame them damn Frenchies. Fixed in the git tree.
> 
> To colour, I assume. ;-) 

No, to 'color' since rb_insert_color() was the public API and hasn't
changed, while 'rb_parent_colour' is a new, internal field (now renamed
to 'rb_parent_color').

-- 
dwmw2


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
  2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer
  2006-06-04 21:33 ` header cleanup and install David Woodhouse
@ 2006-06-04 21:36 ` Alan Cox
  2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 166+ messages in thread
From: Alan Cox @ 2006-06-04 21:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Ar Sul, 2006-06-04 am 13:50 -0700, ysgrifennodd Andrew Morton:
>  All I can say about this is that it doesn't appear to break anything and
>  is ready to merge from that point of view.  It's not an area in which I
>  have much interest or knowledge.

With my distro hat on I'd say its essential work and it either needs
doing now or resolving at the kernel summit.

Alan


^ permalink raw reply	[flat|nested] 166+ messages in thread

* kbuild, kconfig and hrdinstall stuff
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (2 preceding siblings ...)
  2006-06-04 21:36 ` 2.6.18 -mm merge plans Alan Cox
@ 2006-06-04 21:41 ` Sam Ravnborg
  2006-06-04 21:54   ` David Woodhouse
  2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 166+ messages in thread
From: Sam Ravnborg @ 2006-06-04 21:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

> git-hdrcleanup.patch
> git-hdrinstall.patch
> 
>  This is Dave Woodhouse's work cleaning up the kernel headers and adding a
>  `make headerinstall' target which automates the exporting of kernel
>  headers as a userspace-usable package.
> 
>  All I can say about this is that it doesn't appear to break anything and
>  is ready to merge from that point of view.  It's not an area in which I
>  have much interest or knowledge.

Dave Woodhouse asked me to review the hdrinstall part and I will do so.
At first glance only a fiw tid-bits needs fixing and then I like to
include unifdef in the kernel. It is rather unusual to have installed
(gentoo at least does not have it in Portage).

I just lacks a bit of time. Work and my newcomer (2 months old now)
takes a bit of time at the moment.

I you do not beat me hdrinstall will be part of kbuild-tree soon,
whereas the hrdcleanup part will not.

Following will be in kbuild-tree soon too.
> add-dependency-on-kernelrelease-to-the-package-targets.patch
> kconfig-improve-config-load-save-output.patch
> kconfig-fix-config-dependencies.patch
> kconfig-remove-symbol_yesmodno.patch
> kconfig-allow-multiple-default-values-per-symbol.patch
> kconfig-allow-loading-multiple-configurations.patch
> kconfig-integrate-split-config-into-silentoldconfig.patch
> kconfig-integrate-split-config-into-silentoldconfig-fix.patch
> kconfig-move-kernelrelease.patch
> kconfig-add-symbol-option-config-syntax.patch
> kconfig-add-defconfig_list-module-option.patch
> kconfig-add-search-option-for-xconfig.patch
> kconfig-finer-customization-via-popup-menus.patch
> kconfig-create-links-in-info-window.patch
> kconfig-jump-to-linked-menu-prompt.patch
> kconfig-warn-about-leading-whitespace-for-menu-prompts.patch
> kconfig-remove-leading-whitespace-in-menu-prompts.patch
> config-exit-if-no-beginning-filename.patch
> make-kernelrelease-speedup.patch
> kconfig-kconfig_overwriteconfig.patch
Not this one >>> sane-menuconfig-colours.patch
Randy Dunlap has a patch so it is configurable - but I like it
selectable in menuconfig - something I have not yet done.

> kbuild-export-type-enhancement-to-modpostc.patch
> kbuild-export-type-enhancement-to-modpostc-fix.patch
> kbuild-prevent-building-modules-that-wont-load.patch
> kbuild-export-symbol-usage-report-generator.patch
> kbuild-obj-dirs-is-calculated-incorrectly-if-hostprogs-y-is-defined.patch
> fix-make-rpm-for-powerpc.patch
If review is good => kbuild-tree.

> powerpc-kbuild-warning-fix.patch
I need to check up on this.


> kernel-doc-drop-leading-space-in-sections.patch
> kernel-doc-script-cleanups.patch
I thought we had a kernel-doc maintainer these days?

	Sam

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: kbuild, kconfig and hrdinstall stuff
  2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg
@ 2006-06-04 21:54   ` David Woodhouse
  0 siblings, 0 replies; 166+ messages in thread
From: David Woodhouse @ 2006-06-04 21:54 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Andrew Morton, linux-kernel

On Sun, 2006-06-04 at 23:41 +0200, Sam Ravnborg wrote:
> Dave Woodhouse asked me to review the hdrinstall part and I will do
> so.
> At first glance only a fiw tid-bits needs fixing and then I like to
> include unifdef in the kernel. It is rather unusual to have installed
> (gentoo at least does not have it in Portage).

I'm happy enough to include unifdef -- I'll do it right now if there's
consensus. I just didn't want to do it and then have its inclusion be a
_problem_ -- I wanted the tree to be as simple as possible.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (3 preceding siblings ...)
  2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg
@ 2006-06-04 23:04 ` H. Peter Anvin
  2006-06-05 18:09   ` Roman Zippel
  2006-06-06 15:20   ` Pavel Machek
  2006-06-04 23:50 ` clocksource Roman Zippel
                   ` (15 subsequent siblings)
  20 siblings, 2 replies; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-04 23:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> 
> git-klibc.patch
> 
>  Similar.  This all appears to work sufficiently well for a 2.6.18 merge. 
>  But it's been so long since klibc was a hot topic that I've forgotten who
>  wanted it, and what for.
> 
>  Can whoever has an interest in this work please pipe up and let's get our
>  direction sorted out quickly.
> 

klibc (early userspace) in its current form is fundamentally a cleanup. 
  What it does is unload code from the kernel which has no fundamental 
reason to be kernel code (written during kernel rules, with all the 
problems it entails.)  The initial code to have removed is the 
root-mounting code, with all the various ugly mutations of that (ramdisk 
loading, NFS root, initrd...)

The original idea was due Al Viro; obviously, the implementation is 
mostly mine.

It is of course my hope that this will be used for more than just plain 
initialization code, but that in itself is a significant step, and one 
has to start somewhere.

Part of the reason it has taken as long is it has is just to try to make 
it as drop-in as at all possible.

	-hpa

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin
@ 2006-06-05 18:09   ` Roman Zippel
  2006-06-06 15:20   ` Pavel Machek
  1 sibling, 0 replies; 166+ messages in thread
From: Roman Zippel @ 2006-06-05 18:09 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Morton, linux-kernel

Hi,

On Sun, 4 Jun 2006, H. Peter Anvin wrote:

> Andrew Morton wrote:
> > 
> > git-klibc.patch
> > 
> >  Similar.  This all appears to work sufficiently well for a 2.6.18 merge.
> > But it's been so long since klibc was a hot topic that I've forgotten who
> >  wanted it, and what for.
> > 
> >  Can whoever has an interest in this work please pipe up and let's get our
> >  direction sorted out quickly.
> > 
> 
> klibc (early userspace) in its current form is fundamentally a cleanup.  What
> it does is unload code from the kernel which has no fundamental reason to be
> kernel code (written during kernel rules, with all the problems it entails.)
> The initial code to have removed is the root-mounting code, with all the
> various ugly mutations of that (ramdisk loading, NFS root, initrd...)

For a cleanup it adds quite a lot of code, where I'm not really sure it 
should all be distributed with the kernel. I'm really surprised there 
hasn't been any larger discussion about or maybe I missed something?

It adds various utitilies (dash, gzip, ...) to the kernel source, which 
are not kernel specific at all. Why do we need this duplication? IMO code 
duplication like this is not a desirable thing, as it increases the 
maintenance overhead. Sometimes this is necessary, but IMO it should have 
a good reason and should be temporary. Where does this duplication end 
(e.g. udev, module tools), are we going to pull everything into the kernel 
source, which might be needed for an initramfs?

I think the most questionable duplication is the libgcc copy. Why do you 
even provide your own copy of this? This is a private library of the 
compiler and the last thing I would duplicate.

How is this going to integrate with the rest of the system, especially on 
the distribution side. They have their own mechanisms to produce an 
initrd. What happens if NFS root requires a network module, which requires 
a firmware file? How is this going to work?
How can e.g. embedded users control what's going into the initramfs, they 
certainly don't want any duplication here (e.g. initrd on top of 
initramfs?)

I'm not really comfortable with merging the whole thing at once, it's a 
huge patch (or thousands of little commits) with no real documentation, 
which makes it very hard to review. IMO it would be preferable to 
distribute the non-kernel specific parts separatly and make this more 
modular, so that the user can exchange any part he like. The patch also 
already removes a lot of the old setup stuff, which makes it an 
all-or-nothing approach. It doesn't leave us with the possibility to make 
the new setup optional and gradually convert the various boot 
initialization to it and experiment a little with it at first (not just 
kernel developer but also everyone else).

Considering the impact all these changes will have, I would be really 
interested in more opinions on this or just does everyone hope it will 
somehow work out by itself?

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin
  2006-06-05 18:09   ` Roman Zippel
@ 2006-06-06 15:20   ` Pavel Machek
  2006-06-06 20:56     ` Rafael J. Wysocki
  1 sibling, 1 reply; 166+ messages in thread
From: Pavel Machek @ 2006-06-06 15:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Morton, linux-kernel

Hi!

> >git-klibc.patch
> >
> > Similar.  This all appears to work sufficiently well 
> > for a 2.6.18 merge. But it's been so long since klibc 
> > was a hot topic that I've forgotten who
> > wanted it, and what for.
> >
> > Can whoever has an interest in this work please pipe 
> > up and let's get our
> > direction sorted out quickly.
> >
> 
> klibc (early userspace) in its current form is 
> fundamentally a cleanup. What it does is unload code 
>  from the kernel which has no fundamental reason to be 
> kernel code (written during kernel rules, with all the 
> problems it entails.)  The initial code to have removed 
> is the root-mounting code, with all the various ugly 
> mutations of that (ramdisk loading, NFS root, initrd...)
> 
> The original idea was due Al Viro; obviously, the 
> implementation is mostly mine.
> 
> It is of course my hope that this will be used for more 
> than just plain initialization code, but that in itself 
> is a significant step, and one has to start somewhere.

It allows me to unify swsusp & uswsusp into one in future, for
example, reducing code duplication. klibc looks like good thing.

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-06 15:20   ` Pavel Machek
@ 2006-06-06 20:56     ` Rafael J. Wysocki
  2006-06-07  3:37       ` H. Peter Anvin
                         ` (2 more replies)
  0 siblings, 3 replies; 166+ messages in thread
From: Rafael J. Wysocki @ 2006-06-06 20:56 UTC (permalink / raw)
  To: Pavel Machek; +Cc: H. Peter Anvin, Andrew Morton, linux-kernel

Hi,

On Tuesday 06 June 2006 17:20, Pavel Machek wrote:
> > >git-klibc.patch
> > >
> > > Similar.  This all appears to work sufficiently well 
> > > for a 2.6.18 merge. But it's been so long since klibc 
> > > was a hot topic that I've forgotten who
> > > wanted it, and what for.
> > >
> > > Can whoever has an interest in this work please pipe 
> > > up and let's get our
> > > direction sorted out quickly.
> > >
> > 
> > klibc (early userspace) in its current form is 
> > fundamentally a cleanup. What it does is unload code 
> >  from the kernel which has no fundamental reason to be 
> > kernel code (written during kernel rules, with all the 
> > problems it entails.)  The initial code to have removed 
> > is the root-mounting code, with all the various ugly 
> > mutations of that (ramdisk loading, NFS root, initrd...)
> > 
> > The original idea was due Al Viro; obviously, the 
> > implementation is mostly mine.
> > 
> > It is of course my hope that this will be used for more 
> > than just plain initialization code, but that in itself 
> > is a significant step, and one has to start somewhere.
> 
> It allows me to unify swsusp & uswsusp into one in future, for
> example, reducing code duplication.

[cough] How distant is the future you're referring to?

Rafael

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-06 20:56     ` Rafael J. Wysocki
@ 2006-06-07  3:37       ` H. Peter Anvin
  2006-06-07  4:00       ` Nigel Cunningham
  2006-06-07  8:44       ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek
  2 siblings, 0 replies; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-07  3:37 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <200606062256.55472.rjw@sisk.pl>
By author:    "Rafael J. Wysocki" <rjw@sisk.pl>
In newsgroup: linux.dev.kernel
> > 
> > It allows me to unify swsusp & uswsusp into one in future, for
> > example, reducing code duplication.
> 
> [cough] How distant is the future you're referring to?
> 

Shouldn't be far, since most of the code is already written.

One major advantage of klibc is that it allows most of the
initialization code to both be re-used as standalone programs as well
as be tested in normal userspace.  The former lets distributions
stitch it together any way they want, and the latter should reduce
bugs (especially since it's combined with what is a decent-sized
subset of the POSIX programming model, as opposed to the much more
difficult kernel programming model.)

	-hpa

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-06 20:56     ` Rafael J. Wysocki
  2006-06-07  3:37       ` H. Peter Anvin
@ 2006-06-07  4:00       ` Nigel Cunningham
  2006-06-07  4:10         ` H. Peter Anvin
  2006-06-07  8:44       ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek
  2 siblings, 1 reply; 166+ messages in thread
From: Nigel Cunningham @ 2006-06-07  4:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rafael J. Wysocki, Pavel Machek, H. Peter Anvin, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 823 bytes --]

Hi.

Sorry for coming in late. I've only just resubscribed after my move.

Not sure who originally said this...

> > > problems it entails.)  The initial code to have removed
> > > is the root-mounting code, with all the various ugly
> > > mutations of that (ramdisk loading, NFS root, initrd...)

Could I get more explanation of what this means and its implications? I'm 
thinking in particular about the implications for suspending to disk. Will it 
imply that everyone will _have_ to have an initramfs with some userspace 
program that sets up device nodes and so on, even if at the moment all you 
have is root=/dev/hda1 resume2=swap:/dev/hda2?

Along similar lines, I had been considering eventually including support for 
putting an image in place of the initrd (for embedded).

Regards,

Nigel

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-07  4:00       ` Nigel Cunningham
@ 2006-06-07  4:10         ` H. Peter Anvin
  2006-06-07  4:25           ` Nigel Cunningham
  0 siblings, 1 reply; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-07  4:10 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <200606071400.49980.ncunningham@linuxmail.org>
By author:    Nigel Cunningham <ncunningham@linuxmail.org>
In newsgroup: linux.dev.kernel
> 
> Hi.
> 
> Sorry for coming in late. I've only just resubscribed after my move.
> 
> Not sure who originally said this...
> 
> > > > problems it entails.)  The initial code to have removed
> > > > is the root-mounting code, with all the various ugly
> > > > mutations of that (ramdisk loading, NFS root, initrd...)
> 
> Could I get more explanation of what this means and its implications? I'm 
> thinking in particular about the implications for suspending to disk. Will it 
> imply that everyone will _have_ to have an initramfs with some userspace 
> program that sets up device nodes and so on, even if at the moment all you 
> have is root=/dev/hda1 resume2=swap:/dev/hda2?
> 

Yes.  That initramfs is embedded in the kernel image.

> Along similar lines, I had been considering eventually including support for 
> putting an image in place of the initrd (for embedded).

You can still override the default buildin initramfs.  Then you get
the benefit of not carrying a bunch of code with you that can never be
executed.

	-hpa



^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-07  4:10         ` H. Peter Anvin
@ 2006-06-07  4:25           ` Nigel Cunningham
  2006-06-07  4:26             ` klibc H. Peter Anvin
  2006-06-07  6:51             ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson
  0 siblings, 2 replies; 166+ messages in thread
From: Nigel Cunningham @ 2006-06-07  4:25 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]

Hi.

On Wednesday 07 June 2006 14:10, H. Peter Anvin wrote:
> Followup to:  <200606071400.49980.ncunningham@linuxmail.org>
> By author:    Nigel Cunningham <ncunningham@linuxmail.org>
> In newsgroup: linux.dev.kernel
>
> > Hi.
> >
> > Sorry for coming in late. I've only just resubscribed after my move.
> >
> > Not sure who originally said this...
> >
> > > > > problems it entails.)  The initial code to have removed
> > > > > is the root-mounting code, with all the various ugly
> > > > > mutations of that (ramdisk loading, NFS root, initrd...)
> >
> > Could I get more explanation of what this means and its implications? I'm
> > thinking in particular about the implications for suspending to disk.
> > Will it imply that everyone will _have_ to have an initramfs with some
> > userspace program that sets up device nodes and so on, even if at the
> > moment all you have is root=/dev/hda1 resume2=swap:/dev/hda2?
>
> Yes.  That initramfs is embedded in the kernel image.
>
> > Along similar lines, I had been considering eventually including support
> > for putting an image in place of the initrd (for embedded).
>
> You can still override the default buildin initramfs.  Then you get
> the benefit of not carrying a bunch of code with you that can never be
> executed.

Ok. Ta. I guess I should put some time into learning this prior to 2.6.18 
then, so I can help others through the transition.

Regards,

Nigel
-- 
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-07  4:25           ` Nigel Cunningham
@ 2006-06-07  4:26             ` H. Peter Anvin
  2006-06-07  6:22               ` klibc Nigel Cunningham
  2006-06-07  6:51             ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson
  1 sibling, 1 reply; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-07  4:26 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: linux-kernel

Nigel Cunningham wrote:
> 
> Ok. Ta. I guess I should put some time into learning this prior to 2.6.18 
> then, so I can help others through the transition.
> 

I've been meaning to write up some proper documentation, but obviously 
my #1 priority has been to fix problems as they crop up.  If you would 
be willing to help in that area, I'd be more than willing to spend some 
time giving you a mind dump.

	-hpa

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-07  4:26             ` klibc H. Peter Anvin
@ 2006-06-07  6:22               ` Nigel Cunningham
  2006-06-07  6:38                 ` klibc H. Peter Anvin
  0 siblings, 1 reply; 166+ messages in thread
From: Nigel Cunningham @ 2006-06-07  6:22 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

Hi.

On Wednesday 07 June 2006 14:26, H. Peter Anvin wrote:
> Nigel Cunningham wrote:
> > Ok. Ta. I guess I should put some time into learning this prior to 2.6.18
> > then, so I can help others through the transition.
>
> I've been meaning to write up some proper documentation, but obviously
> my #1 priority has been to fix problems as they crop up.  If you would
> be willing to help in that area, I'd be more than willing to spend some
> time giving you a mind dump.

I'm sorry, but I probably wouldn't have time to help. I'm not even making 
progress on Suspend2 at the moment.

Regards,

Nigel
-- 
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-07  6:22               ` klibc Nigel Cunningham
@ 2006-06-07  6:38                 ` H. Peter Anvin
  0 siblings, 0 replies; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-07  6:38 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: linux-kernel

Nigel Cunningham wrote:
> Hi.
> 
> On Wednesday 07 June 2006 14:26, H. Peter Anvin wrote:
>> Nigel Cunningham wrote:
>>> Ok. Ta. I guess I should put some time into learning this prior to 2.6.18
>>> then, so I can help others through the transition.
>> I've been meaning to write up some proper documentation, but obviously
>> my #1 priority has been to fix problems as they crop up.  If you would
>> be willing to help in that area, I'd be more than willing to spend some
>> time giving you a mind dump.
> 
> I'm sorry, but I probably wouldn't have time to help. I'm not even making 
> progress on Suspend2 at the moment.
> 

Anyway, swsusp code is already in klibc.

	-hpa


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-07  4:25           ` Nigel Cunningham
  2006-06-07  4:26             ` klibc H. Peter Anvin
@ 2006-06-07  6:51             ` Joshua Hudson
  2006-06-07 21:12               ` H. Peter Anvin
  1 sibling, 1 reply; 166+ messages in thread
From: Joshua Hudson @ 2006-06-07  6:51 UTC (permalink / raw)
  To: linux-kernel

Did anybody ever fix the can't pivot_root() the rootfs filesystem;
hense can't use on a loopback system backed by NTFS?

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-07  6:51             ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson
@ 2006-06-07 21:12               ` H. Peter Anvin
  2006-06-09  8:03                 ` klibc Nix
  0 siblings, 1 reply; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-07 21:12 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <bda6d13a0606062351i5c94414fpa03ee2ce3dd180ae@mail.gmail.com>
By author:    "Joshua Hudson" <joshudson@gmail.com>
In newsgroup: linux.dev.kernel
>
> Did anybody ever fix the can't pivot_root() the rootfs filesystem;
> hense can't use on a loopback system backed by NTFS?
> 

You shouldn't pivot_root the rootfs filesystem.  Use the run-init
utility or something similar instead (which does a mount with
MS_MOVE.)

	-hpa

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-07 21:12               ` H. Peter Anvin
@ 2006-06-09  8:03                 ` Nix
  2006-06-09 18:45                   ` klibc H. Peter Anvin
       [not found]                   ` <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com>
  0 siblings, 2 replies; 166+ messages in thread
From: Nix @ 2006-06-09  8:03 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On 7 Jun 2006, H. Peter Anvin noted:
> Followup to:  <bda6d13a0606062351i5c94414fpa03ee2ce3dd180ae@mail.gmail.com>
> By author:    "Joshua Hudson" <joshudson@gmail.com>
> In newsgroup: linux.dev.kernel
>>
>> Did anybody ever fix the can't pivot_root() the rootfs filesystem;
>> hense can't use on a loopback system backed by NTFS?
> 
> You shouldn't pivot_root the rootfs filesystem.

What happens if you do? I mean, it doesn't make even conceptual sense,
really. The rootfs is always there: that's its entire purpose.

>                                                 Use the run-init
> utility or something similar instead (which does a mount with
> MS_MOVE.)

busybox has a switch_root tool which (conceptually) rm -rf's everything
on the root filesystem and then does such a mount. (After all whatever
is on that filesystem is inaccessible after the overmount, so keeping
it around is just a waste of memory.)

-- 
`Voting for any American political party is fundamentally
 incomprehensible.' --- Vadik

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-09  8:03                 ` klibc Nix
@ 2006-06-09 18:45                   ` H. Peter Anvin
       [not found]                   ` <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com>
  1 sibling, 0 replies; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-09 18:45 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <8764ja7o2d.fsf@hades.wkstn.nix>
By author:    Nix <nix@esperi.org.uk>
In newsgroup: linux.dev.kernel
> > 
> > You shouldn't pivot_root the rootfs filesystem.
> 
> What happens if you do? I mean, it doesn't make even conceptual sense,
> really. The rootfs is always there: that's its entire purpose.
> 

"What happens if you do"... well, it may work, it might not, it may
break some functionality for you or break in a future kernel version.
It's undefined behaviour.

> >                                                 Use the run-init
> > utility or something similar instead (which does a mount with
> > MS_MOVE.)
> 
> busybox has a switch_root tool which (conceptually) rm -rf's everything
> on the root filesystem and then does such a mount. (After all whatever
> is on that filesystem is inaccessible after the overmount, so keeping
> it around is just a waste of memory.)

What busybox calls switch_root is the same as the run-init tool from
the klibc distribution.

	-hpa

^ permalink raw reply	[flat|nested] 166+ messages in thread

[parent not found: <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com>]

[parent not found: <871wty6rl9.fsf@hades.wkstn.nix>]

* Re: klibc
       [not found]                     ` <871wty6rl9.fsf@hades.wkstn.nix>
@ 2006-06-09 22:28                       ` Joshua Hudson
  2006-06-09 22:48                         ` klibc H. Peter Anvin
  0 siblings, 1 reply; 166+ messages in thread
From: Joshua Hudson @ 2006-06-09 22:28 UTC (permalink / raw)
  To: linux-kernel

On 6/9/06, Nix <nix@esperi.org.uk> wrote:
> On Fri, 9 Jun 2006, Joshua Hudson whispered secretively:
> > On 6/9/06, Nix <nix@esperi.org.uk> wrote:
> >> What happens if you do? I mean, it doesn't make even conceptual sense,
> >> really. The rootfs is always there: that's its entire purpose.
> >
> > I just need it accessable somewhere else on the tree so that the system
> > init runs from that rather than the root filesystem, and so can unmount
> > root filesystem. Obvously, after a mount /, it is not.
>
> You cannot unmount rootfs: it's the first filesystem mounted, the
> ultimate parent of all attached mounts, the fallback used if you umount
> everything else, and is explicitly checked for at mount and pivot_root
> time.
>
> You also don't often want to leave anything in it after you've booted:
> unlike tmpfs, it's not swap-backed, so stuff in there stays in
> nonswappable memory, pinned in the page cache. This is generally
> undesirable. Yes, it stays around empty: but if you boot without an
> initramfs, it stays around empty *in any case*: the kernel builds an
> empty one and uses it automatically, then falls back to code which
> mounts a root filesystem for you (code which HPA's klibc patch removes
> in favour of doing everything it did from an initramfs).
>
>
> The end of my initramfs script (busybox / uclibc-based) reads
>
> # Unmount everything and switch root filesystems for good:
> # exec the real init and begin the real boot process.
> /bin/umount -l /proc
> /bin/umount -l /sys
> /bin/umount -l /dev
>
> exec switch_root /new-root $init $INIT_ARGS
>
> where switch_root is the aforementioned busybox `rm -rf everything on
> this filesystem and mount --move us into the new root'. (At the time
> it runs, it's PID 1 and there are no other non-kernel threads running:
> it execs init.)
>
>
> What are you trying to accomplish?

Once again. Loopback mount requires a clean unmount of root and
host filesystem. After remounting root read-only, host is still read-write
and cannot be remounted read-only.

It is necessary to provide access to the rootfs tree somewhere else
or use pivot_root, like the initrd solution below:

initrd: /linuxrc
#!/bin/sh
mount /dev/hda1 -o rw -t ntfs /host
mount /host/linux/root.img -o loop,ro -t ext3 /root
pivot_root /root /root/initrd
exec /initrd/bin/init

root:/etc/rc.d/rc.halt:
#!/bin/sh
pivot_root /initrd /initrd/root
cd /
exec /stop $RUNLEVEL

initrd:/stop
#!/bin/sh
kill -SIGUSR1 1
umount /root
umount /host
case $1 in
0) poweroff -f ;;
*) reboot -f ;;
esac


This requires static binaries of init, sh, mount, umount, an extant /etc, and a
few nodes in /dev.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-09 22:28                       ` klibc Joshua Hudson
@ 2006-06-09 22:48                         ` H. Peter Anvin
  2006-06-09 23:13                           ` klibc Joshua Hudson
  0 siblings, 1 reply; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-09 22:48 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <bda6d13a0606091528h4e85265du8651818c73827b7d@mail.gmail.com>
By author:    "Joshua Hudson" <joshudson@gmail.com>
In newsgroup: linux.dev.kernel
> 
> Once again. Loopback mount requires a clean unmount of root and
> host filesystem. After remounting root read-only, host is still read-write
> and cannot be remounted read-only.
> 
> It is necessary to provide access to the rootfs tree somewhere else
> or use pivot_root, like the initrd solution below:
> 
> initrd: /linuxrc
> #!/bin/sh
> mount /dev/hda1 -o rw -t ntfs /host
> mount /host/linux/root.img -o loop,ro -t ext3 /root
> pivot_root /root /root/initrd
> exec /initrd/bin/init
> 
> root:/etc/rc.d/rc.halt:
> #!/bin/sh
> pivot_root /initrd /initrd/root
> cd /
> exec /stop $RUNLEVEL
> 
> initrd:/stop
> #!/bin/sh
> kill -SIGUSR1 1
> umount /root
> umount /host
> case $1 in
> 0) poweroff -f ;;
> *) reboot -f ;;
> esac
> 
> This requires static binaries of init, sh, mount, umount, an extant /etc, and a
> few nodes in /dev.

Another solution is to leave a process with its cwd parked in the
rootfs.  Look at run_linuxrc() in usr/kinit/initrd.c of any klibc tree
to see how this can be used.  (That is there to support old-style
/linuxrc, but should be applicable here, too.)

	-hpa

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-09 22:48                         ` klibc H. Peter Anvin
@ 2006-06-09 23:13                           ` Joshua Hudson
  2006-06-09 23:44                             ` klibc H. Peter Anvin
  0 siblings, 1 reply; 166+ messages in thread
From: Joshua Hudson @ 2006-06-09 23:13 UTC (permalink / raw)
  To: linux-kernel

On 6/9/06, H. Peter Anvin <hpa@zytor.com> wrote:
> Followup to:  <bda6d13a0606091528h4e85265du8651818c73827b7d@mail.gmail.com>
> By author:    "Joshua Hudson" <joshudson@gmail.com>
> In newsgroup: linux.dev.kernel
> >
> > Once again. Loopback mount requires a clean unmount of root and
> > host filesystem. After remounting root read-only, host is still read-write
> > and cannot be remounted read-only.
> >
> > It is necessary to provide access to the rootfs tree somewhere else
> > or use pivot_root, like the initrd solution below:
> >
> > initrd: /linuxrc
> > #!/bin/sh
> > mount /dev/hda1 -o rw -t ntfs /host
> > mount /host/linux/root.img -o loop,ro -t ext3 /root
> > pivot_root /root /root/initrd
> > exec /initrd/bin/init
> >
> > root:/etc/rc.d/rc.halt:
> > #!/bin/sh
> > pivot_root /initrd /initrd/root
> > cd /
> > exec /stop $RUNLEVEL
> >
> > initrd:/stop
> > #!/bin/sh
> > kill -SIGUSR1 1
> > umount /root
> > umount /host
> > case $1 in
> > 0) poweroff -f ;;
> > *) reboot -f ;;
> > esac
> >
> > This requires static binaries of init, sh, mount, umount, an extant /etc, and a
> > few nodes in /dev.
>
> Another solution is to leave a process with its cwd parked in the
> rootfs.  Look at run_linuxrc() in usr/kinit/initrd.c of any klibc tree
> to see how this can be used.  (That is there to support old-style
> /linuxrc, but should be applicable here, too.)
>
>         -hpa
Should work if the following is true:
   if pwd is /, mount / followed by ls . retunrs the contents of initramfs.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-09 23:13                           ` klibc Joshua Hudson
@ 2006-06-09 23:44                             ` H. Peter Anvin
  2006-06-16  6:02                               ` klibc Joshua Hudson
  0 siblings, 1 reply; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-09 23:44 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <bda6d13a0606091613h3334facbrcb86dbb2de01b412@mail.gmail.com>
By author:    "Joshua Hudson" <joshudson@gmail.com>
In newsgroup: linux.dev.kernel
> Should work if the following is true:
>    if pwd is /, mount / followed by ls . retunrs the contents of initramfs.

It does, and it does work as described.  Again, see the referenced code.

You can also fchdir() to the rootfs if you have a file descriptor to
any directory therein.

	-hpa

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-09 23:44                             ` klibc H. Peter Anvin
@ 2006-06-16  6:02                               ` Joshua Hudson
  2006-06-16 19:19                                 ` klibc H. Peter Anvin
  0 siblings, 1 reply; 166+ messages in thread
From: Joshua Hudson @ 2006-06-16  6:02 UTC (permalink / raw)
  To: linux-kernel

I've come to the conclusion that there is no good way to return to the
initramfs at all
after init moves to the real root device. What I have found is that the only way
is for another process to keep a cwd or open file handle on the initramfs which
plays very badly with killall.

Anybody got a way to make a user process other than init involunerable
to kill -9? <g>

It would be dirt-simple if I could mount --rbind / /root/initrd where
/ is the initramfs and /root is a mounted filesystem, but that creates
cycles and so breaks other things.

Oh, and mount / followed by ls / returns the contents of the initramfs. Weird.
umount -l / has the exteremely bizarre effect of leaving the process stranded in
/ unless it currently has pwd or open directory handle elsewhere.

Anybody want a patch that dumps the executor of umount / in the
initramfs, then does
a lazy unmount? That, however, carries risks of its own so might not
be the best move.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc
  2006-06-16  6:02                               ` klibc Joshua Hudson
@ 2006-06-16 19:19                                 ` H. Peter Anvin
  0 siblings, 0 replies; 166+ messages in thread
From: H. Peter Anvin @ 2006-06-16 19:19 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <bda6d13a0606152302v6598ce84sf4c7066705c3284f@mail.gmail.com>
By author:    "Joshua Hudson" <joshudson@gmail.com>
In newsgroup: linux.dev.kernel
>
> I've come to the conclusion that there is no good way to return to the
> initramfs at all
> after init moves to the real root device. What I have found is that the only way
> is for another process to keep a cwd or open file handle on the initramfs which
> plays very badly with killall.
> 
> Anybody got a way to make a user process other than init involunerable
> to kill -9? <g>
> 

Actually, does init close all its file descriptors?  Otherwise you
could simply pass a file descriptor to init when init is executed.

If init explicitly closes file descriptors then that's not possible,
but perhaps that could be fixed in init.

On the other hand, if you killall -9 arbitrary processes as root, then
perhaps getting a dirty filesystem on reboot is ugly but not
catastrophic.

	-hpa


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-06 20:56     ` Rafael J. Wysocki
  2006-06-07  3:37       ` H. Peter Anvin
  2006-06-07  4:00       ` Nigel Cunningham
@ 2006-06-07  8:44       ` Pavel Machek
  2006-06-07  9:44         ` Rafael J. Wysocki
  2 siblings, 1 reply; 166+ messages in thread
From: Pavel Machek @ 2006-06-07  8:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: H. Peter Anvin, Andrew Morton, linux-kernel


Hi!

> > > The original idea was due Al Viro; obviously, the 
> > > implementation is mostly mine.
> > > 
> > > It is of course my hope that this will be used for more 
> > > than just plain initialization code, but that in itself 
> > > is a significant step, and one has to start somewhere.
> > 
> > It allows me to unify swsusp & uswsusp into one in future, for
> > example, reducing code duplication.
> 
> [cough] How distant is the future you're referring to?

Year or two, I believe. Actually it is not as much as "unify swsusp &
uswsusp" as "drop kernel/power/swap.c" and possibly put parts of
uswsusp into initial userland so that user do not notice it was
dropped from kernel.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: klibc (was: 2.6.18 -mm merge plans)
  2006-06-07  8:44       ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek
@ 2006-06-07  9:44         ` Rafael J. Wysocki
  0 siblings, 0 replies; 166+ messages in thread
From: Rafael J. Wysocki @ 2006-06-07  9:44 UTC (permalink / raw)
  To: Pavel Machek; +Cc: H. Peter Anvin, Andrew Morton, linux-kernel

On Wednesday 07 June 2006 10:44, Pavel Machek wrote:
> > > > The original idea was due Al Viro; obviously, the 
> > > > implementation is mostly mine.
> > > > 
> > > > It is of course my hope that this will be used for more 
> > > > than just plain initialization code, but that in itself 
> > > > is a significant step, and one has to start somewhere.
> > > 
> > > It allows me to unify swsusp & uswsusp into one in future, for
> > > example, reducing code duplication.
> > 
> > [cough] How distant is the future you're referring to?
> 
> Year or two, I believe. Actually it is not as much as "unify swsusp &
> uswsusp" as "drop kernel/power/swap.c" and possibly put parts of
> uswsusp into initial userland so that user do not notice it was
> dropped from kernel.

OK

Rafael

^ permalink raw reply	[flat|nested] 166+ messages in thread

* clocksource
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (4 preceding siblings ...)
  2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin
@ 2006-06-04 23:50 ` Roman Zippel
  2006-06-05 20:20   ` clocksource john stultz
  2006-06-05  0:02 ` utsname/hostname Randy.Dunlap
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-04 23:50 UTC (permalink / raw)
  To: Andrew Morton, johnstul; +Cc: linux-kernel

Hi,

On Sun, 4 Jun 2006, Andrew Morton wrote:

> time-use-clocksource-infrastructure-for-update_wall_time.patch

I still disagree with the update_wall_time() changes, they should be kept 
the new separate from this. The error algorithm is a somewhat old version 
and can cause oscillation and thus a confused clock.

> time-let-user-request-precision-from-current_tick_length.patch

This is broken, as it simply throws away resolution depending on the 
clock.

These are two key problems, the rest can be fixed incrementally.

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-04 23:50 ` clocksource Roman Zippel
@ 2006-06-05 20:20   ` john stultz
  2006-06-05 20:53     ` clocksource john stultz
  2006-06-05 21:07     ` clocksource Roman Zippel
  0 siblings, 2 replies; 166+ messages in thread
From: john stultz @ 2006-06-05 20:20 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel

On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote:
> Hi,
> 
> On Sun, 4 Jun 2006, Andrew Morton wrote:
> 
> > time-use-clocksource-infrastructure-for-update_wall_time.patch
> 
> I still disagree with the update_wall_time() changes, they should be kept 
> the new separate from this. 

Is this directly related to the next item (if so, how?), or just
preference? I'd really like to avoid having multiple code paths for the
timekeeping core, so I'd like to see this unified. I'm willing to
optimize out bits w/ constants and whatnot, but I worry it will be a
nightmare to maintain if we have multiple generic update_wall_time
implementations.

> The error algorithm is a somewhat old version 
> and can cause oscillation and thus a confused clock.

Would you mind elaborating on this? Which aspect of the error algorithm
is off? How does the clock become confused? Could you point to the line
numbers, etc?  I assume your last patchset contains the current version?

> > time-let-user-request-precision-from-current_tick_length.patch
> 
> This is broken, as it simply throws away resolution depending on the 
> clock.

So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is
true, and currently that's only the jiffies case.

Just to be clear, are you then suggesting that the accumulation in
update_wall_time should be done in a fixed shifted nanosecond unit
regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in
your mind for this?

That seems not too difficult to do, and can be done w/ an incremental
patch. I'll try to crank that out today.

> These are two key problems, the rest can be fixed incrementally.

If these are really blockers, I want to get them resolved in the next
day or so (I'd really like to avoid having Andrew carry them for yet
another cycle). So I'd appreciate your help in correcting these issues.

thanks
-john

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-05 20:20   ` clocksource john stultz
@ 2006-06-05 20:53     ` john stultz
  2006-06-05 21:07     ` clocksource Roman Zippel
  1 sibling, 0 replies; 166+ messages in thread
From: john stultz @ 2006-06-05 20:53 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel

On Mon, 2006-06-05 at 13:20 -0700, john stultz wrote:
> On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote:
> > > time-let-user-request-precision-from-current_tick_length.patch
> > 
> > This is broken, as it simply throws away resolution depending on the 
> > clock.
> 
> So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is
> true, and currently that's only the jiffies case.
> 
> Just to be clear, are you then suggesting that the accumulation in
> update_wall_time should be done in a fixed shifted nanosecond unit
> regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in
> your mind for this?
> 
> That seems not too difficult to do, and can be done w/ an incremental
> patch. I'll try to crank that out today.

Just to quickly get some feedback on this. Currently untested (I'm
working on that part now - Andrew, I'll send it to once it clears), but
it builds.

Roman: Your thoughts? Does it cover your concern?

thanks
-john


diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 4bc9428..884980a 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -146,14 +146,17 @@ static inline s64 cyc2ns(struct clocksou
 	return ret;
 }
 
+
+#define CLOCKSOURCE_INTERVAL_SHIFT (SHIFT_SCALE - 10)
+
 /**
  * clocksource_calculate_interval - Calculates a clocksource interval struct
  *
  * @c:		Pointer to clocksource.
  * @length_nsec: Desired interval length in nanoseconds.
  *
- * Calculates a fixed cycle/nsec interval for a given clocksource/adjustment
- * pair and interval request.
+ * Calculates a fixed cycle/nsec interval (in CLOCKSOURCE_INTERVAL_SHIFT units)
+ * for a given clocksource/adjustment pair and interval request.
  *
  * Unless you're the timekeeping code, you should not be using this!
  */
@@ -164,7 +167,7 @@ static inline void clocksource_calculate
 
 	/* XXX - All of this could use a whole lot of optimization */
 	tmp = length_nsec;
-	tmp <<= c->shift;
+	tmp <<= CLOCKSOURCE_INTERVAL_SHIFT;
 	tmp += c->mult/2;
 	do_div(tmp, c->mult);
 
@@ -215,8 +218,8 @@ static inline int error_aproximation(u64
  * @cycles_delta:	Current unacounted cycle delta
  * @error:		Pointer to current error value
  *
- * Returns clock shifted nanosecond adjustment to be applied against
- * the accumulated time value (ie: xtime).
+ * Returns CLOCKSOURCE_INTERVAL_SHIFT shifted nanosecond adjustment to be
+ * applied against the accumulated time value (ie: xtime).
  *
  * If the error value is large enough, this function calulates the
  * (power of two) adjustment value, and adjusts the clock's mult and
diff --git a/kernel/timer.c b/kernel/timer.c
index 0569d40..588bfcd 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1029,8 +1029,8 @@ static void update_wall_time(void)
 	s64 snsecs_per_sec;
 	cycle_t now, offset;
 
-	snsecs_per_sec = (s64)NSEC_PER_SEC << clock->shift;
-	remainder_snsecs += (s64)xtime.tv_nsec << clock->shift;
+	snsecs_per_sec = (s64)NSEC_PER_SEC << CLOCKSOURCE_INTERVAL_SHIFT;
+	remainder_snsecs += (s64)xtime.tv_nsec << CLOCKSOURCE_INTERVAL_SHIFT;
 
 	now = clocksource_read(clock);
 	offset = (now - last_clock_cycle)&clock->mask;
@@ -1039,8 +1039,11 @@ static void update_wall_time(void)
 	 * case of lost or late ticks, it will accumulate correctly.
 	 */
 	while (offset > clock->interval_cycles) {
-		/* get the ntp interval in clock shifted nanoseconds */
-		s64 ntp_snsecs	= current_tick_length(clock->shift);
+		/* get the ntp interval in CLOCKSOURCE_INTERVAL_SHIFT 
+		 * shifted nanoseconds:
+		 */
+		s64 ntp_snsecs =
+			current_tick_length(CLOCKSOURCE_INTERVAL_SHIFT);
 
 		/* accumulate one interval */
 		remainder_snsecs += clock->interval_snsecs;
@@ -1049,7 +1052,7 @@ static void update_wall_time(void)
 
 		/* interpolator bits */
 		time_interpolator_update(clock->interval_snsecs
-						>> clock->shift);
+						>> CLOCKSOURCE_INTERVAL_SHIFT);
 		/* increment the NTP state machine */
 		update_ntp_one_tick();
 
@@ -1066,8 +1069,8 @@ static void update_wall_time(void)
 		}
 	}
 	/* store full nanoseconds into xtime */
-	xtime.tv_nsec = remainder_snsecs >> clock->shift;
-	remainder_snsecs -= (s64)xtime.tv_nsec << clock->shift;
+	xtime.tv_nsec = remainder_snsecs >> CLOCKSOURCE_INTERVAL_SHIFT;
+	remainder_snsecs -= (s64)xtime.tv_nsec << CLOCKSOURCE_INTERVAL_SHIFT;
 
 	/* check to see if there is a new clocksource to use */
 	if (change_clocksource()) {



^ permalink raw reply related	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-05 20:20   ` clocksource john stultz
  2006-06-05 20:53     ` clocksource john stultz
@ 2006-06-05 21:07     ` Roman Zippel
  2006-06-06 19:42       ` clocksource john stultz
  1 sibling, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-05 21:07 UTC (permalink / raw)
  To: john stultz; +Cc: Andrew Morton, linux-kernel

Hi,

On Mon, 5 Jun 2006, john stultz wrote:

> On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote:
> > Hi,
> > 
> > On Sun, 4 Jun 2006, Andrew Morton wrote:
> > 
> > > time-use-clocksource-infrastructure-for-update_wall_time.patch
> > 
> > I still disagree with the update_wall_time() changes, they should be kept 
> > the new separate from this. 
> 
> Is this directly related to the next item (if so, how?), or just
> preference? I'd really like to avoid having multiple code paths for the
> timekeeping core, so I'd like to see this unified. I'm willing to
> optimize out bits w/ constants and whatnot, but I worry it will be a
> nightmare to maintain if we have multiple generic update_wall_time
> implementations.

One "unified" version will only be worse. Keeping the new path separate 
from the old path will only make things clearer and more flexible.
Right now you have a mixture of old code, interpolator code and new 
timekeeping code, which makes it a big mess. _Please_ don't do this, it 
makes your code very hard to read, personally I cannot guarantee that this 
thing does the right thing, with the separate function we at least have a 
backup plan. John, this is very sensitive code, I beg you not to fuck 
around with it. :-(

> > The error algorithm is a somewhat old version 
> > and can cause oscillation and thus a confused clock.
> 
> Would you mind elaborating on this? Which aspect of the error algorithm
> is off? How does the clock become confused? Could you point to the line
> numbers, etc?  I assume your last patchset contains the current version?

With large clock offsets the lookahead doesn't work correctly, basically 
because it's already to late and it can cause overadjustment. Because of 
this I do an extra lookahead in clocksource_bigadjust().

> > > time-let-user-request-precision-from-current_tick_length.patch
> > 
> > This is broken, as it simply throws away resolution depending on the 
> > clock.
> 
> So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is
> true, and currently that's only the jiffies case.
> 
> Just to be clear, are you then suggesting that the accumulation in
> update_wall_time should be done in a fixed shifted nanosecond unit
> regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in
> your mind for this?
> 
> That seems not too difficult to do, and can be done w/ an incremental
> patch. I'll try to crank that out today.

I'd prefer you'd just take the update function from my patch, it's nicely 
optimized and I'll try to address any concern you have about it.
For this I also I posted a userspace test program, so that I know how it 
behaves, do you have something similiar for yours?

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-05 21:07     ` clocksource Roman Zippel
@ 2006-06-06 19:42       ` john stultz
  2006-06-07  0:41         ` clocksource Roman Zippel
  0 siblings, 1 reply; 166+ messages in thread
From: john stultz @ 2006-06-06 19:42 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel

On Mon, 2006-06-05 at 23:07 +0200, Roman Zippel wrote:
> Hi,
> 
> On Mon, 5 Jun 2006, john stultz wrote:
> 
> > On Mon, 2006-06-05 at 01:50 +0200, Roman Zippel wrote:
> > > Hi,
> > > 
> > > On Sun, 4 Jun 2006, Andrew Morton wrote:
> > > 
> > > > time-use-clocksource-infrastructure-for-update_wall_time.patch
> > > 
> > > I still disagree with the update_wall_time() changes, they should be kept 
> > > the new separate from this. 
> > 
> > Is this directly related to the next item (if so, how?), or just
> > preference? I'd really like to avoid having multiple code paths for the
> > timekeeping core, so I'd like to see this unified. I'm willing to
> > optimize out bits w/ constants and whatnot, but I worry it will be a
> > nightmare to maintain if we have multiple generic update_wall_time
> > implementations.
> 
> One "unified" version will only be worse. Keeping the new path separate 
> from the old path will only make things clearer and more flexible.
> Right now you have a mixture of old code, interpolator code and new 
> timekeeping code, which makes it a big mess.

Eh? Are we looking at the same code? Right now there is *no* mixture of
old code. The update_wall_time() function is the same for *all* arches,
and jiffies is the common clocksource. This allows us to use alternate
clocksources to move to continuous timekeeping, while allowing the
accumulation function to stay the same. I don't see what part of this
you consider messy.

Its true the interpolator code is still there, but not for long, as they
will be easy to convert since the clocksource structure is very similar.
And that's just *one* line of code in the function in question.

> > > The error algorithm is a somewhat old version 
> > > and can cause oscillation and thus a confused clock.
> > 
> > Would you mind elaborating on this? Which aspect of the error algorithm
> > is off? How does the clock become confused? Could you point to the line
> > numbers, etc?  I assume your last patchset contains the current version?
> 
> With large clock offsets the lookahead doesn't work correctly, basically 
> because it's already to late and it can cause overadjustment. Because of 
> this I do an extra lookahead in clocksource_bigadjust().

Do you have a hard example for this with numbers? I don't mean to be a
pain, but I don't see this right off.

With the current code in -mm I can run a test app that disables
interrupts for 2 seconds at a time over and over and I'm still keeping
synched w/ an NTP server within 30 microseconds.

> > > > time-let-user-request-precision-from-current_tick_length.patch
> > > 
> > > This is broken, as it simply throws away resolution depending on the 
> > > clock.
> > 
> > So if the clock shift value is less then 12 (SHIFT_SCALE - 10), this is
> > true, and currently that's only the jiffies case.
> > 
> > Just to be clear, are you then suggesting that the accumulation in
> > update_wall_time should be done in a fixed shifted nanosecond unit
> > regardless of the clock shift value? Is SHIFT_SCALE-10, good enough in
> > your mind for this?
> > 
> > That seems not too difficult to do, and can be done w/ an incremental
> > patch. I'll try to crank that out today.
>
> I'd prefer you'd just take the update function from my patch, it's nicely 
> optimized and I'll try to address any concern you have about it.

Even though as mentioned above I'm still in the dark on the need for it,
I spent a few hours last trying to convert the make_ntp_adj() in -mm to
use your bigadjust implementation. Unfortunately my attempts refuse to
boot. I'm not sure if this is the same problem that was keeping your
patchset from booting as well, or possibly just an implementation error
on my part.

I'll send a patch for your review shortly and maybe you can catch the
issue?

> For this I also I posted a userspace test program, so that I know how it 
> behaves, do you have something similiar for yours?

At your prodding awhile back I wrote a userspace simulator, but you
never commented on it.

Ok, so back to the critical issues:

1) You don't like the unified update_wall_time
2) Issue w/ the current make_ntp_adj and lost ticks.
3) NTP error may be limited to clock->shift resolution w/ the jiffies
clocksource (other clocksources have finer resolution then NTP's).

#1 I disagree with. I really think the unified approach is the way to
go, but I do understand the need to be conservative. This has gotten a
fair amount of testing in -mm with no issues reported. But its not too
hard to re-add the old update_wall_time(), so I could be convinced to
push the unification off w/ a second opinion. Again, this would be via
an incremental patch.

So for #2 I'm working on trying to get your implementation functioning,
but I still don't quite understand the reason. I think this can be
worked out w/ an incremental patch. I'll send you some separate mail
shortly on this.

I posted an incremental patch for #3 yesterday, but it needs more work,
and changes in #2 could affect it, so I'm focusing on #2 at them moment.
Even so, NTP's resolution is 0.2 picoseconds, and the jiffies
clocksource keeps an error resolution of 4 picoseconds, so I'm not sure
if that this is really a blocker.  Further if we do re-add the old
update_wall_time for issue #1, this would keep the jiffies clocksource
from being used commonly, so this wouldn't be an issue.

Your thoughts?
-john

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-06 19:42       ` clocksource john stultz
@ 2006-06-07  0:41         ` Roman Zippel
  2006-06-08  8:05           ` clocksource john stultz
  0 siblings, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-07  0:41 UTC (permalink / raw)
  To: john stultz; +Cc: Andrew Morton, linux-kernel

Hi,

On Tue, 6 Jun 2006, john stultz wrote:

> > One "unified" version will only be worse. Keeping the new path separate 
> > from the old path will only make things clearer and more flexible.
> > Right now you have a mixture of old code, interpolator code and new 
> > timekeeping code, which makes it a big mess.
> 
> Eh? Are we looking at the same code? Right now there is *no* mixture of
> old code. The update_wall_time() function is the same for *all* arches,
> and jiffies is the common clocksource.

Why is this so important??? Please, John, you driving me crazy with this.
You can start with the jiffies clocksource everywhere else, why here?

> This allows us to use alternate
> clocksources to move to continuous timekeeping, while allowing the
> accumulation function to stay the same. I don't see what part of this
> you consider messy.

_Please_ look at the update functions in my patch and try to understand 
its design, read the design document. The core part is:

	while (cycle_offset >= cycle_update) {
		cycle_offset -= cycle_update;
		clocksource_update_tick();
	}
	clocksource_adjust(cycle_offset);

For tick based design I can easily change this to:

	clocksource_update_tick();
	clocksource_adjust(0);

I can also rather easily exchange parts of it with 32 bit based 
calculations (without losing resolution). I know how to optimize this, but 
with your version I can't do this.
John, I don't need a unified design, I want a _flexible_ design and your 
unified mess doesn't give me this. You're taking away any control over 
this, which would make my life easier from an arch perspective. As archs 
convert to the new timekeeping code they can easily switch to the new 
generic function or they can rather easily adapt it to their needs, you 
make this impossible.
Later we can still unify things, but without having the majority of archs 
converted, without having really the complete picture, we need foremost 
flexibility. Doing a preemptive unification only because we can is the 
worst thing we can do at this time.

> > With large clock offsets the lookahead doesn't work correctly, basically 
> > because it's already to late and it can cause overadjustment. Because of 
> > this I do an extra lookahead in clocksource_bigadjust().
> 
> Do you have a hard example for this with numbers? I don't mean to be a
> pain, but I don't see this right off.
> 
> With the current code in -mm I can run a test app that disables
> interrupts for 2 seconds at a time over and over and I'm still keeping
> synched w/ an NTP server within 30 microseconds.

You need a clock source which doesn't generate it's own interrupts, so 
interrupts and clock updates can run asynchron. The key part above is 
"large clock offsets". In my test program disable the extra lookahead and 
run it with large offsets.
This code gets only limited testing in -mm, it needs to run for weeks 
or months, which I don't expect from the average -mm kernel. This makes 
userspace simulations so damn important and if you don't do this, you're 
playing a very risky game with a kernel which is supposed to be stable.

> > I'd prefer you'd just take the update function from my patch, it's nicely 
> > optimized and I'll try to address any concern you have about it.
> 
> Even though as mentioned above I'm still in the dark on the need for it,
> I spent a few hours last trying to convert the make_ntp_adj() in -mm to
> use your bigadjust implementation. Unfortunately my attempts refuse to
> boot. I'm not sure if this is the same problem that was keeping your
> patchset from booting as well, or possibly just an implementation error
> on my part.

John, why don't you just take my function with as little modifications as 
possible? Please don't take it as offense, but as long as you only take 
small pieces of my code without understanding how it fits into the big 
picture, I'm not surprised about such problems and this will take another 
year before we get to something usable. In the meantime Andrew is 
threatening to merge this anyway, which I'm definitively wouldn't trust it 
with anything which needs a stable time source.
Please try to understand me, usually I can be quite patient and I know I 
have a hard time to make myself understandable at times, but I simply 
can't do this under such pressure...

> > For this I also I posted a userspace test program, so that I know how it 
> > behaves, do you have something similiar for yours?
> 
> At your prodding awhile back I wrote a userspace simulator, but you
> never commented on it.

It was very hard to get running at all and in the meantime you had updated 
patches and the whole thing didn't work anymore. Sorry, that I didn't 
comment on it more. Note that I have different test programs to test 
various aspects - the NTP part and the clock part. At that time you did 
still poke very deeply into the NTP guts, where now the clock parts are 
more important.
Anyway, I did all this testing already, why are you simply throwing this 
away?

> 1) You don't like the unified update_wall_time
> 2) Issue w/ the current make_ntp_adj and lost ticks.
> 3) NTP error may be limited to clock->shift resolution w/ the jiffies
> clocksource (other clocksources have finer resolution then NTP's).
> 
> 
> #1 I disagree with. I really think the unified approach is the way to
> go, but I do understand the need to be conservative. This has gotten a
> fair amount of testing in -mm with no issues reported. But its not too
> hard to re-add the old update_wall_time(), so I could be convinced to
> push the unification off w/ a second opinion. Again, this would be via
> an incremental patch.

It wouldn't be an incremental patch, if this gets merged as is, creating 
something actually flexible would come close to another rewrite with the 
update function as the key part. As we discussed previously there other 
parts I don't agree with, but these can be done in more incremental steps, 
but not what you've done with update_wall_time().

> So for #2 I'm working on trying to get your implementation functioning,
> but I still don't quite understand the reason. I think this can be
> worked out w/ an incremental patch.

Because it's not about lost ticks.

> I posted an incremental patch for #3 yesterday, but it needs more work,
> and changes in #2 could affect it, so I'm focusing on #2 at them moment.
> Even so, NTP's resolution is 0.2 picoseconds, and the jiffies
> clocksource keeps an error resolution of 4 picoseconds, so I'm not sure
> if that this is really a blocker.  Further if we do re-add the old
> update_wall_time for issue #1, this would keep the jiffies clocksource
> from being used commonly, so this wouldn't be an issue.

You can use the jiffies clocksource everywhere else, why do you have to 
start in the interrupt function? In general management code this is fine, 
but I don't see a compelling reason to start in the most sensitive part.

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-07  0:41         ` clocksource Roman Zippel
@ 2006-06-08  8:05           ` john stultz
  2006-06-15 11:40             ` clocksource Roman Zippel
  0 siblings, 1 reply; 166+ messages in thread
From: john stultz @ 2006-06-08  8:05 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel

On Wed, 2006-06-07 at 02:41 +0200, Roman Zippel wrote:
> On Tue, 6 Jun 2006, john stultz wrote:
> > > With large clock offsets the lookahead doesn't work correctly, basically 
> > > because it's already to late and it can cause overadjustment. Because of 
> > > this I do an extra lookahead in clocksource_bigadjust().
> > 
> > Do you have a hard example for this with numbers? I don't mean to be a
> > pain, but I don't see this right off.
> > 
> > With the current code in -mm I can run a test app that disables
> > interrupts for 2 seconds at a time over and over and I'm still keeping
> > synched w/ an NTP server within 30 microseconds.
> 
> You need a clock source which doesn't generate it's own interrupts, so 
> interrupts and clock updates can run asynchron. The key part above is 
> "large clock offsets". In my test program disable the extra lookahead and 
> run it with large offsets.

I'm not sure I'm following you here. Almost all clocksources on i386
(specifically, in the case above, I was using the apci_pm) don't
generate interrupts and run asynchronous from the timer interrupt
source. 

I did re-review your documentation, and while it does go over the mult
adjustment code in nice understandable terms, the "why" of this
additional look-ahead isn't quite obvious.

> This code gets only limited testing in -mm, it needs to run for weeks 
> or months, which I don't expect from the average -mm kernel. This makes 
> userspace simulations so damn important and if you don't do this, you're 
> playing a very risky game with a kernel which is supposed to be stable.

Agreed, simulation is nice. Thus, I've revived the old simulator which
builds using the existing code in -mm. Its a bit fast/dirty and isn't
exactly like your sim, but maybe you can take a look at it and send
patches to improve it?

You can find it at:
http://sr71.net/~jstultz/tod/simulator_C2.tar.bz2


I'm currently using it in testing my attempts to get your bigadjust code
working, so hopefully it will help there.

> > > For this I also I posted a userspace test program, so that I know how it 
> > > behaves, do you have something similiar for yours?
> > 
> > At your prodding awhile back I wrote a userspace simulator, but you
> > never commented on it.
> 
> It was very hard to get running at all and in the meantime you had updated 
> patches and the whole thing didn't work anymore. Sorry, that I didn't 
> comment on it more. 

Please let me know if you still have difficulty getting this new one
running.

thanks
-john


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-08  8:05           ` clocksource john stultz
@ 2006-06-15 11:40             ` Roman Zippel
  2006-06-16  3:21               ` clocksource john stultz
  0 siblings, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-15 11:40 UTC (permalink / raw)
  To: john stultz; +Cc: Andrew Morton, linux-kernel

Hi,

On Thu, 8 Jun 2006, john stultz wrote:

> > This code gets only limited testing in -mm, it needs to run for weeks 
> > or months, which I don't expect from the average -mm kernel. This makes 
> > userspace simulations so damn important and if you don't do this, you're 
> > playing a very risky game with a kernel which is supposed to be stable.
> 
> Agreed, simulation is nice. Thus, I've revived the old simulator which
> builds using the existing code in -mm. Its a bit fast/dirty and isn't
> exactly like your sim, but maybe you can take a look at it and send
> patches to improve it?
> 
> You can find it at:
> http://sr71.net/~jstultz/tod/simulator_C2.tar.bz2

At http://www.xs4all.nl/~zippel/ntp/simulator_C2+patches.tar.bz2 is my 
version where I added a number of patches (all p? patches) to get it into 
an acceptable state. You have a number of bugs which actually didn't let 
the clock oscillate that much but instead added random jitter (and in some 
cases a lot of it).

I disabled the lost interrupt simulation, so the effect of adjustments are 
better visible, the error should return to near zero after it. Look for 
the "ppm" prints and watch the time difference.
In the series file you can enable some debug patches (d?) to add extra 
prints or simulate large update delays to see the effect on the error 
difference.

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-15 11:40             ` clocksource Roman Zippel
@ 2006-06-16  3:21               ` john stultz
  2006-06-16  3:35                 ` clocksource john stultz
                                   ` (2 more replies)
  0 siblings, 3 replies; 166+ messages in thread
From: john stultz @ 2006-06-16  3:21 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel

On Thu, 2006-06-15 at 13:40 +0200, Roman Zippel wrote:
> On Thu, 8 Jun 2006, john stultz wrote:
> 
> > > This code gets only limited testing in -mm, it needs to run for weeks 
> > > or months, which I don't expect from the average -mm kernel. This makes 
> > > userspace simulations so damn important and if you don't do this, you're 
> > > playing a very risky game with a kernel which is supposed to be stable.
> > 
> > Agreed, simulation is nice. Thus, I've revived the old simulator which
> > builds using the existing code in -mm. Its a bit fast/dirty and isn't
> > exactly like your sim, but maybe you can take a look at it and send
> > patches to improve it?
> > 
> > You can find it at:
> > http://sr71.net/~jstultz/tod/simulator_C2.tar.bz2
> 
> At http://www.xs4all.nl/~zippel/ntp/simulator_C2+patches.tar.bz2 is my 
> version where I added a number of patches (all p? patches) to get it into 
> an acceptable state. You have a number of bugs which actually didn't let 
> the clock oscillate that much but instead added random jitter (and in some 
> cases a lot of it).

I've been working on the simulator as well, and you're right, it caught
a few problems. I appreciate your prodding me to get it running again.

My current version is here:
http://sr71.net/~jstultz/tod/gtod-sim_C2.1.tar.bz2


Some of the improvements :
o I've added random offsets so increment_simulator_time doesn't always
increment to a INTERVAL boundary. 

o Improved the random "tick dropping" (its a bad name, but it changes
the frequency at which update_wall_time is called) so you can specify
the frequency.

o Added a seed argument so the random results can be reproduced.

o Added the PPM randomization as suggested in your patch (I had to
implement it differently as it collided w/ the random offset code).



Just quickly so you don't have to read the README:

./todsim <drift> <seed> <droptick>

Where:
o drift is the ppm drift. if not specified or zero, it will be randomly
changed as the test runs.

o seed is seeds the random function. If not specified or zero time()
will be used.

o droptick is the frequency that ticks are taken. update_wall_time will
be called randomly w/ 1/<droptick> frequency. Thus if droptick is 1000,
we will on average only call update_wall_time once per 1000 ticks. If
not specified, it will be set to one.



> I disabled the lost interrupt simulation, so the effect of adjustments are 
> better visible, the error should return to near zero after it. Look for 
> the "ppm" prints and watch the time difference.
> In the series file you can enable some debug patches (d?) to add extra 
> prints or simulate large update delays to see the effect on the error 
> difference.

Very cool. I appreciate the small incremental patches. I've looked over
them and am trying to see which ones make sense in light of the
following info.

I've also been working on improving the adjustment algorithm. Paul
Mckenney enlightened me to the established concepts in control theory, I
started reading up on PID control (see:
http://en.wikipedia.org/wiki/PID_controller ). While I have understood
the basic concept, it was useful to read up on it. I've tried to rework
the adjustment code accordingly.

The method I came up with is really just P-D (proportional-derivative)
control, but that should be ok since the adjustments are all linear so I
don't think the integral control is necessary (control theorists can
pipe in here).


The basic algorithm is as follows:

	update_error =0;
	interval_cycs = 0;
	while (offset >= clock->interval_cycles) {
		/* accumulate one interval */
		remainder_snsecs += clock->interval_snsecs;
		...
		/* accumulate error between NTP and clock interval */
		update_error += current_tick_length(clock->shift);
		update_error -= clock->interval_snsecs;
		interval_cycs += clock->interval_cycles;

		...
	}
	/* add error accumulated since last interrupt */
	total_error += update_error;

	if (total_error > (s64)clock->interval_cycles
			|| total_error < -((s64)clock->interval_cycles)) {

		/* derivative control: fix the slope */
		freqadj = update_error/((s64)interval_cycs);

		/* proportional control: converge to zero */
		offadj = total_error/(s64)interval_cycs;

		/* limiter to avoid oscillation */
		if (offadj > MAXOFFADJ)
			offadj = MAXOFFADJ;
		else if(offadj < -MAXOFFADJ)
			offadj = -MAXOFFADJ;

		/* make the adjustment */	
		multadj = freqadj + offadj;
		clock->mult += multadj;
		clock->interval_snsecs = clock->mult * clock->interval_cycles;
		remainder_snsecs -= multadj * (s64)offset;
		total_error += multadj * offset;
	}

Then using the same method in your bigadjust function, I can approximate
the divides and the rest is very similar to your suggestions.

The full patch for -mm is attached below (Andrew, please don't take this
just yet, I'm doing more testing and I'd like Roman's feedback first).

After applying this to -mm and generating the simulator from the result,
I've found this to be *very* robust (it keeps proportionally close with
the frequency of lost ticks, and doesn't have issue w/ ppm changes).
Please take a look at it and let me know what you think.

thanks
-john


diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 4bc9428..2993521 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -181,46 +181,57 @@ static inline void clocksource_calculate
  *
  * @error:	Error value (unsigned)
  * @unit:	Adjustment unit
+ * @max:	Limit on adjustment unit
  *
  * For a given error value, this function takes the adjustment unit
  * and uses binary approximation to return a power of two adjustment value.
  *
- * This function is only for use by the the make_ntp_adj() function
- * and you must hold a write on the xtime_lock when calling.
  */
-static inline int error_aproximation(u64 error, u64 unit)
+static int error_aproximation(u64 error, u64 unit, int max)
 {
-	static int saved_adj = 0;
-	u64 adjusted_unit = unit << saved_adj;
-
-	if (error > (adjusted_unit * 2)) {
-		/* large error, so increment the adjustment factor */
-		saved_adj++;
-	} else if (error > adjusted_unit) {
-		/* just right, don't touch it */
-	} else if (saved_adj) {
-		/* small error, so drop the adjustment factor */
-		saved_adj--;
-		return 0;
+	int adj = 0;
+	while (1) {
+		error >>= 1;
+		if (error <= unit)
+			return adj;
+		if (!max || adj < max)
+			adj++;
 	}
-
-	return saved_adj;
 }
 
 
+#define MAXOFFADJ 4 /* vary max oscillation vs convergance speed */
+
 /**
- * make_ntp_adj - Adjusts the specified clocksource for a given error
+ * clocksource_adj - Adjusts the specified clocksource for a given error
  *
  * @clock:		Pointer to clock to be adjusted
- * @cycles_delta:	Current unacounted cycle delta
- * @error:		Pointer to current error value
+ * @cycles_delta:	Current unaccumulated cycle delta
+ * @total_error:	Pointer to current total error value
+ * @interval_error:	Error accumulated since the last sample
+ * @interval_cycs:      Accumulated cycles since the last sample
  *
  * Returns clock shifted nanosecond adjustment to be applied against
  * the accumulated time value (ie: xtime).
  *
- * If the error value is large enough, this function calulates the
- * (power of two) adjustment value, and adjusts the clock's mult and
- * interval_snsecs values accordingly.
+ * If the error value is large enough, this function aproximates
+ * the frequency and offset adjustment, and applies it to the 
+ * clock's mult and interval_snsecs values accordingly.
+ *
+ * This method of adjustment is similar to PID control.
+ * See http://en.wikipedia.org/wiki/PID_controller for more info.
+ * However we are really just doing P-D control, as since are adjustments
+ * are liniar, there is no need for the integral component of PID.
+ * The P-D control is done in two steps:
+ *    1) Proportonal control: (offset adjustment)
+ *         This makes adjustment based on the current error from NTP.
+ *         This adjustment is limited to avoid oscillation from missed
+ *         ticks.
+ *    2) Derivative control:
+ *         This makes adjustments based on the error accumulated in 
+ *         the last period (in otherwords, the different in error from 
+ *         the last period). This provides a frequency correction so no
+ *         additional error should be accumulated in the next period.
  *
  * However, since there may be some unaccumulated cycles, to avoid
  * time inconsistencies we must adjust the accumulation value
@@ -238,35 +249,89 @@ static inline int error_aproximation(u64
  *
  * Where mult_delta is the adjustment value made to mult
  *
+ * An aditional complication: Since we are adjusting the base value,
+ * we must also adjust the total_error value, as it is the distance
+ * of the base time from the NTP time. Thus we adjust the total_error
+ * by the negative amount we adjusted the base.
  */
-static inline s64 make_ntp_adj(struct clocksource *clock,
-				cycles_t cycles_delta, s64* error)
+static inline s64 clocksource_adj(struct clocksource *clock,
+				cycle_t cycles_delta, s64* total_error,
+				s64 interval_error, s64 interval_cycs)
 {
 	s64 ret = 0;
-	if (*error  > ((s64)clock->interval_cycles+1)/2) {
-		/* calculate adjustment value */
-		int adjustment = error_aproximation(*error,
-						clock->interval_cycles);
-		/* adjust clock */
-		clock->mult += 1 << adjustment;
-		clock->interval_snsecs += clock->interval_cycles << adjustment;
-
-		/* adjust the base and error for the adjustment */
-		ret =  -(cycles_delta << adjustment);
-		*error -= clock->interval_cycles << adjustment;
-		/* XXX adj error for cycle_delta offset? */
-	} else if ((-(*error))  > ((s64)clock->interval_cycles+1)/2) {
-		/* calculate adjustment value */
-		int adjustment = error_aproximation(-(*error),
-						clock->interval_cycles);
-		/* adjust clock */
-		clock->mult -= 1 << adjustment;
-		clock->interval_snsecs -= clock->interval_cycles << adjustment;
-
-		/* adjust the base and error for the adjustment */
-		ret =  cycles_delta << adjustment;
-		*error += clock->interval_cycles << adjustment;
-		/* XXX adj error for cycle_delta offset? */
+	s64 error = *total_error;
+		
+	if ((error > (s64)clock->interval_cycles)
+		||(error < -((s64)clock->interval_cycles)) ) {
+
+		int adj, multadj = 0;
+		s64 offset_update = 0, snsec_update = 0;
+
+		/* First do the frequency adjustment:
+		 *   The idea here is to look at the error 
+		 *   accumulated since the last call to 
+		 *   update_wall_time to determine the 
+		 *   frequency adjustment needed so no new
+		 *   error will be incurred in the next
+		 *   interval.
+		 *
+		 *   This is basically derivative control
+		 *   using the PID terminology (we're calculating
+		 *   the derivative of the slope and correcting it).
+		 *
+		 *   The math is basically:
+		 *      multadj = interval_error/interval_cycles
+		 *   Which we fudge using binary approximation.
+		 */
+		if(interval_error >= 0) {
+			adj = error_aproximation(interval_error,
+						 interval_cycs, 0);
+			multadj += 1 << adj;
+			snsec_update += clock->interval_cycles << adj;
+			offset_update += cycles_delta << adj;
+		} else {
+			adj = error_aproximation(-interval_error, 
+						interval_cycs, 0);
+			multadj -= 1 << adj;
+			snsec_update -= clock->interval_cycles << adj;
+			offset_update -= cycles_delta << adj;
+		}
+		/* Now do the offset adjustment:
+		 *   Now that the frequncy is fixed, we
+		 *   want to look at the total error accumulated
+		 *   to move us back in sync using the same method.
+		 *   However, we must be careful as if we make too 
+		 *   sudden an adjustment we might overshoot. So we 
+		 *   limit the amount of change to spread the 
+		 *   adjustment (using MAXOFFADJ) over a longer 
+		 *   period of time.
+		 *
+		 *   This is basically proportional control
+		 *   using the PID terminology.
+		 *
+		 *   We use interval_cycs here as the divisor, which
+		 *   hopes that the next sample will be similar in 
+		 *   distance from the last.
+		 */
+		if(error >= 0) {
+			adj = error_aproximation(error, 
+						interval_cycs, MAXOFFADJ);
+			multadj += 1<<adj;
+			snsec_update += clock->interval_cycles <<adj;
+			offset_update += cycles_delta << adj;
+		} else {
+			adj = error_aproximation(-error,
+						interval_cycs, MAXOFFADJ);
+			multadj -= 1<<adj;
+			snsec_update -= clock->interval_cycles <<adj;
+			offset_update -= cycles_delta << adj;
+		}
+
+		clock->mult += multadj;
+		clock->interval_snsecs += snsec_update;;		
+		ret -= offset_update;
+		*total_error += offset_update;
+
 	}
 	return ret;
 }
diff --git a/kernel/timer.c b/kernel/timer.c
index 0569d40..1345759 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1025,9 +1025,9 @@ device_initcall(timekeeping_init_device)
  */
 static void update_wall_time(void)
 {
-	static s64 remainder_snsecs, error;
-	s64 snsecs_per_sec;
-	cycle_t now, offset;
+	static s64 remainder_snsecs, total_error;
+	s64 snsecs_per_sec, interval_error = 0;
+	cycle_t now, offset, interval_cycs = 0; 
 
 	snsecs_per_sec = (s64)NSEC_PER_SEC << clock->shift;
 	remainder_snsecs += (s64)xtime.tv_nsec << clock->shift;
@@ -1038,7 +1038,7 @@ static void update_wall_time(void)
 	/* normally this loop will run just once, however in the
 	 * case of lost or late ticks, it will accumulate correctly.
 	 */
-	while (offset > clock->interval_cycles) {
+	while (offset >= clock->interval_cycles) {
 		/* get the ntp interval in clock shifted nanoseconds */
 		s64 ntp_snsecs	= current_tick_length(clock->shift);
 
@@ -1054,10 +1054,8 @@ static void update_wall_time(void)
 		update_ntp_one_tick();
 
 		/* accumulate error between NTP and clock interval */
-		error += (ntp_snsecs - (s64)clock->interval_snsecs);
-
-		/* correct the clock when NTP error is too big */
-		remainder_snsecs += make_ntp_adj(clock, offset, &error);
+		interval_error += (ntp_snsecs - (s64)clock->interval_snsecs);
+		interval_cycs += clock->interval_cycles;
 
 		if (remainder_snsecs >= snsecs_per_sec) {
 			remainder_snsecs -= snsecs_per_sec;
@@ -1065,13 +1063,20 @@ static void update_wall_time(void)
 			second_overflow();
 		}
 	}
+
+	total_error += interval_error;
+
+	/* correct the clock when NTP error is too big */
+	remainder_snsecs += clocksource_adj(clock, offset, &total_error,
+					interval_error, interval_cycs);
+
 	/* store full nanoseconds into xtime */
 	xtime.tv_nsec = remainder_snsecs >> clock->shift;
 	remainder_snsecs -= (s64)xtime.tv_nsec << clock->shift;
 
 	/* check to see if there is a new clocksource to use */
 	if (change_clocksource()) {
-		error = 0;
+		total_error = 0;
 		remainder_snsecs = 0;
 		clocksource_calculate_interval(clock, tick_nsec);
 	}



^ permalink raw reply related	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-16  3:21               ` clocksource john stultz
@ 2006-06-16  3:35                 ` john stultz
  2006-06-16 15:33                 ` clocksource Roman Zippel
  2006-06-17 17:04                 ` clocksource Andrew Morton
  2 siblings, 0 replies; 166+ messages in thread
From: john stultz @ 2006-06-16  3:35 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel

On Thu, 2006-06-15 at 20:21 -0700, john stultz wrote:
> Very cool. I appreciate the small incremental patches. I've looked over
> them and am trying to see which ones make sense in light of the
> following info.

Ah, forgot to mention that the patch included the changes from your p1
fix (some of which I had already made, but items like the
clocksource_adj() name change is a clear improvement).

And just to be sure you know I'm not brushing your patches off: I'm
looking at patches p2, p4, p5 and p7 as the next steps, and after I get
some feedback on the patch I just sent we can discuss p3,p6, and p8.

Sound good?

thanks
-john

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-16  3:21               ` clocksource john stultz
  2006-06-16  3:35                 ` clocksource john stultz
@ 2006-06-16 15:33                 ` Roman Zippel
  2006-06-16 18:48                   ` clocksource john stultz
  2006-06-17 17:04                 ` clocksource Andrew Morton
  2 siblings, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-16 15:33 UTC (permalink / raw)
  To: john stultz; +Cc: Andrew Morton, linux-kernel

Hi,

On Thu, 15 Jun 2006, john stultz wrote:

> I've also been working on improving the adjustment algorithm. Paul
> Mckenney enlightened me to the established concepts in control theory, I
> started reading up on PID control (see:
> http://en.wikipedia.org/wiki/PID_controller ). While I have understood
> the basic concept, it was useful to read up on it. I've tried to rework
> the adjustment code accordingly.
> 
> The method I came up with is really just P-D (proportional-derivative)
> control, but that should be ok since the adjustments are all linear so I
> don't think the integral control is necessary (control theorists can
> pipe in here).

This makes it more complex than necessary. AFAICT this controller 
calculates the adjustment solely based on the current error, but we have 
more information than this, which make the current error rather 
uninteresting.
We know the clock frequency and the NTP frequency so we can easily 
precalculate, how the error will look like at the next few ticks. Based on 
this we can calculate how we have to adjust the clock frequency to reduce 
the error. Overshooting is also not a real problem as long as the absolute 
error gets smaller.

An important point about the last patch is not just robustness but also 
speed, it tries to keep the fast path small, which is basically:

	interval = clock->cycle_interval;
	if (error > interval / 2) {
		adj = 1;
		if (unlikely(error > interval * 2)) {
			...
		}
	} else if (error < -interval / 2) {
		adj = -1
		interval = -interval;
		offset = -offset;
		if (unlikely(error < interval * 2)) {
			...
		}
	} else
		return;

	clock->mult += adj;
	clock->xtime_interval += interval;
	clock->xtime_nsec -= offset;
	clock->error -= interval - offset;

You'll need a very good reason to do anything more than this for small 
errors and I would suggest you start from something like this, as this is 
the very core of the error adjustment.

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-16 15:33                 ` clocksource Roman Zippel
@ 2006-06-16 18:48                   ` john stultz
  2006-06-17 19:45                     ` clocksource Roman Zippel
  0 siblings, 1 reply; 166+ messages in thread
From: john stultz @ 2006-06-16 18:48 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel

On Fri, 2006-06-16 at 17:33 +0200, Roman Zippel wrote:
> On Thu, 15 Jun 2006, john stultz wrote:
> > I've also been working on improving the adjustment algorithm. Paul
> > Mckenney enlightened me to the established concepts in control theory, I
> > started reading up on PID control (see:
> > http://en.wikipedia.org/wiki/PID_controller ). While I have understood
> > the basic concept, it was useful to read up on it. I've tried to rework
> > the adjustment code accordingly.
> > 
> > The method I came up with is really just P-D (proportional-derivative)
> > control, but that should be ok since the adjustments are all linear so I
> > don't think the integral control is necessary (control theorists can
> > pipe in here).
> 
> This makes it more complex than necessary. AFAICT this controller 
> calculates the adjustment solely based on the current error, but we have 
> more information than this, which make the current error rather 
> uninteresting.

Indeed it is the current error, but its also taking the change in error
into account as well.

> We know the clock frequency and the NTP frequency so we can easily 
> precalculate, how the error will look like at the next few ticks. Based on 
> this we can calculate how we have to adjust the clock frequency to reduce 
> the error. Overshooting is also not a real problem as long as the absolute 
> error gets smaller.

I'm not sure I agree here. Using your patch series, if I re-enable the
code that drops calls to update_wall_time (simulating lost ticks) the
clock does not appear very stable. Robustness and features like
dynamic/no_idle_hz are going to require that we can handle taking
something close to only one tick per second, so overshoot is a big
concern in my mind. Maybe I'm misunderstanding you?

However, I need to forward port your patchset to the new simulator to
really do a fair comparison as I know there were some issues w/ the
simulator that I addressed in order to get the new features working. If
you have already done this, let me know.

> An important point about the last patch is not just robustness but also 
> speed, it tries to keep the fast path small, which is basically:
> 
> 	interval = clock->cycle_interval;
> 	if (error > interval / 2) {
> 		adj = 1;
> 		if (unlikely(error > interval * 2)) {
> 			...
> 		}
> 	} else if (error < -interval / 2) {
> 		adj = -1
> 		interval = -interval;
> 		offset = -offset;
> 		if (unlikely(error < interval * 2)) {
> 			...
> 		}
> 	} else
> 		return;
> 
> 	clock->mult += adj;
> 	clock->xtime_interval += interval;
> 	clock->xtime_nsec -= offset;
> 	clock->error -= interval - offset;
> 
> You'll need a very good reason to do anything more than this for small 
> errors and I would suggest you start from something like this, as this is 
> the very core of the error adjustment.

I agree that the patch I sent could use some optimizations, and likely
even some tweaking (supposedly I can get rid of the proportional
adjustment limiter by using a gain value, but I need to test this a bit)
to improve it further.  

Now trying to compare it to your code:

Looking at your description of the code above from your documentation
email:
1)	mult_adj = error / cycle_update;
2)	mult += mult_adj;
3)	xtime -= cycle_offset * mult_adj;
4)	error -= (cycle_update - cycle_offset) * mult_adj;

Lines 1 & 2 calculates the proportional error adjustment for the error
at the next interval.

Line 3 is also well understood, as it corrects the base for the new
adjustment value if there is an offset value.

So I see the proportional adjustment, but I don't see how the derivative
is included. I suspect the density of the error adjustment bit is what
makes this so opaque to me. Breaking line 4 apart for a moment:

4a) error += cycle_offset * multadj;
4b) error -= cycle_update * muladj

Line 4a is also clear, since if the base had been changed in line 3, the
error between the base and ntp has changed as well, so it must be
changed by the negative amount the base was changed to stay in sync.

Line 4b is a bit foggy. Just assuming cycle_offset is zero, we can
ignore line 3 and 4a. So we're reducing the error by the change in
length of the next interval. I see how this would in effect dampen the
next adjustment, but I'm not sure how that then maps the error value to
the actual distance from ntp_time.

Abstractly I understand how looking at the next tick is good for when
the NTP adjustment value changes, but I'm not sure I see how looking
ahead makes the clock more stable when the NTP adjustment isn't
changing.

Is there a way you can map the math above to the terms of PID control
(or maybe some other established concept that I can dig deeper on?).

thanks
-john

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-16 18:48                   ` clocksource john stultz
@ 2006-06-17 19:45                     ` Roman Zippel
  0 siblings, 0 replies; 166+ messages in thread
From: Roman Zippel @ 2006-06-17 19:45 UTC (permalink / raw)
  To: john stultz; +Cc: Andrew Morton, linux-kernel

Hi,

On Fri, 16 Jun 2006, john stultz wrote:

> > This makes it more complex than necessary. AFAICT this controller 
> > calculates the adjustment solely based on the current error, but we have 
> > more information than this, which make the current error rather 
> > uninteresting.
> 
> Indeed it is the current error, but its also taking the change in error
> into account as well.

Which is not really important anymore, as soon as you already _know_ the 
future error.

> > We know the clock frequency and the NTP frequency so we can easily 
> > precalculate, how the error will look like at the next few ticks. Based on 
> > this we can calculate how we have to adjust the clock frequency to reduce 
> > the error. Overshooting is also not a real problem as long as the absolute 
> > error gets smaller.
> 
> I'm not sure I agree here. Using your patch series, if I re-enable the
> code that drops calls to update_wall_time (simulating lost ticks) the
> clock does not appear very stable.

That's because the tick length is not applied in the same way in 
sim-main.c and time.c, if the drop happens around a call to 
second_overflow(), one gets the old value and the other gets the new 
value. If you prevent the drop around a ppm change, it will work just 
fine, it's a bug in the simulator.

> Robustness and features like
> dynamic/no_idle_hz are going to require that we can handle taking
> something close to only one tick per second, so overshoot is a big
> concern in my mind. Maybe I'm misunderstanding you?

Overshooting is not your real problem in this case, as large offsets and 
lost interrupts are less a problem. You're making it way too complex...

> However, I need to forward port your patchset to the new simulator to
> really do a fair comparison as I know there were some issues w/ the
> simulator that I addressed in order to get the new features working. If
> you have already done this, let me know.

The new simulator is far too complex and I don't want to spend the time 
verifying it does a really correct simulation, I don't think it's really a 
good idea to make the simulator more complex than the subject, otherwise 
you need a simulator to test the simulator...
You are also using rather smallish adjustments, changes of upto 1ms/s are 
more realistic (NTP allows a MAXPHASE offset which is spread over a period 
of time). Your simulator is also fixed to 1GHz and 1000Hz, which are 
rather convenient values.

> Line 4b is a bit foggy. Just assuming cycle_offset is zero, we can
> ignore line 3 and 4a. So we're reducing the error by the change in
> length of the next interval. I see how this would in effect dampen the
> next adjustment, but I'm not sure how that then maps the error value to
> the actual distance from ntp_time.

As I said it looks ahead to the next error, if we adjust the multiplier at 
the current tick, it also changes the error at the next tick.

> Abstractly I understand how looking at the next tick is good for when
> the NTP adjustment value changes, but I'm not sure I see how looking
> ahead makes the clock more stable when the NTP adjustment isn't
> changing.

Huh? If the NTP adjustment isn't changing, clock and NTP are hopefully in 
sync and there is no (significant) error to adjust.

> Is there a way you can map the math above to the terms of PID control

Please forget this one, it doesn't really apply here. We know more about 
the model, which allows for simpler calculations. You have to throw away 
information to fit it into the PID model.

> (or maybe some other established concept that I can dig deeper on?).

I guess it's basically a FLL.

Below is another patch, which better takes the update delays into account.

bye, Roman



take the offset better into account by applying the current adjustment
to the error and anticipate a possible large adjustment at the next
update due to a large offset.
---
 time.c |   11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Index: simulator_C2/time.c
===================================================================
--- simulator_C2.orig/time.c
+++ simulator_C2/time.c
@@ -227,7 +227,7 @@ device_initcall(timekeeping_init_device)
  * If the error is already larger, we look ahead another tick,
  * to compensate for late or lost adjustments.
  */
-static __always_inline int clocksource_bigadjust(int sign, s64 error, s64 interval)
+static __always_inline int clocksource_bigadjust(int sign, s64 error, s64 interval, s64 offset)
 {
 	int adj = 0;
 
@@ -236,8 +236,12 @@ static __always_inline int clocksource_b
 
 	while (1) {
 		error >>= 1;
-		if (sign > 0 ? error <= interval : error >= interval)
+		if (sign > 0 ? error <= interval : error >= interval) {
+			error = (error << 1) - interval + offset;
+			if (sign > 0 ? error > interval : error < interval)
+				adj++;
 			return adj;
+		}
 		adj++;
 	}
 }
@@ -246,7 +250,8 @@ static __always_inline int clocksource_b
 	int adj = sign;							\
 	error >>= 2;							\
 	if (unlikely(sign > 0 ? error > interval : error < interval)) {	\
-		adj = clocksource_bigadjust(sign, error, interval);	\
+		adj = clocksource_bigadjust(sign, error,		\
+					    interval, offset);		\
 		interval <<= adj;					\
 		offset <<= adj;						\
 		adj = sign << adj;					\

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: clocksource
  2006-06-16  3:21               ` clocksource john stultz
  2006-06-16  3:35                 ` clocksource john stultz
  2006-06-16 15:33                 ` clocksource Roman Zippel
@ 2006-06-17 17:04                 ` Andrew Morton
  2 siblings, 0 replies; 166+ messages in thread
From: Andrew Morton @ 2006-06-17 17:04 UTC (permalink / raw)
  To: john stultz; +Cc: zippel, linux-kernel

On Thu, 15 Jun 2006 20:21:24 -0700
john stultz <johnstul@us.ibm.com> wrote:

> The method I came up with is really just P-D (proportional-derivative)
> control, but that should be ok since the adjustments are all linear so I
> don't think the integral control is necessary (control theorists can
> pipe in here).

Boy, that takes me back.  If you don't feed back the integral you'll end up
with an output which has a steady-state offset error against the control
point (unless the forward gain is infinite, and it never is).  I don't know
if that matters here, but it cannot be good.

If you feed back the integral of the error then it introduces the
possibility of instability.  Probably in this application you can just
overdamp the thing to avoid that.  It'll make it slower to respond to
changes in the setpoint.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: utsname/hostname
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (5 preceding siblings ...)
  2006-06-04 23:50 ` clocksource Roman Zippel
@ 2006-06-05  0:02 ` Randy.Dunlap
  2006-06-05  1:06   ` utsname/hostname Andrew Morton
       [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn>
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 166+ messages in thread
From: Randy.Dunlap @ 2006-06-05  0:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, 4 Jun 2006 13:50:11 -0700 Andrew Morton wrote:

> 
> It's time to take a look at the -mm queue for 2.6.18.
> 
> 
> When replying to this email pleeeeeeze rewrite the Subject: to something
> appropriate so we do not all go mad.  Thanks.
> 
> 
> proc-sysctl-add-_proc_do_string-helper.patch
> namespaces-add-nsproxy.patch
> namespaces-add-nsproxy-dont-include-compileh.patch
> namespaces-incorporate-fs-namespace-into-nsproxy.patch
> namespaces-utsname-introduce-temporary-helpers.patch
> namespaces-utsname-switch-to-using-uts-namespaces.patch
> namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch
> namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch
> namespaces-utsname-use-init_utsname-when-appropriate.patch
> namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch
> namespaces-utsname-implement-utsname-namespaces.patch
> namespaces-utsname-implement-utsname-namespaces-export.patch
> namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch
> namespaces-utsname-sysctl-hack.patch
> namespaces-utsname-sysctl-hack-cleanup.patch
> namespaces-utsname-sysctl-hack-cleanup-2.patch
> namespaces-utsname-sysctl-hack-cleanup-2-fix.patch
> namespaces-utsname-remove-system_utsname.patch
> namespaces-utsname-implement-clone_newuts-flag.patch
> uts-copy-nsproxy-only-when-needed.patch
> # needed if git-klibc isn't there:
> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch
> #namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch
> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch
> 
>  utsname virtualisation.  This doesn't seem very pointful as a standalone
>  thing.  That's a general problem with infrastructural work for a very
>  large new feature.
> 
>  So probably I'll continue to babysit these patches, unless someone can
>  identify a decent reason why mainline needs this work.

Not a strong argument for mainline, but I have a patch to make
<hostname> larger (up to 255 bytes, per POSIX).
  http://www.xenotime.net/linux/patches/hostname-2617-rc5b.patch

I can either update my hostname patch against mm/utsname.. or not.
But I don't really want to see some/any patch blocked due to a patch
in -mm being borderline "pointful," so how do we deal with this?

>  I don't want to carry an ever-growing stream of OS-virtualisation
>  groundwork patches for ever and ever so if we're going to do this thing...
>  faster, please.


---
~Randy

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: utsname/hostname
  2006-06-05  0:02 ` utsname/hostname Randy.Dunlap
@ 2006-06-05  1:06   ` Andrew Morton
  2006-06-05  3:10     ` utsname/hostname Randy.Dunlap
  0 siblings, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-05  1:06 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel

On Sun, 4 Jun 2006 17:02:18 -0700
"Randy.Dunlap" <rdunlap@xenotime.net> wrote:

> >  utsname virtualisation.  This doesn't seem very pointful as a standalone
> >  thing.  That's a general problem with infrastructural work for a very
> >  large new feature.
> > 
> >  So probably I'll continue to babysit these patches, unless someone can
> >  identify a decent reason why mainline needs this work.
> 
> Not a strong argument for mainline, but I have a patch to make
> <hostname> larger (up to 255 bytes, per POSIX).
>   http://www.xenotime.net/linux/patches/hostname-2617-rc5b.patch

My immediate reaction to that was to tell posix to go take a hike.  I mean,
sheesh.

> I can either update my hostname patch against mm/utsname.. or not.
> But I don't really want to see some/any patch blocked due to a patch
> in -mm being borderline "pointful," so how do we deal with this?

Well first we need to work out if there's any vague reason why we need to
mucky up our kernel by implementing this dopey spec.  If there is such a
reason then I guess I drop all the ustname patches and ask that they be
redone.  They're a bit straggly and a refactoring/rechanngelogging wouldn't
hurt.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: utsname/hostname
  2006-06-05  1:06   ` utsname/hostname Andrew Morton
@ 2006-06-05  3:10     ` Randy.Dunlap
  0 siblings, 0 replies; 166+ messages in thread
From: Randy.Dunlap @ 2006-06-05  3:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, 4 Jun 2006 18:06:18 -0700 Andrew Morton wrote:

> On Sun, 4 Jun 2006 17:02:18 -0700
> "Randy.Dunlap" <rdunlap@xenotime.net> wrote:
> 
> > >  utsname virtualisation.  This doesn't seem very pointful as a standalone
> > >  thing.  That's a general problem with infrastructural work for a very
> > >  large new feature.
> > > 
> > >  So probably I'll continue to babysit these patches, unless someone can
> > >  identify a decent reason why mainline needs this work.
> > 
> > Not a strong argument for mainline, but I have a patch to make
> > <hostname> larger (up to 255 bytes, per POSIX).
> >   http://www.xenotime.net/linux/patches/hostname-2617-rc5b.patch
> 
> My immediate reaction to that was to tell posix to go take a hike.  I mean,
> sheesh.

well thanks for finally replying then.
That's my reaction to some other patches (in -mm) as well
(not that it matters).

> > I can either update my hostname patch against mm/utsname.. or not.
> > But I don't really want to see some/any patch blocked due to a patch
> > in -mm being borderline "pointful," so how do we deal with this?
> 
> Well first we need to work out if there's any vague reason why we need to
> mucky up our kernel by implementing this dopey spec.  If there is such a
> reason then I guess I drop all the ustname patches and ask that they be
> redone.  They're a bit straggly and a refactoring/rechanngelogging wouldn't
> hurt.

Fixing the changelog is easy.  What refactoring do you mean?

---
~Randy

^ permalink raw reply	[flat|nested] 166+ messages in thread

[parent not found: <20060605002807.GA4919@mail.ustc.edu.cn>]

* readahead benchmark
       [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn>
@ 2006-06-05  0:28   ` Fengguang Wu
  2006-06-05  1:02     ` Andrew Morton
  0 siblings, 1 reply; 166+ messages in thread
From: Fengguang Wu @ 2006-06-05  0:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> readahead-kconfig-options.patch
[...] 
>  It's early days yet - needs heaps more performance testing.  The results
>  from "Linux Portal" <linportal@gmail.com> were discouraging.

I found this mail from the lkml archive, did you happen to have more
results?

------
Date:	Mon, 29 May 2006 17:22:50 +0200
From:	"Linux Portal" <linportal@gmail.com>
To:	linux-kernel@vger.kernel.org
Subject: The adaptive readahead patch benchmark

There is an interesting (although simple) benchmark of Wu's adaptive
readahead patchset (v12) together with graphs here:

  http://linux.inet.hr/adaptive_readahead_benchmark.html

In that simple test it definitely looks promising (3x speedup).

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: readahead benchmark
  2006-06-05  0:28   ` readahead benchmark Fengguang Wu
@ 2006-06-05  1:02     ` Andrew Morton
  0 siblings, 0 replies; 166+ messages in thread
From: Andrew Morton @ 2006-06-05  1:02 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: linux-kernel

On Mon, 5 Jun 2006 08:28:07 +0800
Fengguang Wu <fengguang.wu@gmail.com> wrote:

> On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> > readahead-kconfig-options.patch
> [...] 
> >  It's early days yet - needs heaps more performance testing.  The results
> >  from "Linux Portal" <linportal@gmail.com> were discouraging.
> 
> I found this mail from the lkml archive, did you happen to have more
> results?
> 

Sorry, I had the wrong tester.  Voluspa <lista1@comhem.se>: "Conclusion: On
_this_ machine, with _these_ operations, Adaptive Readahead in its current
incarnation and default settings is a _loss_."

> 
> There is an interesting (although simple) benchmark of Wu's adaptive
> readahead patchset (v12) together with graphs here:
> 
>   http://linux.inet.hr/adaptive_readahead_benchmark.html
> 
> In that simple test it definitely looks promising (3x speedup).

That's postgreql again.

We know there's a problem at present with postgresql.  Has anyone tried to
fix it, without going and rewriting everything?

^ permalink raw reply	[flat|nested] 166+ messages in thread

* new SCSI drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (7 preceding siblings ...)
       [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn>
@ 2006-06-05  0:32 ` Jeff Garzik
       [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn>
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 166+ messages in thread
From: Jeff Garzik @ 2006-06-05  0:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-scsi

On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> areca-raid-linux-scsi-driver.patch

>   I'm going to start sending the Areca driver to James, too.  The vendor
>   has worked hard and the hardware is becoming more important - let's help
>   them get it in.

The driver gets my ACK.

Also, I have the Promise 'stex' (previously 'shasta') SCSI RAID driver
in jgarzik/misc-2.6.git#stex that wants merging.

It's been sent to linux-scsi and linux-kernel several times, but
never seemed to make it into a SCSI tree.  I kept it alive in #stex,
and AFAICS it's been ready to merge for a while now.

I'll send it to linux-scsi one more time, sometime this week.

	Jeff

^ permalink raw reply	[flat|nested] 166+ messages in thread

[parent not found: <20060605010501.GA4931@mail.ustc.edu.cn>]

* statistics infrastructure
       [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn>
@ 2006-06-05  1:05   ` Fengguang Wu
  2006-06-05 16:30   ` Greg KH
  1 sibling, 0 replies; 166+ messages in thread
From: Fengguang Wu @ 2006-06-05  1:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Martin Peschke

On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> statistics-infrastructure.patch
 
>  Another tough one.  It offers generic intrastructure for non-task-related
>  instrumentation and it would really be good if someone who has an interest
>  in this for something other than the zfcp driver could stand up and say
>  "this works for us".

I'm having a try of it. Looks good for my case, except some fixable
issues/bugs. Here is a sample session for querying the readahead statistics:

root ~# echo state=on > /debug/statistics/readahead/definition
root ~# cat /debug/statistics/readahead/definition
name=io_block state=on units=/ type=counter_inc data=[  108.272356] started=[  218.797000] stopped=[  201.533118]
name=read_random state=on units=pages/requests type=utilisation data=[  108.272459] started=[  218.797004] stopped=[  201.533146]
name=readahead-fadvise state=on units=pages/requests type=utilisation data=[  108.272472] started=[  218.797005] stopped=[  201.533149]
name=readahead-stock state=on units=pages/requests type=utilisation data=[  108.272476] started=[  218.797005] stopped=[  201.533152]
name=readaround-mmap state=on units=pages/requests type=utilisation data=[  108.272479] started=[  218.797006] stopped=[  201.533155]
name=readahead-mmap state=on units=pages/requests type=utilisation data=[  108.272482] started=[  218.797006] stopped=[  201.533158]
name=readahead-initial state=on units=pages/requests type=utilisation data=[  108.272485] started=[  218.797007] stopped=[  201.533161]
name=readahead-state state=on units=pages/requests type=utilisation data=[  108.272488] started=[  218.797007] stopped=[  201.533164]
name=readahead-context state=on units=pages/requests type=utilisation data=[  108.272491] started=[  218.797008] stopped=[  201.533166]
name=readahead-contexta state=on units=pages/requests type=utilisation data=[  108.272494] started=[  218.797008] stopped=[  201.533169]
name=readahead-backward state=on units=pages/requests type=utilisation data=[  108.272497] started=[  218.797009] stopped=[  201.533172]
name=readahead-onthrash state=on units=pages/requests type=utilisation data=[  108.272500] started=[  218.797009] stopped=[  201.533175]
name=readahead-onseek state=on units=pages/requests type=utilisation data=[  108.272503] started=[  218.797010] stopped=[  201.533178]
name=rescue state=on units=pages/chunks type=utilisation data=[  108.272506] started=[  218.797011] stopped=[  201.533181]
name=size_drop state=on units=from-pages/delta-pages type=counter_inc data=[  108.272509] started=[  218.797011] stopped=[  201.533184]
root ~# cat /debug/statistics/readahead/data
io_block 40
read_random 0 0 0.000 0
readahead-fadvise 0 0 0.000 0
readahead-stock 0 0 0.000 0
readaround-mmap 7 1 28.286 88
readahead-mmap 0 0 0.000 0
readahead-initial 2 5 5.000 5
readahead-state 1331 256 256.010 269
readahead-context 0 0 0.000 0
readahead-contexta 0 0 0.000 0
readahead-backward 0 0 0.000 0
readahead-onthrash 0 0 0.000 0
readahead-onseek 0 0 0.000 0
rescue 0 0 0.000 0
size_drop 13
root ~#
root ~# echo  name=readahead-initial type=histogram_lin entries=32 range_min=8 base_interval=8 > /debug/statistics/readahead/definition
root ~# cat /debug/statistics/readahead/data
io_block 53
read_random 0 0 0.000 0
readahead-fadvise 0 0 0.000 0
readahead-stock 0 0 0.000 0
readaround-mmap 7 1 28.286 88
readahead-mmap 0 0 0.000 0
readahead-initial <=8 0
readahead-initial <=16 0
readahead-initial <=24 0
readahead-initial <=32 0
readahead-initial <=40 0
readahead-initial <=48 0
readahead-initial <=56 0
readahead-initial <=64 0
readahead-initial <=72 0
readahead-initial <=80 0
readahead-initial <=88 0
readahead-initial <=96 0
readahead-initial <=104 0
readahead-initial <=112 0
readahead-initial <=120 0
readahead-initial <=128 0
readahead-initial <=136 0
readahead-initial <=144 0
readahead-initial <=152 0
readahead-initial <=160 0
readahead-initial <=168 0
readahead-initial <=176 0
readahead-initial <=184 0
readahead-initial <=192 0
readahead-initial <=200 0
readahead-initial <=208 0
readahead-initial <=216 0
readahead-initial <=224 0
readahead-initial <=232 0
readahead-initial <=240 0
readahead-initial <=248 0
readahead-initial >248 0
readahead-state 1331 256 256.010 269
readahead-context 0 0 0.000 0
readahead-contexta 0 0 0.000 0
readahead-backward 0 0 0.000 0
readahead-onthrash 0 0 0.000 0
readahead-onseek 0 0 0.000 0
rescue 0 0 0.000 0
size_drop 13

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure
       [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn>
  2006-06-05  1:05   ` statistics infrastructure Fengguang Wu
@ 2006-06-05 16:30   ` Greg KH
  2006-06-13 23:47     ` statistics infrastructure (in -mm tree) review Greg KH
  1 sibling, 1 reply; 166+ messages in thread
From: Greg KH @ 2006-06-05 16:30 UTC (permalink / raw)
  To: Fengguang Wu, Andrew Morton, linux-kernel, Martin Peschke

On Mon, Jun 05, 2006 at 09:05:01AM +0800, Fengguang Wu wrote:
> On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> > statistics-infrastructure.patch
>  
> >  Another tough one.  It offers generic intrastructure for non-task-related
> >  instrumentation and it would really be good if someone who has an interest
> >  in this for something other than the zfcp driver could stand up and say
> >  "this works for us".
> 
> I'm having a try of it. Looks good for my case, except some fixable
> issues/bugs. Here is a sample session for querying the readahead statistics:

The last I looked at this, it seemed way too complex for what was
needed.  A lot of the filtering and other parsing stuff should be done
by a userspace tool, not the kernel.  I'll take a second look at it and
see what I can comment on.

But I don't think it's 2.6.18 material yet...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 166+ messages in thread

* statistics infrastructure (in -mm tree) review
  2006-06-05 16:30   ` Greg KH
@ 2006-06-13 23:47     ` Greg KH
  2006-06-14  0:18       ` Randy.Dunlap
                         ` (2 more replies)
  0 siblings, 3 replies; 166+ messages in thread
From: Greg KH @ 2006-06-13 23:47 UTC (permalink / raw)
  To: Martin Peschke; +Cc: akpm, linux-kernel

First cut at reviewing this code.

Initial impression is, "damm, that's a complex interface".  I'd really
like to see some other, real-world usages of this.  Like perhaps the
io-schedular statistics?  Some other /proc stats that have nothing to do
with processes?

And what does this mean for relayfs?  Those developers tuned that code
to the nth degree to get speed and other goodness, and here you go just
ignoring that stuff and add yet another way to get stats out of the
kernel.  Why should I use this instead of my own code with relayfs?

And is the need for the in-kernel parser really necessary?  I know it
makes the userspace tools simpler (cat and echo), but should we be
telling the kernel how to filter and adjust the data?  Shouldn't we just
dump it all to userspace and use tools there to manipulate it?

Oh, and use C99 structure initializers for when creating the statisic
structures in the example code (and real code), it makes it much easier
to understand, and future proof when the api changes.

Code comments now:


> diff -puN arch/s390/Kconfig~statistics-infrastructure arch/s390/Kconfig
> --- devel/arch/s390/Kconfig~statistics-infrastructure	2006-06-09 15:22:58.000000000 -0700
> +++ devel-akpm/arch/s390/Kconfig	2006-06-09 15:22:58.000000000 -0700
> @@ -490,8 +490,14 @@ source "drivers/net/Kconfig"
>  
>  source "fs/Kconfig"
>  
> +menu "Instrumentation Support"
> +
>  source "arch/s390/oprofile/Kconfig"
>  
> +source "lib/Kconfig.statistic"
> +
> +endmenu
> +
>  source "arch/s390/Kconfig.debug"
>  
>  source "security/Kconfig"
> diff -puN arch/s390/oprofile/Kconfig~statistics-infrastructure arch/s390/oprofile/Kconfig
> --- devel/arch/s390/oprofile/Kconfig~statistics-infrastructure	2006-06-09 15:22:58.000000000 -0700
> +++ devel-akpm/arch/s390/oprofile/Kconfig	2006-06-09 15:22:58.000000000 -0700
> @@ -1,6 +1,3 @@
> -
> -menu "Profiling support"
> -
>  config PROFILING
>  	bool "Profiling support"
>  	help
> @@ -18,5 +15,3 @@ config OPROFILE
>  
>  	  If unsure, say N.
>  
> -endmenu
> -

These two patches should probably go somewhere else, they don't have
much to do with this one.  (well, adding Kconfig.statistic" does, but
the other wording doesn't.)

> diff -puN /dev/null include/linux/statistic.h
> --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> +++ devel-akpm/include/linux/statistic.h	2006-06-09 15:22:58.000000000 -0700
> @@ -0,0 +1,348 @@
> +/*
> + * include/linux/statistic.h
> + *
> + * Statistics facility
> + *
> + * (C) Copyright IBM Corp. 2005, 2006
> + *
> + * Author(s): Martin Peschke <mpeschke@de.ibm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2, or (at your option)
> + * any later version.

Are you sure "any later version"?

> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Two not-needed paragraphs.

> +#ifndef STATISTIC_H
> +#define STATISTIC_H
> +
> +#include <linux/fs.h>
> +#include <linux/types.h>
> +#include <linux/percpu.h>
> +
> +#define STATISTIC_ROOT_DIR	"statistics"
> +
> +#define STATISTIC_FILENAME_DATA	"data"
> +#define STATISTIC_FILENAME_DEF	"definition"
> +
> +#define STATISTIC_NEED_BARRIER	1

Meta-comment about this file, does most of the stuff in this file,
really belong here?  At first glance, this should only hold the public
interface to the statistic code, not everything else needed by the
internal workings of that code.  It looks like it could be made a lot
smaller.

> +enum statistic_state {
> +	STATISTIC_STATE_INVALID,
> +	STATISTIC_STATE_UNCONFIGURED,
> +	STATISTIC_STATE_RELEASED,
> +	STATISTIC_STATE_OFF,
> +	STATISTIC_STATE_ON
> +};
> +
> +enum statistic_type {
> +	STATISTIC_TYPE_COUNTER_INC,
> +	STATISTIC_TYPE_COUNTER_PROD,
> +	STATISTIC_TYPE_UTIL,
> +	STATISTIC_TYPE_HISTOGRAM_LIN,
> +	STATISTIC_TYPE_HISTOGRAM_LOG2,
> +	STATISTIC_TYPE_SPARSE,
> +	STATISTIC_TYPE_NONE
> +};

Make these bit-safe so sparse can catch mistakes?

> +#define STATISTIC_FLAGS_NOINCR	0x01

What's this for?

> +/**
> + * struct statistic_info - description of a class of statistics
> + * @name: pointer to name name string
> + * @x_unit: pointer to string describing unit of X of (X, Y) data pair
> + * @y_unit: pointer to string describing unit of Y of (X, Y) data pair
> + * @flags: only flag so far (distinction of incremental and other statistic)
> + * @defaults: pointer to string describing defaults setting for attributes
> + *
> + * Exploiters must setup an array of struct statistic_info for a
> + * corresponding array of struct statistic, which are then pointed to
> + * by struct statistic_interface.
> + *
> + * Struct statistic_info and all members and addressed strings must stay for
> + * the lifetime of corresponding statistics created with statistic_create().
> + *
> + * Except for the name string, all other members may be left blank.
> + * It would be nice of exploiters to fill it out completely, though.
> + */
> +struct statistic_info {
> +/* public: */
> +	char *name;
> +	char *x_unit;
> +	char *y_unit;
> +	int  flags;
> +	char *defaults;
> +};

The whole "public:" and "private:" thing in these structures is not
needed.  Just document it in the kernel-doc comments and you should be
fine.  This isn't C++ :)

> +struct sgrb_seg {
> +	struct list_head list;
> +	char *address;
> +	int offset;
> +	int size;
> +};
> +
> +struct statistic_file_private {
> +	struct list_head read_seg_lh;
> +	struct list_head write_seg_lh;
> +	size_t write_seg_total_size;
> +};
> +
> +struct statistic_merge_private {
> +	struct statistic *stat;
> +	spinlock_t lock;
> +	void *dst;
> +};

I'm guessing these three structures aren't needed here.  Otherwise,
please document them.

> +#ifdef CONFIG_STATISTICS

Why ifdef now, so late?

> +extern int statistic_create(struct statistic_interface *, const char *);
> +extern int statistic_remove(struct statistic_interface *);
> +
> +/**
> + * statistic_add - update statistic with incremental data in (X, Y) pair
> + * @stat: struct statistic array
> + * @i: index of statistic to be updated
> + * @value: X
> + * @incr: Y
> + *
> + * The actual processing of the (X, Y) data pair is determined by the current
> + * the definition applied to the statistic. See Documentation/statistics.txt.
> + *
> + * This variant takes care of protecting per-cpu data. It is preferred whenever
> + * exploiters don't update several statistics of the same entity in one go.
> + */
> +static inline void statistic_add(struct statistic *stat, int i,
> +				 s64 value, u64 incr)
> +{
> +	unsigned long flags;
> +	local_irq_save(flags);
> +	if (stat[i].state == STATISTIC_STATE_ON)
> +		stat[i].add(&stat[i], smp_processor_id(), value, incr);
> +	local_irq_restore(flags);
> +}

These are all inline, which I guess is acceptable.  But see the current
inline-or-not comments on lkml which may make you rethink this.

> +/**
> + * statistic_add_nolock - update statistic with incremental data in (X, Y) pair
> + * @stat: struct statistic array
> + * @i: index of statistic to be updated
> + * @value: X
> + * @incr: Y
> + *
> + * The actual processing of the (X, Y) data pair is determined by the current
> + * definition applied to the statistic. See Documentation/statistics.txt.
> + *
> + * This variant leaves protecting per-cpu data to exploiters. It is preferred
> + * whenever exploiters update several statistics of the same entity in one go.
> + */
> +static inline void statistic_add_nolock(struct statistic *stat, int i,
> +					s64 value, u64 incr)
> +{
> +	if (stat[i].state == STATISTIC_STATE_ON)
> +		stat[i].add(&stat[i], smp_processor_id(), value, incr);
> +}
> +
> +/**
> + * statistic_inc - update statistic with incremental data in (X, 1) pair
> + * @stat: struct statistic array
> + * @i: index of statistic to be updated
> + * @value: X
> + *
> + * The actual processing of the (X, Y) data pair is determined by the current
> + * definition applied to the statistic. See Documentation/statistics.txt.
> + *
> + * This variant takes care of protecting per-cpu data. It is preferred whenever
> + * exploiters don't update several statistics of the same entity in one go.
> + */
> +static inline void statistic_inc(struct statistic *stat, int i, s64 value)
> +{
> +	unsigned long flags;
> +	local_irq_save(flags);
> +	if (stat[i].state == STATISTIC_STATE_ON)
> +		stat[i].add(&stat[i], smp_processor_id(), value, 1);
> +	local_irq_restore(flags);
> +}

Shouldn't this just call statistic_add() with a incr of 1?

> +
> +/**
> + * statistic_inc_nolock - update statistic with incremental data in (X, 1) pair
> + * @stat: struct statistic array
> + * @i: index of statistic to be updated
> + * @value: X
> + *
> + * The actual processing of the (X, Y) data pair is determined by the current
> + * definition applied to the statistic. See Documentation/statistics.txt.
> + *
> + * This variant leaves protecting per-cpu data to exploiters. It is preferred
> + * whenever exploiters update several statistics of the same entity in one go.
> + */
> +static inline void statistic_inc_nolock(struct statistic *stat, int i,
> +					s64 value)
> +{
> +	if (stat[i].state == STATISTIC_STATE_ON)
> +		stat[i].add(&stat[i], smp_processor_id(), value, 1);
> +}

Shouldn't this just call statistic_add_nolock with a incr of 1?

> diff -puN /dev/null lib/Kconfig.statistic
> --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> +++ devel-akpm/lib/Kconfig.statistic	2006-06-09 15:22:58.000000000 -0700
> @@ -0,0 +1,11 @@
> +config STATISTICS
> +	bool "Statistics infrastructure"
> +	depends on DEBUG_FS
> +	help
> +	  The statistics infrastructure provides a debug-fs based user interface

No "-" in debugfs :)

> +	  for statistics of kernel components, that is, usually device drivers.

Why mention drivers?  Other things might use this (see original comments
at the start of the message.)

> --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> +++ devel-akpm/lib/statistic.c	2006-06-09 15:22:58.000000000 -0700
> @@ -0,0 +1,1459 @@
> +/*
> + *  lib/statistic.c
> + *    statistics facility
> + *
> + *    Copyright (C) 2005, 2006
> + *		IBM Deutschland Entwicklung GmbH,
> + *		IBM Corporation
> + *
> + *    Author(s): Martin Peschke (mpeschke@de.ibm.com),
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2, or (at your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Again with the verbose license :)


> +static void _statistic_barrier(void *unused)
> +{
> +}
> +
> +static inline int statistic_stop(struct statistic *stat)
> +{
> +	stat->stopped = sched_clock();
> +	stat->state = STATISTIC_STATE_OFF;
> +	/* ensures that all CPUs have ceased updating statistics */
> +	smp_mb();
> +	on_each_cpu(_statistic_barrier, NULL, 0, 1);
> +	return 0;
> +}

Isn't there a way to use rcu for this instead?  Just a suggestion, it
might be totally wrong...


> +
> +static int statistic_transition(struct statistic *stat,
> +				struct statistic_info *info,
> +				enum statistic_state requested_state)
> +{
> +	int z = (requested_state < stat->state ? 1 : 0);
> +	int retval = -EINVAL;

	int retval = 0;

> +
> +	while (stat->state != requested_state) {
> +		switch (stat->state) {
> +		case STATISTIC_STATE_INVALID:
> +			retval = ( z ? -EINVAL : statistic_initialise(stat) );
> +			break;
> +		case STATISTIC_STATE_UNCONFIGURED:
> +			retval = ( z ? statistic_uninitialise(stat)
> +				     : statistic_define(stat) );
> +			break;
> +		case STATISTIC_STATE_RELEASED:
> +			retval = ( z ? statistic_initialise(stat)
> +				     : statistic_alloc(stat, info) );
> +			break;
> +		case STATISTIC_STATE_OFF:
> +			retval = ( z ? statistic_free(stat, info)
> +				     : statistic_start(stat) );
> +			break;
> +		case STATISTIC_STATE_ON:
> +			retval = ( z ? statistic_stop(stat) : -EINVAL );
> +			break;
> +		}
> +		if (unlikely(retval))
> +			return retval;

delete these two lines.

> +	}
> +	return 0;

	return retval;

> +static match_table_t statistic_match_type = {
> +	{1, "type=%s"},
> +	{9, NULL}
> +};

named field initializers please.


> +static match_table_t statistic_match_common = {
> +	{STATISTIC_STATE_UNCONFIGURED, "state=unconfigured"},
> +	{STATISTIC_STATE_RELEASED, "state=released"},
> +	{STATISTIC_STATE_OFF, "state=off"},
> +	{STATISTIC_STATE_ON, "state=on"},
> +	{1001, "name=%s"},
> +	{1002, "data=reset"},
> +	{1003, "defaults"},
> +	{9999, NULL}
> +};

Same here.

And why do you have numbers and a mix of enums here?  Shouldn't you
define the name=, data= and defaults too?

Also, just null terminate the list, is 9999 really needed?

> +static struct statistic_discipline statistic_discs[] = {
> +	{ /* STATISTIC_TYPE_COUNTER_INC */
> +	  NULL,
> +	  statistic_alloc_generic,
> +	  NULL,
> +	  statistic_reset_counter,
> +	  statistic_merge_counter,
> +	  statistic_fdata_counter,
> +	  NULL,
> +	  statistic_add_counter_inc,
> +	  statistic_set_counter_inc,
> +	  "counter_inc", sizeof(u64)
> +	},

named initializers please.  That will let you not have to specify the
NULL fields, making it much easier to read overall.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-13 23:47     ` statistics infrastructure (in -mm tree) review Greg KH
@ 2006-06-14  0:18       ` Randy.Dunlap
  2006-06-14 16:45         ` Greg KH
  2006-06-14 22:48         ` Martin Peschke
  2006-06-14  5:04       ` Andi Kleen
  2006-06-17 10:30       ` Martin Peschke
  2 siblings, 2 replies; 166+ messages in thread
From: Randy.Dunlap @ 2006-06-14  0:18 UTC (permalink / raw)
  To: Greg KH; +Cc: mp3, akpm, linux-kernel

On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote:

> First cut at reviewing this code.
> 
> Initial impression is, "damm, that's a complex interface".  I'd really
> like to see some other, real-world usages of this.  Like perhaps the
> io-schedular statistics?  Some other /proc stats that have nothing to do
> with processes?

Agreed with complexity.

> And what does this mean for relayfs?  Those developers tuned that code
> to the nth degree to get speed and other goodness, and here you go just
> ignoring that stuff and add yet another way to get stats out of the
> kernel.  Why should I use this instead of my own code with relayfs?

Good questions.

> And is the need for the in-kernel parser really necessary?  I know it
> makes the userspace tools simpler (cat and echo), but should we be
> telling the kernel how to filter and adjust the data?  Shouldn't we just
> dump it all to userspace and use tools there to manipulate it?

I agree again.

> Code comments now:
> 
> 
> > diff -puN /dev/null include/linux/statistic.h
> > --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> > +++ devel-akpm/include/linux/statistic.h	2006-06-09 15:22:58.000000000 -0700
> > @@ -0,0 +1,348 @@
> > +/*
> > + * include/linux/statistic.h
> > + *
> > + * Statistics facility

> > +/**
> > + * struct statistic_info - description of a class of statistics
> > + * @name: pointer to name name string
> > + * @x_unit: pointer to string describing unit of X of (X, Y) data pair
> > + * @y_unit: pointer to string describing unit of Y of (X, Y) data pair
> > + * @flags: only flag so far (distinction of incremental and other statistic)
> > + * @defaults: pointer to string describing defaults setting for attributes
> > + *
> > + * Exploiters must setup an array of struct statistic_info for a
> > + * corresponding array of struct statistic, which are then pointed to
> > + * by struct statistic_interface.
> > + *
> > + * Struct statistic_info and all members and addressed strings must stay for
> > + * the lifetime of corresponding statistics created with statistic_create().
> > + *
> > + * Except for the name string, all other members may be left blank.
> > + * It would be nice of exploiters to fill it out completely, though.
> > + */
> > +struct statistic_info {
> > +/* public: */
> > +	char *name;
> > +	char *x_unit;
> > +	char *y_unit;
> > +	int  flags;
> > +	char *defaults;
> > +};
> 
> The whole "public:" and "private:" thing in these structures is not
> needed.  Just document it in the kernel-doc comments and you should be
> fine.  This isn't C++ :)

but public: and private: are kernel-doc comments...
Using "private:" causes those fields to be omitted from the
generated documentation because those fields are for internal/private
use of the (statistics) infrastructure code, not to be used by
its clients (er, ugh, exploiters) etc.

> > --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> > +++ devel-akpm/lib/statistic.c	2006-06-09 15:22:58.000000000 -0700
> > @@ -0,0 +1,1459 @@
> > +/*
> > + *  lib/statistic.c
> > + *    statistics facility
> > + *

> Again with the verbose license :)

Well it's not uncommon in kernel source files.
Where do we document how licenses should be written?


---
~Randy

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-14  0:18       ` Randy.Dunlap
@ 2006-06-14 16:45         ` Greg KH
  2006-06-14 22:48         ` Martin Peschke
  1 sibling, 0 replies; 166+ messages in thread
From: Greg KH @ 2006-06-14 16:45 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: mp3, akpm, linux-kernel

On Tue, Jun 13, 2006 at 05:18:27PM -0700, Randy.Dunlap wrote:
> On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote:
> > > +/**
> > > + * struct statistic_info - description of a class of statistics
> > > + * @name: pointer to name name string
> > > + * @x_unit: pointer to string describing unit of X of (X, Y) data pair
> > > + * @y_unit: pointer to string describing unit of Y of (X, Y) data pair
> > > + * @flags: only flag so far (distinction of incremental and other statistic)
> > > + * @defaults: pointer to string describing defaults setting for attributes
> > > + *
> > > + * Exploiters must setup an array of struct statistic_info for a
> > > + * corresponding array of struct statistic, which are then pointed to
> > > + * by struct statistic_interface.
> > > + *
> > > + * Struct statistic_info and all members and addressed strings must stay for
> > > + * the lifetime of corresponding statistics created with statistic_create().
> > > + *
> > > + * Except for the name string, all other members may be left blank.
> > > + * It would be nice of exploiters to fill it out completely, though.
> > > + */
> > > +struct statistic_info {
> > > +/* public: */
> > > +	char *name;
> > > +	char *x_unit;
> > > +	char *y_unit;
> > > +	int  flags;
> > > +	char *defaults;
> > > +};
> > 
> > The whole "public:" and "private:" thing in these structures is not
> > needed.  Just document it in the kernel-doc comments and you should be
> > fine.  This isn't C++ :)
> 
> but public: and private: are kernel-doc comments...
> Using "private:" causes those fields to be omitted from the
> generated documentation because those fields are for internal/private
> use of the (statistics) infrastructure code, not to be used by
> its clients (er, ugh, exploiters) etc.

Oh, I didn't realize that kerneldoc could do that now, nice.  And look,
it's even documented that it can support that, I'll shut up now :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-14  0:18       ` Randy.Dunlap
  2006-06-14 16:45         ` Greg KH
@ 2006-06-14 22:48         ` Martin Peschke
  2006-06-19 22:12           ` Greg KH
  1 sibling, 1 reply; 166+ messages in thread
From: Martin Peschke @ 2006-06-14 22:48 UTC (permalink / raw)
  To: Randy.Dunlap, Greg KH, akpm, Andi Kleen; +Cc: linux-kernel

Randy.Dunlap wrote:
> On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote:
> 
>> First cut at reviewing this code.
>>
>> Initial impression is, "damm, that's a complex interface".  I'd really
>> like to see some other, real-world usages of this.  Like perhaps the
>> io-schedular statistics?  Some other /proc stats that have nothing to do
>> with processes?
> 
> Agreed with complexity.

Well, roughly 1500 lines of code is sort of complex, even if already
being reviewed and cleaned up several times.

Please, let's try to break it down into design details that add
their measure of complexity. A flat "too comlex" doesn't help on.

Could you ACK / NACK the following assumptions, so that we can
figure out how far our (dis)agreement goes and how to continue?

1) There are various kernel components that gather statistical data.
(kernel/profile.c, network stack, genhd, memory management, taskstats,
S390 DASD driver, zfcp driver, ...). Requirements for other statistics
aren't unusual.

2) Basically, they all implement similar things (smp-safe and efficient
data structures used for data gathering, implement algorithms for
on-the-fly data preprocessing, delivery of data through some user
interface, sometimes a switch for turning statistics on/off)

3) They all introduce their own macros / functions, resulting in code
duplication (bad), while they usually have their unique way to show
data to users (bad, too).

4) Possible ways to aggregate statistics data include plain counters,
histograms, a utilisation indicator (min, max, average etc.), and
potentially other algorithms people might come up with.

5) Statistics counters should be maintained in kernel. That's cheapest.
No bursts of zillions of incremental updates relayed to user space.
(please see also other comment at bottom of message)

6) Some library routines would suffice to take over data gathering
and preprocessing. Avoids further code duplication, avoids bugs,
speeds up development and test.

7) With regard to the delivery of statistic data to user land,
a library maintaining statistic counters, histograms or whatever
on behalf of exploiters doesn't need any help from the exploiter.
We can avoid the usual callbacks and code bloat in exploiters
this way.

8) If some library functions are responsible for showing data, and the
exploiter is not, we can achieve a common format for statistics data.
For example, a histogram about block I/O has the same format as
a histogram about network I/O.
This provides ease of use and minimises the effort of writing
scripts that could do further processing (e.g. formatting as
spreadsheats or bar charts, comparison and summarisation of
statistics, ...)

9) For performance reasons, per-cpu data and minimal locking
(local_irq_save/restore) should be used.
Adds to complexity, though.

10) If data is per-cpu, we want to be very careful with regard to
memory footprint. That is why, memory is only allocated for online
cpus (requires cpu hot(un)plug handling, which adds to complexity),

11) At least for data processing modes more expensive than plain
counters, like histograms, an on/off state makes sense.

12) In order to minimise the memory footprint, a released/allocated
state makes sense.

13) Unconfigured/released/off/on states should be handled by a tiny
state machine and a single check on statistic updates.

14) Kernel code delivering statistics data through library routines
can, at best, guess whether a user wants incremental updates be
aggregated in a single counter, a set of counters (histograms), or
in the form of other results. Users might want to change how much
detail is retained in aggregated statistic results.
Adds to complexity.

15) Nonetheless, exploiters are kindly requested to provide some
default settings that are a good starting point for general
purpose use.

16) Aggregated statistic results, in many cases, don't need to be
pushed to user space through a high-speed, high-volume interface.
Debugfs, for example, is fine for this purpose.

17) If the requirement for pushing data comes up anyway, we could,
for example, add relay-entries in debugfs anytime.
(For example, we could implement forwarding of incremental
updates to user space. Just another conceivable data processing
mode that fits into the current design.)

18) The programming interface of a statistics library can be rougly as
simple as statistic_create(), statistics_remove(), statistic_add().

19) Statistic_add() should come in different flavours:
statistic_add/inc() (just for convenience), and
statistic_*_nolock() (more efficient locking for a bundle of updates)

20) Statistic_add() takes a (X, Y) pair, with X being the main
characteristics of the statistics (e.g. a request size) and with
Y quantifying the update reported for a particular X (e.g. number
of observed requests of a particular request size).

21) Processing of (X, Y) according to abstract rules imposed by
counters, histograms etc. doesn't require any knowledge about the
semantics of X or Y.

22) There might be statistic counters that exploiters want to use and
maintain on their own, and which users still may want to have a look at
along with other statistics. Statistic_set() fits in here nicely.

>> And what does this mean for relayfs?  Those developers tuned that code
>> to the nth degree to get speed and other goodness, and here you go just
>> ignoring that stuff and add yet another way to get stats out of the
>> kernel.  Why should I use this instead of my own code with relayfs?
 >
 > Good questions.

Relayfs is a nice feature, but not appropriate here.

For example, during a performance measurements I have seen
SCSI I/O related statistics being updated millions of times while
I was just having a short lunch break. Some of them just increased
a counter, which is pretty fast if done immediately in the kernel.
If all these updates update would have to be relayed to user space
to just increase a counter maintained in user space.. urgh, surely
more expensive and not the way to go.

And what if user space isn't interested at all? Would we keep
pumping zillions of unused updates into buffers instead of
discarding them right away?

Profile.c, taskstats, genhd and all the other statistics listed
above... they all maintain their counters in the kernel and
show aggregated statistics to users.

>> And is the need for the in-kernel parser really necessary?  I know it
>> makes the userspace tools simpler (cat and echo), but should we be
>> telling the kernel how to filter and adjust the data?  Shouldn't we just
>> dump it all to userspace and use tools there to manipulate it?
> 
> I agree again.

Assumimg we can agree on in-kernel counters, histograms etc.
allowing for attributes being adjusted by users makes sense.

The parser stuff required for these attributes is implemented
using match_token() & friends, which should be acceptible.
But, I think that the standard way of using match_token() and
strsep() needs improvement (strsep is destructive to strings
parsed, which is painful).

Thanks, Martin

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-14 22:48         ` Martin Peschke
@ 2006-06-19 22:12           ` Greg KH
  2006-06-20 15:40             ` Martin Peschke
  0 siblings, 1 reply; 166+ messages in thread
From: Greg KH @ 2006-06-19 22:12 UTC (permalink / raw)
  To: Martin Peschke; +Cc: Randy.Dunlap, akpm, Andi Kleen, linux-kernel

On Thu, Jun 15, 2006 at 12:48:29AM +0200, Martin Peschke wrote:
> Randy.Dunlap wrote:
> >On Tue, 13 Jun 2006 16:47:39 -0700 Greg KH wrote:
> >
> >>First cut at reviewing this code.
> >>
> >>Initial impression is, "damm, that's a complex interface".  I'd really
> >>like to see some other, real-world usages of this.  Like perhaps the
> >>io-schedular statistics?  Some other /proc stats that have nothing to do
> >>with processes?
> >
> >Agreed with complexity.
> 
> Well, roughly 1500 lines of code is sort of complex, even if already
> being reviewed and cleaned up several times.
> 
> Please, let's try to break it down into design details that add
> their measure of complexity. A flat "too comlex" doesn't help on.
> 
> Could you ACK / NACK the following assumptions, so that we can
> figure out how far our (dis)agreement goes and how to continue?

Sure.

> 1) There are various kernel components that gather statistical data.
> (kernel/profile.c, network stack, genhd, memory management, taskstats,
> S390 DASD driver, zfcp driver, ...). Requirements for other statistics
> aren't unusual.

Agreed.

> 2) Basically, they all implement similar things (smp-safe and efficient
> data structures used for data gathering, implement algorithms for
> on-the-fly data preprocessing, delivery of data through some user
> interface, sometimes a switch for turning statistics on/off)

Agreed.

> 3) They all introduce their own macros / functions, resulting in code
> duplication (bad), while they usually have their unique way to show
> data to users (bad, too).

Agreed.

> 4) Possible ways to aggregate statistics data include plain counters,
> histograms, a utilisation indicator (min, max, average etc.), and
> potentially other algorithms people might come up with.

agreed.

> 5) Statistics counters should be maintained in kernel. That's cheapest.
> No bursts of zillions of incremental updates relayed to user space.
> (please see also other comment at bottom of message)

agreed.

> 6) Some library routines would suffice to take over data gathering
> and preprocessing. Avoids further code duplication, avoids bugs,
> speeds up development and test.

As long as the library functions do not cause any speed degradations,
which I think your current ones do with the pointer dereference (which
is very slow and measurable on some archs).

> 7) With regard to the delivery of statistic data to user land,
> a library maintaining statistic counters, histograms or whatever
> on behalf of exploiters doesn't need any help from the exploiter.
> We can avoid the usual callbacks and code bloat in exploiters
> this way.

I don't really understand what you are stating here.

> 8) If some library functions are responsible for showing data, and the
> exploiter is not, we can achieve a common format for statistics data.
> For example, a histogram about block I/O has the same format as
> a histogram about network I/O.
> This provides ease of use and minimises the effort of writing
> scripts that could do further processing (e.g. formatting as
> spreadsheats or bar charts, comparison and summarisation of
> statistics, ...)

Common functionality and formats would be wonderful.  But I'm not sure
you can guarantee that we really want the network io and block io
statistics in the same format, as they are fundimentally different
things.

Also, you will have to live with the existing interfaces, as we can't
break them, so porting them will not happen.

> 9) For performance reasons, per-cpu data and minimal locking
> (local_irq_save/restore) should be used.
> Adds to complexity, though.

If necessary.  Is this really necessary?

> 10) If data is per-cpu, we want to be very careful with regard to
> memory footprint. That is why, memory is only allocated for online
> cpus (requires cpu hot(un)plug handling, which adds to complexity),

Agreed.

> 11) At least for data processing modes more expensive than plain
> counters, like histograms, an on/off state makes sense.

So that userspace can tell the kernel to go faster?  I don't know why
this is really necessary :)

> 12) In order to minimise the memory footprint, a released/allocated
> state makes sense.

Again, telling userspace when to tell the kernel to free up memory can
cause problems.

> 13) Unconfigured/released/off/on states should be handled by a tiny
> state machine and a single check on statistic updates.

Ok, but you are now getting into implementation issues, like a few of
the above ones...

> 14) Kernel code delivering statistics data through library routines
> can, at best, guess whether a user wants incremental updates be
> aggregated in a single counter, a set of counters (histograms), or
> in the form of other results. Users might want to change how much
> detail is retained in aggregated statistic results.
> Adds to complexity.

Complexity where?  Userspace or in the kernel?

> 15) Nonetheless, exploiters are kindly requested to provide some
> default settings that are a good starting point for general
> purpose use.
> 
> 16) Aggregated statistic results, in many cases, don't need to be
> pushed to user space through a high-speed, high-volume interface.
> Debugfs, for example, is fine for this purpose.
> 
> 17) If the requirement for pushing data comes up anyway, we could,
> for example, add relay-entries in debugfs anytime.
> (For example, we could implement forwarding of incremental
> updates to user space. Just another conceivable data processing
> mode that fits into the current design.)
> 
> 18) The programming interface of a statistics library can be rougly as
> simple as statistic_create(), statistics_remove(), statistic_add().
> 
> 19) Statistic_add() should come in different flavours:
> statistic_add/inc() (just for convenience), and
> statistic_*_nolock() (more efficient locking for a bundle of updates)
> 
> 20) Statistic_add() takes a (X, Y) pair, with X being the main
> characteristics of the statistics (e.g. a request size) and with
> Y quantifying the update reported for a particular X (e.g. number
> of observed requests of a particular request size).
> 
> 21) Processing of (X, Y) according to abstract rules imposed by
> counters, histograms etc. doesn't require any knowledge about the
> semantics of X or Y.
> 
> 22) There might be statistic counters that exploiters want to use and
> maintain on their own, and which users still may want to have a look at
> along with other statistics. Statistic_set() fits in here nicely.


Ok, these are all implementation details.

Can you please step back a bit?  What is the requirements that you are
trying to achieve here?  A kernel-wide statistic gathering library?  If
so, why?  What has caused this to be needed?  And if it's needed, would
putting the stuff in debugfs for _all_ statistics really be a good idea
(hint, I would say no...)

> >>And what does this mean for relayfs?  Those developers tuned that code
> >>to the nth degree to get speed and other goodness, and here you go just
> >>ignoring that stuff and add yet another way to get stats out of the
> >>kernel.  Why should I use this instead of my own code with relayfs?
> >
> > Good questions.
> 
> Relayfs is a nice feature, but not appropriate here.
> 
> For example, during a performance measurements I have seen
> SCSI I/O related statistics being updated millions of times while
> I was just having a short lunch break. Some of them just increased
> a counter, which is pretty fast if done immediately in the kernel.
> If all these updates update would have to be relayed to user space
> to just increase a counter maintained in user space.. urgh, surely
> more expensive and not the way to go.
> 
> And what if user space isn't interested at all? Would we keep
> pumping zillions of unused updates into buffers instead of
> discarding them right away?

Yes, for simple counters, relayfs is overkill.  But so is an indirect
function call through a pointer for every simple counter update :)

> Profile.c, taskstats, genhd and all the other statistics listed
> above... they all maintain their counters in the kernel and
> show aggregated statistics to users.

Yes, but will you be allowed to port the existing users over to your new
framework without breaking any userspace stuff?  I don't see that
happening :(

> >>And is the need for the in-kernel parser really necessary?  I know it
> >>makes the userspace tools simpler (cat and echo), but should we be
> >>telling the kernel how to filter and adjust the data?  Shouldn't we just
> >>dump it all to userspace and use tools there to manipulate it?
> >
> >I agree again.
> 
> Assumimg we can agree on in-kernel counters, histograms etc.
> allowing for attributes being adjusted by users makes sense.
> 
> The parser stuff required for these attributes is implemented
> using match_token() & friends, which should be acceptible.
> But, I think that the standard way of using match_token() and
> strsep() needs improvement (strsep is destructive to strings
> parsed, which is painful).

Yeah, the parser isn't as bad as I originally thought it was.  But
overall, I'm still not sold on the real need for this kind of
subsystem/library.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-19 22:12           ` Greg KH
@ 2006-06-20 15:40             ` Martin Peschke
  2006-06-20 16:50               ` Randy.Dunlap
  0 siblings, 1 reply; 166+ messages in thread
From: Martin Peschke @ 2006-06-20 15:40 UTC (permalink / raw)
  To: Greg KH; +Cc: Randy.Dunlap, akpm, Andi Kleen, linux-kernel

Greg KH wrote:
>> 6) Some library routines would suffice to take over data gathering
>> and preprocessing. Avoids further code duplication, avoids bugs,
>> speeds up development and test.
> 
> As long as the library functions do not cause any speed degradations,
> which I think your current ones do with the pointer dereference (which
> is very slow and measurable on some archs).

Implementation detail ;-)

I will post another statistic_add() derivate that requires the caller
to specify the way data aggregation should be done, and which,
consequently, won't have an indirect function call.

Then callers / clients / programmers can chose between higher
flexibility and best performance, depending on the requirements
of their statistics.

>> 7) With regard to the delivery of statistic data to user land,
>> a library maintaining statistic counters, histograms or whatever
>> on behalf of exploiters doesn't need any help from the exploiter.
>> We can avoid the usual callbacks and code bloat in exploiters
>> this way.
> 
> I don't really understand what you are stating here.

Sorry.
1,$s/exploiter/client/g

Any device driver or whatever gathering statistics data currently
has code dealing with showing the data. Usually, they have some
callbacks for procfs, sysfs or whatever.

My point is that, if a library keeps track of statistics on behalf
of its clients, no client needs to be called back in order to
merge, format, copy, etc. data being shown to users. The library
can handle as a background operation without disturbing clients.

>> 8) If some library functions are responsible for showing data, and the
>> exploiter is not, we can achieve a common format for statistics data.
>> For example, a histogram about block I/O has the same format as
>> a histogram about network I/O.
>> This provides ease of use and minimises the effort of writing
>> scripts that could do further processing (e.g. formatting as
>> spreadsheats or bar charts, comparison and summarisation of
>> statistics, ...)
> 
> Common functionality and formats would be wonderful.  But I'm not sure
> you can guarantee that we really want the network io and block io
> statistics in the same format, as they are fundimentally different
> things.

Subsystems are free to gather as many/few statistics as required.
And I am not trying to enforce semantics.

All I am saying is that, if two statistics are aggregated using similar
algorithms, then the results should be presented or formatted in a
similar way.

My assumption is that the format of results doesn't depend on the
the semantics of the data feeding a statistic. But it depends on the
way we aggregate data.

For example, there is no reason why statistic A of subsystem 1
aggregated in the form of a histogram should have a different format
than statistic B of subsystem 2 also being aggregated in the form
of a histogram.

A <=0 0
A <=1 0
A <=2 3
A <=4 7
A <=8 29
A <=16 285
A <=32 295
A <=64 96
A <=128 52
A <=256 3
A >256 1

B <=10 1
B <=20 3
B <=30 92
B <=40 251
...
B <=490 34462
B <=500 23434
B >500 0

Semantics are different; statistic names are different;
number of buckets, "diameter" of buckets, scale etc. might be different;
basic format of results is identical - as long as both statistics are
aggregated the same way (as histograms, in this case).

A library can provide a common format, because semantics just don't
matter. Its statistic_add() function (or whatever we want to call it)
has no idea about the actual semantics of the incremental statistic data
it accepts and processes according to abstract rules.

And I think a library should provide a common format, because it
makes it fun poking in the aggregated data, and writing a script that
does further processing of that data.

> Also, you will have to live with the existing interfaces, as we can't
> break them, so porting them will not happen.

Okay.
A library could help to avoid a further proliferation of interfaces.

>> 9) For performance reasons, per-cpu data and minimal locking
>> (local_irq_save/restore) should be used.
>> Adds to complexity, though.
> 
> If necessary.  Is this really necessary?

I would think so.

My initial patch was criticised for not using per-cpu data and,
therewith, requiring more expensive locking.

Besides, all other serious statistic implementations use per-cpu data
(kernel/profile.c, include/linux/genhd.h, ...)

>> 10) If data is per-cpu, we want to be very careful with regard to
>> memory footprint. That is why, memory is only allocated for online
>> cpus (requires cpu hot(un)plug handling, which adds to complexity),
> 
> Agreed.
> 
>> 11) At least for data processing modes more expensive than plain
>> counters, like histograms, an on/off state makes sense.
> 
> So that userspace can tell the kernel to go faster?  I don't know why
> this is really necessary :)

Okay, here are two functions implemting two different ways
data can be aggregated:

   static void statistic_add_counter_inc(struct statistic *stat, int cpu,
                                         s64 value, u64 incr)
   {
           *(u64*)stat->pdata->ptrs[cpu] += incr;
   }

   static void statistic_add_histogram_log2(struct statistic *stat, int cpu,
                                            s64 value, u64 incr)
   {
           int i = statistic_histogram_calc_index_log2(stat, value);
           ((u64*)stat->pdata->ptrs[cpu])[i] += incr;
   }

with statistic_histogram_calc_index_log2 expanding to:

   static int statistic_histogram_calc_index_log2(struct statistic *stat,
                                                  s64 value)
   {
           unsigned long long i;
           for (i = 0;
                i < stat->u.histogram.last_index &&
                value > statistic_histogram_calc_value_log2(stat, i);
                i++);
           return i;
   }

While incrementing a counter might be cheap, updating a histogram
is more expensive. First, we need to identify the counter out of
a set of counters that is to be incremented. For logarithmic scale,
this requires a loop.

Checking whether data gathering has been enabled at all might look
expensive in the context of a plain counter. It certainly saves
cycles for a histogram that users aren't interested in and that
haven't been switched on.

>> 12) In order to minimise the memory footprint, a released/allocated
>> state makes sense.
> 
> Again, telling userspace when to tell the kernel to free up memory can
> cause problems.

We have to make sure that released memory isn't used anymore.
That's what _statistic_barrier() is for.

Do you see other issues?

>> 14) Kernel code delivering statistics data through library routines
>> can, at best, guess whether a user wants incremental updates be
>> aggregated in a single counter, a set of counters (histograms), or
>> in the form of other results. Users might want to change how much
>> detail is retained in aggregated statistic results.
>> Adds to complexity.
> 
> Complexity where?  Userspace or in the kernel?

Complexity in the kernel. Sorry.

When a statistics library allows users to chose from about half a
dozen ways of aggregating data, then this adds to the complexity
of that library to some degree.

>> 15) Nonetheless, exploiters are kindly requested to provide some
>> default settings that are a good starting point for general
>> purpose use.
>>
>> 16) Aggregated statistic results, in many cases, don't need to be
>> pushed to user space through a high-speed, high-volume interface.
>> Debugfs, for example, is fine for this purpose.
>>
>> 17) If the requirement for pushing data comes up anyway, we could,
>> for example, add relay-entries in debugfs anytime.
>> (For example, we could implement forwarding of incremental
>> updates to user space. Just another conceivable data processing
>> mode that fits into the current design.)
>>
>> 18) The programming interface of a statistics library can be rougly as
>> simple as statistic_create(), statistics_remove(), statistic_add().
>>
>> 19) Statistic_add() should come in different flavours:
>> statistic_add/inc() (just for convenience), and
>> statistic_*_nolock() (more efficient locking for a bundle of updates)
>>
>> 20) Statistic_add() takes a (X, Y) pair, with X being the main
>> characteristics of the statistics (e.g. a request size) and with
>> Y quantifying the update reported for a particular X (e.g. number
>> of observed requests of a particular request size).
>>
>> 21) Processing of (X, Y) according to abstract rules imposed by
>> counters, histograms etc. doesn't require any knowledge about the
>> semantics of X or Y.
>>
>> 22) There might be statistic counters that exploiters want to use and
>> maintain on their own, and which users still may want to have a look at
>> along with other statistics. Statistic_set() fits in here nicely.
> 
> 
> Ok, these are all implementation details.

Maybe. But at least 21) is fundamental, as it provides a base for
writing such a library: The library deals with a defined form of
data, regardless of the semantics of the data.

> Can you please step back a bit?  What is the requirements that you are
> trying to achieve here?

Our customers have serious concerns that Linux has no means
to gather SCSI performance data. Making sure we can get data from
subsystems, we both provide for better service and give customers
a good feeling.

Statistics, and SCSI statistics in particular, are seen here as one
of the more urgent things and real inhibitors on enterprise level.

 > A kernel-wide statistic gathering library?

Yes, as a by-product of the specific SCSI requirement, so to speak.
And, why not :)

> If so, why?  What has caused this to be needed?

A clear distinction between code measuring statistics data and
code handling statistics data makes for better code.
There is no point in intermixing algorithms for processing
statistics data and the semantics of statistics data.

So what would you do if you got to write the N-th set of statistic
functions?

To me it looks like the next logical step to fully abstract
statistics code out of a device driver.

 > And if it's needed, would
> putting the stuff in debugfs for _all_ statistics really be a good idea
> (hint, I would say no...)

May I ask you why you think so.

Well, so far I don't see a serious limitation in using debugfs.
I think relayfs entries could be used to cover other requirements,
if they pop up.

And as I have explained, replacing debugfs by something else
shouldn't be too difficult.
But, I don't see a clear direction regarding this discussion.

Or do you suggest that it would make sense to modularise that
part of the code, so as to allow for other user interface code
being "plugged in" and statistics data being shown through
debugfs, procfs, netlink or whatever?

>>>> And what does this mean for relayfs?  Those developers tuned that code
>>>> to the nth degree to get speed and other goodness, and here you go just
>>>> ignoring that stuff and add yet another way to get stats out of the
>>>> kernel.  Why should I use this instead of my own code with relayfs?
>>> Good questions.
>> Relayfs is a nice feature, but not appropriate here.
>>
>> For example, during a performance measurements I have seen
>> SCSI I/O related statistics being updated millions of times while
>> I was just having a short lunch break. Some of them just increased
>> a counter, which is pretty fast if done immediately in the kernel.
>> If all these updates update would have to be relayed to user space
>> to just increase a counter maintained in user space.. urgh, surely
>> more expensive and not the way to go.
>>
>> And what if user space isn't interested at all? Would we keep
>> pumping zillions of unused updates into buffers instead of
>> discarding them right away?
> 
> Yes, for simple counters, relayfs is overkill.  But so is an indirect
> function call through a pointer for every simple counter update :)

Got it.

>> Profile.c, taskstats, genhd and all the other statistics listed
>> above... they all maintain their counters in the kernel and
>> show aggregated statistics to users.
> 
> Yes, but will you be allowed to port the existing users over to your new
> framework without breaking any userspace stuff?  I don't see that
> happening :(

Would it be me porting...? ;-)

I see this library as an offering to anybody who is looking
for a comfortable and established way to dump statistic data,
including me.

>>>> And is the need for the in-kernel parser really necessary?  I know it
>>>> makes the userspace tools simpler (cat and echo), but should we be
>>>> telling the kernel how to filter and adjust the data?  Shouldn't we just
>>>> dump it all to userspace and use tools there to manipulate it?
>>> I agree again.
>> Assumimg we can agree on in-kernel counters, histograms etc.
>> allowing for attributes being adjusted by users makes sense.
>>
>> The parser stuff required for these attributes is implemented
>> using match_token() & friends, which should be acceptible.
>> But, I think that the standard way of using match_token() and
>> strsep() needs improvement (strsep is destructive to strings
>> parsed, which is painful).
> 
> Yeah, the parser isn't as bad as I originally thought it was.  But
> overall, I'm still not sold on the real need for this kind of
> subsystem/library.

In my eyes, there are several indications that a library makes sense:

We want statistics for various components.
Many of the reinvent-the-wheel statistics have similar programming interfaces
(e.g. compare disk_stat_add(), dasd_profile_counter(), profile_hit()).
There is unnecessary code duplication.
There is no need to have statistics user interface code spread throughout
the kernel.
A library can achieve a common output format, simplyfing user space.
A defined programming interface makes it much easier to get a general
idea of the statistics being around. An API gives more control and
might help to avoid introducing redundant statistics or statistics of
lesser importance.

I am not saying that such a library has to look exactly like the
proposed patches. I think that these patches contain some concepts
worth considering.

Thanks, Martin

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-20 15:40             ` Martin Peschke
@ 2006-06-20 16:50               ` Randy.Dunlap
  2006-06-21 18:51                 ` Martin Peschke
  0 siblings, 1 reply; 166+ messages in thread
From: Randy.Dunlap @ 2006-06-20 16:50 UTC (permalink / raw)
  To: Martin Peschke; +Cc: greg, akpm, ak, linux-kernel

On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote:

(I haven't forgotten that I owe you some review/feedback.
It's on my long todo list.)

> Greg KH wrote:
> 
> >> 7) With regard to the delivery of statistic data to user land,
> >> a library maintaining statistic counters, histograms or whatever
> >> on behalf of exploiters doesn't need any help from the exploiter.
> >> We can avoid the usual callbacks and code bloat in exploiters
> >> this way.
> > 
> > I don't really understand what you are stating here.
> 
> Sorry.
> 1,$s/exploiter/client/g
> 
> Any device driver or whatever gathering statistics data currently
> has code dealing with showing the data. Usually, they have some
> callbacks for procfs, sysfs or whatever.
> 
> My point is that, if a library keeps track of statistics on behalf
> of its clients, no client needs to be called back in order to
> merge, format, copy, etc. data being shown to users. The library
> can handle as a background operation without disturbing clients.

That could be a good thing.  OTOH, it means that the library
has to be either all-ways flexible or willing to change to
accommodate clients since you can't predict the universe of all
clients' requirements.

> >> 8) If some library functions are responsible for showing data, and the
> >> exploiter is not, we can achieve a common format for statistics data.
> >> For example, a histogram about block I/O has the same format as
> >> a histogram about network I/O.
> >> This provides ease of use and minimises the effort of writing
> >> scripts that could do further processing (e.g. formatting as
> >> spreadsheats or bar charts, comparison and summarisation of
> >> statistics, ...)
> > 
> > Common functionality and formats would be wonderful.  But I'm not sure
> > you can guarantee that we really want the network io and block io
> > statistics in the same format, as they are fundimentally different
> > things.
> 
> Subsystems are free to gather as many/few statistics as required.
> And I am not trying to enforce semantics.
> 
> All I am saying is that, if two statistics are aggregated using similar
> algorithms, then the results should be presented or formatted in a
> similar way.

Am I reading this correctly?  Are you trying to put presentation
format in the statistics library in the kernel???


> My assumption is that the format of results doesn't depend on the
> the semantics of the data feeding a statistic. But it depends on the
> way we aggregate data.
> 
> For example, there is no reason why statistic A of subsystem 1
> aggregated in the form of a histogram should have a different format
> than statistic B of subsystem 2 also being aggregated in the form
> of a histogram.
> 
> A <=0 0
> A <=1 0
> A <=2 3
> A <=4 7
> A <=8 29
> A <=16 285
> A <=32 295
> A <=64 96
> A <=128 52
> A <=256 3
> A >256 1
> 
> 
> B <=10 1
> B <=20 3
> B <=30 92
> B <=40 251
> ...
> B <=490 34462
> B <=500 23434
> B >500 0
> 
> Semantics are different; statistic names are different;
> number of buckets, "diameter" of buckets, scale etc. might be different;
> basic format of results is identical - as long as both statistics are
> aggregated the same way (as histograms, in this case).
> 
> A library can provide a common format, because semantics just don't
> matter. Its statistic_add() function (or whatever we want to call it)
> has no idea about the actual semantics of the incremental statistic data
> it accepts and processes according to abstract rules.
> 
> And I think a library should provide a common format, because it
> makes it fun poking in the aggregated data, and writing a script that
> does further processing of that data.

Do you mean a userspace library here?  The statements still apply
to a userspace library.

> > Also, you will have to live with the existing interfaces, as we can't
> > break them, so porting them will not happen.
> 
> Okay.
> A library could help to avoid a further proliferation of interfaces.
> 
> >> 9) For performance reasons, per-cpu data and minimal locking
> >> (local_irq_save/restore) should be used.
> >> Adds to complexity, though.
> > 
> > If necessary.  Is this really necessary?
> 
> I would think so.

Do your converted clients use all of the stat. infrastructure
interfaces or are some of them added just to round out the
full API?


> >> 14) Kernel code delivering statistics data through library routines
> >> can, at best, guess whether a user wants incremental updates be
> >> aggregated in a single counter, a set of counters (histograms), or
> >> in the form of other results. Users might want to change how much
> >> detail is retained in aggregated statistic results.
> >> Adds to complexity.
> > 
> > Complexity where?  Userspace or in the kernel?
> 
> Complexity in the kernel. Sorry.
> 
> When a statistics library allows users to chose from about half a
> dozen ways of aggregating data, then this adds to the complexity
> of that library to some degree.


> >> 21) Processing of (X, Y) according to abstract rules imposed by
> >> counters, histograms etc. doesn't require any knowledge about the
> >> semantics of X or Y.
> >>
> >> 22) There might be statistic counters that exploiters want to use and
> >> maintain on their own, and which users still may want to have a look at
> >> along with other statistics. Statistic_set() fits in here nicely.
> > 
> > 
> > Ok, these are all implementation details.
> 
> Maybe. But at least 21) is fundamental, as it provides a base for
> writing such a library: The library deals with a defined form of
> data, regardless of the semantics of the data.

Does 22) make the library somewhat extensible?  If not, does
anything do that?

> > Can you please step back a bit?  What is the requirements that you are
> > trying to achieve here?
> 
> Our customers have serious concerns that Linux has no means
> to gather SCSI performance data. Making sure we can get data from
> subsystems, we both provide for better service and give customers
> a good feeling.
> 
> Statistics, and SCSI statistics in particular, are seen here as one
> of the more urgent things and real inhibitors on enterprise level.
> 
>  > A kernel-wide statistic gathering library?
> 
> Yes, as a by-product of the specific SCSI requirement, so to speak.
> And, why not :)
> 
> > If so, why?  What has caused this to be needed?
> 
> A clear distinction between code measuring statistics data and
> code handling statistics data makes for better code.
> There is no point in intermixing algorithms for processing
> statistics data and the semantics of statistics data.
> 
> So what would you do if you got to write the N-th set of statistic
> functions?
> 
> To me it looks like the next logical step to fully abstract
> statistics code out of a device driver.
> 
>  > And if it's needed, would
> > putting the stuff in debugfs for _all_ statistics really be a good idea
> > (hint, I would say no...)
> 
> May I ask you why you think so.
> 
> Well, so far I don't see a serious limitation in using debugfs.
> I think relayfs entries could be used to cover other requirements,
> if they pop up.
> 
> And as I have explained, replacing debugfs by something else
> shouldn't be too difficult.
> But, I don't see a clear direction regarding this discussion.
> 
> Or do you suggest that it would make sense to modularise that
> part of the code, so as to allow for other user interface code
> being "plugged in" and statistics data being shown through
> debugfs, procfs, netlink or whatever?
> 
> >>>> And what does this mean for relayfs?  Those developers tuned that code
> >>>> to the nth degree to get speed and other goodness, and here you go just
> >>>> ignoring that stuff and add yet another way to get stats out of the
> >>>> kernel.  Why should I use this instead of my own code with relayfs?
> >>> Good questions.
> >> Relayfs is a nice feature, but not appropriate here.
> >>
> >> For example, during a performance measurements I have seen
> >> SCSI I/O related statistics being updated millions of times while
> >> I was just having a short lunch break. Some of them just increased
> >> a counter, which is pretty fast if done immediately in the kernel.
> >> If all these updates update would have to be relayed to user space
> >> to just increase a counter maintained in user space.. urgh, surely
> >> more expensive and not the way to go.

Oh really, I wouldn't expect such a poor design (of pushing each
counter update to userspace) to be considered seriously.
It should be more like a procfs^W sysfs entry at least, or something
similar to a MIB, or what iostat does.  Does iostat not even
come close to what you want for SCSI I/O statistics?


> >> And what if user space isn't interested at all? Would we keep
> >> pumping zillions of unused updates into buffers instead of
> >> discarding them right away?
> > 
> > Yes, for simple counters, relayfs is overkill.  But so is an indirect
> > function call through a pointer for every simple counter update :)
> 
> Got it.
> 
> >> Profile.c, taskstats, genhd and all the other statistics listed
> >> above... they all maintain their counters in the kernel and
> >> show aggregated statistics to users.
> > 
> > Yes, but will you be allowed to port the existing users over to your new
> > framework without breaking any userspace stuff?  I don't see that
> > happening :(
> 
> Would it be me porting...? ;-)
> 
> I see this library as an offering to anybody who is looking
> for a comfortable and established way to dump statistic data,
> including me.
> 
> >>>> And is the need for the in-kernel parser really necessary?  I know it
> >>>> makes the userspace tools simpler (cat and echo), but should we be
> >>>> telling the kernel how to filter and adjust the data?  Shouldn't we just
> >>>> dump it all to userspace and use tools there to manipulate it?
> >>> I agree again.
> >> Assumimg we can agree on in-kernel counters, histograms etc.
> >> allowing for attributes being adjusted by users makes sense.
> >>
> >> The parser stuff required for these attributes is implemented
> >> using match_token() & friends, which should be acceptible.
> >> But, I think that the standard way of using match_token() and
> >> strsep() needs improvement (strsep is destructive to strings
> >> parsed, which is painful).
> > 
> > Yeah, the parser isn't as bad as I originally thought it was.  But
> > overall, I'm still not sold on the real need for this kind of
> > subsystem/library.
> 
> In my eyes, there are several indications that a library makes sense:
> 
> We want statistics for various components.
> Many of the reinvent-the-wheel statistics have similar programming interfaces
> (e.g. compare disk_stat_add(), dasd_profile_counter(), profile_hit()).
> There is unnecessary code duplication.
> There is no need to have statistics user interface code spread throughout
> the kernel.
> A library can achieve a common output format, simplyfing user space.
> A defined programming interface makes it much easier to get a general
> idea of the statistics being around. An API gives more control and
> might help to avoid introducing redundant statistics or statistics of
> lesser importance.
> 
> I am not saying that such a library has to look exactly like the
> proposed patches. I think that these patches contain some concepts
> worth considering.

Thanks.
---
~Randy

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-20 16:50               ` Randy.Dunlap
@ 2006-06-21 18:51                 ` Martin Peschke
  2006-06-21 19:38                   ` Matthew Frost
  0 siblings, 1 reply; 166+ messages in thread
From: Martin Peschke @ 2006-06-21 18:51 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: greg, akpm, ak, linux-kernel

On Tue, 2006-06-20 at 09:50 -0700, Randy.Dunlap wrote:
> On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote:
> > Greg KH wrote:
> > >> 7) With regard to the delivery of statistic data to user land,
> > >> a library maintaining statistic counters, histograms or whatever
> > >> on behalf of exploiters doesn't need any help from the exploiter.
> > >> We can avoid the usual callbacks and code bloat in exploiters
> > >> this way.
> > > 
> > > I don't really understand what you are stating here.
> > 
> > Sorry.
> > 1,$s/exploiter/client/g
> > 
> > Any device driver or whatever gathering statistics data currently
> > has code dealing with showing the data. Usually, they have some
> > callbacks for procfs, sysfs or whatever.
> > 
> > My point is that, if a library keeps track of statistics on behalf
> > of its clients, no client needs to be called back in order to
> > merge, format, copy, etc. data being shown to users. The library
> > can handle as a background operation without disturbing clients.
> 
> That could be a good thing.  OTOH, it means that the library
> has to be either all-ways flexible or willing to change to
> accommodate clients since you can't predict the universe of all
> clients' requirements.

Right. I have made provisions for that to some degree.


First, I could imagine that the statistics data of a client requires
a new way its data should be aggregated and, therewith, requires
a new form of statistic result being shown to users.

I have scanned through the kernel sources for ways of aggregating
and showing statistics data. The usual constructs appear to be:

- counter
- histogram (for intervals), linear scale
- histogram (for intervals), logarithmic scale
- "histogram" for discrete and sparse values
- "utilisation indicator" or "fill level indicator" (num-min-avg-max)

These are implemented in my patches. I would expect these to cover most
requirements of possible new clients.

If another construct would be needed anyway, it can be added to the
statistics library by implemententing about half a dozen routines
described by struct statistic_discipline. I might be wrong, but I don't
think we would see an inflationary growth there.


Second, if a client needs to know anyway when users read statistics
data, e.g. because it wants to update some statistic then, it can
register an optional callback with the statistic infrastructure. This
callback is described in struct statistic_interface().


Third, if a client preferred its data being exported to user land
through a transport other than debugfs ... okay, then I will need to
enhance the statistics library. Moderate effort, I guess. Actually, I
already had a private patch that made the library use the evil procfs
instead of debugfs.


Fourth, if a client would like to take advantage of the library's
existing aggregation code, e.g. the library compiles a histogram on
behalf of the client, _but_ the client doesn't like the way the result
is shown, e.g. the client wants a sysfs file for each bucket instead of
a single debugfs file containing all data... well, that would defeat the
purpose of the library, if this kind of requirement gets out of hand.

OTOH, I don't see a real need for allowing that. Data can be reformatted
and rearranged in any possible way in user space.

> > >> 8) If some library functions are responsible for showing data, and the
> > >> exploiter is not, we can achieve a common format for statistics data.
> > >> For example, a histogram about block I/O has the same format as
> > >> a histogram about network I/O.
> > >> This provides ease of use and minimises the effort of writing
> > >> scripts that could do further processing (e.g. formatting as
> > >> spreadsheats or bar charts, comparison and summarisation of
> > >> statistics, ...)
> > > 
> > > Common functionality and formats would be wonderful.  But I'm not sure
> > > you can guarantee that we really want the network io and block io
> > > statistics in the same format, as they are fundimentally different
> > > things.
> > 
> > Subsystems are free to gather as many/few statistics as required.
> > And I am not trying to enforce semantics.
> > 
> > All I am saying is that, if two statistics are aggregated using similar
> > algorithms, then the results should be presented or formatted in a
> > similar way.
> 
> Am I reading this correctly?  Are you trying to put presentation
> format in the statistics library in the kernel???

Aehm, no.

What's needed is simply an understanding between kernel and user space
on how statistics data reads.

If the interface were ioctl-based, I would need to define some
structures containing the data.
If the interface was netlink based, I would need to define some packet
headers and fields (see taskstats, for example).
And so on...

Since my proposed interface uses debugfs, I have defined a minimal set
of rules describing the ASCII output (compare sample output below):

- A file contains all statistics of the measured entity.
  So, each output line is labeled with the name of the statistic
  it belongs to.

- Each statistic may consist of several pieces, that is, output lines.
  So, each line of a multi-line statistic has another label,
  e.g. ">256" marking a histogram's bucket for values >256.

These rules merely strive for unambiguousness of the file content.
Coincidentally, readability isn't that bad, as well.

> > My assumption is that the format of results doesn't depend on the
> > the semantics of the data feeding a statistic. But it depends on the
> > way we aggregate data.
> > 
> > For example, there is no reason why statistic A of subsystem 1
> > aggregated in the form of a histogram should have a different format
> > than statistic B of subsystem 2 also being aggregated in the form
> > of a histogram.
> > 
> > A <=0 0
> > A <=1 0
> > A <=2 3
> > A <=4 7
> > A <=8 29
> > A <=16 285
> > A <=32 295
> > A <=64 96
> > A <=128 52
> > A <=256 3
> > A >256 1
> > 
> > 
> > B <=10 1 
> > B <=20 3
> > B <=30 92
> > B <=40 251
> > ...
> > B <=490 34462
> > B <=500 23434
> > B >500 0
> > 
> > Semantics are different; statistic names are different;
> > number of buckets, "diameter" of buckets, scale etc. might be different;
> > basic format of results is identical - as long as both statistics are
> > aggregated the same way (as histograms, in this case).
> > 
> > A library can provide a common format, because semantics just don't
> > matter. Its statistic_add() function (or whatever we want to call it)
> > has no idea about the actual semantics of the incremental statistic data
> > it accepts and processes according to abstract rules.
> > 
> > And I think a library should provide a common format, because it
> > makes it fun poking in the aggregated data, and writing a script that
> > does further processing of that data.
> 
> Do you mean a userspace library here?  The statements still apply
> to a userspace library.

No, I am still talking about the kernel's statistic library functions.

> > > Also, you will have to live with the existing interfaces, as we can't
> > > break them, so porting them will not happen.
> > 
> > Okay.
> > A library could help to avoid a further proliferation of interfaces.
> > 
> > >> 9) For performance reasons, per-cpu data and minimal locking
> > >> (local_irq_save/restore) should be used.
> > >> Adds to complexity, though.
> > > 
> > > If necessary.  Is this really necessary?
> > 
> > I would think so.
> 
> Do your converted clients use all of the stat. infrastructure
> interfaces or are some of them added just to round out the
> full API?

My client patches for zfcp and scsi together use all of the
statistic infrastructure's features and interfaces, except for
the optional callback in statistic_interface and the related
statistic_set().

> > >> 21) Processing of (X, Y) according to abstract rules imposed by
> > >> counters, histograms etc. doesn't require any knowledge about the
> > >> semantics of X or Y.
> > >>
> > >> 22) There might be statistic counters that exploiters want to use and
> > >> maintain on their own, and which users still may want to have a look at
> > >> along with other statistics. Statistic_set() fits in here nicely.
> > > 
> > > 
> > > Ok, these are all implementation details.
> > 
> > Maybe. But at least 21) is fundamental, as it provides a base for
> > writing such a library: The library deals with a defined form of
> > data, regardless of the semantics of the data.
> 
> Does 22) make the library somewhat extensible?  If not, does
> anything do that?

21) or 22) ??

I don't understand what you are asking. Extensible in what regard?
 
> > >>>> And what does this mean for relayfs?  Those developers tuned that code
> > >>>> to the nth degree to get speed and other goodness, and here you go just
> > >>>> ignoring that stuff and add yet another way to get stats out of the
> > >>>> kernel.  Why should I use this instead of my own code with relayfs?
> > >>> Good questions.
> > >> Relayfs is a nice feature, but not appropriate here.
> > >>
> > >> For example, during a performance measurements I have seen
> > >> SCSI I/O related statistics being updated millions of times while
> > >> I was just having a short lunch break. Some of them just increased
> > >> a counter, which is pretty fast if done immediately in the kernel.
> > >> If all these updates update would have to be relayed to user space
> > >> to just increase a counter maintained in user space.. urgh, surely
> > >> more expensive and not the way to go.
> 
> Oh really, I wouldn't expect such a poor design (of pushing each
> counter update to userspace) to be considered seriously.
> It should be more like a procfs^W sysfs entry at least, or something
> similar to a MIB, or what iostat does.

We are in agreement.

> Does iostat not even
> come close to what you want for SCSI I/O statistics?

Not really.

First, it measures from __make_request to end_that_request_last.
So it's not very close to the point in time when I/Os hit the wire.

Second, it merely provides sums and average values for all requests of a
device. It doesn't provide much detail about the actual traffic pattern,
like histograms for request latencies and request sizes.

Thanks, Martin


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-21 18:51                 ` Martin Peschke
@ 2006-06-21 19:38                   ` Matthew Frost
  2006-06-22 11:43                     ` Martin Peschke
  0 siblings, 1 reply; 166+ messages in thread
From: Matthew Frost @ 2006-06-21 19:38 UTC (permalink / raw)
  To: Martin Peschke; +Cc: Randy.Dunlap, greg, akpm, ak, linux-kernel

Martin Peschke wrote:
> On Tue, 2006-06-20 at 09:50 -0700, Randy.Dunlap wrote:
>> On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote:
>>> Greg KH wrote:
>>>>> 7) With regard to the delivery of statistic data to user land,
>>>>> a library maintaining statistic counters, histograms or whatever
>>>>> on behalf of exploiters doesn't need any help from the exploiter.
>>>>> We can avoid the usual callbacks and code bloat in exploiters
>>>>> this way.
>>>> I don't really understand what you are stating here.
>>> Sorry.
>>> 1,$s/exploiter/client/g
>>>
>>> Any device driver or whatever gathering statistics data currently
>>> has code dealing with showing the data. Usually, they have some
>>> callbacks for procfs, sysfs or whatever.
>>>
>>> My point is that, if a library keeps track of statistics on behalf
>>> of its clients, no client needs to be called back in order to
>>> merge, format, copy, etc. data being shown to users. The library
>>> can handle as a background operation without disturbing clients.
>> That could be a good thing.  OTOH, it means that the library
>> has to be either all-ways flexible or willing to change to
>> accommodate clients since you can't predict the universe of all
>> clients' requirements.
> 
> Right. I have made provisions for that to some degree.
> 
> 
> First, I could imagine that the statistics data of a client requires
> a new way its data should be aggregated and, therewith, requires
> a new form of statistic result being shown to users.
> 
> I have scanned through the kernel sources for ways of aggregating
> and showing statistics data. The usual constructs appear to be:
> 
> - counter
> - histogram (for intervals), linear scale
> - histogram (for intervals), logarithmic scale
> - "histogram" for discrete and sparse values
> - "utilisation indicator" or "fill level indicator" (num-min-avg-max)
> 
> These are implemented in my patches. I would expect these to cover most
> requirements of possible new clients.

So you're saying, as regards "putting presentation format in ... the 
kernel", that we already have presentation formats specified pell-mell 
in the kernel.  That should then be a non-issue, because you aren't 
introducing anything new, just centralizing an existing kernel behavior. 
  Do I have you right?

> 
> If another construct would be needed anyway, it can be added to the
> statistics library by implemententing about half a dozen routines
> described by struct statistic_discipline. I might be wrong, but I don't
> think we would see an inflationary growth there.
> 
> 
-- elision --
> 
> OTOH, I don't see a real need for allowing that. Data can be reformatted
> and rearranged in any possible way in user space.

Because you're just providing a range of basic output formats, 
standardized.  So anybody can ask for statistics from the kernel in a 
preferred output to then massage as needed in userland.  ACK?  Am I 
oversimplifying?

Matt



^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-21 19:38                   ` Matthew Frost
@ 2006-06-22 11:43                     ` Martin Peschke
  0 siblings, 0 replies; 166+ messages in thread
From: Martin Peschke @ 2006-06-22 11:43 UTC (permalink / raw)
  To: artusemrys; +Cc: Randy.Dunlap, greg, akpm, ak, linux-kernel

Matthew Frost wrote:
> Martin Peschke wrote:
>> On Tue, 2006-06-20 at 09:50 -0700, Randy.Dunlap wrote:
>>> On Tue, 20 Jun 2006 17:40:01 +0200 Martin Peschke wrote:
>>>> Greg KH wrote:
>>>>>> 7) With regard to the delivery of statistic data to user land,
>>>>>> a library maintaining statistic counters, histograms or whatever
>>>>>> on behalf of exploiters doesn't need any help from the exploiter.
>>>>>> We can avoid the usual callbacks and code bloat in exploiters
>>>>>> this way.
>>>>> I don't really understand what you are stating here.
>>>> Sorry.
>>>> 1,$s/exploiter/client/g
>>>>
>>>> Any device driver or whatever gathering statistics data currently
>>>> has code dealing with showing the data. Usually, they have some
>>>> callbacks for procfs, sysfs or whatever.
>>>>
>>>> My point is that, if a library keeps track of statistics on behalf
>>>> of its clients, no client needs to be called back in order to
>>>> merge, format, copy, etc. data being shown to users. The library
>>>> can handle as a background operation without disturbing clients.
>>> That could be a good thing.  OTOH, it means that the library
>>> has to be either all-ways flexible or willing to change to
>>> accommodate clients since you can't predict the universe of all
>>> clients' requirements.
>>
>> Right. I have made provisions for that to some degree.
>>
>>
>> First, I could imagine that the statistics data of a client requires
>> a new way its data should be aggregated and, therewith, requires
>> a new form of statistic result being shown to users.
>>
>> I have scanned through the kernel sources for ways of aggregating
>> and showing statistics data. The usual constructs appear to be:
>>
>> - counter
>> - histogram (for intervals), linear scale
>> - histogram (for intervals), logarithmic scale
>> - "histogram" for discrete and sparse values
>> - "utilisation indicator" or "fill level indicator" (num-min-avg-max)
>>
>> These are implemented in my patches. I would expect these to cover most
>> requirements of possible new clients.
> 
> So you're saying, as regards "putting presentation format in ... the 
> kernel", that we already have presentation formats specified pell-mell 
> in the kernel.  That should then be a non-issue, because you aren't 
> introducing anything new, just centralizing an existing kernel behavior. 
>  Do I have you right?

Yes, there seem to be as many formats as statistics. See examples below.
My patches can help to improve usuability by providing some common basic
formats.

IMO, it would not make sense to "enhance" the statistics library in an
attempt to emulate all the preexisting statistic output formats.

[root@t2930041 ~]# cat /proc/dasd/statistics
31 dasd I/O requests
with 392 sectors(512B each)
    __<4    ___8    __16    __32    __64    _128    _256    _512    __1k
    __2k    __4k    __8k    _16k    _32k    _64k    128k
    _256    _512    __1M    __2M    __4M    __8M    _16M    _32M    _64M
    128M    256M    512M    __1G    __2G    __4G    _>4G
Histogram of sizes (512B secs)
       0       0      21       7       3       0       0       0       0
       0       0       0       0       0       0       0
       0       0       0       0       0       0       0       0       0
       0       0       0       0       0       0       0
Histogram of I/O times (microseconds)
       0       0       0       0       0       0       0       3       6
       1       5       6      10       0       0       0
       0       0       0       0       0       0       0       0       0
       0       0       0       0       0       0       0
<snip>

[root@t2930041 ~]# cat /proc/diskstats
<snip>
   94    0 dasda 67389 1478 1142520 281080 78260 461181 4326752 10280570
  0 6392030 10565980
   94    1 dasda1 68849 1142272 540838 4326704
   94    4 dasdb 27 29 448 0 0 0 0 0 0 0 0
   94    5 dasdb1 27 216 0 0
   94    8 dasdc 28 29 456 40 0 0 0 0 0 30 40
   94    9 dasdc1 28 224 0 0
    9    0 md0 0 0 0 0 0 0 0 0 0 0 0
    8    0 sda 35423 12268 4340826 284540 8605 275966 2276792 980810
  0 219260 1265370
    8   16 sdb 36741 12626 4588754 293140 10090 277678 2302400 440010
  0 221990 733140
    8   32 sdc 36621 11748 4548722 298170 10394 272680 2264736 303580
  0 223110 601730

[root@t2930041 ~]# cat /proc/net/stat/arp_cache
entries  allocs destroys hash_grows  lookups hits  res_failed
rcv_probes_mcast rcv_probes_ucast  periodic_gc_runs forced_gc_runs
00000002  0000002c 000000cd 00000000  00000082 00000056
00000000  00000000 00000000  00017306 00000000
00000002  00000031 00000000 00000000  0000007d 0000004c
00000000  00000000 00000000  00000000 00000000
00000002  0000002f 00000000 00000001  00000074 00000045
00000000  00000000 00000000  00000000 00000000
00000002  00000043 00000000 00000000  00000084 00000041
00000000  00000000 00000000  00000000 00000000

>> If another construct would be needed anyway, it can be added to the
>> statistics library by implemententing about half a dozen routines
>> described by struct statistic_discipline. I might be wrong, but I don't
>> think we would see an inflationary growth there.
>>
>>
> -- elision --
>>
>> OTOH, I don't see a real need for allowing that. Data can be reformatted
>> and rearranged in any possible way in user space.
> 
> Because you're just providing a range of basic output formats, 
> standardized.  So anybody can ask for statistics from the kernel in a 
> preferred output to then massage as needed in userland.  ACK?  Am I 
> oversimplifying?

Sounds reasonable.

Thanks, Martin


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-13 23:47     ` statistics infrastructure (in -mm tree) review Greg KH
  2006-06-14  0:18       ` Randy.Dunlap
@ 2006-06-14  5:04       ` Andi Kleen
  2006-06-14 22:49         ` Martin Peschke
  2006-06-17 10:30       ` Martin Peschke
  2 siblings, 1 reply; 166+ messages in thread
From: Andi Kleen @ 2006-06-14  5:04 UTC (permalink / raw)
  To: Greg KH, mp3; +Cc: akpm, linux-kernel

Greg KH <greg@kroah.com> writes:
> > + * exploiters don't update several statistics of the same entity in one go.
> > + */
> > +static inline void statistic_add(struct statistic *stat, int i,
> > +				 s64 value, u64 incr)
> > +{
> > +	unsigned long flags;
> > +	local_irq_save(flags);
> > +	if (stat[i].state == STATISTIC_STATE_ON)
> > +		stat[i].add(&stat[i], smp_processor_id(), value, incr);


Indirect call in statistics hotpath?  You know how slow this is 
on IA64 and even on other architectures it tends to disrupt 
the pipeline.

Also on i386 the u64s generate quite bad code.

That looks like a really bad implementation that shouldn't be used
anywhere.

-Andi

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-14  5:04       ` Andi Kleen
@ 2006-06-14 22:49         ` Martin Peschke
  2006-06-16 20:40           ` Greg KH
  2006-06-17  6:51           ` Andi Kleen
  0 siblings, 2 replies; 166+ messages in thread
From: Martin Peschke @ 2006-06-14 22:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Greg KH, akpm, linux-kernel, rdunlap

Andi Kleen wrote:
> Greg KH <greg@kroah.com> writes:
>>> + * exploiters don't update several statistics of the same entity in one go.
>>> + */
>>> +static inline void statistic_add(struct statistic *stat, int i,
>>> +				 s64 value, u64 incr)
>>> +{
>>> +	unsigned long flags;
>>> +	local_irq_save(flags);
>>> +	if (stat[i].state == STATISTIC_STATE_ON)
>>> +		stat[i].add(&stat[i], smp_processor_id(), value, incr);
> 
> 
> Indirect call in statistics hotpath?  You know how slow this is 
> on IA64 and even on other architectures it tends to disrupt 
> the pipeline.

Okay, let's try to improve it then. The options here are:

a) Replace the indirect function call by a switch statement which directly
    calls the add function of the data processing mode chosen by user.
    (e.g. simple counter, histogram, utilisation indicator etc.).

    No loss in functionality, slightly uglier code, acceptable performance(?).
    This would be my choice.

b) Export statistic_add_counter(), statistic_add_histogram() and the like
    as part of the programming API (maybe in addition to the flexible
    statistic_add()) for those exploiters that definitively can't effort
    branching into a function.

    Loss in functionality (exploiting kernel code dictates how users see the
    data), a bit faster than option a).

What do you think? Did I miss an option?

Thanks, Martin



^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-14 22:49         ` Martin Peschke
@ 2006-06-16 20:40           ` Greg KH
  2006-06-16 21:34             ` Martin Peschke
  2006-06-17  6:51           ` Andi Kleen
  1 sibling, 1 reply; 166+ messages in thread
From: Greg KH @ 2006-06-16 20:40 UTC (permalink / raw)
  To: Martin Peschke; +Cc: Andi Kleen, akpm, linux-kernel, rdunlap

On Thu, Jun 15, 2006 at 12:49:54AM +0200, Martin Peschke wrote:
> Andi Kleen wrote:
> >Greg KH <greg@kroah.com> writes:
> >>>+ * exploiters don't update several statistics of the same entity in one 
> >>>go.
> >>>+ */
> >>>+static inline void statistic_add(struct statistic *stat, int i,
> >>>+				 s64 value, u64 incr)
> >>>+{
> >>>+	unsigned long flags;
> >>>+	local_irq_save(flags);
> >>>+	if (stat[i].state == STATISTIC_STATE_ON)
> >>>+		stat[i].add(&stat[i], smp_processor_id(), value, incr);
> >
> >
> >Indirect call in statistics hotpath?  You know how slow this is 
> >on IA64 and even on other architectures it tends to disrupt 
> >the pipeline.
> 
> Okay, let's try to improve it then. The options here are:
> 
> a) Replace the indirect function call by a switch statement which directly
>    calls the add function of the data processing mode chosen by user.
>    (e.g. simple counter, histogram, utilisation indicator etc.).
> 
>    No loss in functionality, slightly uglier code, acceptable 
>    performance(?).
>    This would be my choice.

Probably best.  Just don't make it an inline function :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-16 20:40           ` Greg KH
@ 2006-06-16 21:34             ` Martin Peschke
  0 siblings, 0 replies; 166+ messages in thread
From: Martin Peschke @ 2006-06-16 21:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Greg KH, akpm, linux-kernel, rdunlap

Greg KH wrote:
> On Thu, Jun 15, 2006 at 12:49:54AM +0200, Martin Peschke wrote:
>> Andi Kleen wrote:
>>> Greg KH <greg@kroah.com> writes:
>>>>> + * exploiters don't update several statistics of the same entity in one 
>>>>> go.
>>>>> + */
>>>>> +static inline void statistic_add(struct statistic *stat, int i,
>>>>> +				 s64 value, u64 incr)
>>>>> +{
>>>>> +	unsigned long flags;
>>>>> +	local_irq_save(flags);
>>>>> +	if (stat[i].state == STATISTIC_STATE_ON)
>>>>> +		stat[i].add(&stat[i], smp_processor_id(), value, incr);
>>>
>>> Indirect call in statistics hotpath?  You know how slow this is 
>>> on IA64 and even on other architectures it tends to disrupt 
>>> the pipeline.
>> Okay, let's try to improve it then. The options here are:
>>
>> a) Replace the indirect function call by a switch statement which directly
>>    calls the add function of the data processing mode chosen by user.
>>    (e.g. simple counter, histogram, utilisation indicator etc.).
>>
>>    No loss in functionality, slightly uglier code, acceptable 
>>    performance(?).
>>    This would be my choice.
> 
> Probably best.  Just don't make it an inline function :)

Andi, would this be fine with you?

Thanks, Martin


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-14 22:49         ` Martin Peschke
  2006-06-16 20:40           ` Greg KH
@ 2006-06-17  6:51           ` Andi Kleen
  2006-06-17 11:03             ` Martin Peschke
  1 sibling, 1 reply; 166+ messages in thread
From: Andi Kleen @ 2006-06-17  6:51 UTC (permalink / raw)
  To: Martin Peschke; +Cc: Greg KH, akpm, linux-kernel, rdunlap


> b) Export statistic_add_counter(), statistic_add_histogram() and the like
>     as part of the programming API (maybe in addition to the flexible
>     statistic_add()) for those exploiters that definitively can't effort
>     branching into a function.
>
>     Loss in functionality (exploiting kernel code dictates how users see
> the data), a bit faster than option a).

(b) if anything. But do we really need all these weird options anyways? 
For me it seems you're far overdesigning.

> What do you think? Did I miss an option?

I think your whole approach is about 10x too complicated.
The normal approach in Linux is to start simple and add complexity later as 
needed.
You seem to try to do it the other way around.

-Andi


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-17  6:51           ` Andi Kleen
@ 2006-06-17 11:03             ` Martin Peschke
  0 siblings, 0 replies; 166+ messages in thread
From: Martin Peschke @ 2006-06-17 11:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Greg KH, akpm, linux-kernel, rdunlap

On Sat, 2006-06-17 at 08:51 +0200, Andi Kleen wrote:
> > b) Export statistic_add_counter(), statistic_add_histogram() and the like
> >     as part of the programming API (maybe in addition to the flexible
> >     statistic_add()) for those exploiters that definitively can't effort
> >     branching into a function.
> >
> >     Loss in functionality (exploiting kernel code dictates how users see
> > the data), a bit faster than option a).
> 
> (b) if anything.

Yes, I have anticipated this choice. I am looking into this option.

> But do we really need all these weird options anyways? 

Which options?

Assuming you refer to the distinction of counter, histogram, utilisation
indicator etc. ... well, that's what I found when was looking into
existing approaches: counters everywhere, histograms for example in the
s390 DASD driver, some with linear scale, other with logarithmic scale,
counters that only make sense if seen in combination (which made me come
up with this utilisation indicator thing), ...

I have just been trying to find a simple concept to reconcile various
ways of preprocessing statistics data. This is reflected by
struct statistic_discipline.

> For me it seems you're far overdesigning.
> I think your whole approach is about 10x too complicated.

I disagree.
The programing interface is simple.
The modularisation of data processing modes is straight-forward.

I have tried to break down my design into a dozen and a half
assumptions in my other mail.
I am happy to discuss which of them make sense, which of them
might be overkill, which might be deferred, etc.

But please understand that it is hard for me to guess which
10th part of my design is okay for you, if you don't go into details.

A fair share of complexity is caused by performance considerations
(per-cpu data). Which should be fine.
And, in that regard, my code isn't quite as complex yet as
lib/profile.c.

Thanks, Martin

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: statistics infrastructure (in -mm tree) review
  2006-06-13 23:47     ` statistics infrastructure (in -mm tree) review Greg KH
  2006-06-14  0:18       ` Randy.Dunlap
  2006-06-14  5:04       ` Andi Kleen
@ 2006-06-17 10:30       ` Martin Peschke
  2 siblings, 0 replies; 166+ messages in thread
From: Martin Peschke @ 2006-06-17 10:30 UTC (permalink / raw)
  To: Greg KH; +Cc: Wu Fengguang, akpm, linux-kernel

On Tue, 2006-06-13 at 16:47 -0700, Greg KH wrote:
> ... I'd really
> like to see some other, real-world usages of this.  Like perhaps the
> io-schedular statistics?  Some other /proc stats that have nothing to do
> with processes?

Wu is trying it out for readahead statistics:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114946958531310&w=2
I am working on SCSI I/O statistics:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114780190921567&w=2

The zfcp driver (FCP HBA driver for s390) in -mm exports
statistics through this infrastructure.

I could imagine that this code might be exploited by other s390 device
drivers, once we are forced to find a replacement for
homegrown statistics in procfs.

> Oh, and use C99 structure initializers for when creating the statisic
> structures in the example code (and real code), it makes it much easier
> to understand, and future proof when the api changes.

good point - done

> Code comments now:
> 
> 
> > diff -puN arch/s390/Kconfig~statistics-infrastructure arch/s390/Kconfig
> > --- devel/arch/s390/Kconfig~statistics-infrastructure	2006-06-09 15:22:58.000000000 -0700
> > +++ devel-akpm/arch/s390/Kconfig	2006-06-09 15:22:58.000000000 -0700
> > @@ -490,8 +490,14 @@ source "drivers/net/Kconfig"
> >  
> >  source "fs/Kconfig"
> >  
> > +menu "Instrumentation Support"
> > +
> >  source "arch/s390/oprofile/Kconfig"
> >  
> > +source "lib/Kconfig.statistic"
> > +
> > +endmenu
> > +
> >  source "arch/s390/Kconfig.debug"
> >  
> >  source "security/Kconfig"
> > diff -puN arch/s390/oprofile/Kconfig~statistics-infrastructure arch/s390/oprofile/Kconfig
> > --- devel/arch/s390/oprofile/Kconfig~statistics-infrastructure	2006-06-09 15:22:58.000000000 -0700
> > +++ devel-akpm/arch/s390/oprofile/Kconfig	2006-06-09 15:22:58.000000000 -0700
> > @@ -1,6 +1,3 @@
> > -
> > -menu "Profiling support"
> > -
> >  config PROFILING
> >  	bool "Profiling support"
> >  	help
> > @@ -18,5 +15,3 @@ config OPROFILE
> >  
> >  	  If unsure, say N.
> >  
> > -endmenu
> > -
> 
> These two patches should probably go somewhere else, they don't have
> much to do with this one.  (well, adding Kconfig.statistic" does, but
> the other wording doesn't.)

sorry, not sure what you mean

> > diff -puN /dev/null include/linux/statistic.h
> > --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> > +++ devel-akpm/include/linux/statistic.h	2006-06-09 15:22:58.000000000 -0700
> > @@ -0,0 +1,348 @@
> > +/*
> > + * include/linux/statistic.h
> > + *
> > + * Statistics facility
> > + *
> > + * (C) Copyright IBM Corp. 2005, 2006
> > + *
> > + * Author(s): Martin Peschke <mpeschke@de.ibm.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2, or (at your option)
> > + * any later version.
> 
> Are you sure "any later version"?

well, let me get back to an IBM lawyer first...
;-)

> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> 
> Two not-needed paragraphs.

ditto.

> > +#ifndef STATISTIC_H
> > +#define STATISTIC_H
> > +
> > +#include <linux/fs.h>
> > +#include <linux/types.h>
> > +#include <linux/percpu.h>
> > +
> > +#define STATISTIC_ROOT_DIR	"statistics"
> > +
> > +#define STATISTIC_FILENAME_DATA	"data"
> > +#define STATISTIC_FILENAME_DEF	"definition"
> > +
> > +#define STATISTIC_NEED_BARRIER	1
> 
> Meta-comment about this file, does most of the stuff in this file,
> really belong here?  At first glance, this should only hold the public
> interface to the statistic code, not everything else needed by the
> internal workings of that code.  It looks like it could be made a lot
> smaller.

I slimmed the header file down, for example by moving some structures
to lib/statistic.c. Do you think a lib/statistic.h would be a better
place?

> > +enum statistic_state {
> > +	STATISTIC_STATE_INVALID,
> > +	STATISTIC_STATE_UNCONFIGURED,
> > +	STATISTIC_STATE_RELEASED,
> > +	STATISTIC_STATE_OFF,
> > +	STATISTIC_STATE_ON
> > +};
> > +
> > +enum statistic_type {
> > +	STATISTIC_TYPE_COUNTER_INC,
> > +	STATISTIC_TYPE_COUNTER_PROD,
> > +	STATISTIC_TYPE_UTIL,
> > +	STATISTIC_TYPE_HISTOGRAM_LIN,
> > +	STATISTIC_TYPE_HISTOGRAM_LOG2,
> > +	STATISTIC_TYPE_SPARSE,
> > +	STATISTIC_TYPE_NONE
> > +};
> 
> Make these bit-safe so sparse can catch mistakes?
> 
> > +#define STATISTIC_FLAGS_NOINCR	0x01
> 
> What's this for?

added comment with explaination

> > +struct sgrb_seg {
> > +	struct list_head list;
> > +	char *address;
> > +	int offset;
> > +	int size;
> > +};
> > +
> > +struct statistic_file_private {
> > +	struct list_head read_seg_lh;
> > +	struct list_head write_seg_lh;
> > +	size_t write_seg_total_size;
> > +};
> > +
> > +struct statistic_merge_private {
> > +	struct statistic *stat;
> > +	spinlock_t lock;
> > +	void *dst;
> > +};
> 
> I'm guessing these three structures aren't needed here.  Otherwise,
> please document them.

moved to lib/statistic.c

> > +#ifdef CONFIG_STATISTICS
> 
> Why ifdef now, so late?

added comment with explaination

> > +extern int statistic_create(struct statistic_interface *, const char *);
> > +extern int statistic_remove(struct statistic_interface *);
> > +
> > +/**
> > + * statistic_add - update statistic with incremental data in (X, Y) pair
> > + * @stat: struct statistic array
> > + * @i: index of statistic to be updated
> > + * @value: X
> > + * @incr: Y
> > + *
> > + * The actual processing of the (X, Y) data pair is determined by the current
> > + * the definition applied to the statistic. See Documentation/statistics.txt.
> > + *
> > + * This variant takes care of protecting per-cpu data. It is preferred whenever
> > + * exploiters don't update several statistics of the same entity in one go.
> > + */
> > +static inline void statistic_add(struct statistic *stat, int i,
> > +				 s64 value, u64 incr)
> > +{
> > +	unsigned long flags;
> > +	local_irq_save(flags);
> > +	if (stat[i].state == STATISTIC_STATE_ON)
> > +		stat[i].add(&stat[i], smp_processor_id(), value, incr);
> > +	local_irq_restore(flags);
> > +}
> 
> These are all inline, which I guess is acceptable.  But see the current
> inline-or-not comments on lkml which may make you rethink this.

Still got to lookup this thread. I might change it later.

> > +/**
> > + * statistic_add_nolock - update statistic with incremental data in (X, Y) pair
> > + * @stat: struct statistic array
> > + * @i: index of statistic to be updated
> > + * @value: X
> > + * @incr: Y
> > + *
> > + * The actual processing of the (X, Y) data pair is determined by the current
> > + * definition applied to the statistic. See Documentation/statistics.txt.
> > + *
> > + * This variant leaves protecting per-cpu data to exploiters. It is preferred
> > + * whenever exploiters update several statistics of the same entity in one go.
> > + */
> > +static inline void statistic_add_nolock(struct statistic *stat, int i,
> > +					s64 value, u64 incr)
> > +{
> > +	if (stat[i].state == STATISTIC_STATE_ON)
> > +		stat[i].add(&stat[i], smp_processor_id(), value, incr);
> > +}
> > +
> > +/**
> > + * statistic_inc - update statistic with incremental data in (X, 1) pair
> > + * @stat: struct statistic array
> > + * @i: index of statistic to be updated
> > + * @value: X
> > + *
> > + * The actual processing of the (X, Y) data pair is determined by the current
> > + * definition applied to the statistic. See Documentation/statistics.txt.
> > + *
> > + * This variant takes care of protecting per-cpu data. It is preferred whenever
> > + * exploiters don't update several statistics of the same entity in one go.
> > + */
> > +static inline void statistic_inc(struct statistic *stat, int i, s64 value)
> > +{
> > +	unsigned long flags;
> > +	local_irq_save(flags);
> > +	if (stat[i].state == STATISTIC_STATE_ON)
> > +		stat[i].add(&stat[i], smp_processor_id(), value, 1);
> > +	local_irq_restore(flags);
> > +}
> 
> Shouldn't this just call statistic_add() with a incr of 1?

correct - changed

> > +
> > +/**
> > + * statistic_inc_nolock - update statistic with incremental data in (X, 1) pair
> > + * @stat: struct statistic array
> > + * @i: index of statistic to be updated
> > + * @value: X
> > + *
> > + * The actual processing of the (X, Y) data pair is determined by the current
> > + * definition applied to the statistic. See Documentation/statistics.txt.
> > + *
> > + * This variant leaves protecting per-cpu data to exploiters. It is preferred
> > + * whenever exploiters update several statistics of the same entity in one go.
> > + */
> > +static inline void statistic_inc_nolock(struct statistic *stat, int i,
> > +					s64 value)
> > +{
> > +	if (stat[i].state == STATISTIC_STATE_ON)
> > +		stat[i].add(&stat[i], smp_processor_id(), value, 1);
> > +}
> 
> Shouldn't this just call statistic_add_nolock with a incr of 1?

ditto

> > diff -puN /dev/null lib/Kconfig.statistic
> > --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> > +++ devel-akpm/lib/Kconfig.statistic	2006-06-09 15:22:58.000000000 -0700
> > @@ -0,0 +1,11 @@
> > +config STATISTICS
> > +	bool "Statistics infrastructure"
> > +	depends on DEBUG_FS
> > +	help
> > +	  The statistics infrastructure provides a debug-fs based user interface
> 
> No "-" in debugfs :)

sorry, has been fixed.

> > +	  for statistics of kernel components, that is, usually device drivers.
> 
> Why mention drivers?  Other things might use this (see original comments
> at the start of the message.)

yep, changed that as well

> > --- /dev/null	2006-06-03 22:34:36.282200750 -0700
> > +++ devel-akpm/lib/statistic.c	2006-06-09 15:22:58.000000000 -0700
> > @@ -0,0 +1,1459 @@
> > +/*
> > + *  lib/statistic.c
> > + *    statistics facility
> > + *
> > + *    Copyright (C) 2005, 2006
> > + *		IBM Deutschland Entwicklung GmbH,
> > + *		IBM Corporation
> > + *
> > + *    Author(s): Martin Peschke (mpeschke@de.ibm.com),
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2, or (at your option)
> > + * any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> 
> Again with the verbose license :)

I will see...

> > +static void _statistic_barrier(void *unused)
> > +{
> > +}
> > +
> > +static inline int statistic_stop(struct statistic *stat)
> > +{
> > +	stat->stopped = sched_clock();
> > +	stat->state = STATISTIC_STATE_OFF;
> > +	/* ensures that all CPUs have ceased updating statistics */
> > +	smp_mb();
> > +	on_each_cpu(_statistic_barrier, NULL, 0, 1);
> > +	return 0;
> > +}
> 
> Isn't there a way to use rcu for this instead?  Just a suggestion, it
> might be totally wrong...

I am not an rcu expert. But I think rcu doesn't help here.
My barrier makes sure that all the concurrent updates have ceased
before I go on to free the underlying memory. It's a "many
writers"-scenario.

> > +
> > +static int statistic_transition(struct statistic *stat,
> > +				struct statistic_info *info,
> > +				enum statistic_state requested_state)
> > +{
> > +	int z = (requested_state < stat->state ? 1 : 0);
> > +	int retval = -EINVAL;
> 
> 	int retval = 0;
> 
> > +
> > +	while (stat->state != requested_state) {
> > +		switch (stat->state) {
> > +		case STATISTIC_STATE_INVALID:
> > +			retval = ( z ? -EINVAL : statistic_initialise(stat) );
> > +			break;
> > +		case STATISTIC_STATE_UNCONFIGURED:
> > +			retval = ( z ? statistic_uninitialise(stat)
> > +				     : statistic_define(stat) );
> > +			break;
> > +		case STATISTIC_STATE_RELEASED:
> > +			retval = ( z ? statistic_initialise(stat)
> > +				     : statistic_alloc(stat, info) );
> > +			break;
> > +		case STATISTIC_STATE_OFF:
> > +			retval = ( z ? statistic_free(stat, info)
> > +				     : statistic_start(stat) );
> > +			break;
> > +		case STATISTIC_STATE_ON:
> > +			retval = ( z ? statistic_stop(stat) : -EINVAL );
> > +			break;
> > +		}
> > +		if (unlikely(retval))
> > +			return retval;
> 
> delete these two lines.
> 
> > +	}
> > +	return 0;
> 
> 	return retval;

I have simplified this loop.

> > +static match_table_t statistic_match_type = {
> > +	{1, "type=%s"},
> > +	{9, NULL}
> > +};
> 
> named field initializers please.

done

> > +static match_table_t statistic_match_common = {
> > +	{STATISTIC_STATE_UNCONFIGURED, "state=unconfigured"},
> > +	{STATISTIC_STATE_RELEASED, "state=released"},
> > +	{STATISTIC_STATE_OFF, "state=off"},
> > +	{STATISTIC_STATE_ON, "state=on"},
> > +	{1001, "name=%s"},
> > +	{1002, "data=reset"},
> > +	{1003, "defaults"},
> > +	{9999, NULL}
> > +};
> 
> Same here.

Well, no one appears to do this with match_table_t.
And agree that this would be overkill.

> And why do you have numbers and a mix of enums here?  Shouldn't you
> define the name=, data= and defaults too?

Just for my convenience. It simplifies the (single) function using it.

> Also, just null terminate the list, is 9999 really needed?

match_token() requires this array to be terminated.

> > +static struct statistic_discipline statistic_discs[] = {
> > +	{ /* STATISTIC_TYPE_COUNTER_INC */
> > +	  NULL,
> > +	  statistic_alloc_generic,
> > +	  NULL,
> > +	  statistic_reset_counter,
> > +	  statistic_merge_counter,
> > +	  statistic_fdata_counter,
> > +	  NULL,
> > +	  statistic_add_counter_inc,
> > +	  statistic_set_counter_inc,
> > +	  "counter_inc", sizeof(u64)
> > +	},
> 
> named initializers please.  That will let you not have to specify the
> NULL fields, making it much easier to read overall.

You are right. Done.

Thanks, Martin


^ permalink raw reply	[flat|nested] 166+ messages in thread

* wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (9 preceding siblings ...)
       [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn>
@ 2006-06-05  1:06 ` Jeff Garzik
  2006-06-05  1:15   ` Andrew Morton
  2006-06-05  8:54   ` Christoph Hellwig
  2006-06-05  1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
                   ` (9 subsequent siblings)
  20 siblings, 2 replies; 166+ messages in thread
From: Jeff Garzik @ 2006-06-05  1:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, netdev, linville

On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> acx1xx-wireless-driver.patch
> fix-tiacx-on-alpha.patch
> tiacx-fix-attribute-packed-warnings.patch
> tiacx-pci-build-fix.patch
> tiacx-ia64-fix.patch
> 
>   It is about time we did something with this large and presumably useful
>   wireless driver.

I've never had technical objections to merging this, just AFAIK it had a
highly questionable origin, namely being reverse-engineered in a
non-clean-room environment that might leave Linux legally vulnerable.

If we can clear that hurdle, by all means pass it on to John Linville
and get it moving towards upstream.

	Jeff




^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05  1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik
@ 2006-06-05  1:15   ` Andrew Morton
  2006-06-05  8:33     ` Andreas Mohr
  2006-06-05  8:54   ` Christoph Hellwig
  1 sibling, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-05  1:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, netdev, linville, Denis Vlasenko

On Sun, 4 Jun 2006 21:06:36 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> > acx1xx-wireless-driver.patch
> > fix-tiacx-on-alpha.patch
> > tiacx-fix-attribute-packed-warnings.patch
> > tiacx-pci-build-fix.patch
> > tiacx-ia64-fix.patch
> > 
> >   It is about time we did something with this large and presumably useful
> >   wireless driver.
> 
> I've never had technical objections to merging this, just AFAIK it had a
> highly questionable origin, namely being reverse-engineered in a
> non-clean-room environment that might leave Linux legally vulnerable.

I never knew that.

<reads changelog>
<reads website>
<reads wiki>

I still don't know that.  Denis, do you know the details?

> If we can clear that hurdle, by all means pass it on to John Linville
> and get it moving towards upstream.

OK, thanks.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05  1:15   ` Andrew Morton
@ 2006-06-05  8:33     ` Andreas Mohr
  2006-06-05  8:45       ` Arjan van de Ven
  0 siblings, 1 reply; 166+ messages in thread
From: Andreas Mohr @ 2006-06-05  8:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, linux-kernel, netdev, linville, Denis Vlasenko,
	acx100-devel, acx100-users

Hi,

On Sun, Jun 04, 2006 at 06:15:15PM -0700, Andrew Morton wrote:
> On Sun, 4 Jun 2006 21:06:36 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
> > >   It is about time we did something with this large and presumably useful
> > >   wireless driver.
> > 
> > I've never had technical objections to merging this, just AFAIK it had a
> > highly questionable origin, namely being reverse-engineered in a
> > non-clean-room environment that might leave Linux legally vulnerable.
> 
> I never knew that.
> 
> <reads changelog>
> <reads website>
> <reads wiki>
> 
> I still don't know that.  Denis, do you know the details?

The acx100 project was started by about 5 people examining the various
acx100 binary Linux driver "releases" for distro kernels around 2.4.18 etc.
Since this might fail to comply with usual "clean-room" practices
(e.g. one party examining a driver and then a separate party implementing
a new driver with the data gained from examining the original driver),
it may fail to be seen as acceptable for Linux inclusion.

Since missing kernel inclusion is both a maintenance overhead and
(most importantly!) a huge user-level issue, I'd see this as a big problem.

In case there are development-unrelated obstacles against kernel inclusion,
I see (at least?) two possibilities:

a) asking TI to sprinkle our driver effort with the (ahem) holy penguin pee
   required to have it blessed sufficiently for kernel inclusion (preferrably
   in combination with nice firmware blob licensing and specs for those
   chipsets would be nice)
   This might be a problem given that Theo de Raadt and many other people had
   fun repeatedly trying to contact TI for a useful statement concerning WLAN
   support.

b) abandoning our unfortunately not as blessed as intended (stability,
   community involvement, ...) big-effort driver efforts ("3 years and still
   going strong...") [1] and suggesting donating about 100000 OEM WLAN cards
   equipped with TI chipsets to various beautiful landfills in various
   countries ;-)

Whichever way this irons out, at this point I'm quite indifferent to what
happens, given that I really don't feel like spending too many endless weekends
with hardware and driver puzzles any more in exchange for rather dubious gains.
There's also a lot of fun in generic Linux kernel hacking, so...

Andreas Mohr

[1] we're *still* having issues with spotty ACK reception and radio
temperature drift recalibration on those unsupported chipsets,
which requires quite some focused development efforts and close examination
of WLAN traffic in order to really find out what the heck is going wrong here.
And please note that there's now the newer TNETW1450 chipset variant (most
prominently used by AVM hardware with its initial x86-only Linux USB2.0 driver)
with similar support issues which would require even more development.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05  8:33     ` Andreas Mohr
@ 2006-06-05  8:45       ` Arjan van de Ven
  2006-06-05 10:26         ` Alan Cox
  0 siblings, 1 reply; 166+ messages in thread
From: Arjan van de Ven @ 2006-06-05  8:45 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Andrew Morton, Jeff Garzik, linux-kernel, netdev, linville,
	Denis Vlasenko, acx100-devel, acx100-users

On Mon, 2006-06-05 at 10:33 +0200, Andreas Mohr wrote:
> Hi,
> 
> On Sun, Jun 04, 2006 at 06:15:15PM -0700, Andrew Morton wrote:
> > On Sun, 4 Jun 2006 21:06:36 -0400
> > Jeff Garzik <jeff@garzik.org> wrote:
> > > >   It is about time we did something with this large and presumably useful
> > > >   wireless driver.
> > > 
> > > I've never had technical objections to merging this, just AFAIK it had a
> > > highly questionable origin, namely being reverse-engineered in a
> > > non-clean-room environment that might leave Linux legally vulnerable.
> > 
> > I never knew that.
> > 
> > <reads changelog>
> > <reads website>
> > <reads wiki>
> > 
> > I still don't know that.  Denis, do you know the details?
> 
> The acx100 project was started by about 5 people examining the various
> acx100 binary Linux driver "releases" for distro kernels around 2.4.18 etc.
> Since this might fail to comply with usual "clean-room" practices
> (e.g. one party examining a driver and then a separate party implementing
> a new driver with the data gained from examining the original driver),
> it may fail to be seen as acceptable for Linux inclusion.

I disagree there (not speaking for any company just for myself here):
the "clean room" thing is ONLY a USA thing, and is not even required in
the USA. It is a "we want to be extra safe in the USA" thing only. Eg if
you want to be tripple safe and do this in the USA, the clean room is a
good way to be sure.

If you do things in europe or elsewhere, and/or as long as you don't
copy from the original, only use it to learn how it works, you should be
fine as well. It's just that a cleanroom approach is a sure way to prove
you didn't copy. That's all.

If "clean room" now is a requirement for a driver to hit the kernel,
then we need to remove about half the drivers in the kernel I suspect;
that'd just be silly.

I would say that as long as you and the others can certify that you
didn't copy from the original driver, but only used it to learn how it
worked, the kernel should be fine with it.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05  8:45       ` Arjan van de Ven
@ 2006-06-05 10:26         ` Alan Cox
  2006-06-05 10:35           ` Arjan van de Ven
  0 siblings, 1 reply; 166+ messages in thread
From: Alan Cox @ 2006-06-05 10:26 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel, netdev,
	linville, Denis Vlasenko, acx100-devel, acx100-users

Ar Llu, 2006-06-05 am 10:45 +0200, ysgrifennodd Arjan van de Ven:
>  It's just that a cleanroom approach is a sure way to prove
> you didn't copy. That's all.

Which is an extremely important detail especially if you have been
reverse engineering another driver for the same or similar OS where it
is likely that people will retain knowledge and copy rather than
re-implement things.

We've had "fun" with this before and the pwc camera driver. I don't want
to see a repeat of that.

Alan

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 10:26         ` Alan Cox
@ 2006-06-05 10:35           ` Arjan van de Ven
  2006-06-05 10:59             ` Alan Cox
  2006-06-10  6:58             ` Pavel Machek
  0 siblings, 2 replies; 166+ messages in thread
From: Arjan van de Ven @ 2006-06-05 10:35 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel, netdev,
	linville, Denis Vlasenko, acx100-devel, acx100-users

On Mon, 2006-06-05 at 11:26 +0100, Alan Cox wrote:
> Ar Llu, 2006-06-05 am 10:45 +0200, ysgrifennodd Arjan van de Ven:
> >  It's just that a cleanroom approach is a sure way to prove
> > you didn't copy. That's all.
> 
> Which is an extremely important detail especially if you have been
> reverse engineering another driver for the same or similar OS where it
> is likely that people will retain knowledge and copy rather than
> re-implement things.

oh don't get me wrong, it's important to not copy from the original.
(even if that original did copy from linux ;)

> We've had "fun" with this before and the pwc camera driver. I don't want
> to see a repeat of that.

yet at the same time, the cleanroom approach is not the ONLY way to do
it right. And making following that exact approach a strict requirement
is just silly. And it would mean we'd need to remove quite a few drivers
from the tree if you follow that logic.

And to be fair the pwc camera driver was just a guy with a personality
problem rather than any real legal standing. 

Again doing things right is important. But I would say that if you do
the rev-engineering in Europe, just being careful and avoiding copying
should be enough (well and certifying that you were in fact careful and
didn't do any copying).

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 10:35           ` Arjan van de Ven
@ 2006-06-05 10:59             ` Alan Cox
  2006-06-10  6:58             ` Pavel Machek
  1 sibling, 0 replies; 166+ messages in thread
From: Alan Cox @ 2006-06-05 10:59 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel, netdev,
	linville, Denis Vlasenko, acx100-devel, acx100-users

Ar Llu, 2006-06-05 am 12:35 +0200, ysgrifennodd Arjan van de Ven:
> And to be fair the pwc camera driver was just a guy with a personality
> problem rather than any real legal standing. 

I must disagree there having reviewed the code in question and been
directly involved in the fallout. 

Alan


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 10:35           ` Arjan van de Ven
  2006-06-05 10:59             ` Alan Cox
@ 2006-06-10  6:58             ` Pavel Machek
  1 sibling, 0 replies; 166+ messages in thread
From: Pavel Machek @ 2006-06-10  6:58 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan Cox, Andreas Mohr, Andrew Morton, Jeff Garzik, linux-kernel,
	netdev, linville, Denis Vlasenko, acx100-devel, acx100-users

Hi!

> > >  It's just that a cleanroom approach is a sure way to prove
> > > you didn't copy. That's all.
> > 
> > Which is an extremely important detail especially if you have been
> > reverse engineering another driver for the same or similar OS where it
> > is likely that people will retain knowledge and copy rather than
> > re-implement th?ngs.
> 
> oh don't get me wrong, it's important to not copy from the original.
> (even if that original did copy from linux ;)

Well, if original did copy from linux, it surely is GPLed and case
closed, no? Being sued from vendor not respecting the GPL would
probably only do harm to them.

Like US courts are crazy, but hopefully not _that_ crazy.
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05  1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik
  2006-06-05  1:15   ` Andrew Morton
@ 2006-06-05  8:54   ` Christoph Hellwig
  2006-06-05 12:33     ` Jeff Garzik
  2006-06-05 13:27     ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
  1 sibling, 2 replies; 166+ messages in thread
From: Christoph Hellwig @ 2006-06-05  8:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, netdev, linville

On Sun, Jun 04, 2006 at 09:06:36PM -0400, Jeff Garzik wrote:
> On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> > acx1xx-wireless-driver.patch
> > fix-tiacx-on-alpha.patch
> > tiacx-fix-attribute-packed-warnings.patch
> > tiacx-pci-build-fix.patch
> > tiacx-ia64-fix.patch
> > 
> >   It is about time we did something with this large and presumably useful
> >   wireless driver.
> 
> I've never had technical objections to merging this, just AFAIK it had a
> highly questionable origin, namely being reverse-engineered in a
> non-clean-room environment that might leave Linux legally vulnerable.

As are at leasdt a fourth of linux drivers.  Andrew, please just go ahead
and merge it (I'll do another review ASAP).

Please don't let this reverse engineering idiocy hinder wireless driver
adoption, we're already falling far behind openbsd who are very successfull
reverse engineering lots of wireless chipsets.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05  8:54   ` Christoph Hellwig
@ 2006-06-05 12:33     ` Jeff Garzik
  2006-06-05 12:48       ` Arjan van de Ven
  2006-06-05 13:27     ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
  1 sibling, 1 reply; 166+ messages in thread
From: Jeff Garzik @ 2006-06-05 12:33 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville

On Mon, Jun 05, 2006 at 09:54:51AM +0100, Christoph Hellwig wrote:
> On Sun, Jun 04, 2006 at 09:06:36PM -0400, Jeff Garzik wrote:
> > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> > > acx1xx-wireless-driver.patch
> > > fix-tiacx-on-alpha.patch
> > > tiacx-fix-attribute-packed-warnings.patch
> > > tiacx-pci-build-fix.patch
> > > tiacx-ia64-fix.patch
> > > 
> > >   It is about time we did something with this large and presumably useful
> > >   wireless driver.
> > 
> > I've never had technical objections to merging this, just AFAIK it had a
> > highly questionable origin, namely being reverse-engineered in a
> > non-clean-room environment that might leave Linux legally vulnerable.
> 
> As are at leasdt a fourth of linux drivers.  Andrew, please just go ahead

Hardly.  The -vast majority- of drivers I've dealt with in my time
hacking the kernel are either blessed by the vendor, or are of
unquestionably legal origin.

It's a good thing I pay attention to this issue, too, Mr. Just Go Ahead
And Merge It.


> Please don't let this reverse engineering idiocy hinder wireless driver
> adoption, we're already falling far behind openbsd who are very successfull
> reverse engineering lots of wireless chipsets.

Thanks for your highly professional, legal opinion :)

	Jeff




^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 12:33     ` Jeff Garzik
@ 2006-06-05 12:48       ` Arjan van de Ven
  2006-06-05 12:52         ` Jeff Garzik
  0 siblings, 1 reply; 166+ messages in thread
From: Arjan van de Ven @ 2006-06-05 12:48 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville

> 
> It's a good thing I pay attention to this issue, too, Mr. Just Go Ahead
> And Merge It.

dude, name calling is way out of line here.

Why is it a good thing you are blocking this driver? Do you have ANY
indication AT ALL that there is anything fishy about it?
(and don't say "they didn't follow cleanroom procedure", because you
know that cleanroom is not the only way to do reverse engineering
properly).

Paying attention to proper reverse engineering is good. Being
overzealous is not.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 12:48       ` Arjan van de Ven
@ 2006-06-05 12:52         ` Jeff Garzik
  2006-06-05 14:02           ` Linux kernel and laws Adrian Bunk
  0 siblings, 1 reply; 166+ messages in thread
From: Jeff Garzik @ 2006-06-05 12:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Christoph Hellwig, Andrew Morton, linux-kernel, netdev, linville

On Mon, Jun 05, 2006 at 02:48:27PM +0200, Arjan van de Ven wrote:
> Why is it a good thing you are blocking this driver? Do you have ANY
> indication AT ALL that there is anything fishy about it?

Yes.


> Paying attention to proper reverse engineering is good. Being
> overzealous is not.

Being overzealous about merging drivers without first checking the legal
ramifications is a good way to torpedo Linux.

Far too many people have a careless "U.S.A. laws suck, merge it anyway"
attitude.

	Jeff




^ permalink raw reply	[flat|nested] 166+ messages in thread

* Linux kernel and laws
  2006-06-05 12:52         ` Jeff Garzik
@ 2006-06-05 14:02           ` Adrian Bunk
  2006-06-05 14:21             ` linux-os (Dick Johnson)
  2006-06-06  5:33             ` Evgeniy Polyakov
  0 siblings, 2 replies; 166+ messages in thread
From: Adrian Bunk @ 2006-06-05 14:02 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Arjan van de Ven, Christoph Hellwig, Andrew Morton, linux-kernel,
	netdev, linville

On Mon, Jun 05, 2006 at 08:52:35AM -0400, Jeff Garzik wrote:
>...
> > Paying attention to proper reverse engineering is good. Being
> > overzealous is not.
> 
> Being overzealous about merging drivers without first checking the legal
> ramifications is a good way to torpedo Linux.
> 
> Far too many people have a careless "U.S.A. laws suck, merge it anyway"
> attitude.

Independent of this issue:

An interesting question is how to handle legal issues properly.

Where is the borderline for rejecting code due to legal issues?
Might not be 100% correct according to laws in the USA.
Might not be 100% correct according to laws in Germany.
Might not be 100% correct according to laws in Finland.
Might not be 100% correct according to laws in Norway.
Might not be 100% correct according to laws in Brasil.
Might not be 100% correct according to laws in Japan.
Might not be 100% correct according to laws in India.
Might not be 100% correct according to laws in Russia.
Might not be 100% correct according to laws in China.
Might not be 100% correct according to laws in Saudi Arabia.
Might not be 100% correct according to laws in Iran.

For me living in Germany, none of these laws except for the German one 
has any relevance.

I've never seen people on this list pointing to probable problems with 
Chinese laws although these laws are relevant for four times as many 
people as US laws.

If someone would state a submission to the kernel might have issues 
according to Chinese laws, or Iranian laws, or Russian laws, would this 
be enough for keeping code out of the kernel?

This might sound like a theoretical question, but e.g. considering that 
the kernel contains cryptography code it's a question that might have 
wide practical implications.

> 	Jeff

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Linux kernel and laws
  2006-06-05 14:02           ` Linux kernel and laws Adrian Bunk
@ 2006-06-05 14:21             ` linux-os (Dick Johnson)
  2006-06-06  5:33             ` Evgeniy Polyakov
  1 sibling, 0 replies; 166+ messages in thread
From: linux-os (Dick Johnson) @ 2006-06-05 14:21 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Jeff Garzik, Arjan van de Ven, Christoph Hellwig, Andrew Morton,
	linux-kernel, netdev, linville

On Mon, 5 Jun 2006, Adrian Bunk wrote:

> On Mon, Jun 05, 2006 at 08:52:35AM -0400, Jeff Garzik wrote:
>> ...
>>> Paying attention to proper reverse engineering is good. Being
>>> overzealous is not.
>>
>> Being overzealous about merging drivers without first checking the legal
>> ramifications is a good way to torpedo Linux.
>>
>> Far too many people have a careless "U.S.A. laws suck, merge it anyway"
>> attitude.
>
> Independent of this issue:
>
> An interesting question is how to handle legal issues properly.
>
> Where is the borderline for rejecting code due to legal issues?
> Might not be 100% correct according to laws in the USA.
> Might not be 100% correct according to laws in Germany.
> Might not be 100% correct according to laws in Finland.
> Might not be 100% correct according to laws in Norway.
> Might not be 100% correct according to laws in Brasil.
> Might not be 100% correct according to laws in Japan.
> Might not be 100% correct according to laws in India.
> Might not be 100% correct according to laws in Russia.
> Might not be 100% correct according to laws in China.
> Might not be 100% correct according to laws in Saudi Arabia.
> Might not be 100% correct according to laws in Iran.
>
> For me living in Germany, none of these laws except for the German one
> has any relevance.
>
> I've never seen people on this list pointing to probable problems with
> Chinese laws although these laws are relevant for four times as many
> people as US laws.
>
> If someone would state a submission to the kernel might have issues
> according to Chinese laws, or Iranian laws, or Russian laws, would this
> be enough for keeping code out of the kernel?
>
> This might sound like a theoretical question, but e.g. considering that
> the kernel contains cryptography code it's a question that might have
> wide practical implications.
>
>> 	Jeff
>
> cu
> Adrian

If the kernel represented simply a knowledge base, then the burden
about whether or not someone could use it used to rest entirely
upon the user. That's why some Pacific rim governments are reportedly
fire-walling information.

In most western cultures, knowledge was not a crime. For many years,
someone could write a book, telling you how to kill somebody and,
as long as he didn't carry it out, he could not be held culpable.

Recently, in the US and some other countries, knowledge has become
criminalized. If you know how to defeat copy protection, and
you are not in a protected industry, you could be tried and
convicted of a federal crime.

That's one of the reasons why there are now no general guidelines
about kernel code, or any intellectual property use, for that matter.
The conditions could occur where the government thinks that you
know too much and are, therefore, a threat to "national security".

So, again, see a lawyer. The fact that you sought and accepted
legal opinion may in the future be your only viable defense as
governments bring charges against you! Sorry state of affairs for
sure.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Linux kernel and laws
  2006-06-05 14:02           ` Linux kernel and laws Adrian Bunk
  2006-06-05 14:21             ` linux-os (Dick Johnson)
@ 2006-06-06  5:33             ` Evgeniy Polyakov
  1 sibling, 0 replies; 166+ messages in thread
From: Evgeniy Polyakov @ 2006-06-06  5:33 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Jeff Garzik, Arjan van de Ven, Christoph Hellwig, Andrew Morton,
	linux-kernel, netdev, linville

On Mon, Jun 05, 2006 at 04:02:26PM +0200, Adrian Bunk (bunk@stusta.de) wrote:
> > Far too many people have a careless "U.S.A. laws suck, merge it anyway"
> > attitude.
> If someone would state a submission to the kernel might have issues 
> according to Chinese laws, or Iranian laws, or Russian laws, would this 
> be enough for keeping code out of the kernel?

Btw, did kernel hackers consulted with Papua New Guinea or bloody
Russian laws? It is possible that they have a law which forbids to write 
open source code. So we should stop Linux kernel development and completely 
remove it's sources from the Internet ASAP.

P.S. It is explicitly permitted to make reverse engineering in Russia.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05  8:54   ` Christoph Hellwig
  2006-06-05 12:33     ` Jeff Garzik
@ 2006-06-05 13:27     ` John W. Linville
  2006-06-05 13:31       ` Christoph Hellwig
                         ` (2 more replies)
  1 sibling, 3 replies; 166+ messages in thread
From: John W. Linville @ 2006-06-05 13:27 UTC (permalink / raw)
  To: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel,
	netdev

On Mon, Jun 05, 2006 at 09:54:51AM +0100, Christoph Hellwig wrote:
> On Sun, Jun 04, 2006 at 09:06:36PM -0400, Jeff Garzik wrote:
> > On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> > > acx1xx-wireless-driver.patch
> > > fix-tiacx-on-alpha.patch
> > > tiacx-fix-attribute-packed-warnings.patch
> > > tiacx-pci-build-fix.patch
> > > tiacx-ia64-fix.patch
> > > 
> > >   It is about time we did something with this large and presumably useful
> > >   wireless driver.
> > 
> > I've never had technical objections to merging this, just AFAIK it had a
> > highly questionable origin, namely being reverse-engineered in a
> > non-clean-room environment that might leave Linux legally vulnerable.
> 
> As are at leasdt a fourth of linux drivers.  Andrew, please just go ahead
> and merge it (I'll do another review ASAP).

Actually, I was planning to merge the softmac-based version for 2.6.18.
It looks like I may want some of Andrew's patches on top (ia64, alpha, etc).

http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/master/

	0003-wireless-add-acx-driver.txt
	0004-acxsm-merge-from-acx-0.3.32.txt
	0005-tiacx-Let-only-ACX_PCI-ACX_USB-be-user-visible.txt
	0007-tiacx-revert-neither-PCI-nor-USB-is-selected-change.txt
	0008-tiacx-implement-much-more-flexible-firmware-statistics-parsing.txt
	0009-tiacx-Change-acx_ioctl_-get-set-_encode-to-use-kernel-80211-stack.txt
	0010-tiacx-fix-breakage-of-Get-rid-of-circular-list-of-adev-s.txt
	0011-tiacx-split-module-into-acx-common-acx-pci-acx-usb.txt

Of course, I didn't know there were serious concerns about this
driver's origin.  I hope we aren't confusing this with the atheros
driver...?

> Please don't let this reverse engineering idiocy hinder wireless driver
> adoption, we're already falling far behind openbsd who are very successfull
> reverse engineering lots of wireless chipsets.

This bugbear does seem to keep visiting us.  It is a bit of a
minefield.

I'm inclined to think that Christoph and Arjan are right, that we
have been too cautious.  Of course, neither of these fine gentlemen
are known for their timidity... :-)

Does not the Signed-off-by: line on a patch submission give us some
level of "good faith" protection?

I'm tempted to take contributors at their word, that they have produced
their own work and not copied from others.  What else do we need?

John
-- 
John W. Linville
linville@tuxdriver.com

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 13:27     ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
@ 2006-06-05 13:31       ` Christoph Hellwig
  2006-06-05 13:42       ` Arjan van de Ven
  2006-06-05 16:24       ` Alan Cox
  2 siblings, 0 replies; 166+ messages in thread
From: Christoph Hellwig @ 2006-06-05 13:31 UTC (permalink / raw)
  To: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel,
	netdev

On Mon, Jun 05, 2006 at 09:27:37AM -0400, John W. Linville wrote:
> Actually, I was planning to merge the softmac-based version for 2.6.18.
> It looks like I may want some of Andrew's patches on top (ia64, alpha, etc).

duh, didn't know that wasn't in -mm.  we want the softmac version of course.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 13:27     ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
  2006-06-05 13:31       ` Christoph Hellwig
@ 2006-06-05 13:42       ` Arjan van de Ven
  2006-06-05 16:24       ` Alan Cox
  2 siblings, 0 replies; 166+ messages in thread
From: Arjan van de Ven @ 2006-06-05 13:42 UTC (permalink / raw)
  To: John W. Linville
  Cc: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel,
	netdev


> Of course, I didn't know there were serious concerns about this
> driver's origin.  I hope we aren't confusing this with the atheros
> driver...?
> 
> > Please don't let this reverse engineering idiocy hinder wireless driver
> > adoption, we're already falling far behind openbsd who are very successfull
> > reverse engineering lots of wireless chipsets.
> 
> This bugbear does seem to keep visiting us.  It is a bit of a
> minefield.
> 
> I'm inclined to think that Christoph and Arjan are right, that we
> have been too cautious.  Of course, neither of these fine gentlemen
> are known for their timidity... :-)
> 
> Does not the Signed-off-by: line on a patch submission give us some
> level of "good faith" protection?

I would suggest asking them an explicit "did you copy anything" and make
sure their "we didn't copy" answer is in the description of the original
patch submission.
> 
> I'm tempted to take contributors at their word, that they have produced
> their own work and not copied from others.  What else do we need?

to a large degree that's all you can do. (of course you can look at the
code for something that looks "obviously not from here" as well, and we
all tend to do that anyway since such stuff tends to highly violate
coding style anyway)


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 13:27     ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
  2006-06-05 13:31       ` Christoph Hellwig
  2006-06-05 13:42       ` Arjan van de Ven
@ 2006-06-05 16:24       ` Alan Cox
  2006-06-29 14:26         ` ACX100 (softmac-based) driver ready to merge, but is it legal? -- " John W. Linville
  2 siblings, 1 reply; 166+ messages in thread
From: Alan Cox @ 2006-06-05 16:24 UTC (permalink / raw)
  To: John W. Linville
  Cc: Christoph Hellwig, Jeff Garzik, Andrew Morton, linux-kernel,
	netdev

Ar Llu, 2006-06-05 am 09:27 -0400, ysgrifennodd John W. Linville:
> Does not the Signed-off-by: line on a patch submission give us some
> level of "good faith" protection?
> 
> I'm tempted to take contributors at their word, that they have produced
> their own work and not copied from others.  What else do we need?

To keep an eye out for problems. Given the questions raised the tiacx
people need to clarify their position and someone needs to look into it.
Until that is done it certainly isn't "good faith" any more.

Alan


^ permalink raw reply	[flat|nested] 166+ messages in thread

* ACX100 (softmac-based) driver ready to merge, but is it legal? -- Re: wireless (was Re: 2.6.18 -mm merge plans)
  2006-06-05 16:24       ` Alan Cox
@ 2006-06-29 14:26         ` John W. Linville
       [not found]           ` <20060629144233.GB24463@tuxdriver.com>
  0 siblings, 1 reply; 166+ messages in thread
From: John W. Linville @ 2006-06-29 14:26 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Denis Vlasenko, Carlos Martin, Andreas Mohr, acx100-devel,
	acx100-users, Arjan van de Ven, Adrian Bunk, Alan Cox,
	Christoph Hellwig, linux-os (Dick Johnson), Evgeniy Polyakov,
	Jeff Garzik, Andrew Morton, Linus Torvalds

On Mon, Jun 05, 2006 at 05:24:51PM +0100, Alan Cox wrote:
> Ar Llu, 2006-06-05 am 09:27 -0400, ysgrifennodd John W. Linville:
> > Does not the Signed-off-by: line on a patch submission give us some
> > level of "good faith" protection?
> > 
> > I'm tempted to take contributors at their word, that they have produced
> > their own work and not copied from others.  What else do we need?
> 
> To keep an eye out for problems. Given the questions raised the tiacx
> people need to clarify their position and someone needs to look into it.
> Until that is done it certainly isn't "good faith" any more.

I apologize for the long copy list.  I have tried to include all
known interested parties.

This is a follow-up to a thread started by Andrew a few weeks ago
about what should be merged for 2.6.18.  One of the topics he cited
was the ACX100 driver which he has carried in -mm for quite some time.
I have a slightly different (softmac based) version of that driver
in wireless-2.6 which I think is worth merging now.

In the aforementioned thread, some questions were raised about the
legality of the ACX100 driver (i.e. tiacx) code base, but no one
had any specific points other than that it is not 100% "clean room"
derived.  Others point-out that this is not strictly a requirement.
The matter dropped without a strong defense from the tiacx team.

I hereby invite the tiacx team to defend their work by making public,
affirmative statements indicating a) how they produced their code; and,
b) that they have the legal right to license it as part of the Linux
kernel under the GPL.  As an incentive to this, I have already made
the necessary preparations for this driver to be merged immediately.

This is the softmac-based tiacx that has been in wireless-2.6 for
some time, with the addition of a few patches that akpm had in -mm
which I did not previously have.  For easy review, a tarball with
the full driver is available here:

	http://www.kernel.org/pub/linux/kernel/people/linville/tiacx.tar.gz

A git pull request follows.  I am confident that if the legal status
of this code can be confirmed, it will be merged upstream ASAP.

Comments welcome!

Thanks,

John

---

The following changes since commit 70a332b048e4d90635dfa47fc5d91cf87b5cc3a5:
  John W. Linville:
        softmac: fix build-break from 881ee6999d66c8fc903b429b73bbe6045b38c549

are found in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git tiacx

Andreas Mohr:
      tiacx: implement much more flexible firmware statistics parsing

Andrew Morton:
      tiacx: pci build fix

Carlos Martin:
      tiacx: fix breakage of "Get rid of circular list of adev's"
      tiacx: split module into acx-common + acx-pci + acx-usb

Denis Vlasenko:
      acxsm: merge from acx 0.3.32
      tiacx: revert "neither PCI nor USB is selected" change
      tiacx: Change acx_ioctl_{get,set}_encode to use kernel 80211 stack
      fix tiacx on alpha
      tiacx: fix attribute packed warnings

John W. Linville:
      wireless: add acx driver
      tiacx: Let only ACX_PCI/ACX_USB be user-visible
      tiacx: support ia64

 drivers/net/wireless/Kconfig             |    1 
 drivers/net/wireless/Makefile            |    2 
 drivers/net/wireless/tiacx/Changelog     |  114 
 drivers/net/wireless/tiacx/Kconfig       |   65 
 drivers/net/wireless/tiacx/Makefile      |    6 
 drivers/net/wireless/tiacx/README        |   61 
 drivers/net/wireless/tiacx/acx.h         |   11 
 drivers/net/wireless/tiacx/acx_config.h  |   40 
 drivers/net/wireless/tiacx/acx_func.h    |  598 ++
 drivers/net/wireless/tiacx/acx_struct.h  | 2048 ++++++++
 drivers/net/wireless/tiacx/common.c      | 7542 ++++++++++++++++++++++++++++++
 drivers/net/wireless/tiacx/ioctl.c       | 2738 +++++++++++
 drivers/net/wireless/tiacx/pci.c         | 4243 +++++++++++++++++
 drivers/net/wireless/tiacx/setrate.c     |  213 +
 drivers/net/wireless/tiacx/usb.c         | 1954 ++++++++
 drivers/net/wireless/tiacx/wlan.c        |  422 ++
 drivers/net/wireless/tiacx/wlan_compat.h |  267 +
 drivers/net/wireless/tiacx/wlan_hdr.h    |  497 ++
 drivers/net/wireless/tiacx/wlan_mgmt.h   |  582 ++
 19 files changed, 21404 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/wireless/tiacx/Changelog
 create mode 100644 drivers/net/wireless/tiacx/Kconfig
 create mode 100644 drivers/net/wireless/tiacx/Makefile
 create mode 100644 drivers/net/wireless/tiacx/README
 create mode 100644 drivers/net/wireless/tiacx/acx.h
 create mode 100644 drivers/net/wireless/tiacx/acx_config.h
 create mode 100644 drivers/net/wireless/tiacx/acx_func.h
 create mode 100644 drivers/net/wireless/tiacx/acx_struct.h
 create mode 100644 drivers/net/wireless/tiacx/common.c
 create mode 100644 drivers/net/wireless/tiacx/ioctl.c
 create mode 100644 drivers/net/wireless/tiacx/pci.c
 create mode 100644 drivers/net/wireless/tiacx/setrate.c
 create mode 100644 drivers/net/wireless/tiacx/usb.c
 create mode 100644 drivers/net/wireless/tiacx/wlan.c
 create mode 100644 drivers/net/wireless/tiacx/wlan_compat.h
 create mode 100644 drivers/net/wireless/tiacx/wlan_hdr.h
 create mode 100644 drivers/net/wireless/tiacx/wlan_mgmt.h

The complete (history-free) is available here:

	http://www.kernel.org/pub/linux/kernel/people/linville/tiacx.patch.gz

-- 
John W. Linville
linville@tuxdriver.com

^ permalink raw reply	[flat|nested] 166+ messages in thread

[parent not found: <20060629144233.GB24463@tuxdriver.com>]

* Re: [Acx100-users] Denis Vlasenko, where are you? (mail bounced)
       [not found]           ` <20060629144233.GB24463@tuxdriver.com>
@ 2006-06-29 14:47             ` Andreas Mohr
  0 siblings, 0 replies; 166+ messages in thread
From: Andreas Mohr @ 2006-06-29 14:47 UTC (permalink / raw)
  To: acx100-users; +Cc: netdev, linux-kernel, acx100-devel

Hi,

On Thu, Jun 29, 2006 at 10:42:39AM -0400, John W. Linville wrote:
> If anyone knows how to get in touch w/ Denis, I'd appreciate it...

He sent me (and few other addresses) his new address recently
(*important* mails only!):

vda.linux AT a server called googlemail.com

(he got a new job and moved)

Andreas Mohr

^ permalink raw reply	[flat|nested] 166+ messages in thread

* merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (10 preceding siblings ...)
  2006-06-05  1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik
@ 2006-06-05  1:32 ` Jeff Garzik
  2006-06-05  1:47   ` Andrew Morton
  2006-06-05  6:58   ` Francois Romieu
  2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse
                   ` (8 subsequent siblings)
  20 siblings, 2 replies; 166+ messages in thread
From: Jeff Garzik @ 2006-06-05  1:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> areca-raid-linux-scsi-driver.patch

>   I'm going to start sending the Areca driver to James, too.  The vendor
>   has worked hard and the hardware is becoming more important - let's help
>   them get it in.

In general, I'm a bit disappointed at the time it takes new drivers to
reach the upstream kernel.  I grant that a lot of vendor drivers are
unreadable, unmergable shite...  but on the other side of the coin, I
see a lot of decent drivers get stalled simply because they aren't
perfect.

I would rather see a driver get "95% there" -- because once a driver is
merged into the upstream kernel, it has a lot more visibility, and will
inevitably receive the remaining changes and cleanups anyway.

	Jeff

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05  1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
@ 2006-06-05  1:47   ` Andrew Morton
  2006-06-05  8:59     ` Christoph Hellwig
  2006-06-05  6:58   ` Francois Romieu
  1 sibling, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-05  1:47 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

On Sun, 4 Jun 2006 21:32:23 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> On Sun, Jun 04, 2006 at 01:50:11PM -0700, Andrew Morton wrote:
> > areca-raid-linux-scsi-driver.patch
> 
> >   I'm going to start sending the Areca driver to James, too.  The vendor
> >   has worked hard and the hardware is becoming more important - let's help
> >   them get it in.
> 
> 
> In general, I'm a bit disappointed at the time it takes new drivers to
> reach the upstream kernel.  I grant that a lot of vendor drivers are
> unreadable, unmergable shite...  but on the other side of the coin, I
> see a lot of decent drivers get stalled simply because they aren't
> perfect.
> 
> I would rather see a driver get "95% there" -- because once a driver is
> merged into the upstream kernel, it has a lot more visibility, and will
> inevitably receive the remaining changes and cleanups anyway.
> 

Yes, I agree.  As long as we reasonably think that a piece of code *will*
become acceptable within a reasonable amount of time then going early is
safe.

A large part of that calculation is non-technical: do we believe that the
originator will do the work to get things finished off.  Often, that's a
pretty easy call to make.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05  1:47   ` Andrew Morton
@ 2006-06-05  8:59     ` Christoph Hellwig
  2006-06-05  9:10       ` Andrew Morton
  2006-06-05 11:10       ` Ivan Novick
  0 siblings, 2 replies; 166+ messages in thread
From: Christoph Hellwig @ 2006-06-05  8:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jeff Garzik, linux-kernel

On Sun, Jun 04, 2006 at 06:47:11PM -0700, Andrew Morton wrote:
> Yes, I agree.  As long as we reasonably think that a piece of code *will*
> become acceptable within a reasonable amount of time then going early is
> safe.

Definitly not the case for areca.  The only progress at all is where people
like Arjan, Randy or me did very intensive babysitting.  And it's still far
from beeing there.

And especially in scsi land I'm absolutely against putting in more substandard
drivers.  The subsystem is still badly plagued from lots of old drivers that
aren't up to any standards, and we need to decrease the maintaince load due
to odd drivers not increase it even further.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05  8:59     ` Christoph Hellwig
@ 2006-06-05  9:10       ` Andrew Morton
  2006-06-05  9:16         ` Arjan van de Ven
  2006-06-05 11:10       ` Ivan Novick
  1 sibling, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-05  9:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: jeff, linux-kernel

On Mon, 5 Jun 2006 09:59:18 +0100
Christoph Hellwig <hch@infradead.org> wrote:

> On Sun, Jun 04, 2006 at 06:47:11PM -0700, Andrew Morton wrote:
> > Yes, I agree.  As long as we reasonably think that a piece of code *will*
> > become acceptable within a reasonable amount of time then going early is
> > safe.
> 
> 
> Definitly not the case for areca.  The only progress at all is where people
> like Arjan, Randy or me did very intensive babysitting.  And it's still far
> from beeing there.
> 
> And especially in scsi land I'm absolutely against putting in more substandard
> drivers.  The subsystem is still badly plagued from lots of old drivers that
> aren't up to any standards, and we need to decrease the maintaince load due
> to odd drivers not increase it even further.

So..  How are we going to get the Areca controllers supported in Linux? 
The code's been sitting in -mm for over a year and the vendor does have
staff assigned to work on it.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05  9:10       ` Andrew Morton
@ 2006-06-05  9:16         ` Arjan van de Ven
  0 siblings, 0 replies; 166+ messages in thread
From: Arjan van de Ven @ 2006-06-05  9:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, jeff, linux-kernel

On Mon, 2006-06-05 at 02:10 -0700, Andrew Morton wrote:
> On Mon, 5 Jun 2006 09:59:18 +0100
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Sun, Jun 04, 2006 at 06:47:11PM -0700, Andrew Morton wrote:
> > > Yes, I agree.  As long as we reasonably think that a piece of code *will*
> > > become acceptable within a reasonable amount of time then going early is
> > > safe.
> > 
> > 
> > Definitly not the case for areca.  The only progress at all is where people
> > like Arjan, Randy or me did very intensive babysitting.  And it's still far
> > from beeing there.
> > 
> > And especially in scsi land I'm absolutely against putting in more substandard
> > drivers.  The subsystem is still badly plagued from lots of old drivers that
> > aren't up to any standards, and we need to decrease the maintaince load due
> > to odd drivers not increase it even further.
> 
> So..  How are we going to get the Areca controllers supported in Linux? 
> The code's been sitting in -mm for over a year and the vendor does have
> staff assigned to work on it.

the driver is improving for sure.

What seems to work well is when we make a work-to-do list, the vendor
then goes about and fixes most of that quite quickly. I think I'm
approaching the end of useful review input I can give (they fixed most
if not all the stuff I flagged before), it would be really nice if
Christoph or some other scsi person would do a review again and make a
list of "these should be fixed and then we can merge" (and a list of
"these can be fixed post merge" as well)



^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05  8:59     ` Christoph Hellwig
  2006-06-05  9:10       ` Andrew Morton
@ 2006-06-05 11:10       ` Ivan Novick
  2006-06-05 11:26         ` Adrian Bunk
  1 sibling, 1 reply; 166+ messages in thread
From: Ivan Novick @ 2006-06-05 11:10 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton; +Cc: Jeff Garzik, linux-kernel

> And especially in scsi land I'm absolutely against putting in more
> substandard drivers.  The subsystem is still badly plagued from lots of old drivers
> that aren't up to any standards, and we need to decrease the maintaince load
> due to odd drivers not increase it even further.

Is there a hit-list of old drivers that need work, in case someone is
interested in helping?

Ivan

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05 11:10       ` Ivan Novick
@ 2006-06-05 11:26         ` Adrian Bunk
  0 siblings, 0 replies; 166+ messages in thread
From: Adrian Bunk @ 2006-06-05 11:26 UTC (permalink / raw)
  To: Ivan Novick; +Cc: Christoph Hellwig, Andrew Morton, Jeff Garzik, linux-kernel

On Mon, Jun 05, 2006 at 12:10:20PM +0100, Ivan Novick wrote:
> > And especially in scsi land I'm absolutely against putting in more
> > substandard drivers.  The subsystem is still badly plagued from lots of old drivers
> > that aren't up to any standards, and we need to decrease the maintaince load
> > due to odd drivers not increase it even further.
> 
> Is there a hit-list of old drivers that need work, in case someone is
> interested in helping?

Not a complete list, but a good way for finding 50 drivers that need 
work:
  grep scsi_module.c drivers/scsi/*

> Ivan

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05  1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
  2006-06-05  1:47   ` Andrew Morton
@ 2006-06-05  6:58   ` Francois Romieu
  2006-06-05 10:32     ` Alan Cox
  1 sibling, 1 reply; 166+ messages in thread
From: Francois Romieu @ 2006-06-05  6:58 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel

Jeff Garzik <jeff@garzik.org> :
[...]
> In general, I'm a bit disappointed at the time it takes new drivers to
> reach the upstream kernel.  I grant that a lot of vendor drivers are
> unreadable, unmergable shite...  but on the other side of the coin, I
> see a lot of decent drivers get stalled simply because they aren't
> perfect.

Could you provide an informal list of a few drivers which are currently
stalled ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05  6:58   ` Francois Romieu
@ 2006-06-05 10:32     ` Alan Cox
  2006-06-05 10:36       ` Arjan van de Ven
  0 siblings, 1 reply; 166+ messages in thread
From: Alan Cox @ 2006-06-05 10:32 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Jeff Garzik, Andrew Morton, linux-kernel

Ar Llu, 2006-06-05 am 08:58 +0200, ysgrifennodd Francois Romieu:
> Jeff Garzik <jeff@garzik.org> :
> [...]
> > In general, I'm a bit disappointed at the time it takes new drivers to
> > reach the upstream kernel.  I grant that a lot of vendor drivers are
> > unreadable, unmergable shite...  but on the other side of the coin, I
> > see a lot of decent drivers get stalled simply because they aren't
> > perfect.
> 
> Could you provide an informal list of a few drivers which are currently
> stalled ?

It isn't just drivers. Xen has the same problem. All large code blocks
have this problem. The older policy was to get stuff roughly right,
merge it into a tree then beat on it. Now everyone is blocking anything
that is the slightest imperfect which makes it impossible to add
anything large to the tree because it will *never* be perfect before a
merge and hack session and it will never be perfect in everyones eyes.
Plus of course some people have personal dislikes of Xen, and of various
other projects that get in the way.

Perfection is the enemy of progress and of success. We risk moving back
to the case we got into in 2.4 when merging got so hard that most
vendors shipped kernels bearing no relationship to the "upstream" tree.
Probably worse this time as there is no common "unofficial" tree like
-ac so they will all ship different variants and combinations.

Perfect is the wrong test. In the overall interest of the kernel is the
right test.

Alan

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05 10:32     ` Alan Cox
@ 2006-06-05 10:36       ` Arjan van de Ven
  2006-06-06  2:02         ` Chris Wright
  0 siblings, 1 reply; 166+ messages in thread
From: Arjan van de Ven @ 2006-06-05 10:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: Francois Romieu, Jeff Garzik, Andrew Morton, linux-kernel

On Mon, 2006-06-05 at 11:32 +0100, Alan Cox wrote:
> Ar Llu, 2006-06-05 am 08:58 +0200, ysgrifennodd Francois Romieu:
> > Jeff Garzik <jeff@garzik.org> :
> > [...]
> > > In general, I'm a bit disappointed at the time it takes new drivers to
> > > reach the upstream kernel.  I grant that a lot of vendor drivers are
> > > unreadable, unmergable shite...  but on the other side of the coin, I
> > > see a lot of decent drivers get stalled simply because they aren't
> > > perfect.
> > 
> > Could you provide an informal list of a few drivers which are currently
> > stalled ?
> 
> It isn't just drivers. Xen has the same problem.

Xen has many problems. This is not nearly their biggest ;)




^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-05 10:36       ` Arjan van de Ven
@ 2006-06-06  2:02         ` Chris Wright
  2006-06-06  7:01           ` Andi Kleen
  0 siblings, 1 reply; 166+ messages in thread
From: Chris Wright @ 2006-06-06  2:02 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan Cox, Francois Romieu, Jeff Garzik, Andrew Morton,
	linux-kernel

* Arjan van de Ven (arjan@infradead.org) wrote:
> On Mon, 2006-06-05 at 11:32 +0100, Alan Cox wrote:
> > It isn't just drivers. Xen has the same problem.
> 
> Xen has many problems. This is not nearly their biggest ;)

What is the biggest, or even top 3 or 5?  I've a todo list of some
140-odd entries which are being worked on.  It's slow and tedious,
but in progress.  I'd be happy to hear some top issues (guess we're
talking high-level technical categories here) to help prioritize and
perhaps even augment that todo list.  I've got my ideas and priorities,
but I'm interested to hear yours.

thanks,
-chris

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-06  2:02         ` Chris Wright
@ 2006-06-06  7:01           ` Andi Kleen
  2006-06-06 13:04             ` Steven Rostedt
  0 siblings, 1 reply; 166+ messages in thread
From: Andi Kleen @ 2006-06-06  7:01 UTC (permalink / raw)
  To: Chris Wright
  Cc: Alan Cox, Francois Romieu, Jeff Garzik, Andrew Morton,
	linux-kernel

Chris Wright <chrisw@sous-sol.org> writes:

> * Arjan van de Ven (arjan@infradead.org) wrote:
> > On Mon, 2006-06-05 at 11:32 +0100, Alan Cox wrote:
> > > It isn't just drivers. Xen has the same problem.
> > 
> > Xen has many problems. This is not nearly their biggest ;)
> 
> What is the biggest, or even top 3 or 5?  I've a todo list of some

I would say the biggest is that things haven't gotten submitted
for so long and aren't not resubmitted quickly.

e.g. Xen code needs a lot of arch/* cleanups in small patches that should
be just submitted, fixed, resubmitted quickly.  Many of them
could be already merged.

For example Jan Beulich has been sending many of the cleanups he 
needed for x86-64/i386 Xen immediately and at least for x86-64 
I merged most of them. If the other things were submitted
earlier a lot of it could be already merged too.

Then Xen net/block/char etc. drivers should be submitted to the
respective maintainers independently (they are useful even without the
rest of Xen in HVM guests)

> 140-odd entries which are being worked on.  It's slow and tedious,
> but in progress. 

What I would do is to concentrate on the small cleanup patches first
and post them as soon as you fix them. I think a lot of them were
actually ok without changes. Then bigger stuff piece by piece. You
don't need to wait to fix everything first.

-Andi

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: merging new drivers (was Re: 2.6.18 -mm merge plans)
  2006-06-06  7:01           ` Andi Kleen
@ 2006-06-06 13:04             ` Steven Rostedt
  0 siblings, 0 replies; 166+ messages in thread
From: Steven Rostedt @ 2006-06-06 13:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Chris Wright, Alan Cox, Francois Romieu, Jeff Garzik,
	Andrew Morton, linux-kernel

On Tue, 2006-06-06 at 09:01 +0200, Andi Kleen wrote:

> 
> > 140-odd entries which are being worked on.  It's slow and tedious,
> > but in progress. 
> 
> What I would do is to concentrate on the small cleanup patches first
> and post them as soon as you fix them. I think a lot of them were
> actually ok without changes. Then bigger stuff piece by piece. You
> don't need to wait to fix everything first.

I totally agree with this approach.  There are probably over 900 patches
that have been accepted into mainline as cleanups and bug fixes that
originated from Ingo's -rt patch set.  A lot of developers don't realize
how much the -rt patch has helped the mainline kernel.

But to keep the -rt patch maintainable, when a problem or cleanup is
discovered, we try very hard to get that fix into mainline.  That way we
don't still need to maintain it. Doesn't the Xen team do the same?

-- Steve

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans -- GFS
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (11 preceding siblings ...)
  2006-06-05  1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
@ 2006-06-05 13:38 ` David Woodhouse
  2006-06-05 14:10   ` Russell King
  2006-06-05 15:01   ` Steven Whitehouse
  2006-06-05 14:08 ` 2.6.18 -mm merge plans Oleg Nesterov
                   ` (7 subsequent siblings)
  20 siblings, 2 replies; 166+ messages in thread
From: David Woodhouse @ 2006-06-05 13:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Patrick Caulfield, Steven Whitehouse, davej,
	David Teigland

On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote:
> It's time to take a look at the -mm queue for 2.6.18.

You didn't mention GFS2 either -- is that ready to go upstream?
It contains this in its user<->kernel communication (whitespace sic)...

/* struct passed to the lock write */
struct dlm_lock_params {
       __u8 mode;
       __u16 flags;
       __u32 lkid;
       __u32 parent;
       __u8 namelen;
        void __user *castparam;
       void __user *castaddr;
       void __user *bastparam;
        void __user *bastaddr;
       struct dlm_lksb __user *lksb;
       char lvb[DLM_USER_LVB_LEN];
       char name[1];
};

struct dlm_lspace_params {
       __u32 flags;
       __u32 minor;
       char name[1];
};

struct dlm_write_request {
       __u32 version[3];
       __u8 cmd;

       union  {
               struct dlm_lock_params   lock;
               struct dlm_lspace_params lspace;
       } i;
};

/* struct read from the "device" fd,
   consists mainly of userspace pointers for the library to use */
struct dlm_lock_result {
       __u32 length;
       void __user * user_astaddr;
       void __user * user_astparam;
       struct dlm_lksb __user * user_lksb;
       struct dlm_lksb lksb;
       __u8 bast_mode;
       /* Offsets may be zero if no data is present */
       __u32 lvb_offset;
};

Now, the intention seems to be that instead of doing CONFIG_COMPAT stuff
in the kernel for backwards-compatibility with 32-bit userspace on
64-bit kernels, we _instead_ attempt to make the 32-bit userspace
_forward_ compatible with 64-bit kernels.

The userspace side of this is implemented (for sparc and s390 only) at
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/cluster/dlm/lib/dlm32.c?rev=1.3&content-type=text/plain&cvsroot=cluster

That approach looks broken when i386 binaries are run on x86_64, because
the offset of the 'qinfo' member in a struct which starts like this...

struct dlm_query_params64 {
	uint32_t query;
	uint64_t qinfo;
	...

... is going to be _different_ between the 32-bit userspace code and the
64-bit kernel anyway, despite the fact that this structure is supposed
to match the 64-bit kernel's idea of the structure.

The hdrcleanup and hdrinstall stuff helped to highlight this, because it
now stands out as being immediately obvious that we're adding pointers
to user-visible structures. I'm being asked to add the GFS2 headers to
Fedora's glibc-kernheaders package, but I _don't_ want to diverge from
upstream -- part of the reason for 'make headers_install' was that all
the distros will be able to ship a _consistent_ set of headers. So if
we're going to barf at the compat stuff above, can we do it ASAP and get
it fixed, or if we're going to accept the "userspace must be
forward-compatible" approach and send it to Linus as is, can we reach a
consensus on that instead?

-- 
dwmw2


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans -- GFS
  2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse
@ 2006-06-05 14:10   ` Russell King
  2006-06-05 15:01   ` Steven Whitehouse
  1 sibling, 0 replies; 166+ messages in thread
From: Russell King @ 2006-06-05 14:10 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andrew Morton, linux-kernel, Patrick Caulfield, Steven Whitehouse,
	davej, David Teigland

On Mon, Jun 05, 2006 at 02:38:50PM +0100, David Woodhouse wrote:
> You didn't mention GFS2 either -- is that ready to go upstream?
> It contains this in its user<->kernel communication (whitespace sic)...
> 
> /* struct passed to the lock write */
> struct dlm_lock_params {
>        __u8 mode;
>        __u16 flags;
>        __u32 lkid;
>        __u32 parent;
>        __u8 namelen;

Hmm.  This is going to be subject to random compiler padding.  It would
be much better to have:

	__u8 mode;
	__u8 namelen;
	__u16 flags;
	__u32 lkid;
	__u32 parent;

which should be less subject to compiler padding.

> struct dlm_write_request {
>        __u32 version[3];
>        __u8 cmd;

Ditto - though maybe following this by:

	__u8 unused[3];

would be a sane solution.

> struct dlm_lock_result {
>        __u32 length;
>        void __user * user_astaddr;
>        void __user * user_astparam;
>        struct dlm_lksb __user * user_lksb;
>        struct dlm_lksb lksb;
>        __u8 bast_mode;

Ditto.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans -- GFS
  2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse
  2006-06-05 14:10   ` Russell King
@ 2006-06-05 15:01   ` Steven Whitehouse
  2006-06-07  7:12     ` Steven Whitehouse
  1 sibling, 1 reply; 166+ messages in thread
From: Steven Whitehouse @ 2006-06-05 15:01 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andrew Morton, linux-kernel, Patrick Caulfield, davej,
	David Teigland

Hi,

On Mon, 2006-06-05 at 14:38 +0100, David Woodhouse wrote:
> On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote:
> > It's time to take a look at the -mm queue for 2.6.18.
> 
> You didn't mention GFS2 either -- is that ready to go upstream?

Assuming that 2.6.18 is imminent, as I understand it is, then my
preferred option would be for GFS2 to spend one more cycle in -mm,
assuming that nobody would disagree with that. I pretty sure that we'll
be ready by the time 2.6.19 comes around to request inclusion upstream
but 2.6.18 might be just a bit too soon,

Steve.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans -- GFS
  2006-06-05 15:01   ` Steven Whitehouse
@ 2006-06-07  7:12     ` Steven Whitehouse
  0 siblings, 0 replies; 166+ messages in thread
From: Steven Whitehouse @ 2006-06-07  7:12 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: David Teigland, davej, Patrick Caulfield, linux-kernel,
	Andrew Morton, David Woodhouse

Hi,

[at the risk of appearing to be mad by replying to myself...]

On Mon, 2006-06-05 at 16:01 +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Mon, 2006-06-05 at 14:38 +0100, David Woodhouse wrote:
> > On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote:
> > > It's time to take a look at the -mm queue for 2.6.18.
> > 
> > You didn't mention GFS2 either -- is that ready to go upstream?
> 
> Assuming that 2.6.18 is imminent, as I understand it is, then my
> preferred option would be for GFS2 to spend one more cycle in -mm,
> assuming that nobody would disagree with that. I pretty sure that we'll
> be ready by the time 2.6.19 comes around to request inclusion upstream
> but 2.6.18 might be just a bit too soon,
> 
> Steve.
> 

To clarify the above a bit more, there is one regression in GFS2 in the
current -mm tree that needs to be fixed. Since Linus has announced
2.6.17-rc6, there is now in fact time for that to happen, so that we
will be ready to merge for 2.6.18. Sorry for any confusion my earlier
comment might have caused,

Steve.
[now recovered from jet lag :-) ]


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (12 preceding siblings ...)
  2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse
@ 2006-06-05 14:08 ` Oleg Nesterov
  2006-06-05 14:43 ` Serge E. Hallyn
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 166+ messages in thread
From: Oleg Nesterov @ 2006-06-05 14:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On 06/04, Andrew Morton wrote:
>
> de_thread-fix-lockless-do_each_thread.patch
> coredump-optimize-mm-users-traversal.patch
> coredump-speedup-sigkill-sending.patch
> coredump-kill-ptrace-related-stuff.patch
> coredump-kill-ptrace-related-stuff-fix.patch
> coredump-dont-take-tasklist_lock.patch
> coredump-some-code-relocations.patch
> coredump-shutdown-current-process-first.patch
> coredump-copy_process-dont-check-signal_group_exit.patch
>
>  Will merge.  I have a note here that Roland had issues with
>  coredump-kill-ptrace-related-stuff.patch?

Should be solved by coredump-kill-ptrace-related-stuff-fix.patch.
(There was no explicit ack from Roland though).

Oleg.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (13 preceding siblings ...)
  2006-06-05 14:08 ` 2.6.18 -mm merge plans Oleg Nesterov
@ 2006-06-05 14:43 ` Serge E. Hallyn
  2006-06-08 19:56   ` Eric W. Biederman
  2006-06-06  0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 166+ messages in thread
From: Serge E. Hallyn @ 2006-06-05 14:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Eric W. Biederman, Kirill Korotaev, Dave Hansen,
	Hubertus Franke, Cedric Le Goater

Quoting Andrew Morton (akpm@osdl.org):
> proc-sysctl-add-_proc_do_string-helper.patch
> namespaces-add-nsproxy.patch
> namespaces-add-nsproxy-dont-include-compileh.patch
> namespaces-incorporate-fs-namespace-into-nsproxy.patch
> namespaces-utsname-introduce-temporary-helpers.patch
> namespaces-utsname-switch-to-using-uts-namespaces.patch
> namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch
> namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch
> namespaces-utsname-use-init_utsname-when-appropriate.patch
> namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch
> namespaces-utsname-implement-utsname-namespaces.patch
> namespaces-utsname-implement-utsname-namespaces-export.patch
> namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch
> namespaces-utsname-sysctl-hack.patch
> namespaces-utsname-sysctl-hack-cleanup.patch
> namespaces-utsname-sysctl-hack-cleanup-2.patch
> namespaces-utsname-sysctl-hack-cleanup-2-fix.patch
> namespaces-utsname-remove-system_utsname.patch
> namespaces-utsname-implement-clone_newuts-flag.patch
> uts-copy-nsproxy-only-when-needed.patch
> # needed if git-klibc isn't there:
> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch
> #namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch
> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch
> 
>  utsname virtualisation.  This doesn't seem very pointful as a standalone
>  thing.  That's a general problem with infrastructural work for a very
>  large new feature.
> 
>  So probably I'll continue to babysit these patches, unless someone can
>  identify a decent reason why mainline needs this work.
> 
>  I don't want to carry an ever-growing stream of OS-virtualisation
>  groundwork patches for ever and ever so if we're going to do this thing...
>  faster, please.

Eric, Kirill, Dave, Hubertus,

In the spirit of 'faster, please', does someone care to port and
resubmit a pidspace patch?

I'll do it if noone else wants to, just don't want to step on anyone's
toes if you were already working on it.

thanks,
-serge

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-05 14:43 ` Serge E. Hallyn
@ 2006-06-08 19:56   ` Eric W. Biederman
  2006-06-09 13:02     ` Serge E. Hallyn
  2006-06-09 23:25     ` Serge E. Hallyn
  0 siblings, 2 replies; 166+ messages in thread
From: Eric W. Biederman @ 2006-06-08 19:56 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andrew Morton, linux-kernel, Kirill Korotaev, Dave Hansen,
	Hubertus Franke, Cedric Le Goater

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Andrew Morton (akpm@osdl.org):
>> proc-sysctl-add-_proc_do_string-helper.patch
>> namespaces-add-nsproxy.patch
>> namespaces-add-nsproxy-dont-include-compileh.patch
>> namespaces-incorporate-fs-namespace-into-nsproxy.patch
>> namespaces-utsname-introduce-temporary-helpers.patch
>> namespaces-utsname-switch-to-using-uts-namespaces.patch
>> namespaces-utsname-switch-to-using-uts-namespaces-alpha-fix.patch
>> namespaces-utsname-switch-to-using-uts-namespaces-cleanup.patch
>> namespaces-utsname-use-init_utsname-when-appropriate.patch
>> namespaces-utsname-use-init_utsname-when-appropriate-cifs-update.patch
>> namespaces-utsname-implement-utsname-namespaces.patch
>> namespaces-utsname-implement-utsname-namespaces-export.patch
>> namespaces-utsname-implement-utsname-namespaces-dont-include-compileh.patch
>> namespaces-utsname-sysctl-hack.patch
>> namespaces-utsname-sysctl-hack-cleanup.patch
>> namespaces-utsname-sysctl-hack-cleanup-2.patch
>> namespaces-utsname-sysctl-hack-cleanup-2-fix.patch
>> namespaces-utsname-remove-system_utsname.patch
>> namespaces-utsname-implement-clone_newuts-flag.patch
>> uts-copy-nsproxy-only-when-needed.patch
>> # needed if git-klibc isn't there:
>> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit.patch
>> #namespaces-utsname-use-init_utsname-when-appropriate-klibc-bit.patch
>> #namespaces-utsname-switch-to-using-uts-namespaces-klibc-bit-2.patch
>> 
>>  utsname virtualisation.  This doesn't seem very pointful as a standalone
>>  thing.  That's a general problem with infrastructural work for a very
>>  large new feature.
>> 
>>  So probably I'll continue to babysit these patches, unless someone can
>>  identify a decent reason why mainline needs this work.
>> 
>>  I don't want to carry an ever-growing stream of OS-virtualisation
>>  groundwork patches for ever and ever so if we're going to do this thing...
>>  faster, please.

Ack.  I agree we need to start moving faster.
I had a couple of distractions but I should be sending out some
relevant patches in a bit.  The more we can get out for review
before kernel summit the better the conversation will be I suspect.

> Eric, Kirill, Dave, Hubertus,
>
> In the spirit of 'faster, please', does someone care to port and
> resubmit a pidspace patch?

I think I can get that one. Except for the very tail end though
most of my patches probably won't be directly pidspace patches.
I'm going to work on killing sys_sysctl a little before I
get to far into that.   A pidspace is one of the most controversial
patches so it is a bit tricky.

> I'll do it if noone else wants to, just don't want to step on anyone's
> toes if you were already working on it.

If you want to help with the bare pid to struct pid conversion I
don't have any outstanding patches, and getting that done kills
some theoretical pid wrap around problems as well as laying the ground
work for a simple pidspace implementation.

Eric

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-08 19:56   ` Eric W. Biederman
@ 2006-06-09 13:02     ` Serge E. Hallyn
  2006-06-09 23:25     ` Serge E. Hallyn
  1 sibling, 0 replies; 166+ messages in thread
From: Serge E. Hallyn @ 2006-06-09 13:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Andrew Morton, linux-kernel, Kirill Korotaev,
	Dave Hansen, Hubertus Franke, Cedric Le Goater

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
> > Eric, Kirill, Dave, Hubertus,
> >
> > In the spirit of 'faster, please', does someone care to port and
> > resubmit a pidspace patch?
> 
> I think I can get that one. Except for the very tail end though
> most of my patches probably won't be directly pidspace patches.
> I'm going to work on killing sys_sysctl a little before I
> get to far into that.   A pidspace is one of the most controversial
> patches so it is a bit tricky.
> 
> > I'll do it if noone else wants to, just don't want to step on anyone's
> > toes if you were already working on it.
> 
> If you want to help with the bare pid to struct pid conversion I
> don't have any outstanding patches, and getting that done kills
> some theoretical pid wrap around problems as well as laying the ground
> work for a simple pidspace implementation.

Yeah, I'll get going on that over the next week.  A quick lxr search
shows quite a few remaining hits on pid_t  :)

-serge

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-08 19:56   ` Eric W. Biederman
  2006-06-09 13:02     ` Serge E. Hallyn
@ 2006-06-09 23:25     ` Serge E. Hallyn
  2006-06-10  0:39       ` Eric W. Biederman
  2006-06-10  9:53       ` Christoph Hellwig
  1 sibling, 2 replies; 166+ messages in thread
From: Serge E. Hallyn @ 2006-06-09 23:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> If you want to help with the bare pid to struct pid conversion I
> don't have any outstanding patches, and getting that done kills
> some theoretical pid wrap around problems as well as laying the ground
> work for a simple pidspace implementation.
> 
> Eric

Is this the sort of thing you are looking for?  Is this worthwhile for
kernel_threads, or only for userspace threads - i.e. do we expect kernel
threads to live?

If we do want to do this for kernel threads, then I assume that
eventually we'll want to change kernel_thread() itself.  I actually
started to do that earlier, but of course that way every user would
have to be changed in the same patch :)

Subject: [PATCH] struct pid: convert ieee1394 to hold struct pid

ieee1394 driver caches pid_t's for kernel threads.  Switch to
holding a reference to a struct pid.  This prevents concern
about the cached pid pointing to the wrong process after the
kernel thread dies and pids wrap around.

Signed-off-by: Serge Hallyn <serue@us.ibm.com>

---

 drivers/ieee1394/ieee1394_core.c |   16 ++++++++++------
 drivers/ieee1394/nodemgr.c       |   12 ++++++++----
 2 files changed, 18 insertions(+), 10 deletions(-)

ca429eb5558988a34815c8cdfcecd26a06170f4f
diff --git a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c
index be6854e..4db5c54 100644
--- a/drivers/ieee1394/ieee1394_core.c
+++ b/drivers/ieee1394/ieee1394_core.c
@@ -33,6 +33,7 @@
 #include <linux/kdev_t.h>
 #include <linux/skbuff.h>
 #include <linux/suspend.h>
+#include <linux/pid.h>
 
 #include <asm/byteorder.h>
 #include <asm/semaphore.h>
@@ -997,7 +998,8 @@ void abort_timedouts(unsigned long __opa
  * packets that have a "complete" function are sent here. This way, the
  * completion is run out of kernel context, and doesn't block the rest of
  * the stack. */
-static int khpsbpkt_pid = -1, khpsbpkt_kill;
+static int khpsbpkt_kill;
+static struct pid *khpsbpkt_pid;
 static DECLARE_COMPLETION(khpsbpkt_complete);
 static struct sk_buff_head hpsbpkt_queue;
 static DECLARE_MUTEX_LOCKED(khpsbpkt_sig);
@@ -1056,6 +1058,7 @@ static int hpsbpkt_thread(void *__hi)
 static int __init ieee1394_init(void)
 {
 	int i, ret;
+	pid_t nr;
 
 	skb_queue_head_init(&hpsbpkt_queue);
 
@@ -1065,12 +1068,13 @@ static int __init ieee1394_init(void)
 		HPSB_ERR("Some features may not be available\n");
 	}
 
-	khpsbpkt_pid = kernel_thread(hpsbpkt_thread, NULL, CLONE_KERNEL);
-	if (khpsbpkt_pid < 0) {
+	nr = kernel_thread(hpsbpkt_thread, NULL, CLONE_KERNEL);
+	if (nr < 0) {
 		HPSB_ERR("Failed to start hpsbpkt thread!\n");
 		ret = -ENOMEM;
 		goto exit_cleanup_config_roms;
 	}
+	khpsbpkt_pid = get_pid(nr);
 
 	if (register_chrdev_region(IEEE1394_CORE_DEV, 256, "ieee1394")) {
 		HPSB_ERR("unable to register character device major %d!\n", IEEE1394_MAJOR);
@@ -1148,8 +1152,8 @@ release_all_bus:
 release_chrdev:
 	unregister_chrdev_region(IEEE1394_CORE_DEV, 256);
 exit_release_kernel_thread:
-	if (khpsbpkt_pid >= 0) {
-		kill_proc(khpsbpkt_pid, SIGTERM, 1);
+	if (khpsbpkt_pid) {
+		kill_proc(khpsbpkt_pid->nr, SIGTERM, 1);
 		wait_for_completion(&khpsbpkt_complete);
 	}
 exit_cleanup_config_roms:
@@ -1172,7 +1176,7 @@ static void __exit ieee1394_cleanup(void
 		bus_remove_file(&ieee1394_bus_type, fw_bus_attrs[i]);
 	bus_unregister(&ieee1394_bus_type);
 
-	if (khpsbpkt_pid >= 0) {
+	if (khpsbpkt_pid) {
 		khpsbpkt_kill = 1;
 		mb();
 		up(&khpsbpkt_sig);
diff --git a/drivers/ieee1394/nodemgr.c b/drivers/ieee1394/nodemgr.c
index 082c7fd..d33f2fe 100644
--- a/drivers/ieee1394/nodemgr.c
+++ b/drivers/ieee1394/nodemgr.c
@@ -19,6 +19,7 @@
 #include <linux/delay.h>
 #include <linux/pci.h>
 #include <linux/moduleparam.h>
+#include <linux/pid.h>
 #include <asm/atomic.h>
 
 #include "ieee1394_types.h"
@@ -115,7 +116,7 @@ struct host_info {
 	struct list_head list;
 	struct completion exited;
 	struct semaphore reset_sem;
-	int pid;
+	struct pid *pid;
 	char daemon_name[15];
 	int kill_me;
 };
@@ -1705,6 +1706,7 @@ int hpsb_node_write(struct node_entry *n
 static void nodemgr_add_host(struct hpsb_host *host)
 {
 	struct host_info *hi;
+	pid_t nr;
 
 	hi = hpsb_create_hostinfo(&nodemgr_highlevel, host, sizeof(*hi));
 
@@ -1719,14 +1721,15 @@ static void nodemgr_add_host(struct hpsb
 
 	sprintf(hi->daemon_name, "knodemgrd_%d", host->id);
 
-	hi->pid = kernel_thread(nodemgr_host_thread, hi, CLONE_KERNEL);
+	nr = kernel_thread(nodemgr_host_thread, hi, CLONE_KERNEL);
 
-	if (hi->pid < 0) {
+	if (nr < 0) {
 		HPSB_ERR ("NodeMgr: failed to start %s thread for %s",
 			  hi->daemon_name, host->driver->name);
 		hpsb_destroy_hostinfo(&nodemgr_highlevel, host);
 		return;
 	}
+	hi->pid = find_get_pid(nr);
 
 	return;
 }
@@ -1749,11 +1752,12 @@ static void nodemgr_remove_host(struct h
 	struct host_info *hi = hpsb_get_hostinfo(&nodemgr_highlevel, host);
 
 	if (hi) {
-		if (hi->pid >= 0) {
+		if (hi->pid->nr >= 0) {
 			hi->kill_me = 1;
 			mb();
 			up(&hi->reset_sem);
 			wait_for_completion(&hi->exited);
+			put_pid(hi->pid);
 			nodemgr_remove_host_dev(&host->device);
 		}
 	} else
-- 
1.1.6

^ permalink raw reply related	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-09 23:25     ` Serge E. Hallyn
@ 2006-06-10  0:39       ` Eric W. Biederman
  2006-06-10  1:23         ` Serge E. Hallyn
  2006-06-10  9:53       ` Christoph Hellwig
  1 sibling, 1 reply; 166+ messages in thread
From: Eric W. Biederman @ 2006-06-10  0:39 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: linux-kernel

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> If you want to help with the bare pid to struct pid conversion I
>> don't have any outstanding patches, and getting that done kills
>> some theoretical pid wrap around problems as well as laying the ground
>> work for a simple pidspace implementation.
>> 
>> Eric
>
> Is this the sort of thing you are looking for?  Is this worthwhile for
> kernel_threads, or only for userspace threads - i.e. do we expect kernel
> threads to live?

For kernel threads we should simply be able to use their task
struct.

In this instance we have hit upon a different problem.  Anything
using the kernel_thread API instead of the kthread api needs 
to be updated.

The basic problem is that for kernel_threads can show up
inside of containers.

We can fix that by updating daemonize or we can simply
universally use the kthread api.  Since the kernel_thread
api is deprecated because of these kinds of reasons
what really makes sense is to work on the transition
to the kthread api.

> If we do want to do this for kernel threads, then I assume that
> eventually we'll want to change kernel_thread() itself.  I actually
> started to do that earlier, but of course that way every user would
> have to be changed in the same patch :)
>
> Subject: [PATCH] struct pid: convert ieee1394 to hold struct pid
>
> ieee1394 driver caches pid_t's for kernel threads.  Switch to
> holding a reference to a struct pid.  This prevents concern
> about the cached pid pointing to the wrong process after the
> kernel thread dies and pids wrap around.
>
> Signed-off-by: Serge Hallyn <serue@us.ibm.com>

Ok a couple of comments.

As I recall there are some pretty sane ways of going
from struct pid to a task_struct and then we can use things
like group_send_sig.

But otherwise you seem to be using struct pid ok.

Eric

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-10  0:39       ` Eric W. Biederman
@ 2006-06-10  1:23         ` Serge E. Hallyn
  2006-06-10  7:52           ` Eric W. Biederman
  2006-06-10  8:09           ` Eric W. Biederman
  0 siblings, 2 replies; 166+ messages in thread
From: Serge E. Hallyn @ 2006-06-10  1:23 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> If you want to help with the bare pid to struct pid conversion I
> >> don't have any outstanding patches, and getting that done kills
> >> some theoretical pid wrap around problems as well as laying the ground
> >> work for a simple pidspace implementation.
> >> 
> >> Eric
> >
> > Is this the sort of thing you are looking for?  Is this worthwhile for
> > kernel_threads, or only for userspace threads - i.e. do we expect kernel
> > threads to live?
> 
> For kernel threads we should simply be able to use their task
> struct.
> 
> In this instance we have hit upon a different problem.  Anything
> using the kernel_thread API instead of the kthread api needs 
> to be updated.
> 
> The basic problem is that for kernel_threads can show up
> inside of containers.
> 
> We can fix that by updating daemonize or we can simply
> universally use the kthread api.  Since the kernel_thread
> api is deprecated because of these kinds of reasons
> what really makes sense is to work on the transition
> to the kthread api.

Egads, I apologize.

Apparently I was in a daze, as I'd forgotten that converting
all kernel_thread users to kthread was something else we wanted
to work towards, and which Christoph had explicitly asked for
help with.

> Ok a couple of comments.
> 
> As I recall there are some pretty sane ways of going
> from struct pid to a task_struct and then we can use things
> like group_send_sig.

Oh, you mean instead of doing kill_proc(struct pid->nr), which
I guess was pretty braindead?  :)

Ok, futile as this may have seemed overall, I think it's helped
me figure out what to actually do.

thanks,
-serge

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-10  1:23         ` Serge E. Hallyn
@ 2006-06-10  7:52           ` Eric W. Biederman
  2006-06-10  8:09           ` Eric W. Biederman
  1 sibling, 0 replies; 166+ messages in thread
From: Eric W. Biederman @ 2006-06-10  7:52 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: linux-kernel

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Egads, I apologize.
>
> Apparently I was in a daze, as I'd forgotten that converting
> all kernel_thread users to kthread was something else we wanted
> to work towards, and which Christoph had explicitly asked for
> help with.

Yep.  And the linux-vserver guys discovered the hard way.

>> Ok a couple of comments.
>> 
>> As I recall there are some pretty sane ways of going
>> from struct pid to a task_struct and then we can use things
>> like group_send_sig.
>
> Oh, you mean instead of doing kill_proc(struct pid->nr), which
> I guess was pretty braindead?  :)

I think it defeats half our purpose.

> Ok, futile as this may have seemed overall, I think it's helped
> me figure out what to actually do.

Sure and that is what it was aimed to do.

You want to attack the kernel_thread -> kthread thing?

Eric

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-10  1:23         ` Serge E. Hallyn
  2006-06-10  7:52           ` Eric W. Biederman
@ 2006-06-10  8:09           ` Eric W. Biederman
  1 sibling, 0 replies; 166+ messages in thread
From: Eric W. Biederman @ 2006-06-10  8:09 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: linux-kernel

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Oh, you mean instead of doing kill_proc(struct pid->nr), which
> I guess was pretty braindead?  :)

For a single process we should be able to do:
struct pid *pid = ( some value ... )
struct task_struct *task;
rcu_read_lock();
task = pid_task(pid);
if (task)
	group_send_sig_info(sig, info, task);
rcu_read_unlock();

If it comes up very often that looks like an idiom
that would appreciate a helper function.

For process groups we must get a read_lock on the task_list_lock
because otherwise the atomicity guarantees of sending a signal
to a process group are broken.

Eric

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-09 23:25     ` Serge E. Hallyn
  2006-06-10  0:39       ` Eric W. Biederman
@ 2006-06-10  9:53       ` Christoph Hellwig
  1 sibling, 0 replies; 166+ messages in thread
From: Christoph Hellwig @ 2006-06-10  9:53 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Eric W. Biederman, linux-kernel

On Fri, Jun 09, 2006 at 06:25:51PM -0500, Serge E. Hallyn wrote:
> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > If you want to help with the bare pid to struct pid conversion I
> > don't have any outstanding patches, and getting that done kills
> > some theoretical pid wrap around problems as well as laying the ground
> > work for a simple pidspace implementation.
> > 
> > Eric
> 
> Is this the sort of thing you are looking for?  Is this worthwhile for
> kernel_threads, or only for userspace threads - i.e. do we expect kernel
> threads to live?
> 
> If we do want to do this for kernel threads, then I assume that
> eventually we'll want to change kernel_thread() itself.  I actually
> started to do that earlier, but of course that way every user would
> have to be changed in the same patch :)


> 
> Subject: [PATCH] struct pid: convert ieee1394 to hold struct pid
> 
> ieee1394 driver caches pid_t's for kernel threads.  Switch to
> holding a reference to a struct pid.  This prevents concern
> about the cached pid pointing to the wrong process after the
> kernel thread dies and pids wrap around.

NACK.  please conver to the kthread_ API instead.  A reference to a pid_t
in a driver should generally be treated as a bug, the few exception should
be discussed on lkml and commented verbosely.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (14 preceding siblings ...)
  2006-06-05 14:43 ` Serge E. Hallyn
@ 2006-06-06  0:54 ` Balbir Singh
  2006-06-06 22:28   ` Shailabh Nagar
  2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 166+ messages in thread
From: Balbir Singh @ 2006-06-06  0:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Jay Lan, Peter Chubb

Andrew Morton wrote:
> per-task-delay-accounting-setup.patch
> per-task-delay-accounting-setup-fix-1.patch
> per-task-delay-accounting-setup-fix-2.patch
> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch
> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch
> per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch
> per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch
> per-task-delay-accounting-utilities-for-genetlink-usage.patch
> per-task-delay-accounting-taskstats-interface.patch
> per-task-delay-accounting-taskstats-interface-fix-1.patch
> per-task-delay-accounting-taskstats-interface-fix-2.patch
> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch
> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch
> per-task-delay-accounting-documentation.patch
> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch
> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch
> 
>  I just don't know.  There are a number of groups who pop up with various
>  enhanced accounting requirements and patches (all quite different) but I
>  haven't heard a lot of enthusiasm from any of them over this work, which
>  attempts to provide an extensible framework for accumulation and querying
>  of per-task metrics.
> 
>  But then again, we cannot just sit there and wait for everyone to be 100%
>  happy.  So I'm 51% inclined to push this along.
> 
>  Anyone else who has an interest in this sort of thing needs to be aware
>  that there will be an expectation that any future statistics submissions
>  should use these interfaces.  So the time to pay attention is right now.
> 

Hi, Andrew,

Here is a brief summary of the status of the response we have received from
the stakeholders (some of it has been duplicated in previous postings)

Project                                         Response

1. CSA accounting/PAGG/JOB:                    Has agreed to use taskstats
  Jay Lan <jlan@engr.sgi.com>                  interface

2. per-process IO statistics:                  None
  Levent Serinol <lserinol@gmail.com>          Needs are subset of CSA

3. per-cpu time statistics:                    None (email bounced)
  Erich Focht <efocht@ess.nec.de>              Needs can be met by taskstats
                                               Statistics not yet submitted

4. Microstate accounting:                      None
  Peter Chubb <peterc@gelato.unsw.edu.au>      overlap with delay accounting
                                               prefers /proc due to convenience
                                               taskstats can meet the needs


5. ELSA: Guillaume Thouvenin                   None
  <guillaume.thouvenin@bull.net>               ELSA is not a direct user
                                               of new kernel statistics
                                               Consumer of CSA/BSD accounting
                                               statistics

6. pnotify: Jes Sorensen <jes@sgi.com>         None
(taken over pnotify from Erik Jacobson)        Informed over private email
                                               that pnotify replacement is
                                               being worked on. pnotify
                                               or its replacement will
                                               not be concerned with
                                               exporting data to user space
                                               or collecting any statistics.


7. Scalable statistics counters with /proc      Not working on it
  reporting:                                   anymore
  Ravikiran G Thirumalai,
  Dipankar Sarma <dipankar@in.ibm.com>

Studying the responses from all stake holders, Jay Lan's was the most
encouraging. Peter Chubb prefers the /proc interface due to the text interface
and ease of parsing. (in our opinion, taskstats can meet the needs easily
and the getdelays utility can provide the same ease for parsing).
The others did not respond. 

Some performance numbers of taskstats were posted at
http://lkml.org/lkml/2006/3/23/141. The result highlights are included
below

    Results highlights

    - Configuring delay accounting adds < 0.5%
      overhead in most cases and even reduces overhead
      in some cases

    - Enabling delay accounting has similar results
      with a maximum overhead of 1.2% for hackbench,
      most other overheads < 1% and reduction in
      overhead in some cases

These statistics are _per task_ and can be extended easily by anyone
who wishes to obtain per task data. An example of per task improved
scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381
(I am not sure if the email refers to our per-task statistics). If not,
the new statistics could easily use the taskstats interface.

These statistics can be used by software product stacks to monitor
usage information about the various tasks they create and control.
I also informally spoke to a group of students (verbally), who were
excited at the possibility of using the per-task statistics to do
dynamic deadline based power management. They want to use the delay data
(CPU and IO) to predict deadlines for a task and then use these results
for dynamically scaling CPU frequency.


The ability to monitor the CPU run and delay data and IO delay data is useful.

I would request you to consider the inclusion per-task delay accounting into
2.6.18.

-- 

	Thanks,
	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-06  0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh
@ 2006-06-06 22:28   ` Shailabh Nagar
  2006-06-06 22:40     ` Andrew Morton
  2006-06-06 22:52     ` Jay Lan
  0 siblings, 2 replies; 166+ messages in thread
From: Shailabh Nagar @ 2006-06-06 22:28 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, linux-kernel, Jay Lan, Peter Chubb

Balbir Singh wrote:
> Andrew Morton wrote:
> 
>> per-task-delay-accounting-setup.patch
>> per-task-delay-accounting-setup-fix-1.patch
>> per-task-delay-accounting-setup-fix-2.patch
>> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch
>>
>> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch
>>
>> per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch
>> per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch
>> per-task-delay-accounting-utilities-for-genetlink-usage.patch
>> per-task-delay-accounting-taskstats-interface.patch
>> per-task-delay-accounting-taskstats-interface-fix-1.patch
>> per-task-delay-accounting-taskstats-interface-fix-2.patch
>> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch
>>
>> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch
>>
>> per-task-delay-accounting-documentation.patch
>> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch
>>
>> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch
>>
>>
>>  I just don't know.  There are a number of groups who pop up with various
>>  enhanced accounting requirements and patches (all quite different) but I
>>  haven't heard a lot of enthusiasm from any of them over this work, which
>>  attempts to provide an extensible framework for accumulation and
>> querying
>>  of per-task metrics.
>>
>>  But then again, we cannot just sit there and wait for everyone to be
>> 100%
>>  happy.  So I'm 51% inclined to push this along.
>>
>>  Anyone else who has an interest in this sort of thing needs to be aware
>>  that there will be an expectation that any future statistics submissions
>>  should use these interfaces.  So the time to pay attention is right now.
>>
> 
> Hi, Andrew,
> 
> Here is a brief summary of the status of the response we have received from
> the stakeholders (some of it has been duplicated in previous postings)
> 
> Project                                         Response
> 
> 1. CSA accounting/PAGG/JOB:                    Has agreed to use taskstats
>  Jay Lan <jlan@engr.sgi.com>                  interface
> 
> 2. per-process IO statistics:                  None
>  Levent Serinol <lserinol@gmail.com>          Needs are subset of CSA
> 
> 3. per-cpu time statistics:                    None (email bounced)
>  Erich Focht <efocht@ess.nec.de>              Needs can be met by taskstats
>                                               Statistics not yet submitted
> 
> 4. Microstate accounting:                      None
>  Peter Chubb <peterc@gelato.unsw.edu.au>      overlap with delay accounting
>                                               prefers /proc due to
> convenience
>                                               taskstats can meet the needs
> 
> 
> 5. ELSA: Guillaume Thouvenin                   None
>  <guillaume.thouvenin@bull.net>               ELSA is not a direct user
>                                               of new kernel statistics
>                                               Consumer of CSA/BSD
> accounting
>                                               statistics
> 
> 6. pnotify: Jes Sorensen <jes@sgi.com>         None
> (taken over pnotify from Erik Jacobson)        Informed over private email
>                                               that pnotify replacement is
>                                               being worked on. pnotify
>                                               or its replacement will
>                                               not be concerned with
>                                               exporting data to user space
>                                               or collecting any statistics.
> 
> 
> 7. Scalable statistics counters with /proc      Not working on it
>  reporting:                                   anymore
>  Ravikiran G Thirumalai,
>  Dipankar Sarma <dipankar@in.ibm.com>
> 
> Studying the responses from all stake holders, Jay Lan's was the most
> encouraging. Peter Chubb prefers the /proc interface due to the text
> interface
> and ease of parsing. (in our opinion, taskstats can meet the needs easily
> and the getdelays utility can provide the same ease for parsing).
> The others did not respond.
> Some performance numbers of taskstats were posted at
> http://lkml.org/lkml/2006/3/23/141. The result highlights are included
> below
> 
>    Results highlights
> 
>    - Configuring delay accounting adds < 0.5%
>      overhead in most cases and even reduces overhead
>      in some cases
> 
>    - Enabling delay accounting has similar results
>      with a maximum overhead of 1.2% for hackbench,
>      most other overheads < 1% and reduction in
>      overhead in some cases
> 
> These statistics are _per task_ and can be extended easily by anyone
> who wishes to obtain per task data. An example of per task improved
> scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381
> (I am not sure if the email refers to our per-task statistics). If not,
> the new statistics could easily use the taskstats interface.
> 
> These statistics can be used by software product stacks to monitor
> usage information about the various tasks they create and control.
> I also informally spoke to a group of students (verbally), who were
> excited at the possibility of using the per-task statistics to do
> dynamic deadline based power management. They want to use the delay data
> (CPU and IO) to predict deadlines for a task and then use these results
> for dynamically scaling CPU frequency.
> 
> 
> The ability to monitor the CPU run and delay data and IO delay data is
> useful.
> 
> I would request you to consider the inclusion per-task delay accounting
> into
> 2.6.18.
> 


Andrew,

The only other new set of patches to be discussed in this context are the
statistics-infrastructure patches from Martin Peschke.

That infrastructure cannot meet the needs of delay accounting, CSA etc. because
- it only provides "user pull" model of getting stats whereas "kernel push" is
needed for delay accounting
- it uses a relatively slow interface unsuitable for high volumes of data. Each
statistic has its own definition, needs to be read separately using ASCII,
reading data continuously means open/read/close each time.....all of
which is not very conducive to large structures being sent to userspace.
- its oriented towards sampled data whereas taskstats isn't.

So, we have a good consensus from existing/potential users of taskstats and would
very much appreciate it being included in 2.6.18.

--Shailabh





^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-06 22:28   ` Shailabh Nagar
@ 2006-06-06 22:40     ` Andrew Morton
  2006-06-08 14:27       ` Shailabh Nagar
  2006-06-06 22:52     ` Jay Lan
  1 sibling, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-06 22:40 UTC (permalink / raw)
  To: Shailabh Nagar; +Cc: balbir, linux-kernel, jlan, peterc

On Tue, 06 Jun 2006 18:28:15 -0400
Shailabh Nagar <nagar@watson.ibm.com> wrote:

> So, we have a good consensus from existing/potential users of taskstats and would
> very much appreciate it being included in 2.6.18.

Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play
wait-and-see on the statistics infrastructure.  Greg is taking a look at
the stats code, which is good.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-06 22:40     ` Andrew Morton
@ 2006-06-08 14:27       ` Shailabh Nagar
  2006-06-08 17:42         ` Andrew Morton
  0 siblings, 1 reply; 166+ messages in thread
From: Shailabh Nagar @ 2006-06-08 14:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: balbir, linux-kernel, jlan, peterc

Andrew Morton wrote:

>On Tue, 06 Jun 2006 18:28:15 -0400
>Shailabh Nagar <nagar@watson.ibm.com> wrote:
>
>  
>
>>So, we have a good consensus from existing/potential users of taskstats and would
>>very much appreciate it being included in 2.6.18.
>>    
>>
>
>Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play
>wait-and-see on the statistics infrastructure.  Greg is taking a look at
>the stats code, which is good.
>
>  
>
Thanks !

The suggestion from  Jay Lan to extend the interface by making sending 
of tgid stats configurable
is quite reasonable and can be done relatively simply:
set some parameter, either by sending a separate command (verify sender 
is privileged) or by
some sysfs parameter and use that to control sending of tgid stats on 
task exit (as well as allocation of
any tgid stat related structures).

Would you recommend we submit a patch for it now or wait till after 
delay accounting has gone into
2.6.18 ?

Such requests for extending the interface are likely to happen as more 
users start using the interface.
But since any patch will need some testing etc. and we are very close to 
the 2.6.18 merge window, I
wanted your advice on whether this should wait until later.

Regards,
Shailabh

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-08 14:27       ` Shailabh Nagar
@ 2006-06-08 17:42         ` Andrew Morton
  2006-06-08 18:36           ` Shailabh Nagar
  0 siblings, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-08 17:42 UTC (permalink / raw)
  To: Shailabh Nagar; +Cc: balbir, linux-kernel, jlan, peterc

On Thu, 08 Jun 2006 10:27:46 -0400
Shailabh Nagar <nagar@watson.ibm.com> wrote:

> Andrew Morton wrote:
> 
> >On Tue, 06 Jun 2006 18:28:15 -0400
> >Shailabh Nagar <nagar@watson.ibm.com> wrote:
> >
> >  
> >
> >>So, we have a good consensus from existing/potential users of taskstats and would
> >>very much appreciate it being included in 2.6.18.
> >>    
> >>
> >
> >Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play
> >wait-and-see on the statistics infrastructure.  Greg is taking a look at
> >the stats code, which is good.
> >
> >  
> >
> Thanks !
> 
> The suggestion from  Jay Lan to extend the interface by making sending 
> of tgid stats configurable
> is quite reasonable and can be done relatively simply:
> set some parameter, either by sending a separate command (verify sender 
> is privileged) or by
> some sysfs parameter and use that to control sending of tgid stats on 
> task exit (as well as allocation of
> any tgid stat related structures).

hm.  Is it possible to check the privileges of a netlink message sender?

> Would you recommend we submit a patch for it now or wait till after 
> delay accounting has gone into
> 2.6.18 ?

Earlier, please.

> Such requests for extending the interface are likely to happen as more 
> users start using the interface.
> But since any patch will need some testing etc. and we are very close to 
> the 2.6.18 merge window, I
> wanted your advice on whether this should wait until later.

If it's merged, we'll have a couple more months to test it, and to fix any
little problems.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-08 17:42         ` Andrew Morton
@ 2006-06-08 18:36           ` Shailabh Nagar
  2006-06-08 19:33             ` Balbir Singh
  0 siblings, 1 reply; 166+ messages in thread
From: Shailabh Nagar @ 2006-06-08 18:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: balbir, linux-kernel, jlan, peterc

Andrew Morton wrote:

>On Thu, 08 Jun 2006 10:27:46 -0400
>Shailabh Nagar <nagar@watson.ibm.com> wrote:
>
>  
>
>>Andrew Morton wrote:
>>
>>    
>>
>>>On Tue, 06 Jun 2006 18:28:15 -0400
>>>Shailabh Nagar <nagar@watson.ibm.com> wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>So, we have a good consensus from existing/potential users of taskstats and would
>>>>very much appreciate it being included in 2.6.18.
>>>>   
>>>>
>>>>        
>>>>
>>>Yes, for 2.6.18 I'm inclined to send taskstats and to continue to play
>>>wait-and-see on the statistics infrastructure.  Greg is taking a look at
>>>the stats code, which is good.
>>>
>>> 
>>>
>>>      
>>>
>>Thanks !
>>
>>The suggestion from  Jay Lan to extend the interface by making sending 
>>of tgid stats configurable
>>is quite reasonable and can be done relatively simply:
>>set some parameter, either by sending a separate command (verify sender 
>>is privileged) or by
>>some sysfs parameter and use that to control sending of tgid stats on 
>>task exit (as well as allocation of
>>any tgid stat related structures).
>>    
>>
>
>hm.  Is it possible to check the privileges of a netlink message sender?
>  
>
Not entirely sure. But there's a check in net/netlink/genetlink.c: 
genl_rcv_msg()
for
if ((ops->flags & GENL_ADMIN_PERM) && security_netlink_recv(skb))
{    err = -EPERM;
    goto errout;
}

and security_netlink_recv(skb), normally set to cap_netlink_recv, checks 
on the skb's effective capability
being CAP_NET_ADMIN which I thought would be sufficient.
Need to look further.

If it doesn't turn out to fit properly, sysfs config variable can be used.

>>Would you recommend we submit a patch for it now or wait till after 
>>delay accounting has gone into
>>2.6.18 ?
>>    
>>
>
>Earlier, please.
>  
>
Ok. will submit asap.

>  
>
>>Such requests for extending the interface are likely to happen as more 
>>users start using the interface.
>>But since any patch will need some testing etc. and we are very close to 
>>the 2.6.18 merge window, I
>>wanted your advice on whether this should wait until later.
>>    
>>
>
>If it's merged, we'll have a couple more months to test it, and to fix any
>little problems.
>  
>
Sounds good.


Thanks,
Shailabh


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-08 18:36           ` Shailabh Nagar
@ 2006-06-08 19:33             ` Balbir Singh
  0 siblings, 0 replies; 166+ messages in thread
From: Balbir Singh @ 2006-06-08 19:33 UTC (permalink / raw)
  To: Shailabh Nagar; +Cc: Andrew Morton, linux-kernel, jlan, peterc

Shailabh Nagar wrote:
>> hm.  Is it possible to check the privileges of a netlink message sender?
>>  
>>
> Not entirely sure. But there's. a check in net/netlink/genetlink.c: 
> genl_rcv_msg()
> for
> if ((ops->flags & GENL_ADMIN_PERM) && security_netlink_recv(skb))
> {    err = -EPERM;
>    goto errout;
> }
> 
> and security_netlink_recv(skb), normally set to cap_netlink_recv, checks 
> on the skb's effective capability
> being CAP_NET_ADMIN which I thought would be sufficient.
> Need to look further.
> 
> If it doesn't turn out to fit properly, sysfs config variable can be used.
>

The genl_ops has a flags field. If the flags field is initialized to
GENL_ADMIN_PERM, then privleges are checked as pointed out by you.
 
-- 

	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-06 22:28   ` Shailabh Nagar
  2006-06-06 22:40     ` Andrew Morton
@ 2006-06-06 22:52     ` Jay Lan
  2006-06-06 22:55       ` Shailabh Nagar
  2006-06-12 12:02       ` Martin Peschke
  1 sibling, 2 replies; 166+ messages in thread
From: Jay Lan @ 2006-06-06 22:52 UTC (permalink / raw)
  To: Shailabh Nagar
  Cc: balbir, Andrew Morton, linux-kernel, Chris Sturtivant,
	Peter Chubb

Shailabh Nagar wrote:
> Balbir Singh wrote:
> 
>>Andrew Morton wrote:
>>
>>
>>>per-task-delay-accounting-setup.patch
>>>per-task-delay-accounting-setup-fix-1.patch
>>>per-task-delay-accounting-setup-fix-2.patch
>>>per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch
>>>
>>>per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch
>>>
>>>per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch
>>>per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch
>>>per-task-delay-accounting-utilities-for-genetlink-usage.patch
>>>per-task-delay-accounting-taskstats-interface.patch
>>>per-task-delay-accounting-taskstats-interface-fix-1.patch
>>>per-task-delay-accounting-taskstats-interface-fix-2.patch
>>>per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch
>>>
>>>per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch
>>>
>>>per-task-delay-accounting-documentation.patch
>>>per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch
>>>
>>>per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch
>>>
>>>
>>> I just don't know.  There are a number of groups who pop up with various
>>> enhanced accounting requirements and patches (all quite different) but I
>>> haven't heard a lot of enthusiasm from any of them over this work, which
>>> attempts to provide an extensible framework for accumulation and
>>>querying
>>> of per-task metrics.
>>>
>>> But then again, we cannot just sit there and wait for everyone to be
>>>100%
>>> happy.  So I'm 51% inclined to push this along.
>>>
>>> Anyone else who has an interest in this sort of thing needs to be aware
>>> that there will be an expectation that any future statistics submissions
>>> should use these interfaces.  So the time to pay attention is right now.
>>>
>>
>>Hi, Andrew,
>>
>>Here is a brief summary of the status of the response we have received from
>>the stakeholders (some of it has been duplicated in previous postings)
>>
>>Project                                         Response
>>
>>1. CSA accounting/PAGG/JOB:                    Has agreed to use taskstats
>> Jay Lan <jlan@engr.sgi.com>                  interface
>>
>>2. per-process IO statistics:                  None
>> Levent Serinol <lserinol@gmail.com>          Needs are subset of CSA
>>
>>3. per-cpu time statistics:                    None (email bounced)
>> Erich Focht <efocht@ess.nec.de>              Needs can be met by taskstats
>>                                              Statistics not yet submitted
>>
>>4. Microstate accounting:                      None
>> Peter Chubb <peterc@gelato.unsw.edu.au>      overlap with delay accounting
>>                                              prefers /proc due to
>>convenience
>>                                              taskstats can meet the needs
>>
>>
>>5. ELSA: Guillaume Thouvenin                   None
>> <guillaume.thouvenin@bull.net>               ELSA is not a direct user
>>                                              of new kernel statistics
>>                                              Consumer of CSA/BSD
>>accounting
>>                                              statistics
>>
>>6. pnotify: Jes Sorensen <jes@sgi.com>         None
>>(taken over pnotify from Erik Jacobson)        Informed over private email
>>                                              that pnotify replacement is
>>                                              being worked on. pnotify
>>                                              or its replacement will
>>                                              not be concerned with
>>                                              exporting data to user space
>>                                              or collecting any statistics.
>>
>>
>>7. Scalable statistics counters with /proc      Not working on it
>> reporting:                                   anymore
>> Ravikiran G Thirumalai,
>> Dipankar Sarma <dipankar@in.ibm.com>
>>
>>Studying the responses from all stake holders, Jay Lan's was the most
>>encouraging. Peter Chubb prefers the /proc interface due to the text
>>interface
>>and ease of parsing. (in our opinion, taskstats can meet the needs easily
>>and the getdelays utility can provide the same ease for parsing).
>>The others did not respond.
>>Some performance numbers of taskstats were posted at
>>http://lkml.org/lkml/2006/3/23/141. The result highlights are included
>>below
>>
>>   Results highlights
>>
>>   - Configuring delay accounting adds < 0.5%
>>     overhead in most cases and even reduces overhead
>>     in some cases
>>
>>   - Enabling delay accounting has similar results
>>     with a maximum overhead of 1.2% for hackbench,
>>     most other overheads < 1% and reduction in
>>     overhead in some cases
>>
>>These statistics are _per task_ and can be extended easily by anyone
>>who wishes to obtain per task data. An example of per task improved
>>scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381
>>(I am not sure if the email refers to our per-task statistics). If not,
>>the new statistics could easily use the taskstats interface.
>>
>>These statistics can be used by software product stacks to monitor
>>usage information about the various tasks they create and control.
>>I also informally spoke to a group of students (verbally), who were
>>excited at the possibility of using the per-task statistics to do
>>dynamic deadline based power management. They want to use the delay data
>>(CPU and IO) to predict deadlines for a task and then use these results
>>for dynamically scaling CPU frequency.
>>
>>
>>The ability to monitor the CPU run and delay data and IO delay data is
>>useful.
>>
>>I would request you to consider the inclusion per-task delay accounting
>>into
>>2.6.18.
>>
> 
> 
> 
> Andrew,
> 
> The only other new set of patches to be discussed in this context are the
> statistics-infrastructure patches from Martin Peschke.
> 
> That infrastructure cannot meet the needs of delay accounting, CSA etc. because
> - it only provides "user pull" model of getting stats whereas "kernel push" is
> needed for delay accounting

Doesn't taskstats interface provide "user pull" request-reply model
also? Serious accounting needs to push accounting data as soon as
possible.

> - it uses a relatively slow interface unsuitable for high volumes of data. Each
> statistic has its own definition, needs to be read separately using ASCII,
> reading data continuously means open/read/close each time.....all of
> which is not very conducive to large structures being sent to userspace.

Yes, i second the point. It won't be able to catch up the traffic.

> - its oriented towards sampled data whereas taskstats isn't.
> 
> So, we have a good consensus from existing/potential users of taskstats and would
> very much appreciate it being included in 2.6.18.

Andrew, it has become clear that the community wants to see accounting
data processing being moved to userspace. Thus there is a need for a
common accounting interface to provide minimal works at kernel (via
hooks at fork and exit) and deliver data to userspace.

The delayacct patchset provides a good framework and example that
i believe CSA/Job can follow and build upon to move most of our work
to userspace and thus cut off dependency of PAGG. We will submit CSA
patch soon based on the taskstats interface.

Thanks,
  - jay

P.S. Balbir and Shailabh, Chris Sturtivant will continue the CSA work
      at SGI. Please also cc Chris <csturtiv@sgi.com> in the future.
      Thanks!


> 
> --Shailabh
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-06 22:52     ` Jay Lan
@ 2006-06-06 22:55       ` Shailabh Nagar
  2006-06-12 12:02       ` Martin Peschke
  1 sibling, 0 replies; 166+ messages in thread
From: Shailabh Nagar @ 2006-06-06 22:55 UTC (permalink / raw)
  To: Jay Lan; +Cc: balbir, Andrew Morton, linux-kernel, Chris Sturtivant,
	Peter Chubb

Jay Lan wrote:
> Shailabh Nagar wrote:
> 
>> Balbir Singh wrote:
>>
>>> Andrew Morton wrote:
>>>
>>>
>>>> per-task-delay-accounting-setup.patch
>>>> per-task-delay-accounting-setup-fix-1.patch
>>>> per-task-delay-accounting-setup-fix-2.patch
>>>> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection.patch
>>>>
>>>>
>>>> per-task-delay-accounting-sync-block-i-o-and-swapin-delay-collection-fix-1.patch
>>>>
>>>>
>>>> per-task-delay-accounting-cpu-delay-collection-via-schedstats.patch
>>>> per-task-delay-accounting-cpu-delay-collection-via-schedstats-fix-1.patch
>>>>
>>>> per-task-delay-accounting-utilities-for-genetlink-usage.patch
>>>> per-task-delay-accounting-taskstats-interface.patch
>>>> per-task-delay-accounting-taskstats-interface-fix-1.patch
>>>> per-task-delay-accounting-taskstats-interface-fix-2.patch
>>>> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface.patch
>>>>
>>>>
>>>> per-task-delay-accounting-delay-accounting-usage-of-taskstats-interface-use-portable-cputime-api-in-__delayacct_add_tsk.patch
>>>>
>>>>
>>>> per-task-delay-accounting-documentation.patch
>>>> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays.patch
>>>>
>>>>
>>>> per-task-delay-accounting-proc-export-of-aggregated-block-i-o-delays-warning-fix.patch
>>>>
>>>>
>>>>
>>>> I just don't know.  There are a number of groups who pop up with
>>>> various
>>>> enhanced accounting requirements and patches (all quite different)
>>>> but I
>>>> haven't heard a lot of enthusiasm from any of them over this work,
>>>> which
>>>> attempts to provide an extensible framework for accumulation and
>>>> querying
>>>> of per-task metrics.
>>>>
>>>> But then again, we cannot just sit there and wait for everyone to be
>>>> 100%
>>>> happy.  So I'm 51% inclined to push this along.
>>>>
>>>> Anyone else who has an interest in this sort of thing needs to be aware
>>>> that there will be an expectation that any future statistics
>>>> submissions
>>>> should use these interfaces.  So the time to pay attention is right
>>>> now.
>>>>
>>>
>>> Hi, Andrew,
>>>
>>> Here is a brief summary of the status of the response we have
>>> received from
>>> the stakeholders (some of it has been duplicated in previous postings)
>>>
>>> Project                                         Response
>>>
>>> 1. CSA accounting/PAGG/JOB:                    Has agreed to use
>>> taskstats
>>> Jay Lan <jlan@engr.sgi.com>                  interface
>>>
>>> 2. per-process IO statistics:                  None
>>> Levent Serinol <lserinol@gmail.com>          Needs are subset of CSA
>>>
>>> 3. per-cpu time statistics:                    None (email bounced)
>>> Erich Focht <efocht@ess.nec.de>              Needs can be met by
>>> taskstats
>>>                                              Statistics not yet
>>> submitted
>>>
>>> 4. Microstate accounting:                      None
>>> Peter Chubb <peterc@gelato.unsw.edu.au>      overlap with delay
>>> accounting
>>>                                              prefers /proc due to
>>> convenience
>>>                                              taskstats can meet the
>>> needs
>>>
>>>
>>> 5. ELSA: Guillaume Thouvenin                   None
>>> <guillaume.thouvenin@bull.net>               ELSA is not a direct user
>>>                                              of new kernel statistics
>>>                                              Consumer of CSA/BSD
>>> accounting
>>>                                              statistics
>>>
>>> 6. pnotify: Jes Sorensen <jes@sgi.com>         None
>>> (taken over pnotify from Erik Jacobson)        Informed over private
>>> email
>>>                                              that pnotify replacement is
>>>                                              being worked on. pnotify
>>>                                              or its replacement will
>>>                                              not be concerned with
>>>                                              exporting data to user
>>> space
>>>                                              or collecting any
>>> statistics.
>>>
>>>
>>> 7. Scalable statistics counters with /proc      Not working on it
>>> reporting:                                   anymore
>>> Ravikiran G Thirumalai,
>>> Dipankar Sarma <dipankar@in.ibm.com>
>>>
>>> Studying the responses from all stake holders, Jay Lan's was the most
>>> encouraging. Peter Chubb prefers the /proc interface due to the text
>>> interface
>>> and ease of parsing. (in our opinion, taskstats can meet the needs
>>> easily
>>> and the getdelays utility can provide the same ease for parsing).
>>> The others did not respond.
>>> Some performance numbers of taskstats were posted at
>>> http://lkml.org/lkml/2006/3/23/141. The result highlights are included
>>> below
>>>
>>>   Results highlights
>>>
>>>   - Configuring delay accounting adds < 0.5%
>>>     overhead in most cases and even reduces overhead
>>>     in some cases
>>>
>>>   - Enabling delay accounting has similar results
>>>     with a maximum overhead of 1.2% for hackbench,
>>>     most other overheads < 1% and reduction in
>>>     overhead in some cases
>>>
>>> These statistics are _per task_ and can be extended easily by anyone
>>> who wishes to obtain per task data. An example of per task improved
>>> scheduler statistics was mentioned in http://lkml.org/lkml/2006/6/1/381
>>> (I am not sure if the email refers to our per-task statistics). If not,
>>> the new statistics could easily use the taskstats interface.
>>>
>>> These statistics can be used by software product stacks to monitor
>>> usage information about the various tasks they create and control.
>>> I also informally spoke to a group of students (verbally), who were
>>> excited at the possibility of using the per-task statistics to do
>>> dynamic deadline based power management. They want to use the delay data
>>> (CPU and IO) to predict deadlines for a task and then use these results
>>> for dynamically scaling CPU frequency.
>>>
>>>
>>> The ability to monitor the CPU run and delay data and IO delay data is
>>> useful.
>>>
>>> I would request you to consider the inclusion per-task delay accounting
>>> into
>>> 2.6.18.
>>>
>>
>>
>>
>> Andrew,
>>
>> The only other new set of patches to be discussed in this context are the
>> statistics-infrastructure patches from Martin Peschke.
>>
>> That infrastructure cannot meet the needs of delay accounting, CSA
>> etc. because
>> - it only provides "user pull" model of getting stats whereas "kernel
>> push" is
>> needed for delay accounting
> 
> 
> Doesn't taskstats interface provide "user pull" request-reply model
> also? Serious accounting needs to push accounting data as soon as
> possible.

Yes, I meant to say "kernel push" is also needed for delay accounting.
So taskstats provides both pull and push whereas statistics infrastructure, on
account of use of fs-based interface, provides only user-pull.
> 
>> - it uses a relatively slow interface unsuitable for high volumes of
>> data. Each
>> statistic has its own definition, needs to be read separately using
>> ASCII,
>> reading data continuously means open/read/close each time.....all of
>> which is not very conducive to large structures being sent to userspace.
> 
> 
> Yes, i second the point. It won't be able to catch up the traffic.
> 
>> - its oriented towards sampled data whereas taskstats isn't.
>>
>> So, we have a good consensus from existing/potential users of
>> taskstats and would
>> very much appreciate it being included in 2.6.18.
> 
> 
> Andrew, it has become clear that the community wants to see accounting
> data processing being moved to userspace. Thus there is a need for a
> common accounting interface to provide minimal works at kernel (via
> hooks at fork and exit) and deliver data to userspace.
> 
> The delayacct patchset provides a good framework and example that
> i believe CSA/Job can follow and build upon to move most of our work
> to userspace and thus cut off dependency of PAGG. We will submit CSA
> patch soon based on the taskstats interface.
> 
> Thanks,
>  - jay
> 
> P.S. Balbir and Shailabh, Chris Sturtivant will continue the CSA work
>      at SGI. Please also cc Chris <csturtiv@sgi.com> in the future.
>      Thanks!

Sure.

Thanks,
Shailabh
> 
> 
>>
>> --Shailabh
>>
>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-06 22:52     ` Jay Lan
  2006-06-06 22:55       ` Shailabh Nagar
@ 2006-06-12 12:02       ` Martin Peschke
  2006-06-12 13:28         ` Shailabh Nagar
  1 sibling, 1 reply; 166+ messages in thread
From: Martin Peschke @ 2006-06-12 12:02 UTC (permalink / raw)
  To: Jay Lan
  Cc: Shailabh Nagar, balbir, Andrew Morton, linux-kernel,
	Chris Sturtivant, Peter Chubb

Jay Lan wrote:
> Shailabh Nagar wrote:
>> Balbir Singh wrote:

>> Andrew,
>>
>> The only other new set of patches to be discussed in this context are the
>> statistics-infrastructure patches from Martin Peschke.
>>
>> That infrastructure cannot meet the needs of delay accounting, CSA 
>> etc. because
>> - it only provides "user pull" model of getting stats whereas "kernel 
>> push" is
>> needed for delay accounting
> 
> Doesn't taskstats interface provide "user pull" request-reply model
> also? Serious accounting needs to push accounting data as soon as
> possible.
> 
>> - it uses a relatively slow interface unsuitable for high volumes of 
>> data. 

By design.

I think it would be fatal to report every event relevant
to statistical data gathering up to user space. It's fine to have
the kernel maintain counters and to provide preprocessed data.

Given that, is there a need for a high-speed interface for a
huge amount of unprocessed statistical data?

However, the user interface is a just one building brick,
which can be enhanced or replaced with moderate effort, if
there is a need.

 >> Each statistic has its own definition,

Allowing users to restrict accounting to what they need in their
particular case. Sensible defaults are usually available.

>> needs to be read separately using ASCII,
>> reading data continuously means open/read/close each time.....all of
>> which is not very conducive to large structures being sent to userspace.

Debugfs file are fine for larger structures.

Unless one keeps reading statistics dozens of times per second,
I don't see an issue with that.

The question is: what are the requirements to be covered?

> Yes, i second the point. It won't be able to catch up the traffic.
> 
>> - its oriented towards sampled data whereas taskstats isn't.
>>
>> So, we have a good consensus from existing/potential users of 
>> taskstats and would
>> very much appreciate it being included in 2.6.18.
> 
> Andrew, it has become clear that the community wants to see accounting
> data processing being moved to userspace. Thus there is a need for a
> common accounting interface to provide minimal works at kernel (via
> hooks at fork and exit) and deliver data to userspace.

Both, the statistics infrastructure on behalf of its exploiters as well
as the exploiters of the taskstats interface do data preprocessing,
that is, maintain counters in the kernel.
User space counters won't perform, of course.

AFAICS, actual differences are:
- triggers for data delivery to user space
   (statistics infrastructure: when user reads statistics through file,
    taskstats: on certain task related events, right?)
- and, therewith, frequency of data delivery to user space

Martin

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans)
  2006-06-12 12:02       ` Martin Peschke
@ 2006-06-12 13:28         ` Shailabh Nagar
  0 siblings, 0 replies; 166+ messages in thread
From: Shailabh Nagar @ 2006-06-12 13:28 UTC (permalink / raw)
  To: Martin Peschke
  Cc: Jay Lan, balbir, Andrew Morton, linux-kernel, Chris Sturtivant,
	Peter Chubb

Martin Peschke wrote:

> Jay Lan wrote:
>
>> Shailabh Nagar wrote:
>>
>>> Balbir Singh wrote:
>>
>
>>> Andrew,
>>>
>>> The only other new set of patches to be discussed in this context 
>>> are the
>>> statistics-infrastructure patches from Martin Peschke.
>>>
>>> That infrastructure cannot meet the needs of delay accounting, CSA 
>>> etc. because
>>> - it only provides "user pull" model of getting stats whereas 
>>> "kernel push" is
>>> needed for delay accounting
>>
>>
>> Doesn't taskstats interface provide "user pull" request-reply model
>> also? Serious accounting needs to push accounting data as soon as
>> possible.
>>
>>> - it uses a relatively slow interface unsuitable for high volumes of 
>>> data. 
>>
>
> By design.
>
> I think it would be fatal to report every event relevant
> to statistical data gathering up to user space. It's fine to have
> the kernel maintain counters and to provide preprocessed data.
>
> Given that, is there a need for a high-speed interface for a
> huge amount of unprocessed statistical data?

Broadly speaking, yes. Inserting policy into the kernel will no doubt
save the data being sent to userspace but also
- limits flexibility of what userspace can do with it and
- adds to the kernel code base unnecessarily

Specifically for taskstats, there is a need for a high-speed interface
because of the potential volume of data resulting from
- large number of tasks in a single kernel
- high frequency of task exits

Some kernel-based preprocessing, such as per-tgid aggregation, can
help cut down the volume but as long as we have a need for getting per-task
data, an efficient interface will matter, atleast for our needs.

>
> However, the user interface is a just one building brick,
> which can be enhanced or replaced with moderate effort, if
> there is a need.

True. This is not to suggest statistical infrastructure's interface 
choice isn't correct..
just that its not enough for the needs we seek to serve.

A filesystem based interface has plenty of usability benefits so its 
primarily a question
of which stats you want to export using its interface.

>
> >> Each statistic has its own definition,
>
> Allowing users to restrict accounting to what they need in their
> particular case. Sensible defaults are usually available.
>
>>> needs to be read separately using ASCII,
>>> reading data continuously means open/read/close each time.....all of
>>> which is not very conducive to large structures being sent to 
>>> userspace.
>>
>
> Debugfs file are fine for larger structures.
>
> Unless one keeps reading statistics dozens of times per second,
> I don't see an issue with that.

Since delay accounting stats can be exploited for resource management at 
user
space, if one wants to get data for all tasks/processes periodically, it 
could add up to a fairly
high demand for user<->kernel bandwidth even if the frequency need for 
reading one task's
stats isn't  that high. Its a question of scalability as number of tasks 
increase.

For systemwide stats, your point is well taken...unlikely to be an 
issue, unless of course, one
needs to read lots of them.

>
> The question is: what are the requirements to be covered?

Yup...I think the infrastructures are serving differing needs.

>
>> Yes, i second the point. It won't be able to catch up the traffic.
>>
>>> - its oriented towards sampled data whereas taskstats isn't.
>>>
>>> So, we have a good consensus from existing/potential users of 
>>> taskstats and would
>>> very much appreciate it being included in 2.6.18.
>>
>>
>> Andrew, it has become clear that the community wants to see accounting
>> data processing being moved to userspace. Thus there is a need for a
>> common accounting interface to provide minimal works at kernel (via
>> hooks at fork and exit) and deliver data to userspace.
>
>
> Both, the statistics infrastructure on behalf of its exploiters as well
> as the exploiters of the taskstats interface do data preprocessing,
> that is, maintain counters in the kernel.
> User space counters won't perform, of course.
>
> AFAICS, actual differences are:
> - triggers for data delivery to user space
>   (statistics infrastructure: when user reads statistics through file,
>    taskstats: on certain task related events, right?)

Taskstats data delivery is triggered by
- user asking for data (i.e. akin to reading through a file, only done 
through a command-response interface)
- on task exit event

> - and, therewith, frequency of data delivery to user space
>
>
> Martin


--Shailabh


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm pi-futex merge
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (15 preceding siblings ...)
  2006-06-06  0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh
@ 2006-06-06 12:32 ` Steven Rostedt
  2006-06-06 13:34   ` Roman Zippel
  2006-06-06 14:42 ` genirq Ingo Molnar
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 166+ messages in thread
From: Steven Rostedt @ 2006-06-06 12:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote:

> pi-futex-futex-code-cleanups.patch
> pi-futex-robust-futex-docs-fix.patch
> pi-futex-introduce-debug_check_no_locks_freed.patch
> pi-futex-introduce-warn_on_smp.patch
> pi-futex-add-plist-implementation.patch
> pi-futex-scheduler-support-for-pi.patch
> pi-futex-rt-mutex-core.patch
> pi-futex-rt-mutex-docs.patch
> pi-futex-rt-mutex-docs-update.patch
> pi-futex-rt-mutex-debug.patch
> pi-futex-rt-mutex-tester.patch
> pi-futex-rt-mutex-futex-api.patch
> pi-futex-futex_lock_pi-futex_unlock_pi-support.patch
> #
> futex_requeue-optimization.patch
> 
>  Priority-inheriting futexes.  I don't have a clue how this code works,
>  but it sure has a lot of trylocks for something which allegedly works. 
>  Will merge.

Andrew, I wrote the rt-mutex-design.txt just so you would have a clue :)

If you have any questions, I would be happy to update it to make it
clearer.

-- Steve



^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm pi-futex merge
  2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt
@ 2006-06-06 13:34   ` Roman Zippel
  2006-06-06 13:44     ` Steven Rostedt
  0 siblings, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-06 13:34 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Andrew Morton, linux-kernel, tglx

Hi,

On Tue, 6 Jun 2006, Steven Rostedt wrote:

> On Sun, 2006-06-04 at 13:50 -0700, Andrew Morton wrote:
> 
> > pi-futex-futex-code-cleanups.patch
> > pi-futex-robust-futex-docs-fix.patch
> > pi-futex-introduce-debug_check_no_locks_freed.patch
> > pi-futex-introduce-warn_on_smp.patch
> > pi-futex-add-plist-implementation.patch
> > pi-futex-scheduler-support-for-pi.patch
> > pi-futex-rt-mutex-core.patch
> > pi-futex-rt-mutex-docs.patch
> > pi-futex-rt-mutex-docs-update.patch
> > pi-futex-rt-mutex-debug.patch
> > pi-futex-rt-mutex-tester.patch
> > pi-futex-rt-mutex-futex-api.patch
> > pi-futex-futex_lock_pi-futex_unlock_pi-support.patch
> > #
> > futex_requeue-optimization.patch
> > 
> >  Priority-inheriting futexes.  I don't have a clue how this code works,
> >  but it sure has a lot of trylocks for something which allegedly works. 
> >  Will merge.

Please also include the patch below to fix defaults and dependencies. 
Thomas, could you please also provide a little more verbose help text?
BTW what's the correct spelling - RT Mutex, rt mutex or rt-mutex?

bye, Roman


[PATCH] fix rt-mutex defaults and dependencies

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>

---

 lib/Kconfig.debug |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Index: linux-2.6-mm/lib/Kconfig.debug
===================================================================
--- linux-2.6-mm.orig/lib/Kconfig.debug	2006-06-06 15:24:45.000000000 +0200
+++ linux-2.6-mm/lib/Kconfig.debug	2006-06-06 15:25:30.000000000 +0200
@@ -158,7 +158,6 @@ config DEBUG_MUTEX_DEADLOCKS
 
 config DEBUG_RT_MUTEXES
 	bool "RT Mutex debugging, deadlock detection"
-	default y
 	depends on DEBUG_KERNEL && RT_MUTEXES
 	help
 	 This allows rt mutex semantics violations and rt mutex related
@@ -171,8 +170,7 @@ config DEBUG_PI_LIST
 
 config RT_MUTEX_TESTER
 	bool "Built-in scriptable tester for rt-mutexes"
-	depends on RT_MUTEXES
-	default n
+	depends on DEBUG_KERNEL && RT_MUTEXES
 	help
 	  This option enables a rt-mutex tester.
 

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm pi-futex merge
  2006-06-06 13:34   ` Roman Zippel
@ 2006-06-06 13:44     ` Steven Rostedt
  0 siblings, 0 replies; 166+ messages in thread
From: Steven Rostedt @ 2006-06-06 13:44 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Andrew Morton, linux-kernel, tglx

On Tue, 2006-06-06 at 15:34 +0200, Roman Zippel wrote:

> BTW what's the correct spelling - RT Mutex, rt mutex or rt-mutex?

I'd recommend "RT Mutex" for when it is in titles (as it is now) and
rt-mutex when explaining the code.  IMO "rt mutex" is just wrong.

-- Steve



^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: genirq
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (16 preceding siblings ...)
  2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt
@ 2006-06-06 14:42 ` Ingo Molnar
  2006-06-06 16:56   ` genirq Daniel Walker
  2006-06-07  3:46   ` genirq Benjamin Herrenschmidt
  2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar
                   ` (2 subsequent siblings)
  20 siblings, 2 replies; 166+ messages in thread
From: Ingo Molnar @ 2006-06-06 14:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Thomas Gleixner


* Andrew Morton <akpm@osdl.org> wrote:

> genirq-rename-desc-handler-to-desc-chip.patch
> genirq-rename-desc-handler-to-desc-chip-power-fix.patch
> genirq-rename-desc-handler-to-desc-chip-ia64-fix.patch
> genirq-rename-desc-handler-to-desc-chip-ia64-fix-2.patch
> genirq-sem2mutex-probe_sem-probing_active.patch
> genirq-cleanup-merge-irq_affinity-into-irq_desc.patch
> genirq-cleanup-remove-irq_descp.patch
> genirq-cleanup-remove-irq_descp-fix.patch
> genirq-cleanup-remove-fastcall.patch
> genirq-cleanup-misc-code-cleanups.patch
> genirq-cleanup-reduce-irq_desc_t-use-mark-it-obsolete.patch
> genirq-cleanup-include-linux-irqh.patch
> genirq-cleanup-merge-irq_dir-smp_affinity_entry-into-irq_desc.patch
> genirq-cleanup-merge-pending_irq_cpumask-into-irq_desc.patch
> genirq-cleanup-turn-arch_has_irq_per_cpu-into-config_irq_per_cpu.patch
> genirq-debug-better-debug-printout-in-enable_irq.patch
> genirq-add-retrigger-irq-op-to-consolidate-hw_irq_resend.patch
> genirq-doc-comment-include-linux-irqh-structures.patch
> genirq-doc-handle_irq_event-and-__do_irq-comments.patch
> genirq-cleanup-no_irq_type-cleanups.patch
> genirq-doc-add-design-documentation.patch
> genirq-add-genirq-sw-irq-retrigger.patch
> genirq-add-irq_noprobe-support.patch
> genirq-add-irq_norequest-support.patch
> genirq-add-irq_noautoen-support.patch
> genirq-update-copyrights.patch
> genirq-core.patch
> genirq-msi-fixes-2.patch
> genirq-add-irq-chip-support.patch
> genirq-add-irq-chip-support-fix.patch
> genirq-add-handle_bad_irq.patch
> genirq-add-irq-wake-power-management-support.patch
> genirq-add-sa_trigger-support.patch
> genirq-cleanup-no_irq_type-no_irq_chip-rename.patch
> genirq-convert-the-x86_64-architecture-to-irq-chips.patch
> genirq-convert-the-i386-architecture-to-irq-chips.patch
> genirq-convert-the-i386-architecture-to-irq-chips-fix-2.patch
> genirq-more-verbose-debugging-on-unexpected-irq-vectors.patch
> genirq-add-chip-eoi-fastack-fasteoi.patch
> genirq-add-chip-eoi-fastack-fasteoi-fix.patch
> 
>  Still stabilising.  It's looking more like 2.6.19 material.  Needs 
>  more review from arch maintainers too.

there hasnt been any real problem since the MSI one. The core bits are 
rather stable. The patch-queue had positive input from the maintainers 
of the two architectures with the most complex IRQ hardware (arm and 
ppc*), and that's reassuring. But in any case, other architectures are 
not affected at all (sans brow paperbag build bugs and typos), their 
__do_IRQ() handling remains unchanged. So i'd like to see this in 
2.6.18. (there a good deal of stuff we have ontop of genirq)

(the irqpoll discussions are unrelated to genirq - they are fixes for an 
irqpoll problem that the lock validator uncovered, and naturally those 
patches were ontop of genirq.)

	Ingo

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: genirq
  2006-06-06 14:42 ` genirq Ingo Molnar
@ 2006-06-06 16:56   ` Daniel Walker
  2006-06-07  8:42     ` genirq Ingo Molnar
  2006-06-07  3:46   ` genirq Benjamin Herrenschmidt
  1 sibling, 1 reply; 166+ messages in thread
From: Daniel Walker @ 2006-06-06 16:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner

On Tue, 2006-06-06 at 16:42 +0200, Ingo Molnar wrote:

> there hasnt been any real problem since the MSI one. The core bits are 
> rather stable. The patch-queue had positive input from the maintainers 
> of the two architectures with the most complex IRQ hardware (arm and 
> ppc*), and that's reassuring. But in any case, other architectures are 
> not affected at all (sans brow paperbag build bugs and typos), their 
> __do_IRQ() handling remains unchanged. So i'd like to see this in 
> 2.6.18. (there a good deal of stuff we have ontop of genirq)

There was a problem reported by Kevin Hillman , the -v5 version was not
functional on ARM omap boards .. Was that handled already in -v6?

Daniel


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: genirq
  2006-06-06 16:56   ` genirq Daniel Walker
@ 2006-06-07  8:42     ` Ingo Molnar
  0 siblings, 0 replies; 166+ messages in thread
From: Ingo Molnar @ 2006-06-07  8:42 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner


* Daniel Walker <dwalker@mvista.com> wrote:

> On Tue, 2006-06-06 at 16:42 +0200, Ingo Molnar wrote:
> 
> > there hasnt been any real problem since the MSI one. The core bits are 
> > rather stable. The patch-queue had positive input from the maintainers 
> > of the two architectures with the most complex IRQ hardware (arm and 
> > ppc*), and that's reassuring. But in any case, other architectures are 
> > not affected at all (sans brow paperbag build bugs and typos), their 
> > __do_IRQ() handling remains unchanged. So i'd like to see this in 
> > 2.6.18. (there a good deal of stuff we have ontop of genirq)
> 
> There was a problem reported by Kevin Hillman , the -v5 version was 
> not functional on ARM omap boards .. Was that handled already in -v6?

Daniel - you should be aware that the -mm genirq lineup does _not_ 
include the ARM bits. Those changes go via the normal ARM QA and merge 
path, i.e. via rmk. The -mm lineup only includes the generic bits (for 
type-based platforms). In any case, they dont impact upstream genirq 
merging plans.

(The omap thing in the armirq queue is likely some small thing. Thomas 
is checking it.)

	Ingo

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: genirq
  2006-06-06 14:42 ` genirq Ingo Molnar
  2006-06-06 16:56   ` genirq Daniel Walker
@ 2006-06-07  3:46   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 166+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-07  3:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner

> >  Still stabilising.  It's looking more like 2.6.19 material.  Needs 
> >  more review from arch maintainers too.
> 
> there hasnt been any real problem since the MSI one. The core bits are 
> rather stable. The patch-queue had positive input from the maintainers 
> of the two architectures with the most complex IRQ hardware (arm and 
> ppc*), and that's reassuring. But in any case, other architectures are 
> not affected at all (sans brow paperbag build bugs and typos), their 
> __do_IRQ() handling remains unchanged. So i'd like to see this in 
> 2.6.18. (there a good deal of stuff we have ontop of genirq)
> 
> (the irqpoll discussions are unrelated to genirq - they are fixes for an 
> irqpoll problem that the lock validator uncovered, and naturally those 
> patches were ontop of genirq.)

I vote for genirq inclusion in 2.6.18 too. I'm almost finishing porting
powerpc over to it and so far it looks good. In addition, I'm pretty
confident the patches have a very low impact (if at all) on archs that
haven't been ported over (the old mecanism is still there mostly
untouched). 

In addition, I'm doing some fairly heavy rework of some of the powerpc
irq management that is based on top of the genirq port and I'd really
want it in 2.6.18...

Finally, we are doing some crash-work on MSI (trying to get some basic
support for powerpc separate from the current unuseable
drivers/pci/msi.c) so we can at least get something working in 2.6.18
and that too will be based on my work mentioned above.

So as far as I'm concerned, genirq is pretty important to have in right
at the beginning.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (17 preceding siblings ...)
  2006-06-06 14:42 ` genirq Ingo Molnar
@ 2006-06-06 14:53 ` Ingo Molnar
  2006-06-06 16:02   ` Andrew Morton
  2006-06-07  3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt
  2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig
  20 siblings, 1 reply; 166+ messages in thread
From: Ingo Molnar @ 2006-06-06 14:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Arjan van de Ven


* Andrew Morton <akpm@osdl.org> wrote:

> lock-validator-floppyc-irq-release-fix.patch
> lock-validator-floppyc-irq-release-fix-fix.patch
> lock-validator-forcedethc-fix.patch
> lock-validator-mutex-section-binutils-workaround.patch
> lock-validator-add-__module_address-method.patch
> lock-validator-better-lock-debugging.patch
> lock-validator-locking-api-self-tests.patch
> lock-validator-locking-api-self-tests-self-test-fix.patch
> lock-validator-locking-init-debugging-improvement.patch
> lock-validator-beautify-x86_64-stacktraces.patch
> lock-validator-beautify-x86_64-stacktraces-fix.patch
> lock-validator-beautify-x86_64-stacktraces-fix-2.patch
> lock-validator-beautify-x86_64-stacktraces-fix-3.patch
> lock-validator-beautify-x86_64-stacktraces-fix-4.patch
> lock-validator-x86_64-document-stack-frame-internals.patch
> lock-validator-stacktrace.patch
> lock-validator-stacktrace-build-fix.patch
> lock-validator-stacktrace-warning-fix.patch
> lock-validator-stacktrace-fix-on-x86_64.patch
> lock-validator-fown-locking-workaround.patch
> lock-validator-sk_callback_lock-workaround.patch
> lock-validator-irqtrace-core.patch
> lock-validator-irqtrace-core-powerpc-fix-1.patch
> lock-validator-irqtrace-core-non-x86-fix.patch
> lock-validator-irqtrace-core-non-x86-fix-2.patch
> lock-validator-irqtrace-core-non-x86-fix-3.patch
> lock-validator-irqtrace-entrys-fix.patch
> lock-validator-irqtrace-core-remove-softirqc-warn_on.patch
> lock-validator-irqtrace-cleanup-include-asm-i386-irqflagsh.patch
> lock-validator-irqtrace-cleanup-include-asm-x86_64-irqflagsh.patch
> lock-validator-x86_64-irqflags-trace-entrys-fix.patch
> lock-validator-lockdep-add-local_irq_enable_in_hardirq-api.patch
> lock-validator-add-per_cpu_offset.patch
> lock-validator-add-per_cpu_offset-fix.patch
> lock-validator-core.patch
> lock-validator-core-early_boot_irqs_-build-fix.patch
> lock-validator-core-fix-compiler-warning.patch
> lock-validator-procfs.patch
> lock-validator-core-multichar-fix.patch
> lock-validator-core-count_matching_names-fix.patch
> lock-validator-design-docs.patch
> lock-validator-prove-rwsem-locking-correctness.patch
> lock-validator-prove-rwsem-locking-correctness-fix.patch
> lock-validator-prove-rwsem-locking-correctness-powerpc-fix.patch
> lock-validator-prove-spinlock-rwlock-locking-correctness.patch
> lock-validator-prove-mutex-locking-correctness.patch
> lock-validator-prove-mutex-locking-correctness-fix-null-type-name-bug.patch
> lock-validator-print-all-lock-types-on-sysrq-d.patch
> lock-validator-x86_64-early-init.patch
> lock-validator-smp-alternatives-workaround.patch
> lock-validator-do-not-recurse-in-printk.patch
> lock-validator-disable-nmi-watchdog-if-config_lockdep.patch
> lock-validator-disable-nmi-watchdog-if-config_lockdep-i386.patch
> lock-validator-disable-nmi-watchdog-if-config_lockdep-x86_64.patch
> lock-validator-special-locking-bdev.patch
> lock-validator-special-locking-direct-io.patch
> lock-validator-special-locking-serial.patch
> lock-validator-special-locking-serial-fix.patch
> lock-validator-special-locking-dcache.patch
> lock-validator-special-locking-i_mutex.patch
> lock-validator-special-locking-s_lock.patch
> lock-validator-special-locking-futex.patch
> lock-validator-special-locking-genirq.patch
> lock-validator-special-locking-completions.patch
> lock-validator-special-locking-waitqueues.patch
> lock-validator-special-locking-mm.patch
> lock-validator-special-locking-serio.patch
> lock-validator-special-locking-slab.patch
> lock-validator-special-locking-skb_queue_head_init.patch
> lock-validator-special-locking-net-ipv4-igmpcpatch.patch
> lock-validator-special-locking-net-ipv4-igmpc-2.patch
> lock-validator-special-locking-timerc.patch
> lock-validator-special-locking-schedc.patch
> lock-validator-special-locking-hrtimerc.patch
> lock-validator-special-locking-sock_lock_init.patch
> lock-validator-special-locking-af_unix.patch
> lock-validator-special-locking-bh_lock_sock.patch
> lock-validator-special-locking-mmap_sem.patch
> lock-validator-special-locking-sb-s_umount.patch
> lock-validator-special-locking-sb-s_umount-fix.patch
> lock-validator-special-locking-sb-s_umount-2.patch
> lock-validator-special-locking-sb-s_umount-2-fix.patch
> lockdep-annotate-rpc_populate-for.patch
> lock-validator-special-locking-jbd.patch
> lock-validator-special-locking-posix-timers.patch
> lock-validator-special-locking-sch_genericc.patch
> lock-validator-special-locking-xfrm.patch
> lockdep-add-i_mutex-ordering-annotations-to-the-sunrpc.patch
> lockdep-add-parent-child-annotations-to-usbfs.patch
> lock-validator-special-locking-sound-core-seq-seq_portsc.patch
> lock-validator-special-locking-sound-core-seq-seq_devicec.patch
> lock-validator-special-locking-sound-core-seq-seq_devicec-fix.patch
> lock-validator-fix-rt_hash_lock_sz.patch
> lock-validator-introduce-irq__lockdep.patch
> locking-validator-special-rule-8390c-disable_irq.patch
> locking-validator-special-rule-3c59xc-disable_irq.patch
> lock-validator-enable-lock-validator-in-kconfig.patch
> lock-validator-enable-lock-validator-in-kconfig-require-trace_irqflags_support.patch
> lock-validator-enable-lock-validator-in-kconfig-not-yet.patch
> lockdep-one-stacktrace-column-if-config_lockdep=y.patch
> i386-remove-multi-entry-backtraces.patch
> lockdep-further-improve-stacktrace-output.patch
> lock-validator-irqtrace-support-non-x86-architectures.patch
> lock-validator-disable-oprofile-if-lockdep=y.patch
> lock-validator-select-kallsyms_all.patch
> 
>  I'm not really sure that this has as good a bugfixes/effort ratio as 
>  would, say, working on our ever-growing bugzilla list.

well, the two sets of bugs are pretty much disjunct. Deadlocks that 
trigger (and produce an NMI watchdog output) are easy to fix. But the 
overwhelming majority of the deadlocks the lock validator found were not 
actually triggered.

>  But given that it exists, and that it'll fix (or rather prevent) 
>  future bugs at a constant-but-low rate for a long time, I guess it's 
>  something we want.
> 
>  I think it's more like 2.6.19 material.  The number of 
>  teach-lockdep-about-this-unusual-but-correct-locking-code patches 
>  continues to grow and I don't think we fully have a handle on how 
>  it'll all end up looking.

the biggest proportion of fixlets were due to out-of-order unlocking, 
which i took care of with CONFIG_DEBUG_NON_NESTED_UNLOCKS. Note that 
most of those annotations are trivial, and i think we've now got most of 
them. Also, those annotations are definitely useful in documenting 
"unusual" locking sequences - and we very much want to document the 
locking details of Linux. Also note that for example the 
local_irq_enable_in_hardirq() annotation found at least one real 
deadlock as well.

So unless something unexpected happens in -mm, i'd like to see this 
merged into 2.6.18 too.

	Ingo

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar
@ 2006-06-06 16:02   ` Andrew Morton
  2006-06-06 16:35     ` Arjan van de Ven
  2006-06-06 20:47     ` lock validator [2.6.18 -mm merge plans] Ingo Molnar
  0 siblings, 2 replies; 166+ messages in thread
From: Andrew Morton @ 2006-06-06 16:02 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan

On Tue, 6 Jun 2006 16:53:37 +0200
Ingo Molnar <mingo@elte.hu> wrote:

>
> [ lockdep ]
>
> So unless something unexpected happens in -mm, i'd like to see this 
> merged into 2.6.18 too.

Well, we _could_, and I guess that we'd get things acceptably sorted out in
time for release.  But it'll be pretty chaotic and we don't want chaos
happening in Linus's tree.

I don't think there's any rush here - the code is only now reaching
sort-of-ready-for-mm status.  And..

- I think we still have a problem with the raid/bdev changes in
  block_dev.c.

- the changes to block_dev.c _do_ impact non-lockdep kernels

- we need to take a second look to see which other
  dont-affect-non-lockdep-kernels patches are in fact affecting non-lockdep
  kernels

- the changes to block_dev.c were pretty awful anyway

- did the various review comments I sent get disposed of in some fashion?

My overarching concern is the rate at which false-positive workaround
patches are piling up.  At some point we need to step back and decide
whether the goodness justifies the badness.  I expect we'll be OK, but I
don't think we're yet in a position to know that for sure.

(I'm actually quite surprised at how few real bugs this checker has
revealed.  We must rock, or something).

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-06 16:02   ` Andrew Morton
@ 2006-06-06 16:35     ` Arjan van de Ven
  2006-06-06 20:47     ` lock validator [2.6.18 -mm merge plans] Ingo Molnar
  1 sibling, 0 replies; 166+ messages in thread
From: Arjan van de Ven @ 2006-06-06 16:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, linux-kernel

On Tue, 2006-06-06 at 09:02 -0700, Andrew Morton wrote:
> 
> (I'm actually quite surprised at how few real bugs this checker has
> revealed.  We must rock, or something).
> 

in part that is because we sent fixes (or bugreports) to the various
maintainers for issues that we found already during the development;
over the last few months that is. 


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: lock validator [2.6.18 -mm merge plans]
  2006-06-06 16:02   ` Andrew Morton
  2006-06-06 16:35     ` Arjan van de Ven
@ 2006-06-06 20:47     ` Ingo Molnar
  1 sibling, 0 replies; 166+ messages in thread
From: Ingo Molnar @ 2006-06-06 20:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, arjan

* Andrew Morton <akpm@osdl.org> wrote:

> - I think we still have a problem with the raid/bdev changes in
>   block_dev.c.

> - the changes to block_dev.c _do_ impact non-lockdep kernels

yes, there are a few more functions that do things explicitly. If you 
worry about the stack footprint, there should be little if any impact: 
the stack footprint you looked at yesterday was on a kernel that 
included patches that are not in -mm (the -fno-sibling-calls patch) and 
the lockdep tracer (-pg) which both increase stack footprint.

> - we need to take a second look to see which other
>   dont-affect-non-lockdep-kernels patches are in fact affecting 
>   non-lockdep kernels

there should be no change to the logic of non-lockdep kernels. There 
might be some minimal impact to the code built. Wanna have an explicit 
list of what the affected functions are? [but unless you think it causes 
problems i dont think we should worry about it - we do changes all the 
time that affect the generated code but dont affect the logic. The 
inlining overhead thing is a red herring i believe.]

> - the changes to block_dev.c were pretty awful anyway

yeah - but it just matches the code there. I'll think about finding 
better ways.

> - did the various review comments I sent get disposed of in some 
>   fashion?

they were not forgotten at all (and there are others too who have sent 
feedback), they are next on my list.

> My overarching concern is the rate at which false-positive workaround 
> patches are piling up. [...]

i feel a bit frustrated that my arguments regarding these "false 
positives" remain apparently unanswered. I expressed it lots of times 
that i find most of the semantics restrictions a necessary step towards 
having a more robust kernel. The restrictions are:

 - unordered unlocks. I still think we want to document all of them. 
   They pointed to real bugs multiple times. They pointed to suboptimal 
   code. There has been _one_ case so far that i'd declare a true false 
   positive. Nevertheless i gave up resistance and implemented the 
   CONFIG_DEBUG_NON_NESTED_UNLOCKS (default-off) option, which makes 
   these messages totally voluntary.

 - stealth locking via disable_irq(). We know that this is both wrong 
   and dangerous. It's wrong because it affects all other handlers on 
   the same IRQ line, for a possibly long period of time. This also 
   uncovered the real deadlock and design flaw related to irqpoll.

 - nested locking of the same lock-type. These are not broken,
   nevertheless it's useful to extend validation to these categories too 
   - it found real bugs (about 5) in the networking code for example - 
   some of them were in core networking code. The majority of these are 
   related to i_mutex: here too i think we want to document all the
   locking rules.

(and there are some other restrictions too, which i started enforcing 
via earlier cleanups of the locking code. As i mentioned in the big 
mutex flamewar^H^H^H^Hdiscussion, restriction of semantics to a natural 
model is what i believe leads to a kernel that can be trusted more.)

> (I'm actually quite surprised at how few real bugs this checker has 
> revealed.  We must rock, or something).

The number of deadlocks found was actually much higher (and happened 
much sooner) than i expected. I expected there to be less than 10 bugs 
left, which would trickle in the timeframe of weeks. (because we thought 
we covered alot of code in our testing) What happened is that we are 
well above 10 deadlocks found so far, in just a few days.

The focus of the validator is on _new code_. Lock dependencies can now 
reach near-perfect quality almost immediately, while they needed to sit 
in the kernel for many months before. (and even then the validator found 
deadlock bugs that were years old) The fact that we now check and 
document the existing and pretty well-tested kernel too is just an added 
bonus.

Also, the validator was in the works for months and we fixed a bunch of 
bugs before we posted the patches. In fact in the past month it was 
based on -mm and was tested on an allyesconfig bzImage bootup which 
further decreased the rate of detection. I guess i should quote Davem's 
blog:

   http://vger.kernel.org/~davem/cgi-bin/blog.cgi

  "[...] I've known about this for some time, because as he was writing 
   it Ingo passed along some networking locking bugs that this sucker 
   has found. "

The networking code is the subsystem with probably the best locking 
design in place. It's nearly half a million lines of code (not counting 
drivers) and needed only about 5 annotations so far.

Another thing that also reduced the number of deadlock bugs is the 
effect of the -rt kernel's deadlock checker and agressive preemption 
model: we found dozens of deadlocks there too. The -rt kernel's deadlock 
checker covers all lock types too. (while the upstream kernel only 
covered about 50% of all locking APIs via in-situ deadlock checking.)

So there has been alot of focus on the locking APIs in the past year or 
so, so dont be surprised that the quality of locking in existing kernel 
code isnt all that bad.

Regarding annotations, my current estimation is that we'll have at most 
~0.2% rate of explicit annotations (== 'false positives'), which with 
the ~50,000 locking APIs will be at most 100 places.

The deadlock rate for newly released locking code is at least 0.5%-2%, 
and we introduce about 2000 new locking API uses per kernel release 
(about 5% of all the existing locking code in the kernel), which means 
the validator can find 10-40 new deadlocks per kernel release, at the 
price of 2 new annotations. (where 90% of the annotations are trivial, 
and the rest is only difficult because no-one knows/remembers the 
locking rules anymore ...) Furthermore, chances are that problematic 
locking constructs will be introduced with a lower probability due to 
the validator. Also, code around existing annotations could be cleaned 
up too. (as it happened in a number of cases already) So maybe the 
annotation rate will go down as well.

Furtermore, untold amount of developer time is wasted on finding 
deadlocks. The fresher the code is, the more likely it is that it has a 
deadlock bug. The bug rate of lock uses could be as high as 10% in 
totally new code. With the validator, many of these bugs will be found 
_much_ earlier, improving productivity - and putting less crap into your 
tree! That will also mean that we wont see most of those bugs.

Plus Linux support engineers at sw/hw vendors are spending significant 
amount of time fixing deadlocks. This has the added twist that changing 
locking code is always risky in a product, and it's not bad to have some 
good proof in place that the code they trigger during re-QA is actually 
correct in terms of locking dependencies.

Users also get totally frustrated at deadlocks. If something doesnt work 
as expected that's an usually an annoyance, but if the box locks up 
totally which necessiates the destruction of its current volatile data 
(its memory, via a hard reset) that's a complete and immediate 
showstopper. If you look at the kind of bugs that annoy users most 
you'll find 'lockups' really high on the list.

The validator found bugs in my _own code_ that i thought to be 
production quality, and i thought i can write correct locking code. We 
should really let the computer do this job.

	Ingo

^ permalink raw reply	[flat|nested] 166+ messages in thread

* mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (18 preceding siblings ...)
  2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar
@ 2006-06-07  3:52 ` Benjamin Herrenschmidt
  2006-06-07  4:29   ` Andrew Morton
  2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig
  20 siblings, 1 reply; 166+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-07  3:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Ingo Molnar

> work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
> kernel-kernel-cpuc-to-mutexes.patch
> 
>  ug.  We cannot convert the cpu.c semaphore into a mutex until we work out
>  why power4 goes titsup if you enable local interrupts during boot.

What is the exact problem ? Some mutex is forcing local irqs enabled
before init_IRQ() ? (Before the normal enabling of IRQ done by
init/main.c just after init_IRQ() more precisely ?)

This is bad for any architecture. Basically, at this point, the
interrupt controller can be in _any_ state, with possible pending
interrupts for whatever sources, etc...

As we discussed before, that problem should really be fixed in the mutex
code by not hard-enabling.

There is an incredible amount of crap that could be cleaned up for
example by re-ordering a bit the init code and making things like slab
available before init_IRQ/time_init etc... but all of those will break
because of that.

In addition, even without that re-ordering, I'm pretty sure we are
hitting semaphores/mutexes early, before init_IRQ(), already and if not
in generic code, in arch code somewhere down the call stacks.

I don't think that whole pile of problems lurking around the corner is
worth the couple of cycles saved by hard-enabling irq in the mutex
instead of doing a save/restore.

Ben.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07  3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt
@ 2006-06-07  4:29   ` Andrew Morton
  2006-06-07  5:04     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-07  4:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo

On Wed, 07 Jun 2006 13:52:58 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> 
> > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
> > kernel-kernel-cpuc-to-mutexes.patch
> > 
> >  ug.  We cannot convert the cpu.c semaphore into a mutex until we work out
> >  why power4 goes titsup if you enable local interrupts during boot.
> 
> What is the exact problem ? Some mutex is forcing local irqs enabled
> before init_IRQ() ? (Before the normal enabling of IRQ done by
> init/main.c just after init_IRQ() more precisely ?)

Any code which does mutex_lock() will have interrupts reenabled if the
mutex code was compiled in debug mode.

> This is bad for any architecture. Basically, at this point, the
> interrupt controller can be in _any_ state, with possible pending
> interrupts for whatever sources, etc...
> 
> As we discussed before, that problem should really be fixed in the mutex
> code by not hard-enabling.
> 
> There is an incredible amount of crap that could be cleaned up for
> example by re-ordering a bit the init code and making things like slab
> available before init_IRQ/time_init etc... but all of those will break
> because of that.
> 
> In addition, even without that re-ordering, I'm pretty sure we are
> hitting semaphores/mutexes early, before init_IRQ(), already and if not
> in generic code, in arch code somewhere down the call stacks.
> 
> I don't think that whole pile of problems lurking around the corner is
> worth the couple of cycles saved by hard-enabling irq in the mutex
> instead of doing a save/restore.

A couple of cycles repeated a zillion times per second for the entire
uptime, just because we cannot get our act together in the first few
seconds of booting.  How much does that suck?

And how much does it suck that we require that an attempt to take a
sleeping lock must keep local interrupts disabled if the lock wasn't
contended?

Fortunately, it only happens (or at least, is only _known_ to happen) when
mutex debugging is enabled, so the performance loss is moot.

I do not know where the offending mutex_lock()s are occuring (although it
would be super-simple to find out).

By far the best solution to this would be to remove this requirement that
local interrupts remain disabled for impractical amounts of time during boot.
Either whack the PIC in setup_arch() or reorganise start_kernel() in some
appropriate manner.

But I'll be merging
work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
so we'll just continue to suck I guess.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07  4:29   ` Andrew Morton
@ 2006-06-07  5:04     ` Benjamin Herrenschmidt
  2006-06-07  5:29       ` Andrew Morton
  0 siblings, 1 reply; 166+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-07  5:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo, Paul Mackerras

> A couple of cycles repeated a zillion times per second for the entire
> uptime, just because we cannot get our act together in the first few
> seconds of booting.  How much does that suc

I don't follow you... what would you call "getting our act together"
then ? Being able to initialize half of the kernel data structures
without ever going near a mutex ?

The current stuff is just crap.

> And how much does it suck that we require that an attempt to take a
> sleeping lock must keep local interrupts disabled if the lock wasn't
> contended?

That is a more interesting point :)

> Fortunately, it only happens (or at least, is only _known_ to happen) when
> mutex debugging is enabled, so the performance loss is moot.
> 
> I do not know where the offending mutex_lock()s are occuring (although it
> would be super-simple to find out).

And even if we fix those, we'll ultimately get more. 

> By far the best solution to this would be to remove this requirement that
> local interrupts remain disabled for impractical amounts of time during boot.

That is not possible in any remotely sane way accross the board.

> Either whack the PIC in setup_arch() or reorganise start_kernel() in some
> appropriate manner.

Neither would be satisfactory. Whacking the PIC means accessing
hardware, which for a lot of architectures means having page tables up,
some kind of ioremap, etc... Hence the bunch of workarounds done by
various archs like having their PTE allocation function do horrors like
if (mem_init_done) kmalloc() else alloc_bootmem().

It's just too ugly for words.

As you said, we need to get our act together. That means having basic
kernel services that do _not_ rely on any hardware (interrupts, timer,
whatever...) be initialized first before we start needing ioremap's and
friends. That means having things like init_IRQ() which has to handle
allocating and initializing PICs all over the range and all sorts of
data structures that are related to interrupt handling, be able to use
said kernel services instead of having dodgy things like do half init
now, and another half later from a hook somewhere or an initcall while
hopeing that nobody will get in the middle.

It would make so much more sense to have the init code do something
like:

 setup_arch();
 init_basic_kernel_services(); <--- that's the blob you spotted with mem
init, slab init, ...
 init_arch(); <--- new arch hook

and later on, as part of the various inits, you get init_IRQ() and so
on...

In my example, init_arch() would be where the arch code moves the bits
currently in setup_arch() that do things like ioremap system devices and
do things that may want to use the slab etc... thus leaving setup_arch()
to very basic initialisations.

Not being able to do all of those because we have this
hyper-optimized-mutex-blah thing that hard enables interrupt all over
the place seems like a stupid thing to me. In fact, as you mentioned, it
only affects a debug code path which thus could perfectly take the
performance hit.

> But I'll be merging
> work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
> so we'll just continue to suck I guess.

How so ? Can you tell me how making the mutex debug code path do
something sane makes it 'suck' ? Don't argue about the couple of cycles
benefit, as you mentionned yourself, it's a debug code path.

Ben.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07  5:04     ` Benjamin Herrenschmidt
@ 2006-06-07  5:29       ` Andrew Morton
  2006-06-07  6:44         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 166+ messages in thread
From: Andrew Morton @ 2006-06-07  5:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo, paulus

On Wed, 07 Jun 2006 15:04:07 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> > Either whack the PIC in setup_arch() or reorganise start_kernel() in some
> > appropriate manner.
> 
> Neither would be satisfactory. Whacking the PIC means accessing
> hardware, which for a lot of architectures means having page tables up,
> some kind of ioremap, etc... Hence the bunch of workarounds done by
> various archs like having their PTE allocation function do horrors like
> if (mem_init_done) kmalloc() else alloc_bootmem().

Why on earth does the PIC come up pulling an interrupt when it hasn't been
spoken to yet?

> It would make so much more sense to have the init code do something
> like:
> 
>  setup_arch();
>  init_basic_kernel_services(); <--- that's the blob you spotted with mem
> init, slab init, ...
>  init_arch(); <--- new arch hook
> 
> and later on, as part of the various inits, you get init_IRQ() and so
> on...
> 
> In my example, init_arch() would be where the arch code moves the bits
> currently in setup_arch() that do things like ioremap system devices and
> do things that may want to use the slab etc... thus leaving setup_arch()
> to very basic initialisations.
> 
> Not being able to do all of those because we have this
> hyper-optimized-mutex-blah thing that hard enables interrupt all over
> the place seems like a stupid thing to me. In fact, as you mentioned, it
> only affects a debug code path which thus could perfectly take the
> performance hit.

Nonsense.  mutex_lock() can sleep.  Sleeping will enable interrupts. 
Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED,
that's it.

But now, because some broken piece of hardware is coming out of
reset/firmware asserting an interrupt we need to change the rules to be
"mutex_lock() must preserve local interrupts if the lock is uncontended". 
Ditto down(), down_read() and down_write().

And why does this bizarre restriction upon the implementation of our
locking primtives exist?  Because of your broken PIC and because of our
inability to sort out the early boot code.  And because the early boot code
has this implicit knowledge that the locks will be uncontended, else we're
toast.

We're doing mutex_lock(), down(), down_read() and down_write() with local
interrupts disabled, which is a bug.  We have explicit code in there to
*disable* our runtime debugging checks because we know about this bug but
don't know how to fix it.

I call that sucky.

> > But I'll be merging
> > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
> > so we'll just continue to suck I guess.
> 
> How so ? Can you tell me how making the mutex debug code path do
> something sane makes it 'suck' ? Don't argue about the couple of cycles
> benefit, as you mentionned yourself, it's a debug code path.
> 

Would you prefer "wildly idiotic"?

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07  5:29       ` Andrew Morton
@ 2006-06-07  6:44         ` Benjamin Herrenschmidt
  2006-06-07  7:03           ` Andrew Morton
  2006-06-07 13:21           ` Ingo Molnar
  0 siblings, 2 replies; 166+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-07  6:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo, paulus

On Tue, 2006-06-06 at 22:29 -0700, Andrew Morton wrote:

> Why on earth does the PIC come up pulling an interrupt when it hasn't been
> spoken to yet?

You leave in an ideal world :) Unfortunately the harsh reality is broken
firmwares or bootloaders, PICs that can't mask interrupts, virtual PICs
from hypervisors that want to talk to you as soon as you can take it,
kexec/kdump-style boots which didn't or couldn't completely shut the PIC
up, etc etc etc ....

In addition, on PowerPC (and possibly others), there is the decrementer
too that never stops ticking and interrupting (it simply can't be
stopped).

I'm sure other archs, especially embedded, can come up with a gazillion
other reasons.

> Nonsense.  mutex_lock() can sleep.  Sleeping will enable interrupts. 
> Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED,
> that's it.

Yes, except that we are talking about a well defined path which is the
kernel initialization here where it is guaranteed that there will be no
contention.

I think it's a fairly sane thing to require mutexes and other
synchronisation primitives not to hard enable interrupts when there is
no contention.

> But now, because some broken piece of hardware is coming out of
> reset/firmware asserting an interrupt we need to change the rules to be
> "mutex_lock() must preserve local interrupts if the lock is uncontended". 
> Ditto down(), down_read() and down_write().

I'm fairly convinced that "broken piece of hardware" is the general case
and that an idle PIC the exception :) Ask Linus why we have kept
carfully interrupts disabled until after init_IRQ() in the first place ?

> And why does this bizarre restriction upon the implementation of our
> locking primtives exist?  Because of your broken PIC and because of our
> inability to sort out the early boot code.  And because the early boot code
> has this implicit knowledge that the locks will be uncontended, else we're
> toast.

Because we live in a world where you can't assume that a PIC will be
well behaved. That's not a ppc specific problem, by far. I remember
having similar issues on some ARM bits I did ages ago for example. And
I'm sure kdump kind of things will bite on x86 as well.

> We're doing mutex_lock(), down(), down_read() and down_write() with local
> interrupts disabled, which is a bug.  We have explicit code in there to
> *disable* our runtime debugging checks because we know about this bug but
> don't know how to fix it.
> 
> I call that sucky.

So what can we do ? There is simply no other option I can see else of
having an entire pile of infrastructure and hacks (which is the case
today) dedicated to being able to do IOs and smashing the PIC down
before we can allocate memory in a remotely sane way... I really
need/want to clean that shit up. It all got triggered by the fact that I
need a radix tree at init_IRQ() time, but then I went looking for all
the cases where we hack around the lack of slab from there and it's
really not funny...

> > > But I'll be merging
> > > work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
> > > so we'll just continue to suck I guess.
> > 
> > How so ? Can you tell me how making the mutex debug code path do
> > something sane makes it 'suck' ? Don't argue about the couple of cycles
> > benefit, as you mentionned yourself, it's a debug code path.
> > 
> 
> Would you prefer "wildly idiotic"?

Heh. I still think not but you seem to have a pretty firm idea about
what makes sense and what not... I just happen not to share it ;)

Now, let's talk about solutions :) One is to ignore the problem, I can
do a band aid for PowerPC, I think I know what the problem is (probably
the decrementer, not the PIC) and consider that we must be always safe
to have interrupts enabled during the entire boot sequence of the
kernel. If you do
that, I strongly recommend that you put a local_irq_enable() somewhere
in start_kernel(), maybe as early as setup_arch(), in -mm and see what
breaks :)

I do have a solution in mind though that could work around the problem
of a bad behaved PIC or spurrious decrementer interrupts on powerpc, and
other architectures could probably do something similar. Basically, the
idea is to keep irqs disabled by default at startup (so we still need
your test to silence might_sleep() at leat until the main
local_irq_enable() is done, not later, so we still get useful warnings
during boot). In addition to that, archs need to add something to their
actual interrupt entry:

	if (no_irq_boot) {
		local_irq_disable();
		return;
	}

That will have the effect of "cleaning" up after a mutex/semaphore
re-enabling interrupts at least until no_irq_boot is cleared which would
be done right before the local_irq_enable() in init/main.c

Ben.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07  6:44         ` Benjamin Herrenschmidt
@ 2006-06-07  7:03           ` Andrew Morton
  2006-06-07 13:21           ` Ingo Molnar
  1 sibling, 0 replies; 166+ messages in thread
From: Andrew Morton @ 2006-06-07  7:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo, paulus

On Wed, 07 Jun 2006 16:44:31 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> So what can we do ?

Well my plan is to keep being sucky, hence
work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch.

The rule is that sleeping locks need to preserve local IRQs in the
non-contended case.   So be it, move on to more pressing things.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07  6:44         ` Benjamin Herrenschmidt
  2006-06-07  7:03           ` Andrew Morton
@ 2006-06-07 13:21           ` Ingo Molnar
  2006-06-08  0:31             ` Benjamin Herrenschmidt
  2006-06-08 22:59             ` Paul Mackerras
  1 sibling, 2 replies; 166+ messages in thread
From: Ingo Molnar @ 2006-06-07 13:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Andrew Morton, linux-kernel, paulus

* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> during boot). In addition to that, archs need to add something to their
> actual interrupt entry:
> 
> 	if (no_irq_boot) {
> 		local_irq_disable();
> 		return;
> 	}

that just moves the suckage from the mutex-debugging slowpath to the 
irq-handling hotpath. (at which point i still prefer to have that in the 
mutex-debugging path)

a better solution would be to install boot-time IRQ vectors that just do
nothing but return. They dont mask, they dont ACK nor EOI - they just
return. The only thing that could break this is a screaming interrupt,
and even that one probably just slows things down a tiny bit until we
get so far in the init sequence to set up the PIC.

	Ingo

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07 13:21           ` Ingo Molnar
@ 2006-06-08  0:31             ` Benjamin Herrenschmidt
  2006-06-08 10:49               ` David Woodhouse
  2006-06-08 11:17               ` Roman Zippel
  2006-06-08 22:59             ` Paul Mackerras
  1 sibling, 2 replies; 166+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-08  0:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, paulus

> a better solution would be to install boot-time IRQ vectors that just do
> nothing but return. They dont mask, they dont ACK nor EOI - they just
> return. The only thing that could break this is a screaming interrupt,
> and even that one probably just slows things down a tiny bit until we
> get so far in the init sequence to set up the PIC.

Changing vectors on the fly is hard on some platforms.... We could
change our toplevel ppc_md.get_irq() on powerpc, but we still to do
something about decrementer interrupts.
A screaming level interrupt will lockup the machine at least on some
platforms.

The problem with all those approaches is that they require changes to
all archs interrupt handling to make the situation safe vs. mutexes...

I still don't think where is the suckage in just not hard-enabling in
the mutex debug code...

Ben.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-08  0:31             ` Benjamin Herrenschmidt
@ 2006-06-08 10:49               ` David Woodhouse
  2006-06-08 10:53                 ` Ingo Molnar
  2006-06-08 11:17               ` Roman Zippel
  1 sibling, 1 reply; 166+ messages in thread
From: David Woodhouse @ 2006-06-08 10:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus

On Thu, 2006-06-08 at 10:31 +1000, Benjamin Herrenschmidt wrote:
> I still don't think where is the suckage in just not hard-enabling in
> the mutex debug code... 

If the mutex debugging code is hard-enabling interrupts before
init_IRQ() ever got called, that's just broken. Fixing that can hardly
be called 'suckage'.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-08 10:49               ` David Woodhouse
@ 2006-06-08 10:53                 ` Ingo Molnar
  2006-06-08 11:01                   ` David Woodhouse
  0 siblings, 1 reply; 166+ messages in thread
From: Ingo Molnar @ 2006-06-08 10:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Benjamin Herrenschmidt, Andrew Morton, linux-kernel, paulus

* David Woodhouse <dwmw2@infradead.org> wrote:

> On Thu, 2006-06-08 at 10:31 +1000, Benjamin Herrenschmidt wrote:
> > I still don't think where is the suckage in just not hard-enabling in
> > the mutex debug code... 
> 
> If the mutex debugging code is hard-enabling interrupts before 
> init_IRQ() ever got called, that's just broken. Fixing that can hardly 
> be called 'suckage'.

to quote Andrew:

--------------->
Nonsense.  mutex_lock() can sleep.  Sleeping will enable interrupts.
Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED,
that's it.

But now, because some broken piece of hardware is coming out of
reset/firmware asserting an interrupt we need to change the rules to be
"mutex_lock() must preserve local interrupts if the lock is uncontended".
Ditto down(), down_read() and down_write().

And why does this bizarre restriction upon the implementation of our
locking primtives exist?  Because of your broken PIC and because of our
inability to sort out the early boot code.  And because the early boot code
has this implicit knowledge that the locks will be uncontended, else we're
toast.

We're doing mutex_lock(), down(), down_read() and down_write() with local
interrupts disabled, which is a bug.  We have explicit code in there to
*disable* our runtime debugging checks because we know about this bug but
don't know how to fix it.

I call that sucky.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-08 10:53                 ` Ingo Molnar
@ 2006-06-08 11:01                   ` David Woodhouse
  0 siblings, 0 replies; 166+ messages in thread
From: David Woodhouse @ 2006-06-08 11:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Benjamin Herrenschmidt, Andrew Morton, linux-kernel, paulus

On Thu, 2006-06-08 at 12:53 +0200, Ingo Molnar quoted Andrew:
> Nonsense.  mutex_lock() can sleep.  Sleeping will enable interrupts.
> Therefore, hence, ergo ipso facto mutex_lock() can enable interrupts. QED,
> that's it.
> 
> But now, because some broken piece of hardware is coming out of
> reset/firmware asserting an interrupt we need to change the rules to be
> "mutex_lock() must preserve local interrupts if the lock is uncontended".
> Ditto down(), down_read() and down_write().
> 
> And why does this bizarre restriction upon the implementation of our
> locking primtives exist?  Because of your broken PIC and because of our
> inability to sort out the early boot code.  And because the early boot code
> has this implicit knowledge that the locks will be uncontended, else we're
> toast.
> 
> We're doing mutex_lock(), down(), down_read() and down_write() with local
> interrupts disabled, which is a bug.  We have explicit code in there to
> *disable* our runtime debugging checks because we know about this bug but
> don't know how to fix it.
> 
> I call that sucky. 

OK, if you put it like that, and you're going to be consistent by
declaring the disabling of __might_sleep() warnings to be sucky too,
then I suppose we can buy that argument.

Yes, we need to sort out the early boot code. It isn't so much that
we're unable as that nobody's really tried very hard. People seem scared
of it -- they even invent pointless special cases like the
'earlyconsole' crap instead of just registering the damn consoles
earlier, for example. Register_console() has _always_ worked right from
the beginning of setup_arch(), and I've often put it there.

Let's make a concerted effort to reorder the startup code so that we
_can_ enable interrupts and have slab working quite early. Ben's plans
for this look sane enough to me.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-08  0:31             ` Benjamin Herrenschmidt
  2006-06-08 10:49               ` David Woodhouse
@ 2006-06-08 11:17               ` Roman Zippel
  2006-06-08 13:38                 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-08 11:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus

Hi,

On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote:

> > a better solution would be to install boot-time IRQ vectors that just do
> > nothing but return. They dont mask, they dont ACK nor EOI - they just
> > return. The only thing that could break this is a screaming interrupt,
> > and even that one probably just slows things down a tiny bit until we
> > get so far in the init sequence to set up the PIC.
> 
> Changing vectors on the fly is hard on some platforms.... We could
> change our toplevel ppc_md.get_irq() on powerpc, but we still to do
> something about decrementer interrupts.

On ppc it should not be that difficult to even modify the exception entry 
code. Instead of calling do_IRQ use do_early_IRQ and only install the real 
handler later.

> A screaming level interrupt will lockup the machine at least on some
> platforms.

I guess that's even deadly on most platforms.

> The problem with all those approaches is that they require changes to
> all archs interrupt handling to make the situation safe vs. mutexes...

Only those archs that want to delay interrupt initialization and they at 
least have to provide minimal support to survive enabled interrupts.
init_IRQ() stays the same for all other archs and we add another hook to 
allow the delayed initializtion.


> I still don't think where is the suckage in just not hard-enabling in
> the mutex debug code...

If you want to have full services, then irqs are part of it. :)

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-08 11:17               ` Roman Zippel
@ 2006-06-08 13:38                 ` Benjamin Herrenschmidt
  2006-06-08 14:02                   ` Roman Zippel
  0 siblings, 1 reply; 166+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-08 13:38 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus

On Thu, 2006-06-08 at 13:17 +0200, Roman Zippel wrote:
> Hi,
> 
> On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote:
> 
> > > a better solution would be to install boot-time IRQ vectors that just do
> > > nothing but return. They dont mask, they dont ACK nor EOI - they just
> > > return. The only thing that could break this is a screaming interrupt,
> > > and even that one probably just slows things down a tiny bit until we
> > > get so far in the init sequence to set up the PIC.
> > 
> > Changing vectors on the fly is hard on some platforms.... We could
> > change our toplevel ppc_md.get_irq() on powerpc, but we still to do
> > something about decrementer interrupts.
> 
> On ppc it should not be that difficult to even modify the exception entry 
> code. Instead of calling do_IRQ use do_early_IRQ and only install the real 
> handler later.

Yes, it's possible, but will add overhead to the common  IRQ path just
to handle an early boot special case.

> > A screaming level interrupt will lockup the machine at least on some
> > platforms.
> 
> I guess that's even deadly on most platforms.

Yup.

> > The problem with all those approaches is that they require changes to
> > all archs interrupt handling to make the situation safe vs. mutexes...
> 
> Only those archs that want to delay interrupt initialization and they at 
> least have to provide minimal support to survive enabled interrupts.
> init_IRQ() stays the same for all other archs and we add another hook to 
> allow the delayed initializtion.

I'm taking a broader point of view here. More than just interrupt init,
it's in general having basic kernel services such as memory allocator,
which shouldn't need any special hardware initialization outside of the
mmu, be setup before we start banging hardware.

> > I still don't think where is the suckage in just not hard-enabling in
> > the mutex debug code...
> 
> If you want to have full services, then irqs are part of it. :)

No. THere is, imho, a very clear difference between services that do
rely on hw devices, ioremap, and such major infrastructure as page
tables, and services like slab which essentially, can be initialized
with the CPU itself being ready and nothing else.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-08 13:38                 ` Benjamin Herrenschmidt
@ 2006-06-08 14:02                   ` Roman Zippel
  2006-06-08 23:40                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 166+ messages in thread
From: Roman Zippel @ 2006-06-08 14:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus

Hi,

On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote:

> > On ppc it should not be that difficult to even modify the exception entry 
> > code. Instead of calling do_IRQ use do_early_IRQ and only install the real 
> > handler later.
> 
> Yes, it's possible, but will add overhead to the common  IRQ path just
> to handle an early boot special case.

What I mean is to directly patch the exception entry code, so after the 
initialization is complete you'll have no additional overhead.
In the EXC_XFER_TEMPLATE() macro the handler is stored at i##n. You can 
either export that address or you can use a special transfer handler, 
which automatically patches the values once some flag is set.

bye, Roman

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-08 14:02                   ` Roman Zippel
@ 2006-06-08 23:40                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 166+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-08 23:40 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, paulus

On Thu, 2006-06-08 at 16:02 +0200, Roman Zippel wrote:
> Hi,
> 
> On Thu, 8 Jun 2006, Benjamin Herrenschmidt wrote:
> 
> > > On ppc it should not be that difficult to even modify the exception entry 
> > > code. Instead of calling do_IRQ use do_early_IRQ and only install the real 
> > > handler later.
> > 
> > Yes, it's possible, but will add overhead to the common  IRQ path just
> > to handle an early boot special case.
> 
> What I mean is to directly patch the exception entry code, so after the 
> initialization is complete you'll have no additional overhead.
> In the EXC_XFER_TEMPLATE() macro the handler is stored at i##n. You can 
> either export that address or you can use a special transfer handler, 
> which automatically patches the values once some flag is set.

That is a possibility. Also totally PPC specific for a problem that will
hit every arch once I start moving things around in the init code. I
still think that the best way is to fix the mutex code. You remember
what you can read on most public toilets about leaving them in the state
you found them ? Sounds like a pretty good rule to me here as well.

Ben.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: mutex vs. local irqs (Was: 2.6.18 -mm merge plans)
  2006-06-07 13:21           ` Ingo Molnar
  2006-06-08  0:31             ` Benjamin Herrenschmidt
@ 2006-06-08 22:59             ` Paul Mackerras
  1 sibling, 0 replies; 166+ messages in thread
From: Paul Mackerras @ 2006-06-08 22:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Benjamin Herrenschmidt, Andrew Morton, linux-kernel

Ingo Molnar writes:

> a better solution would be to install boot-time IRQ vectors that just do
> nothing but return. They dont mask, they dont ACK nor EOI - they just
> return.

How would that help?  We'd just end up taking the interrupt over and
over again.  We have to either poke the PIC to tell it to shut up
somehow (which we can't do before ioremap is available) or arrange for
interrupts to be disabled after the return (which means that
might_sleep() will scream at us).

Paul.

^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
                   ` (19 preceding siblings ...)
  2006-06-07  3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt
@ 2006-06-10 10:22 ` Christoph Hellwig
  2006-06-14 15:18   ` Michael Halcrow
  20 siblings, 1 reply; 166+ messages in thread
From: Christoph Hellwig @ 2006-06-10 10:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

> ecryptfs-crypto-functions.patch
> ecryptfs-debug-functions.patch
> ecryptfs-alpha-build-fix.patch
> ecryptfs-convert-assert-to-bug_on.patch
> ecryptfs-remove-unnecessary-null-checks.patch
> ecryptfs-rewrite-ecryptfs_fsync.patch
> ecryptfs-overhaul-file-locking.patch
> 
>  Christoph has half-reviewed this and all the issues arising from that
>  have, I believe, been addressed.  With the exception of the "we should
>  have a generic stacking layer" issue.  Which is true.  Michael's take is
>  "yes, but that's not my job".  Which also is true.

It's far from ready.  There's various things that simply can't be done
properly in a lowlevel fs or abosulutely shouldn't.  And I think a few
uniqueue gems in there.   Most urgent thing of course is that we somehow
need to deal with the idiocy of the nameidata passed into most namespace
methods.


^ permalink raw reply	[flat|nested] 166+ messages in thread

* Re: 2.6.18 -mm merge plans
  2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig
@ 2006-06-14 15:18   ` Michael Halcrow
  0 siblings, 0 replies; 166+ messages in thread
From: Michael Halcrow @ 2006-06-14 15:18 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, linux-kernel

On Sat, Jun 10, 2006 at 11:22:11AM +0100, Christoph Hellwig wrote:
> > ecryptfs-crypto-functions.patch
> > ecryptfs-debug-functions.patch
> > ecryptfs-alpha-build-fix.patch
> > ecryptfs-convert-assert-to-bug_on.patch
> > ecryptfs-remove-unnecessary-null-checks.patch
> > ecryptfs-rewrite-ecryptfs_fsync.patch
> > ecryptfs-overhaul-file-locking.patch
> > 
> >  Christoph has half-reviewed this and all the issues arising from
> >  that have, I believe, been addressed.  With the exception of the
> >  "we should have a generic stacking layer" issue.  Which is true.
> >  Michael's take is "yes, but that's not my job".  Which also is
> >  true.

We are looking into how this can be best accomplished, given the
requirements of the various stackable filesystems out there.

> It's far from ready.  There's various things that simply can't be
> done properly in a lowlevel fs or abosulutely shouldn't.  And I
> think a few uniqueue gems in there.

We will work on fixes for any such issues brought to our attention. Up
to this point, we have provided fixes for all but two of the items
Christoph brought up in his initial analysis of eCryptfs. The
setlk/getlk code is redundant with the existing VFS implementations,
and so we are working on a fix for that.

> Most urgent thing of course is that we somehow need to deal with the
> idiocy of the nameidata passed into most namespace methods.

We would appreciate any suggestions folks in the community could give
on this.

Until then, is this patch a reasonable approach to address the problem
of just replacing the vfsmount and dentry in the existing nameidata
struct?

Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>

---

Don't muck with the existing nameidata structures.

---

 fs/ecryptfs/dentry.c |   18 +++++++-----------
 fs/ecryptfs/inode.c  |   14 +++++---------
 2 files changed, 12 insertions(+), 20 deletions(-)

5f0a8c57f8b51ba87cc950d1d2bac6873f73a8b3
diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index 7b1018a..6f19fc4 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -41,23 +41,19 @@ #include "ecryptfs_kernel.h"
  */
 static int ecryptfs_d_revalidate(struct dentry *dentry, struct nameidata *nd)
 {
-	int err = 1;
+	int rc = 1;
 	struct dentry *lower_dentry;
-	struct dentry *saved_dentry;
-	struct vfsmount *saved_vfsmount;
+	struct nameidata lower_nd;
 
 	lower_dentry = ecryptfs_dentry_to_lower(dentry);
 	if (!lower_dentry->d_op || !lower_dentry->d_op->d_revalidate)
 		goto out;
-	saved_dentry = nd->dentry;
-	saved_vfsmount = nd->mnt;
-	nd->dentry = lower_dentry;
-	nd->mnt = ecryptfs_superblock_to_private(dentry->d_sb)->lower_mnt;
-	err = lower_dentry->d_op->d_revalidate(lower_dentry, nd);
-	nd->dentry = saved_dentry;
-	nd->mnt = saved_vfsmount;
+	memcpy(&lower_nd, nd, sizeof(struct nameidata));
+	lower_nd.dentry = lower_dentry;
+	lower_nd.mnt = ecryptfs_superblock_to_private(dentry->d_sb)->lower_mnt;
+	rc = lower_dentry->d_op->d_revalidate(lower_dentry, &lower_nd);
 out:
-	return err;
+	return rc;
 }
 
 struct kmem_cache *ecryptfs_dentry_info_cache;
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 47e4202..342b0fa 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -122,17 +122,13 @@ ecryptfs_create_underlying_file(struct i
 				struct nameidata *nd)
 {
 	int rc;
-	struct dentry *saved_dentry = NULL;
-	struct vfsmount *saved_vfsmount = NULL;
+	struct nameidata lower_nd;
 
-	saved_dentry = nd->dentry;
-	saved_vfsmount = nd->mnt;
-	nd->dentry = lower_dentry;
-	nd->mnt = ecryptfs_superblock_to_private(
+	memcpy(&lower_nd, nd, sizeof(struct nameidata));
+	lower_nd.dentry = lower_dentry;
+	lower_nd.mnt = ecryptfs_superblock_to_private(
 		ecryptfs_dentry->d_sb)->lower_mnt;
-	rc = vfs_create(lower_dir_inode, lower_dentry, mode, nd);
-	nd->dentry = saved_dentry;
-	nd->mnt = saved_vfsmount;
+	rc = vfs_create(lower_dir_inode, lower_dentry, mode, &lower_nd);
 	return rc;
 }
 
-- 
1.3.3


^ permalink raw reply related	[flat|nested] 166+ messages in thread

end of thread, other threads:[~2006-06-29 14:47 UTC | newest]

Thread overview: 166+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-04 20:50 2.6.18 -mm merge plans Andrew Morton
2006-06-04 21:20 ` 2.6.18 hdrinstall (Re: 2.6.18 -mm merge plans) Bernhard Rosenkraenzer
2006-06-04 21:33 ` header cleanup and install David Woodhouse
2006-06-04 21:43   ` Andrew Morton
2006-06-05 10:52   ` Jens Axboe
2006-06-05 10:54     ` David Woodhouse
2006-06-05 10:59       ` Jens Axboe
2006-06-05 10:57         ` David Woodhouse
2006-06-05 11:03           ` Jens Axboe
2006-06-05 18:09             ` Andrew Morton
2006-06-05 19:19               ` David Woodhouse
2006-06-17 20:35                 ` Alistair John Strachan
2006-06-17 21:20                   ` David Woodhouse
2006-06-04 21:36 ` 2.6.18 -mm merge plans Alan Cox
2006-06-04 21:41 ` kbuild, kconfig and hrdinstall stuff Sam Ravnborg
2006-06-04 21:54   ` David Woodhouse
2006-06-04 23:04 ` klibc (was: 2.6.18 -mm merge plans) H. Peter Anvin
2006-06-05 18:09   ` Roman Zippel
2006-06-06 15:20   ` Pavel Machek
2006-06-06 20:56     ` Rafael J. Wysocki
2006-06-07  3:37       ` H. Peter Anvin
2006-06-07  4:00       ` Nigel Cunningham
2006-06-07  4:10         ` H. Peter Anvin
2006-06-07  4:25           ` Nigel Cunningham
2006-06-07  4:26             ` klibc H. Peter Anvin
2006-06-07  6:22               ` klibc Nigel Cunningham
2006-06-07  6:38                 ` klibc H. Peter Anvin
2006-06-07  6:51             ` klibc (was: 2.6.18 -mm merge plans) Joshua Hudson
2006-06-07 21:12               ` H. Peter Anvin
2006-06-09  8:03                 ` klibc Nix
2006-06-09 18:45                   ` klibc H. Peter Anvin
     [not found]                   ` <bda6d13a0606091050n40fda044v668eef09af3c29a7@mail.gmail.com>
     [not found]                     ` <871wty6rl9.fsf@hades.wkstn.nix>
2006-06-09 22:28                       ` klibc Joshua Hudson
2006-06-09 22:48                         ` klibc H. Peter Anvin
2006-06-09 23:13                           ` klibc Joshua Hudson
2006-06-09 23:44                             ` klibc H. Peter Anvin
2006-06-16  6:02                               ` klibc Joshua Hudson
2006-06-16 19:19                                 ` klibc H. Peter Anvin
2006-06-07  8:44       ` klibc (was: 2.6.18 -mm merge plans) Pavel Machek
2006-06-07  9:44         ` Rafael J. Wysocki
2006-06-04 23:50 ` clocksource Roman Zippel
2006-06-05 20:20   ` clocksource john stultz
2006-06-05 20:53     ` clocksource john stultz
2006-06-05 21:07     ` clocksource Roman Zippel
2006-06-06 19:42       ` clocksource john stultz
2006-06-07  0:41         ` clocksource Roman Zippel
2006-06-08  8:05           ` clocksource john stultz
2006-06-15 11:40             ` clocksource Roman Zippel
2006-06-16  3:21               ` clocksource john stultz
2006-06-16  3:35                 ` clocksource john stultz
2006-06-16 15:33                 ` clocksource Roman Zippel
2006-06-16 18:48                   ` clocksource john stultz
2006-06-17 19:45                     ` clocksource Roman Zippel
2006-06-17 17:04                 ` clocksource Andrew Morton
2006-06-05  0:02 ` utsname/hostname Randy.Dunlap
2006-06-05  1:06   ` utsname/hostname Andrew Morton
2006-06-05  3:10     ` utsname/hostname Randy.Dunlap
     [not found] ` <20060605002807.GA4919@mail.ustc.edu.cn>
2006-06-05  0:28   ` readahead benchmark Fengguang Wu
2006-06-05  1:02     ` Andrew Morton
2006-06-05  0:32 ` new SCSI drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
     [not found] ` <20060605010501.GA4931@mail.ustc.edu.cn>
2006-06-05  1:05   ` statistics infrastructure Fengguang Wu
2006-06-05 16:30   ` Greg KH
2006-06-13 23:47     ` statistics infrastructure (in -mm tree) review Greg KH
2006-06-14  0:18       ` Randy.Dunlap
2006-06-14 16:45         ` Greg KH
2006-06-14 22:48         ` Martin Peschke
2006-06-19 22:12           ` Greg KH
2006-06-20 15:40             ` Martin Peschke
2006-06-20 16:50               ` Randy.Dunlap
2006-06-21 18:51                 ` Martin Peschke
2006-06-21 19:38                   ` Matthew Frost
2006-06-22 11:43                     ` Martin Peschke
2006-06-14  5:04       ` Andi Kleen
2006-06-14 22:49         ` Martin Peschke
2006-06-16 20:40           ` Greg KH
2006-06-16 21:34             ` Martin Peschke
2006-06-17  6:51           ` Andi Kleen
2006-06-17 11:03             ` Martin Peschke
2006-06-17 10:30       ` Martin Peschke
2006-06-05  1:06 ` wireless (was Re: 2.6.18 -mm merge plans) Jeff Garzik
2006-06-05  1:15   ` Andrew Morton
2006-06-05  8:33     ` Andreas Mohr
2006-06-05  8:45       ` Arjan van de Ven
2006-06-05 10:26         ` Alan Cox
2006-06-05 10:35           ` Arjan van de Ven
2006-06-05 10:59             ` Alan Cox
2006-06-10  6:58             ` Pavel Machek
2006-06-05  8:54   ` Christoph Hellwig
2006-06-05 12:33     ` Jeff Garzik
2006-06-05 12:48       ` Arjan van de Ven
2006-06-05 12:52         ` Jeff Garzik
2006-06-05 14:02           ` Linux kernel and laws Adrian Bunk
2006-06-05 14:21             ` linux-os (Dick Johnson)
2006-06-06  5:33             ` Evgeniy Polyakov
2006-06-05 13:27     ` wireless (was Re: 2.6.18 -mm merge plans) John W. Linville
2006-06-05 13:31       ` Christoph Hellwig
2006-06-05 13:42       ` Arjan van de Ven
2006-06-05 16:24       ` Alan Cox
2006-06-29 14:26         ` ACX100 (softmac-based) driver ready to merge, but is it legal? -- " John W. Linville
     [not found]           ` <20060629144233.GB24463@tuxdriver.com>
2006-06-29 14:47             ` [Acx100-users] Denis Vlasenko, where are you? (mail bounced) Andreas Mohr
2006-06-05  1:32 ` merging new drivers (was Re: 2.6.18 -mm merge plans) Jeff Garzik
2006-06-05  1:47   ` Andrew Morton
2006-06-05  8:59     ` Christoph Hellwig
2006-06-05  9:10       ` Andrew Morton
2006-06-05  9:16         ` Arjan van de Ven
2006-06-05 11:10       ` Ivan Novick
2006-06-05 11:26         ` Adrian Bunk
2006-06-05  6:58   ` Francois Romieu
2006-06-05 10:32     ` Alan Cox
2006-06-05 10:36       ` Arjan van de Ven
2006-06-06  2:02         ` Chris Wright
2006-06-06  7:01           ` Andi Kleen
2006-06-06 13:04             ` Steven Rostedt
2006-06-05 13:38 ` 2.6.18 -mm merge plans -- GFS David Woodhouse
2006-06-05 14:10   ` Russell King
2006-06-05 15:01   ` Steven Whitehouse
2006-06-07  7:12     ` Steven Whitehouse
2006-06-05 14:08 ` 2.6.18 -mm merge plans Oleg Nesterov
2006-06-05 14:43 ` Serge E. Hallyn
2006-06-08 19:56   ` Eric W. Biederman
2006-06-09 13:02     ` Serge E. Hallyn
2006-06-09 23:25     ` Serge E. Hallyn
2006-06-10  0:39       ` Eric W. Biederman
2006-06-10  1:23         ` Serge E. Hallyn
2006-06-10  7:52           ` Eric W. Biederman
2006-06-10  8:09           ` Eric W. Biederman
2006-06-10  9:53       ` Christoph Hellwig
2006-06-06  0:54 ` Merge of per task delay accounting (was Re: 2.6.18 -mm merge plans) Balbir Singh
2006-06-06 22:28   ` Shailabh Nagar
2006-06-06 22:40     ` Andrew Morton
2006-06-08 14:27       ` Shailabh Nagar
2006-06-08 17:42         ` Andrew Morton
2006-06-08 18:36           ` Shailabh Nagar
2006-06-08 19:33             ` Balbir Singh
2006-06-06 22:52     ` Jay Lan
2006-06-06 22:55       ` Shailabh Nagar
2006-06-12 12:02       ` Martin Peschke
2006-06-12 13:28         ` Shailabh Nagar
2006-06-06 12:32 ` 2.6.18 -mm pi-futex merge Steven Rostedt
2006-06-06 13:34   ` Roman Zippel
2006-06-06 13:44     ` Steven Rostedt
2006-06-06 14:42 ` genirq Ingo Molnar
2006-06-06 16:56   ` genirq Daniel Walker
2006-06-07  8:42     ` genirq Ingo Molnar
2006-06-07  3:46   ` genirq Benjamin Herrenschmidt
2006-06-06 14:53 ` 2.6.18 -mm merge plans Ingo Molnar
2006-06-06 16:02   ` Andrew Morton
2006-06-06 16:35     ` Arjan van de Ven
2006-06-06 20:47     ` lock validator [2.6.18 -mm merge plans] Ingo Molnar
2006-06-07  3:52 ` mutex vs. local irqs (Was: 2.6.18 -mm merge plans) Benjamin Herrenschmidt
2006-06-07  4:29   ` Andrew Morton
2006-06-07  5:04     ` Benjamin Herrenschmidt
2006-06-07  5:29       ` Andrew Morton
2006-06-07  6:44         ` Benjamin Herrenschmidt
2006-06-07  7:03           ` Andrew Morton
2006-06-07 13:21           ` Ingo Molnar
2006-06-08  0:31             ` Benjamin Herrenschmidt
2006-06-08 10:49               ` David Woodhouse
2006-06-08 10:53                 ` Ingo Molnar
2006-06-08 11:01                   ` David Woodhouse
2006-06-08 11:17               ` Roman Zippel
2006-06-08 13:38                 ` Benjamin Herrenschmidt
2006-06-08 14:02                   ` Roman Zippel
2006-06-08 23:40                     ` Benjamin Herrenschmidt
2006-06-08 22:59             ` Paul Mackerras
2006-06-10 10:22 ` 2.6.18 -mm merge plans Christoph Hellwig
2006-06-14 15:18   ` Michael Halcrow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox