stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Joe Perches <joe@perches.com>, Will Deacon <will.deacon@arm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Anshul Garg <aksgarg1989@gmail.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	David Miller <davem@davemloft.net>,
	Ingo Molnar <mingo@kernel.org>, Kees Cook <keescook@chromium.org>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Michael Davidson <md@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 4.9 04/91] lib/int_sqrt: optimize initial value compute
Date: Thu,  4 Apr 2019 10:46:48 +0200	[thread overview]
Message-ID: <20190404084535.672303864@linuxfoundation.org> (raw)
In-Reply-To: <20190404084535.450029272@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@infradead.org>

commit f8ae107eef209bff29a5816bc1aad40d5cd69a80 upstream.

The initial value (@m) compute is:

	m = 1UL << (BITS_PER_LONG - 2);
	while (m > x)
		m >>= 2;

Which is a linear search for the highest even bit smaller or equal to @x
We can implement this using a binary search using __fls() (or better when
its hardware implemented).

	m = 1UL << (__fls(x) & ~1UL);

Especially for small values of @x; which are the more common arguments
when doing a CDF on idle times; the linear search is near to worst case,
while the binary search of __fls() is a constant 6 (or 5 on 32bit)
branches.

      cycles:                 branches:              branch-misses:

PRE:

hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219

SOFTWARE FLS:

hot:   29.576176 +- 0.028850  26.666730 +- 0.004511  0.019463 +- 0.000663
cold: 165.947136 +- 0.188406  26.666746 +- 0.004511  6.133897 +- 0.004386

HARDWARE FLS:

hot:   24.720922 +- 0.025161  20.666784 +- 0.004509  0.020836 +- 0.000677
cold: 132.777197 +- 0.127471  20.666776 +- 0.004509  5.080285 +- 0.003874

Averages computed over all values <128k using a LFSR to generate order.
Cold numbers have a LFSR based branch trace buffer 'confuser' ran between
each int_sqrt() invocation.

Link: http://lkml.kernel.org/r/20171020164644.936577234@infradead.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Joe Perches <joe@perches.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Anshul Garg <aksgarg1989@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Michael Davidson <md@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 lib/int_sqrt.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -7,6 +7,7 @@
 
 #include <linux/kernel.h>
 #include <linux/export.h>
+#include <linux/bitops.h>
 
 /**
  * int_sqrt - rough approximation to sqrt
@@ -21,10 +22,7 @@ unsigned long int_sqrt(unsigned long x)
 	if (x <= 1)
 		return x;
 
-	m = 1UL << (BITS_PER_LONG - 2);
-	while (m > x)
-		m >>= 2;
-
+	m = 1UL << (__fls(x) & ~1UL);
 	while (m != 0) {
 		b = y + m;
 		y >>= 1;



  parent reply	other threads:[~2019-04-04  9:57 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-04  8:46 [PATCH 4.9 00/91] 4.9.168-stable review Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 01/91] arm64: debug: Dont propagate UNKNOWN FAR into si_code for debug signals Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 02/91] arm64: debug: Ensure debug handlers check triggering exception level Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 03/91] ext4: cleanup bh release code in ext4_ind_remove_space() Greg Kroah-Hartman
2019-04-04  8:46 ` Greg Kroah-Hartman [this message]
2019-04-04  8:46 ` [PATCH 4.9 05/91] tty/serial: atmel: Add is_half_duplex helper Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 06/91] tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 07/91] mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 08/91] i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 09/91] CIFS: fix POSIX lock leak and invalid ptr deref Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 10/91] h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 11/91] tracing: kdb: Fix ftdump to not sleep Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 12/91] gpio: gpio-omap: fix level interrupt idling Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 13/91] include/linux/relay.h: fix percpu annotation in struct rchan Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 14/91] sysctl: handle overflow for file-max Greg Kroah-Hartman
2019-04-04  8:46 ` [PATCH 4.9 15/91] enic: fix build warning without CONFIG_CPUMASK_OFFSTACK Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 16/91] scsi: hisi_sas: Set PHY linkrate when disconnected Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 17/91] mm/cma.c: cma_declare_contiguous: correct err handling Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 18/91] mm/page_ext.c: fix an imbalance with kmemleak Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 19/91] mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 20/91] mm/slab.c: kmemleak no scan alien caches Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 21/91] ocfs2: fix a panic problem caused by o2cb_ctl Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 22/91] f2fs: do not use mutex lock in atomic context Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 23/91] fs/file.c: initialize init_files.resize_wait Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 24/91] cifs: use correct format characters Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 25/91] dm thin: add sanity checks to thin-pool and external snapshot creation Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 26/91] cifs: Fix NULL pointer dereference of devname Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 27/91] fs: Make splice() and tee() take into account O_NONBLOCK flag on pipes Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 28/91] jbd2: fix invalid descriptor block checksum Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 29/91] fs: fix guard_bio_eod to check for real EOD errors Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 30/91] tools lib traceevent: Fix buffer overflow in arg_eval Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 31/91] wil6210: check null pointer in _wil_cfg80211_merge_extra_ies Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 32/91] crypto: crypto4xx - add missing of_node_put after of_device_is_available Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 33/91] usb: chipidea: Grab the (legacy) USB PHY by phandle first Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 34/91] scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 35/91] coresight: etm4x: Add support to enable ETMv4.2 Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 36/91] ARM: 8840/1: use a raw_spinlock_t in unwind Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 37/91] iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 38/91] mmc: omap: fix the maximum timeout setting Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 39/91] e1000e: Fix -Wformat-truncation warnings Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 40/91] mlxsw: spectrum: Avoid " Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 41/91] IB/mlx4: Increase the timeout for CM cache Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 42/91] scsi: megaraid_sas: return error when create DMA pool failed Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 43/91] perf test: Fix failure of evsel-tp-sched test on s390 Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 44/91] SoC: imx-sgtl5000: add missing put_device() Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 45/91] media: sh_veu: Correct return type for mem2mem buffer helpers Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 46/91] media: s5p-jpeg: " Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 47/91] media: s5p-g2d: " Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 48/91] media: mx2_emmaprp: " Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 49/91] vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1 Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 50/91] HID: intel-ish-hid: avoid binding wrong ishtp_cl_device Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 51/91] leds: lp55xx: fix null deref on firmware load failure Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 52/91] iwlwifi: pcie: fix emergency path Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 53/91] ACPI / video: Refactor and fix dmi_is_desktop() Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 54/91] kprobes: Prohibit probing on bsearch() Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 55/91] ARM: 8833/1: Ensure that NEON code always compiles with Clang Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 56/91] ALSA: PCM: check if ops are defined before suspending PCM Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 57/91] usb: f_fs: Avoid crash due to out-of-scope stack ptr access Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 58/91] bcache: fix input overflow to cache set sysfs file io_error_halflife Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 59/91] bcache: fix input overflow to sequential_cutoff Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 60/91] bcache: improve sysfs_strtoul_clamp() Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 61/91] genirq: Avoid summation loops for /proc/stat Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 62/91] iw_cxgb4: fix srqidx leak during connection abort Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 63/91] fbdev: fbmem: fix memory access if logo is bigger than the screen Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 64/91] cdrom: Fix race condition in cdrom_sysctl_register Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 65/91] e1000e: fix cyclic resets at link up with active tx Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 66/91] ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 67/91] efi/memattr: Dont bail on zero VA if it equals the regions PA Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 68/91] ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 69/91] soc: qcom: gsbi: Fix error handling in gsbi_probe() Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 70/91] mt7601u: bump supported EEPROM version Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 71/91] ARM: avoid Cortex-A9 livelock on tight dmb loops Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 72/91] tty: increase the default flip buffer limit to 2*640K Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 73/91] powerpc/pseries: Perform full re-add of CPU for topology update post-migration Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 74/91] media: mt9m111: set initial frame size other than 0x0 Greg Kroah-Hartman
2019-04-04  8:47 ` [PATCH 4.9 75/91] hwrng: virtio - Avoid repeated init of completion Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 76/91] soc/tegra: fuse: Fix illegal free of IO base address Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 77/91] HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 78/91] hpet: Fix missing = character in the __setup() code of hpet_mmap_enable Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 79/91] dmaengine: imx-dma: fix warning comparison of distinct pointer types Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 80/91] dmaengine: qcom_hidma: assign channel cookie correctly Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 81/91] netfilter: physdev: relax br_netfilter dependency Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 82/91] media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 83/91] regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 84/91] drm/nouveau: Stop using drm_crtc_force_disable Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 85/91] x86/build: Specify elf_i386 linker emulation explicitly for i386 objects Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 86/91] selinux: do not override context on context mounts Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 87/91] wlcore: Fix memory leak in case wl12xx_fetch_firmware failure Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 88/91] x86/build: Mark per-CPU symbols as absolute explicitly for LLD Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 89/91] dmaengine: tegra: avoid overflow of byte tracking Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 90/91] drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers Greg Kroah-Hartman
2019-04-04  8:48 ` [PATCH 4.9 91/91] ACPI / video: Extend chassis-type detection with a "Lunch Box" check Greg Kroah-Hartman
2019-04-04 16:44 ` [PATCH 4.9 00/91] 4.9.168-stable review kernelci.org bot
2019-04-05  3:14 ` Naresh Kamboju
2019-04-05 15:26 ` shuah
2019-04-05 15:36 ` Jon Hunter
2019-04-05 18:30 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190404084535.672303864@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aksgarg1989@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=davem@davemloft.net \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=md@google.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).