All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Anshul Garg <aksgarg1989@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Will Deacon <will.deacon@arm.com>,
	Joe Perches <joe@perches.com>, David Miller <davem@davemloft.net>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Kees Cook <keescook@chromium.org>,
	Michael Davidson <md@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>
Subject: [PATCH 4.14 31/41] lib/int_sqrt: optimize small argument
Date: Tue, 26 Mar 2019 15:30:08 +0900	[thread overview]
Message-ID: <20190326042651.600856386@linuxfoundation.org> (raw)
In-Reply-To: <20190326042649.889479098@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@infradead.org>

commit 3f3295709edea6268ff1609855f498035286af73 upstream.

The current int_sqrt() computation is sub-optimal for the case of small
@x.  Which is the interesting case when we're going to do cumulative
distribution functions on idle times, which we assume to be a random
variable, where the target residency of the deepest idle state gives an
upper bound on the variable (5e6ns on recent Intel chips).

In the case of small @x, the compute loop:

	while (m != 0) {
		b = y + m;
		y >>= 1;

		if (x >= b) {
			x -= b;
			y += m;
		}
		m >>= 2;
	}

can be reduced to:

	while (m > x)
		m >>= 2;

Because y==0, b==m and until x>=m y will remain 0.

And while this is computationally equivalent, it runs much faster
because there's less code, in particular less branches.

      cycles:                 branches:              branch-misses:

OLD:

hot:   45.109444 +- 0.044117  44.333392 +- 0.002254  0.018723 +- 0.000593
cold: 187.737379 +- 0.156678  44.333407 +- 0.002254  6.272844 +- 0.004305

PRE:

hot:   67.937492 +- 0.064124  66.999535 +- 0.000488  0.066720 +- 0.001113
cold: 232.004379 +- 0.332811  66.999527 +- 0.000488  6.914634 +- 0.006568

POST:

hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219

Averages computed over all values <128k using a LFSR to generate order.
Cold numbers have a LFSR based branch trace buffer 'confuser' ran between
each int_sqrt() invocation.

Link: http://lkml.kernel.org/r/20171020164644.876503355@infradead.org
Fixes: 30493cc9dddb ("lib/int_sqrt.c: optimize square root algorithm")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Anshul Garg <aksgarg1989@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Davidson <md@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 lib/int_sqrt.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -23,6 +23,9 @@ unsigned long int_sqrt(unsigned long x)
 		return x;
 
 	m = 1UL << (BITS_PER_LONG - 2);
+	while (m > x)
+		m >>= 2;
+
 	while (m != 0) {
 		b = y + m;
 		y >>= 1;



  parent reply	other threads:[~2019-03-26  6:34 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26  6:29 [PATCH 4.14 00/41] 4.14.109-stable review Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 01/41] mmc: pxamci: fix enum type confusion Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 02/41] drm/vmwgfx: Dont double-free the mode stored in par->set_mode Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 03/41] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 04/41] libceph: wait for latest osdmap in ceph_monc_blacklist_add() Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 05/41] udf: Fix crash on IO error during truncate Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 06/41] mips: loongson64: lemote-2f: Add IRQF_NO_SUSPEND to "cascade" irqaction Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 07/41] MIPS: Ensure ELF appended dtb is relocated Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 08/41] MIPS: Fix kernel crash for R6 in jump label branch function Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 09/41] scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 10/41] scsi: ibmvscsi: Fix empty event pool access during host removal Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 11/41] futex: Ensure that futex address is aligned in handle_futex_death() Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 12/41] perf probe: Fix getting the kernel map Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 13/41] objtool: Move objtool_file struct off the stack Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 14/41] ALSA: x86: Fix runtime PM for hdmi-lpe-audio Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 15/41] ext4: fix NULL pointer dereference while journal is aborted Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 16/41] ext4: fix data corruption caused by unaligned direct AIO Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 17/41] ext4: brelse all indirect buffer in ext4_ind_remove_space() Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 18/41] media: v4l2-ctrls.c/uvc: zero v4l2_event Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 19/41] Bluetooth: hci_uart: Check if socket buffer is ERR_PTR in h4_recv_buf() Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 20/41] Bluetooth: Fix decrementing reference count twice in releasing socket Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 21/41] Bluetooth: hci_ldisc: Initialize hci_dev before open() Greg Kroah-Hartman
2019-03-26  6:29 ` [PATCH 4.14 22/41] Bluetooth: hci_ldisc: Postpone HCI_UART_PROTO_READY bit set in hci_uart_set_proto() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 23/41] drm: Reorder set_property_atomic to avoid returning with an active ww_ctx Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 24/41] netfilter: ebtables: remove BUGPRINT messages Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 25/41] x86/unwind: Handle NULL pointer calls better in frame unwinder Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 26/41] x86/unwind: Add hardcoded ORC entry for NULL Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 27/41] locking/lockdep: Add debug_locks check in __lock_downgrade() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 28/41] mm, mempolicy: fix uninit memory access Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 29/41] ALSA: hda - Record the current power state before suspend/resume calls Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 30/41] ALSA: hda - Enforces runtime_resume after S3 and S4 for each codec Greg Kroah-Hartman
2019-03-26  6:30 ` Greg Kroah-Hartman [this message]
2019-03-26  6:30 ` [PATCH 4.14 32/41] USB: core: only clean up what we allocated Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 33/41] scsi: ufs: fix wrong command type of UTRD for UFSHCI v2.1 Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 34/41] PCI: designware-ep: dw_pcie_ep_set_msi() should only set MMC bits Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 35/41] PCI: designware-ep: Read-only registers need DBI_RO_WR_EN to be writable Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 36/41] PCI: endpoint: Use EPCs device in dma_alloc_coherent()/dma_free_coherent() Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 37/41] rtc: Fix overflow when converting time64_t to rtc_time Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 38/41] sched/cpufreq/schedutil: Fix error path mutex unlock Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 39/41] pwm-backlight: Enable/disable the PWM before/after LCD enable toggle Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 40/41] power: supply: charger-manager: Fix incorrect return value Greg Kroah-Hartman
2019-03-26  6:30 ` [PATCH 4.14 41/41] ath10k: avoid possible string overflow Greg Kroah-Hartman
2019-03-26 10:23 ` [PATCH 4.14 00/41] 4.14.109-stable review kernelci.org bot
2019-03-26 15:19 ` Jon Hunter
2019-03-26 15:19   ` Jon Hunter
2019-03-26 16:39 ` Naresh Kamboju
2019-03-26 17:49 ` Guenter Roeck
2019-03-26 23:15 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190326042651.600856386@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aksgarg1989@gmail.com \
    --cc=arnd@arndb.de \
    --cc=dave@stgolabs.net \
    --cc=davem@davemloft.net \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=md@google.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.