From: William Lee Irwin III <wli@holomorphy.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Ray Bryant <raybry@sgi.com>, Jesse Barnes <jbarnes@engr.sgi.com>,
linux-kernel@vger.kernel.org
Subject: [profile] amortize atomic hit count increments
Date: Mon, 13 Sep 2004 21:47:48 -0700 [thread overview]
Message-ID: <20040914044748.GZ9106@holomorphy.com> (raw)
In-Reply-To: <20040913015003.5406abae.akpm@osdl.org>
On Mon, Sep 13, 2004 at 01:50:03AM -0700, Andrew Morton wrote:
> Due to master.kernel.org being on the blink, 2.6.9-rc1-mm5 Is currently at
> http://www.zip.com.au/~akpm/linux/patches/2.6.9-rc1-mm5/
> and will later appear at
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc1/2.6.9-rc1-mm5/
> Please check kernel.org before using zip.com.au.
> - Added the `bk-scsi-target' tree to the -mm lineup. It is managed by James
> Bottomley
> - Some enhancements to the ext3 block reservation code here. Please cc
> sct@redhat.com on oops reports ;)
> - There's a patch here which will cause warnings if a PCI device driver is
> removed without having called pci_disable_device(). Please try to cc the
> appropriate mailing list or maintainer when reporting any instances.
I've been informed that /proc/profile livelocks some systems in the
timer interrupt, usually at boot. The following patch attempts to
amortize the atomic operations done on the profile buffer to address
this stability concern. This patch has nothing to do with performance;
kernels using periodic timer interrupts are under realtime constraints
to complete whatever work they perform within timer interrupts before
the next timer interrupt arrives lest they livelock, performing no work
whatsoever apart from servicing timer interrupts. The latency of the
cacheline bounce for prof_buffer contributes to the time spent in the
timer interrupt, hence it must be amortized when remote access latencies
or deviations from fair exclusive cacheline acquisition may cause
cacheline bounces to take longer than the interval between timer ticks.
What this patch does is to create a per-cpu open-addressed hashtable
indexed by profile buffer slot holding values representing the number
of pending profile buffer hits. When this hashtable overflows, one
iterates over the hashtable accounting each of the pairs of profile
buffer slots and hit counts to the global profile buffer. Zero is a
legitimate profile buffer slot, so zero hit counts represent unused
hashtable entries. The hashtable is furthermore protected from reentry
into the timer interrupt by interrupt disablement. read_proc_profile()
does not flush the per-cpu hashtables because flushing may cause
timeslice overrun on the systems where prof_buffer cacheline bounces
are so problematic as to livelock the timer interrupt.
This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries.
This reduces what was before the patch a number of atomic increments
equal to what after the patch becomes the sum of the hits held for each
entry in the hashtable, to a number of atomic_add()'s equal to the
number of entries in the per_cpu hashtable. This is nondeterministic,
but as the profile hits tend to be concentrated in a very small number
of profile buffer slots during any given timing interval, is likely to
represent a very large number of atomic increments. This amortization
of atomic increments does not depend on the hash function, only the
(lack of) scattering of profile buffer hits.
I would be much obliged if the reporters of this issue could verify
whether this resolves their livelock. Untested, as I was hoping the
bugreporters could do that bit for me.
Index: mm5-2.6.9-rc1/kernel/profile.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/profile.c 2004-09-13 16:27:36.639247200 -0700
+++ mm5-2.6.9-rc1/kernel/profile.c 2004-09-13 21:36:35.498912144 -0700
@@ -12,10 +12,18 @@
#include <linux/profile.h>
#include <asm/sections.h>
+struct profile_hit {
+ unsigned long pc, hits;
+};
+#define NR_PROFILE_HIT (PAGE_SIZE/sizeof(struct profile_hit))
+
static atomic_t *prof_buffer;
static unsigned long prof_len, prof_shift;
static int prof_on;
static cpumask_t prof_cpu_mask = CPU_MASK_ALL;
+#ifdef CONFIG_SMP
+static DEFINE_PER_CPU(struct profile_hit [NR_PROFILE_HIT], cpu_profile_hits);
+#endif /* CONFIG_SMP */
static int __init profile_setup(char * str)
{
@@ -181,6 +189,41 @@
EXPORT_SYMBOL_GPL(profile_event_register);
EXPORT_SYMBOL_GPL(profile_event_unregister);
+#ifdef CONFIG_SMP
+void profile_hit(int type, void *__pc)
+{
+ unsigned long primary, secondary, flags, pc = (unsigned long)__pc;
+ int i, cpu;
+ struct profile_hit *hits;
+
+ if (prof_on != type || !prof_buffer)
+ return;
+ pc = min((pc - (unsigned long)_stext) >> prof_shift, prof_len - 1);
+ cpu = get_cpu();
+ i = primary = pc & (NR_PROFILE_HIT - 1);
+ secondary = ((~pc << 1) | 1) & (NR_PROFILE_HIT - 1);
+ hits = per_cpu(cpu_profile_hits, cpu);
+ local_irq_save(flags);
+ do {
+ if (hits[i].pc == pc) {
+ hits[i].hits++;
+ goto out;
+ } else if (!hits[i].hits) {
+ hits[i].pc = pc;
+ hits[i].hits = 1;
+ goto out;
+ } else
+ i = (i + secondary) & (NR_PROFILE_HIT - 1);
+ } while (i != primary);
+ atomic_inc(&prof_buffer[pc]);
+ for (i = 0; i < NR_PROFILE_HIT; ++i)
+ atomic_add(hits[i].hits, &prof_buffer[hits[i].pc]);
+ memset(hits, 0, NR_PROFILE_HIT*sizeof(struct profile_hit));
+out:
+ local_irq_restore(flags);
+ put_cpu();
+}
+#else
void profile_hit(int type, void *__pc)
{
unsigned long pc;
@@ -190,6 +233,7 @@
pc = ((unsigned long)__pc - (unsigned long)_stext) >> prof_shift;
atomic_inc(&prof_buffer[min(pc, prof_len - 1)]);
}
+#endif
void profile_tick(int type, struct pt_regs *regs)
{
next prev parent reply other threads:[~2004-09-14 4:48 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-13 8:50 2.6.9-rc1-mm5 Andrew Morton
2004-09-13 9:22 ` 2.6.9-rc1-mm5 Nick Piggin
2004-09-13 17:24 ` 2.6.9-rc1-mm5 Jesse Barnes
2004-09-13 18:06 ` 2.6.9-rc1-mm5 Paul Jackson
2004-09-13 18:10 ` 2.6.9-rc1-mm5 Jesse Barnes
2004-09-13 21:30 ` 2.6.9-rc1-mm5 Jesse Barnes
2004-09-14 2:02 ` 2.6.9-rc1-mm5 Nick Piggin
2004-09-14 2:12 ` 2.6.9-rc1-mm5 Jesse Barnes
2004-09-13 10:20 ` 2.6.9-rc1-mm5 Christoph Hellwig
2004-09-13 10:48 ` 2.6.9-rc1-mm5 Rafael J. Wysocki
2004-09-13 11:13 ` 2.6.9-rc1-mm5 Nikita Danilov
2004-09-13 13:40 ` 2.6.9-rc1-mm5 Christoph Hellwig
2004-09-13 11:16 ` 2.6.9-rc1-mm5 Rafael J. Wysocki
2004-09-13 11:01 ` 2.6.9-rc1-mm5 William Lee Irwin III
2004-09-13 15:09 ` 2.6.9-rc1-mm5 Martin J. Bligh
2004-09-13 15:18 ` 2.6.9-rc1-mm5 Paul Jackson
2004-09-13 16:11 ` 2.6.9-rc1-mm5 Martin J. Bligh
2004-09-13 16:22 ` 2.6.9-rc1-mm5 Paul Jackson
2004-09-13 15:20 ` 2.6.9-rc1-mm5 Kirill Korotaev
2004-09-13 20:01 ` 2.6.9-rc1-mm5 Andrew Morton
2004-09-14 6:39 ` 2.6.9-rc1-mm5 Kirill Korotaev
2004-09-13 20:30 ` 2.6.9-rc1-mm5 Pasi Savolainen
2004-09-13 21:06 ` 2.6.9-rc1-mm5 Rafael J. Wysocki
2004-09-14 9:07 ` 2.6.9-rc1-mm5 Nikita Danilov
2004-09-14 9:12 ` 2.6.9-rc1-mm5 Andrew Morton
2004-09-14 13:21 ` 2.6.9-rc1-mm5 David Howells
2004-09-14 14:24 ` 2.6.9-rc1-mm5 James Morris
2004-09-14 15:36 ` 2.6.9-rc1-mm5 David Howells
2004-09-13 21:47 ` 2.6.9-rc1-mm5 scheduling while atomic Jesse Barnes
2004-09-13 22:56 ` Paul Jackson
2004-09-13 21:56 ` 2.6.9-rc1-mm5 bug in tcp_recvmsg? Jesse Barnes
2004-09-13 22:36 ` David S. Miller
2004-09-13 22:44 ` Jesse Barnes
2004-09-13 22:47 ` David S. Miller
2004-09-13 23:54 ` Jesse Barnes
2004-09-13 23:55 ` David S. Miller
2004-09-14 0:03 ` Jesse Barnes
2004-09-14 0:21 ` David S. Miller
2004-09-14 17:09 ` Jesse Barnes
2004-09-14 0:25 ` 2.6.9-rc1-mm5: TCP oopses James Morris
2004-09-14 2:08 ` David S. Miller
2004-09-14 3:04 ` James Morris
2004-09-14 3:34 ` Herbert Xu
2004-09-14 4:53 ` David S. Miller
2004-09-14 4:55 ` David S. Miller
2004-09-14 5:07 ` James Morris
2004-09-14 2:25 ` [pidhashing] [0/3] pid allocator updates William Lee Irwin III
2004-09-14 2:28 ` [pidhashing] [1/3] retain older vendor copyright William Lee Irwin III
2004-09-14 2:31 ` [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines William Lee Irwin III
2004-09-14 2:36 ` [pidhashing] [3/3] enforce PID_MAX_LIMIT in sysctls William Lee Irwin III
2004-09-14 2:38 ` [pidhashing] [2/3] lower PID_MAX_LIMIT for 32-bit machines William Lee Irwin III
2004-09-14 10:55 ` Roger Luethi
2004-09-14 11:10 ` Lars Marowsky-Bree
2004-09-14 12:06 ` Lars Marowsky-Bree
2004-09-14 12:08 ` Roger Luethi
2004-09-14 15:41 ` William Lee Irwin III
2004-09-14 15:47 ` Roger Leuthi
2004-09-14 16:41 ` William Lee Irwin III
2004-09-14 17:16 ` Roger Luethi
2004-09-14 2:53 ` [procfs] [1/1] fix task_mmu.c text size reporting William Lee Irwin III
2004-09-14 2:54 ` William Lee Irwin III
2004-09-15 10:51 ` [procfs] [2/1] report per-process pagetable usage William Lee Irwin III
2004-09-14 4:47 ` William Lee Irwin III [this message]
2004-09-14 5:05 ` [profile] amortize atomic hit count increments David S. Miller
2004-09-14 5:32 ` William Lee Irwin III
2004-09-14 5:49 ` David S. Miller
2004-09-14 6:10 ` William Lee Irwin III
2004-09-14 6:18 ` William Lee Irwin III
2004-09-14 5:05 ` Andrew Morton
2004-09-14 5:21 ` William Lee Irwin III
2004-09-14 6:43 ` William Lee Irwin III
2004-09-14 6:52 ` Andrew Morton
2004-09-14 7:55 ` William Lee Irwin III
2004-09-14 8:48 ` William Lee Irwin III
2004-09-14 11:34 ` Andrea Arcangeli
2004-09-14 15:51 ` William Lee Irwin III
2004-09-14 16:05 ` Andrea Arcangeli
2004-09-14 16:16 ` Jesse Barnes
2004-09-14 16:31 ` Andrea Arcangeli
2004-09-14 16:45 ` William Lee Irwin III
2004-09-14 19:00 ` William Lee Irwin III
2004-09-14 19:23 ` William Lee Irwin III
2004-09-14 20:02 ` William Lee Irwin III
2004-09-14 20:04 ` William Lee Irwin III
2004-09-14 21:04 ` William Lee Irwin III
2004-09-14 21:11 ` William Lee Irwin III
2004-09-14 10:00 ` 2.6.9-rc1-mm5 Lorenzo Allegrucci
2004-09-15 11:36 ` 2.6.9-rc1-mm5 William Lee Irwin III
2004-09-15 11:38 ` 2.6.9-rc1-mm5 Jens Axboe
2004-09-15 12:28 ` 2.6.9-rc1-mm5 William Lee Irwin III
2004-09-15 12:41 ` 2.6.9-rc1-mm5 Jens Axboe
2004-09-15 12:50 ` 2.6.9-rc1-mm5 Jens Axboe
2004-09-15 12:53 ` 2.6.9-rc1-mm5 William Lee Irwin III
2004-09-16 0:38 ` 2.6.9-rc1-mm5 William Lee Irwin III
2004-09-16 5:44 ` 2.6.9-rc1-mm5 William Lee Irwin III
2004-09-16 5:45 ` 2.6.9-rc1-mm5 Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040914044748.GZ9106@holomorphy.com \
--to=wli@holomorphy.com \
--cc=akpm@osdl.org \
--cc=jbarnes@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=raybry@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox