linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] New algorithm for ASID allocation and rollover
@ 2012-08-15 16:53 Will Deacon
  2012-08-15 16:54 ` [PATCH 1/3] ARM: mm: remove IPI broadcasting on ASID rollover Will Deacon
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Will Deacon @ 2012-08-15 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

Following some investigation into preempt-rt Linux, it became apparent
that ASID rollover can happen fairly regularly under certain heavy
scheduling workloads. Each time this happens, we broadcast an interrupt
to the secondary CPUs so that we can reset the global ASID numberspace
without assigning duplicate ASIDs to different tasks or accidentally
assigning different ASIDs to threads of the same process.

This leads to a large number of expensive IPIs between cores:

           CPU0       CPU1
IPI0:          0          0  Timer broadcast interrupts
IPI1:      23165     115888  Rescheduling interrupts
IPI2:          0          0  Function call interrupts
IPI3:       6619       1123  Single function call interrupts <---- IPIs
IPI4:          0          0  CPU stop interrupts

Digging deeper, this also leads to an extremely varied waittime on the
cpu_asid_lock. Granted this is only contended for <1% of the time, but
the waittime varies between 0.5 and 734 us!

After some discussion, it became apparent that tracking the ASIDs
currently active on the cores in the system means that, on rollover, we
can automatically reserve those that are in use without having to stop
the world.

This patch series develops that idea so that:

  - We can support cores without hardware broadcasting of TLB maintenance
    operations without resorting to IPIs.
  - The fastpath (that is, the task already has a valid ASID) remains
    lockless.
  - Assuming that the number of CPUs is less than the number of ASIDs,
    the algorithm scales as they increase (using a bitmap for searching).
  - Generation overflow is not a problem (we use a u64).

With these patches applied, I saw ~2% improvement in hackbench scores on
my dual-core Cortex-A15 board and the interrupt statistics now appear as:

           CPU0       CPU1
IPI0:          0          0  Timer broadcast interrupts
IPI1:      64888      74560  Rescheduling interrupts
IPI2:          0          0  Function call interrupts
IPI3:          1          3  Single function call interrupts <--- Much better!
IPI4:          0          0  CPU stop interrupts

Finally, the waittime on cpu_asid_lock reduced to 0.5 - 4.6 us.

All feedback welcome.

Will


Will Deacon (3):
  ARM: mm: remove IPI broadcasting on ASID rollover
  ARM: mm: avoid taking ASID spinlock on fastpath
  ARM: mm: use bitmap operations when allocating new ASIDs

 arch/arm/include/asm/mmu.h         |   11 +--
 arch/arm/include/asm/mmu_context.h |   82 +--------------
 arch/arm/mm/context.c              |  207 +++++++++++++++++++-----------------
 3 files changed, 115 insertions(+), 185 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-08-20 12:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-15 16:53 [PATCH 0/3] New algorithm for ASID allocation and rollover Will Deacon
2012-08-15 16:54 ` [PATCH 1/3] ARM: mm: remove IPI broadcasting on ASID rollover Will Deacon
2012-08-15 16:54 ` [PATCH 2/3] ARM: mm: avoid taking ASID spinlock on fastpath Will Deacon
2012-08-15 16:54 ` [PATCH 3/3] ARM: mm: use bitmap operations when allocating new ASIDs Will Deacon
2012-08-15 17:05 ` [PATCH 0/3] New algorithm for ASID allocation and rollover Marc Zyngier
2012-08-19 15:21 ` Arnd Bergmann
2012-08-20 12:51   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).