Re: [PATCH 0/6] Drain remote per-cpu directly v3

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Qian Cai <quic_qiancai@quicinc.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 0/6] Drain remote per-cpu directly v3
Date: Thu, 26 May 2022 13:19:38 -0400	[thread overview]
Message-ID: <Yo+2qqHqSdpE5l7m@qian> (raw)
In-Reply-To: <20220512085043.5234-1-mgorman@techsingularity.net>

On Thu, May 12, 2022 at 09:50:37AM +0100, Mel Gorman wrote:
> Changelog since v2
> o More conversions from page->lru to page->[pcp_list|buddy_list]
> o Additional test results in changelogs
> 
> Changelog since v1
> o Fix unsafe RT locking scheme
> o Use spin_trylock on UP PREEMPT_RT
> 
> This series has the same intent as Nicolas' series "mm/page_alloc: Remote
> per-cpu lists drain support" -- avoid interference of a high priority
> task due to a workqueue item draining per-cpu page lists. While many
> workloads can tolerate a brief interruption, it may be cause a real-time
> task runnning on a NOHZ_FULL CPU to miss a deadline and at minimum,
> the draining in non-deterministic.
> 
> Currently an IRQ-safe local_lock protects the page allocator per-cpu lists.
> The local_lock on its own prevents migration and the IRQ disabling protects
> from corruption due to an interrupt arriving while a page allocation is
> in progress. The locking is inherently unsafe for remote access unless
> the CPU is hot-removed.
> 
> This series adjusts the locking. A spinlock is added to struct
> per_cpu_pages to protect the list contents while local_lock_irq continues
> to prevent migration and IRQ reentry. This allows a remote CPU to safely
> drain a remote per-cpu list.
> 
> This series is a partial series. Follow-on work should allow the
> local_irq_save to be converted to a local_irq to avoid IRQs being
> disabled/enabled in most cases. Consequently, there are some TODO comments
> highlighting the places that would change if local_irq was used. However,
> there are enough corner cases that it deserves a series on its own
> separated by one kernel release and the priority right now is to avoid
> interference of high priority tasks.
> 
> Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages
> 	and when it is storing per-cpu pages.
> 
> Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking
> 	this is not necessary but it avoids per_cpu_pages consuming another
> 	cache line.
> 
> Patch 3 is a preparation patch to avoid code duplication.
> 
> Patch 4 is a simple micro-optimisation that improves code flow necessary for
> 	a later patch to avoid code duplication.
> 
> Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still
> 	relying on local_lock to prevent migration, stabilise the pcp
> 	lookup and prevent IRQ reentrancy.
> 
> Patch 6 remote drains per-cpu pages directly instead of using a workqueue.

Mel, we saw spontanous "mm_percpu_wq" crash on today's linux-next tree
while running CPU offlining/onlining, and wondering if you have any
thoughts?

 WARNING: CPU: 31 PID: 173 at kernel/kthread.c:524 __kthread_bind_mask
 CPU: 31 PID: 173 Comm: kworker/31:0 Not tainted 5.18.0-next-20220526-dirty #127
 Workqueue:  0x0
  (mm_percpu_wq)

 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __kthread_bind_mask
 lr : __kthread_bind_mask
 sp : ffff800018667c50
 x29: ffff800018667c50
  x28: ffff800018667d20
  x27: ffff083678bc2458
 x26: 1fffe1002f5b17a8
  x25: ffff08017ad8bd40
  x24: 1fffe106cf17848b
 x23: 1ffff000030ccfa0 x22: ffff0801de2d1ac0
  x21: ffff0801de2d1ac0
 x20: ffff07ff80286f08 x19: ffff0801de2d1ac0
  x18: ffffd6056a577d1c
 x17: ffffffffffffffff
  x16: 1fffe0fff158eb18
  x15: 1fffe106cf176138
 x14: 000000000000f1f1
  x13: 00000000f3f3f3f3 x12: ffff7000030ccf3b
 x11: 1ffff000030ccf3a x10: ffff7000030ccf3a x9 : dfff800000000000
 x8 : ffff8000186679d7
  x7 : 0000000000000001
  x6 : ffff7000030ccf3a
 x5 : 1ffff000030ccf39 x4 : 1ffff000030ccf4e x3 : 0000000000000000
 x2 : 0000000000000000
  x1 : ffff07ff8ac74fc0 x0 : 0000000000000000
 Call trace:
  __kthread_bind_mask
  kthread_bind_mask
  create_worker
  worker_thread
  kthread
  ret_from_fork
 irq event stamp: 146
 hardirqs last  enabled at (145):  _raw_spin_unlock_irqrestore
 hardirqs last disabled at (146):  el1_dbg
 softirqs last  enabled at (0):  copy_process
 softirqs last disabled at (0):  0x0

 WARNING: CPU: 31 PID: 173 at kernel/kthread.c:593 kthread_set_per_cpu
 CPU: 31 PID: 173 Comm: kworker/31:0 Tainted: G        W         5.18.0-next-20220526-dirty #127
 Workqueue:  0x0 (mm_percpu_wq)
 pstate: 10400009 (nzcV daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : kthread_set_per_cpu
 lr : worker_attach_to_pool
 sp : ffff800018667be0
 x29: ffff800018667be0 x28: ffff800018667d20 x27: ffff083678bc2458
 x26: 1fffe1002f5b17a8 x25: ffff08017ad8bd40 x24: 1fffe106cf17848b
 x23: 1fffe1003bc5a35d x22: ffff0801de2d1aec x21: 0000000000000007
 x20: ffff4026d8adae00 x19: ffff0801de2d1ac0 x18: ffffd6056a577d1c
 x17: ffffffffffffffff x16: 1fffe0fff158eb18 x15: 1fffe106cf176138
 x14: 000000000000f1f1 x13: 00000000f3f3f3f3 x12: ffff7000030ccf53
 x11: 1ffff000030ccf52 x10: ffff7000030ccf52 x9 : ffffd60563f9a038
 x8 : ffff800018667a97 x7 : 0000000000000001 x6 : ffff7000030ccf52
 x5 : ffff800018667a90 x4 : ffff7000030ccf53 x3 : 1fffe1003bc5a408
 x2 : 0000000000000000 x1 : 000000000000001f x0 : 0000000000208060
 Call trace:
  kthread_set_per_cpu
  worker_attach_to_pool at kernel/workqueue.c:1873
  create_worker
  worker_thread
  kthread
  ret_from_fork
 irq event stamp: 146
 hardirqs last  enabled at (145):  _raw_spin_unlock_irqrestore
 hardirqs last disabled at (146):  el1_dbg
 softirqs last  enabled at (0):  copy_process
 softirqs last disabled at (0):  0x0

 Unable to handle kernel paging request at virtual address dfff800000000003
 KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 [dfff800000000003] address between user and kernel address ranges
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 CPU: 83 PID: 23994 Comm: kworker/31:2 Not tainted 5.18.0-next-20220526-dirty #127
 pstate: 104000c9 (nzcV daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __lock_acquire
 lr : lock_acquire.part.0
 sp : ffff800071777ac0
 x29: ffff800071777ac0 x28: ffffd60563fa6380
  x27: 0000000000000018
 x26: 0000000000000080 x25: 0000000000000018 x24: 0000000000000000
 x23: ffff0801de2d1ac0 x22: ffffd6056a66a7e0
  x21: 0000000000000000
 x20: 0000000000000000 x19: 0000000000000000
  x18: 0000000000000767
 x17: 0000000000000000 x16: 1fffe1003bc5a473
  x15: 1fffe806c88e9338
 x14: 000000000000f1f1
  x13: 00000000f3f3f3f3
  x12: ffff0801de2d1ac8
 x11: 1ffffac0ad4aefa3
  x10: ffffd6056a577d18 x9 : 0000000000000000
 x8 : 0000000000000003
  x7 : ffffd60563fa6380
  x6 : 0000000000000000
 x5 : 0000000000000080
  x4 : 0000000000000001
  x3 : 0000000000000000
 x2 : 0000000000000000 x1 : 0000000000000003
  x0 : dfff800000000000

 Call trace:
  __lock_acquire at kernel/locking/lockdep.c:4923
  lock_acquire
  _raw_spin_lock_irq
  worker_thread at kernel/workqueue.c:2389
  kthread
  ret_from_fork
 Code: d65f03c0 d343ff61 d2d00000 f2fbffe0 (38e06820)
 ---[ end trace 0000000000000000 ]---
 1424.464630][T23994] Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Kernel Offset: 0x56055bdf0000 from 0xffff800008000000
 PHYS_OFFSET: 0x80000000
 CPU features: 0x000,0042e015,19801c82
 Memory Limit: none

next prev parent reply	other threads:[~2022-05-26 17:19 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12  8:50 [PATCH 0/6] Drain remote per-cpu directly v3 Mel Gorman
2022-05-12  8:50 ` [PATCH 1/6] mm/page_alloc: Add page->buddy_list and page->pcp_list Mel Gorman
2022-05-13 11:59   ` Nicolas Saenz Julienne
2022-05-19  9:36   ` Vlastimil Babka
2022-05-12  8:50 ` [PATCH 2/6] mm/page_alloc: Use only one PCP list for THP-sized allocations Mel Gorman
2022-05-19  9:45   ` Vlastimil Babka
2022-05-12  8:50 ` [PATCH 3/6] mm/page_alloc: Split out buddy removal code from rmqueue into separate helper Mel Gorman
2022-05-13 12:01   ` Nicolas Saenz Julienne
2022-05-19  9:52   ` Vlastimil Babka
2022-05-23 16:09   ` Qais Yousef
2022-05-24 11:55     ` Mel Gorman
2022-05-25 11:23       ` Qais Yousef
2022-05-12  8:50 ` [PATCH 4/6] mm/page_alloc: Remove unnecessary page == NULL check in rmqueue Mel Gorman
2022-05-13 12:03   ` Nicolas Saenz Julienne
2022-05-19 10:57   ` Vlastimil Babka
2022-05-19 12:13     ` Mel Gorman
2022-05-19 12:26       ` Vlastimil Babka
2022-05-12  8:50 ` [PATCH 5/6] mm/page_alloc: Protect PCP lists with a spinlock Mel Gorman
2022-05-13 12:22   ` Nicolas Saenz Julienne
2022-05-12  8:50 ` [PATCH 6/6] mm/page_alloc: Remotely drain per-cpu lists Mel Gorman
2022-05-12 19:37   ` Andrew Morton
2022-05-13 15:04     ` Mel Gorman
2022-05-13 15:19       ` Nicolas Saenz Julienne
2022-05-13 18:23         ` Mel Gorman
2022-05-17 12:57           ` Mel Gorman
2022-05-12 19:43 ` [PATCH 0/6] Drain remote per-cpu directly v3 Andrew Morton
2022-05-13 14:23   ` Mel Gorman
2022-05-13 19:38     ` Andrew Morton
2022-05-16 10:53       ` Mel Gorman
2022-05-13 12:24 ` Nicolas Saenz Julienne
2022-05-17 23:35 ` Qian Cai
2022-05-18 12:51   ` Mel Gorman
2022-05-18 16:27     ` Qian Cai
2022-05-18 17:15       ` Paul E. McKenney
2022-05-19 13:29         ` Qian Cai
2022-05-19 19:15           ` Paul E. McKenney
2022-05-19 21:05             ` Qian Cai
2022-05-19 21:29               ` Paul E. McKenney
2022-05-18 17:26   ` Marcelo Tosatti
2022-05-18 17:44     ` Marcelo Tosatti
2022-05-18 18:01 ` Nicolas Saenz Julienne
2022-05-26 17:19 ` Qian Cai [this message]
2022-05-27  8:39   ` Mel Gorman
2022-05-27 12:58     ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yo+2qqHqSdpE5l7m@qian \
    --to=quic_qiancai@quicinc.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.