From: Andrea Parri <parri.andrea@gmail.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: paulmck@kernel.org, palmer@dabbelt.com, paul.walmsley@sifive.com,
aou@eecs.berkeley.edu, mmaas@google.com, hboehm@google.com,
striker@us.ibm.com, charlie@rivosinc.com, rehn@rivosinc.com,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] membarrier: riscv: Provide core serializing command
Date: Tue, 28 Nov 2023 16:13:56 +0100 [thread overview]
Message-ID: <ZWYDtB/otYvTMZWd@andrea> (raw)
In-Reply-To: <91ab0210-07f9-42c4-af7f-a98799250cf7@efficios.com>
> I am concerned about the possibility that this change lacks two barriers in the
> following scenario:
>
> On a transition from uthread -> uthread on [CPU 0], from a thread belonging to
> another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent
> membarrier sync-core is done on [CPU 1]:
>
> - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers
> associated with these stores.
>
> - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C]
> within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes
> cpu_rq(0)->curr->mm != mm, so it skips the IPI.
>
> - This means membarrier relies on switch_mm() to issue the sync-core.
>
> - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm()
> may incorrectly skip the sync-core.
>
> AFAIU, [C] can be reordered before [A] because there is no barrier between those
> operations within membarrier. I suspect it can cause the switch_mm() code to skip
> a needed sync-core.
>
> AFAIU, [D] can be reordered before [B] because there is no documented barrier
> between those operations within the scheduler, which can also cause switch_mm()
> to skip a needed sync-core.
>
> We possibly have a similar scenario for uthread->uthread when the scheduler
> switches between mm -> !mm.
>
> One way to fix this would be to add the following barriers:
>
> - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in
> prepare_sync_core_cmd(), with comments detailing the ordering it guarantees,
> - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in
> flush_icache_deferred(), with appropriate comments.
>
> Am I missing something ?
Thanks for the detailed analysis.
AFAIU, the following barrier (in membarrier_private_expedited())
/*
* Matches memory barriers around rq->curr modification in
* scheduler.
*/
smp_mb(); /* system call entry is not a mb. */
can serve the purpose of ordering [A] before [C] (to be documented in v2).
But I agree that [B] and [D] are unordered /missing suitable synchronization.
Worse, RISC-V has currently no full barrier after [B] and before returning to
user-space: I'm thinking (inspired by the PowerPC implementation),
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 217fd4de61342..f63222513076d 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
if (unlikely(prev == next))
return;
+#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP)
+ /*
+ * The membarrier system call requires a full memory barrier
+ * after storing to rq->curr, before going back to user-space.
+ *
+ * Only need the full barrier when switching between processes:
+ * barrier when switching from kernel to userspace is not
+ * required here, given that it is implied by mmdrop(); barrier
+ * when switching from userspace to kernel is not needed after
+ * store to rq->curr.
+ */
+ if (unlikely(atomic_read(&next->membarrier_state) &
+ (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
+ MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev)
+ smp_mb();
+#endif
+
/*
* Mark the current MM context as inactive, and the next as
* active. This is at least used by the icache flushing
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a708d225c28e8..a1c749fddd095 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode)
*
* Here are the schemes providing that barrier on the
* various architectures:
- * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
- * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+ * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC,
+ * RISC-V. switch_mm() relies on membarrier_arch_switch_mm()
+ * on PowerPC.
* - finish_lock_switch() for weakly-ordered
* architectures where spin_unlock is a full barrier,
* - switch_to() for arm64 (weakly-ordered, spin_unlock
The silver lining is that similar changes (probably as a separate/preliminary
patch) also restore the desired order between [B] and [D] AFAIU; so with them,
2/2 would just need additions to document the above SYNC_CORE scenario.
Thoughts?
Andrea
next prev parent reply other threads:[~2023-11-28 15:14 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-27 10:32 [PATCH 0/2] membarrier: riscv: Provide core serializing command Andrea Parri
2023-11-27 10:32 ` [PATCH 1/2] locking: Introduce prepare_sync_core_cmd() Andrea Parri
2023-11-27 12:53 ` Mathieu Desnoyers
2023-11-27 10:32 ` [PATCH 2/2] membarrier: riscv: Provide core serializing command Andrea Parri
2023-11-27 13:28 ` Mathieu Desnoyers
2023-11-28 15:13 ` Andrea Parri [this message]
2023-11-28 18:39 ` Mathieu Desnoyers
2023-11-29 18:29 ` Andrea Parri
2023-11-29 20:00 ` Mathieu Desnoyers
2023-11-29 21:25 ` Andrea Parri
2023-11-29 21:32 ` Mathieu Desnoyers
2023-11-29 22:43 ` Andrea Parri
2023-12-06 13:05 ` Palmer Dabbelt
2023-12-06 14:11 ` Andrea Parri
2023-12-06 14:15 ` Palmer Dabbelt
2023-12-06 17:42 ` Andrea Parri
2023-12-06 17:56 ` Palmer Dabbelt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWYDtB/otYvTMZWd@andrea \
--to=parri.andrea@gmail.com \
--cc=aou@eecs.berkeley.edu \
--cc=charlie@rivosinc.com \
--cc=hboehm@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mmaas@google.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=paulmck@kernel.org \
--cc=rehn@rivosinc.com \
--cc=striker@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox