Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Anton Blanchard <anton@ozlabs.org>, Arnd Bergmann <arnd@arndb.de>,
	linux-arch <linux-arch@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>, x86 <x86@kernel.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
Date: Mon, 20 Jul 2020 12:46:30 -0400 (EDT)	[thread overview]
Message-ID: <2055788870.20749.1595263590675.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <1595213677.kxru89dqy2.astroid@bobo.none>

----- On Jul 19, 2020, at 11:03 PM, Nicholas Piggin npiggin@gmail.com wrote:

> Excerpts from Mathieu Desnoyers's message of July 17, 2020 11:42 pm:
>> ----- On Jul 16, 2020, at 7:26 PM, Nicholas Piggin npiggin@gmail.com wrote:
>> [...]
>>> 
>>> membarrier does replace barrier instructions on remote CPUs, which do
>>> order accesses performed by the kernel on the user address space. So
>>> membarrier should too I guess.
>>> 
>>> Normal process context accesses like read(2) will do so because they
>>> don't get filtered out from IPIs, but kernel threads using the mm may
>>> not.
>> 
>> But it should not be an issue, because membarrier's ordering is only with
>> respect
>> to submit and completion of io_uring requests, which are performed through
>> system calls from the context of user-space threads, which are called from the
>> right mm.
> 
> Is that true? Can io completions be written into an address space via a
> kernel thread? I don't know the io_uring code well but it looks like
> that's asynchonously using the user mm context.

Indeed, the io completion appears to be signaled asynchronously between kernel
and user-space. Therefore, both kernel and userspace code need to have proper
memory barriers in place to signal completion, otherwise user-space could read
garbage after it notices completion of a read.

I did not review the entire io_uring implementation, but the publish side
for completion appears to be:

static void __io_commit_cqring(struct io_ring_ctx *ctx)
{
        struct io_rings *rings = ctx->rings;

        /* order cqe stores with ring update */
        smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);

        if (wq_has_sleeper(&ctx->cq_wait)) {
                wake_up_interruptible(&ctx->cq_wait);
                kill_fasync(&ctx->cq_fasync, SIGIO, POLL_IN);
        }
}

The store-release on tail should be paired with a load_acquire on the
reader-side (it's called "read_barrier()" in the code):

tools/io_uring/queue.c:

static int __io_uring_get_cqe(struct io_uring *ring,
                              struct io_uring_cqe **cqe_ptr, int wait)
{
        struct io_uring_cq *cq = &ring->cq;
        const unsigned mask = *cq->kring_mask;
        unsigned head;
        int ret;

        *cqe_ptr = NULL;
        head = *cq->khead;
        do {
                /*
                 * It's necessary to use a read_barrier() before reading
                 * the CQ tail, since the kernel updates it locklessly. The
                 * kernel has the matching store barrier for the update. The
                 * kernel also ensures that previous stores to CQEs are ordered
                 * with the tail update.
                 */
                read_barrier();
                if (head != *cq->ktail) {
                        *cqe_ptr = &cq->cqes[head & mask];
                        break;
                }
                if (!wait)
                        break;
                ret = io_uring_enter(ring->ring_fd, 0, 1,
                                        IORING_ENTER_GETEVENTS, NULL);
                if (ret < 0)
                        return -errno;
        } while (1);

        return 0;
}

So as far as membarrier memory ordering dependencies are concerned, it relies
on the store-release/load-acquire dependency chain in the completion queue to
order against anything that was done prior to the completed requests.

What is in-flight while the requests are being serviced provides no memory
ordering guarantee whatsoever.

> How about other memory accesses via kthread_use_mm? Presumably there is
> still ordering requirement there for membarrier,

Please provide an example case with memory accesses via kthread_use_mm where
ordering matters to support your concern.

> so I really think
> it's a fragile interface with no real way for the user to know how
> kernel threads may use its mm for any particular reason, so membarrier
> should synchronize all possible kernel users as well.

I strongly doubt so, but perhaps something should be clarified in the documentation
if you have that feeling.

Thanks,

Mathieu

> 
> Thanks,
> Nick

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

WARNING: multiple messages have this Message-ID (diff)

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-arch <linux-arch@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Arnd Bergmann <arnd@arndb.de>,
	Peter Zijlstra <peterz@infradead.org>, x86 <x86@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	linux-mm <linux-mm@kvack.org>, Andy Lutomirski <luto@kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
Date: Mon, 20 Jul 2020 12:46:30 -0400 (EDT)	[thread overview]
Message-ID: <2055788870.20749.1595263590675.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <1595213677.kxru89dqy2.astroid@bobo.none>

----- On Jul 19, 2020, at 11:03 PM, Nicholas Piggin npiggin@gmail.com wrote:

> Excerpts from Mathieu Desnoyers's message of July 17, 2020 11:42 pm:
>> ----- On Jul 16, 2020, at 7:26 PM, Nicholas Piggin npiggin@gmail.com wrote:
>> [...]
>>> 
>>> membarrier does replace barrier instructions on remote CPUs, which do
>>> order accesses performed by the kernel on the user address space. So
>>> membarrier should too I guess.
>>> 
>>> Normal process context accesses like read(2) will do so because they
>>> don't get filtered out from IPIs, but kernel threads using the mm may
>>> not.
>> 
>> But it should not be an issue, because membarrier's ordering is only with
>> respect
>> to submit and completion of io_uring requests, which are performed through
>> system calls from the context of user-space threads, which are called from the
>> right mm.
> 
> Is that true? Can io completions be written into an address space via a
> kernel thread? I don't know the io_uring code well but it looks like
> that's asynchonously using the user mm context.

Indeed, the io completion appears to be signaled asynchronously between kernel
and user-space. Therefore, both kernel and userspace code need to have proper
memory barriers in place to signal completion, otherwise user-space could read
garbage after it notices completion of a read.

I did not review the entire io_uring implementation, but the publish side
for completion appears to be:

static void __io_commit_cqring(struct io_ring_ctx *ctx)
{
        struct io_rings *rings = ctx->rings;

        /* order cqe stores with ring update */
        smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);

        if (wq_has_sleeper(&ctx->cq_wait)) {
                wake_up_interruptible(&ctx->cq_wait);
                kill_fasync(&ctx->cq_fasync, SIGIO, POLL_IN);
        }
}

The store-release on tail should be paired with a load_acquire on the
reader-side (it's called "read_barrier()" in the code):

tools/io_uring/queue.c:

static int __io_uring_get_cqe(struct io_uring *ring,
                              struct io_uring_cqe **cqe_ptr, int wait)
{
        struct io_uring_cq *cq = &ring->cq;
        const unsigned mask = *cq->kring_mask;
        unsigned head;
        int ret;

        *cqe_ptr = NULL;
        head = *cq->khead;
        do {
                /*
                 * It's necessary to use a read_barrier() before reading
                 * the CQ tail, since the kernel updates it locklessly. The
                 * kernel has the matching store barrier for the update. The
                 * kernel also ensures that previous stores to CQEs are ordered
                 * with the tail update.
                 */
                read_barrier();
                if (head != *cq->ktail) {
                        *cqe_ptr = &cq->cqes[head & mask];
                        break;
                }
                if (!wait)
                        break;
                ret = io_uring_enter(ring->ring_fd, 0, 1,
                                        IORING_ENTER_GETEVENTS, NULL);
                if (ret < 0)
                        return -errno;
        } while (1);

        return 0;
}

So as far as membarrier memory ordering dependencies are concerned, it relies
on the store-release/load-acquire dependency chain in the completion queue to
order against anything that was done prior to the completed requests.

What is in-flight while the requests are being serviced provides no memory
ordering guarantee whatsoever.

> How about other memory accesses via kthread_use_mm? Presumably there is
> still ordering requirement there for membarrier,

Please provide an example case with memory accesses via kthread_use_mm where
ordering matters to support your concern.

> so I really think
> it's a fragile interface with no real way for the user to know how
> kernel threads may use its mm for any particular reason, so membarrier
> should synchronize all possible kernel users as well.

I strongly doubt so, but perhaps something should be clarified in the documentation
if you have that feeling.

Thanks,

Mathieu

> 
> Thanks,
> Nick

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2020-07-20 16:46 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10  1:56 [RFC PATCH 0/7] mmu context cleanup, lazy tlb cleanup, Nicholas Piggin
2020-07-10  1:56 ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 1/7] asm-generic: add generic MMU versions of mmu context functions Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 2/7] arch: use asm-generic mmu context for no-op implementations Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 3/7] mm: introduce exit_lazy_tlb Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  9:42   ` Peter Zijlstra
2020-07-10  9:42     ` Peter Zijlstra
2020-07-10 14:02   ` Mathieu Desnoyers
2020-07-10 14:02     ` Mathieu Desnoyers
2020-07-10 17:04   ` Andy Lutomirski
2020-07-10 17:04     ` Andy Lutomirski
2020-07-13  4:45     ` Nicholas Piggin
2020-07-13  4:45       ` Nicholas Piggin
2020-07-13 13:47       ` Nicholas Piggin
2020-07-13 13:47         ` Nicholas Piggin
2020-07-13 14:13         ` Mathieu Desnoyers
2020-07-13 14:13           ` Mathieu Desnoyers
2020-07-13 15:48           ` Andy Lutomirski
2020-07-13 15:48             ` Andy Lutomirski
2020-07-13 16:37             ` Nicholas Piggin
2020-07-13 16:37               ` Nicholas Piggin
2020-07-16  4:15           ` Nicholas Piggin
2020-07-16  4:15             ` Nicholas Piggin
2020-07-16  4:42             ` Nicholas Piggin
2020-07-16  4:42               ` Nicholas Piggin
2020-07-16 15:46               ` Mathieu Desnoyers
2020-07-16 15:46                 ` Mathieu Desnoyers
2020-07-16 16:03                 ` Mathieu Desnoyers
2020-07-16 16:03                   ` Mathieu Desnoyers
2020-07-16 18:58                   ` Mathieu Desnoyers
2020-07-16 18:58                     ` Mathieu Desnoyers
2020-07-16 21:24                     ` Alan Stern
2020-07-16 21:24                       ` Alan Stern
2020-07-17 13:39                       ` Mathieu Desnoyers
2020-07-17 13:39                         ` Mathieu Desnoyers
2020-07-17 14:51                         ` Alan Stern
2020-07-17 14:51                           ` Alan Stern
2020-07-17 15:39                           ` Mathieu Desnoyers
2020-07-17 15:39                             ` Mathieu Desnoyers
2020-07-17 16:11                             ` Alan Stern
2020-07-17 16:11                               ` Alan Stern
2020-07-17 16:22                               ` Mathieu Desnoyers
2020-07-17 16:22                                 ` Mathieu Desnoyers
2020-07-17 17:44                                 ` Alan Stern
2020-07-17 17:44                                   ` Alan Stern
2020-07-17 17:52                                   ` Mathieu Desnoyers
2020-07-17 17:52                                     ` Mathieu Desnoyers
2020-07-17  0:00                     ` Nicholas Piggin
2020-07-17  0:00                       ` Nicholas Piggin
2020-07-16  5:18             ` Andy Lutomirski
2020-07-16  5:18               ` Andy Lutomirski
2020-07-16  6:06               ` Nicholas Piggin
2020-07-16  6:06                 ` Nicholas Piggin
2020-07-16  8:50               ` Peter Zijlstra
2020-07-16  8:50                 ` Peter Zijlstra
2020-07-16 10:03                 ` Nicholas Piggin
2020-07-16 10:03                   ` Nicholas Piggin
2020-07-16 11:00                   ` peterz
2020-07-16 11:00                     ` peterz
2020-07-16 15:34                     ` Mathieu Desnoyers
2020-07-16 15:34                       ` Mathieu Desnoyers
2020-07-16 23:26                     ` Nicholas Piggin
2020-07-16 23:26                       ` Nicholas Piggin
2020-07-17 13:42                       ` Mathieu Desnoyers
2020-07-17 13:42                         ` Mathieu Desnoyers
2020-07-20  3:03                         ` Nicholas Piggin
2020-07-20  3:03                           ` Nicholas Piggin
2020-07-20 16:46                           ` Mathieu Desnoyers [this message]
2020-07-20 16:46                             ` Mathieu Desnoyers
2020-07-21 10:04                             ` Nicholas Piggin
2020-07-21 10:04                               ` Nicholas Piggin
2020-07-21 13:11                               ` Mathieu Desnoyers
2020-07-21 13:11                                 ` Mathieu Desnoyers
2020-07-21 14:30                                 ` Nicholas Piggin
2020-07-21 14:30                                   ` Nicholas Piggin
2020-07-21 15:06                               ` peterz
2020-07-21 15:06                                 ` peterz
2020-07-21 15:15                                 ` Mathieu Desnoyers
2020-07-21 15:15                                   ` Mathieu Desnoyers
2020-07-21 15:19                                   ` Peter Zijlstra
2020-07-21 15:19                                     ` Peter Zijlstra
2020-07-21 15:22                                     ` Mathieu Desnoyers
2020-07-21 15:22                                       ` Mathieu Desnoyers
2020-07-10  1:56 ` [RFC PATCH 5/7] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  9:48   ` Peter Zijlstra
2020-07-10  9:48     ` Peter Zijlstra
2020-07-10  1:56 ` [RFC PATCH 6/7] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  1:56 ` [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin
2020-07-10  1:56   ` Nicholas Piggin
2020-07-10  9:35   ` Peter Zijlstra
2020-07-10  9:35     ` Peter Zijlstra
2020-07-13  4:58     ` Nicholas Piggin
2020-07-13  4:58       ` Nicholas Piggin
2020-07-13 15:59   ` Andy Lutomirski
2020-07-13 15:59     ` Andy Lutomirski
2020-07-13 16:48     ` Nicholas Piggin
2020-07-13 16:48       ` Nicholas Piggin
2020-07-13 18:18       ` Andy Lutomirski
2020-07-13 18:18         ` Andy Lutomirski
2020-07-14  5:04         ` Nicholas Piggin
2020-07-14  5:04           ` Nicholas Piggin
2020-07-14  6:31           ` Nicholas Piggin
2020-07-14  6:31             ` Nicholas Piggin
2020-07-14 12:46             ` Andy Lutomirski
2020-07-14 12:46               ` Andy Lutomirski
2020-07-14 13:23               ` Peter Zijlstra
2020-07-14 13:23                 ` Peter Zijlstra
2020-07-16  2:26               ` Nicholas Piggin
2020-07-16  2:26                 ` Nicholas Piggin
2020-07-16  2:35               ` Nicholas Piggin
2020-07-16  2:35                 ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2055788870.20749.1595263590675.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=anton@ozlabs.org \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.