All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Chris Kennelly <ckennelly@google.com>
Cc: "Joel Fernandes, Google" <joel@joelfernandes.org>,
	Paul Turner <pjt@google.com>, Florian Weimer <fweimer@redhat.com>,
	Carlos O'Donell <codonell@redhat.com>,
	libc-alpha <libc-alpha@sourceware.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	Brian Geffon <bgeffon@google.com>
Subject: Re: Rseq registration: Google tcmalloc vs glibc
Date: Wed, 26 Feb 2020 16:21:44 -0500 (EST)	[thread overview]
Message-ID: <1484567919.9131.1582752104880.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <CAEE+ybkz0YDddbLh+f0UnykcAcH+FoZrysQcckjh0S6YjYQvFg@mail.gmail.com>

----- On Feb 26, 2020, at 2:12 PM, Chris Kennelly ckennelly@google.com wrote:

> On Wed, Feb 26, 2020 at 1:56 PM Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> ----- On Feb 26, 2020, at 12:27 PM, Chris Kennelly ckennelly@google.com wrote:
>>
>> > On Wed, Feb 26, 2020 at 12:01 PM Mathieu Desnoyers
>> > <mathieu.desnoyers@efficios.com> wrote:
>> >>
>> >> ----- On Feb 25, 2020, at 10:38 PM, Chris Kennelly ckennelly@google.com wrote:
>> >>
>> >> > On Tue, Feb 25, 2020 at 10:25 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>> >> >>
>> >> >> On Fri, Feb 21, 2020 at 11:13 AM Mathieu Desnoyers
>> >> >> <mathieu.desnoyers@efficios.com> wrote:
>> >> >> >
>> >> >> > ----- On Feb 21, 2020, at 10:49 AM, Joel Fernandes, Google
>> >> >> > joel@joelfernandes.org wrote:
>> >> >> >
>> >> >> > [...]
>> >> >> > >>
>> >> >> > >> 3) Use the  __rseq_abi TLS cpu_id field to know whether Rseq has been
>> >> >> > >> registered.
>> >> >> > >>
>> >> >> > >> - Current protocol in the most recent glibc integration patch set.
>> >> >> > >> - Not supported yet by Linux kernel rseq selftests,
>> >> >> > >> - Not supported yet by tcmalloc,
>> >> >> > >>
>> >> >> > >> Use the per-thread state to figure out whether each thread need to register
>> >> >> > >> Rseq individually.
>> >> >> > >>
>> >> >> > >> Works for integration between a library which exists for the entire lifetime
>> >> >> > >> of the executable (e.g. glibc) and other libraries. However, it does not
>> >> >> > >> allow a set of libraries which are dlopen'd/dlclose'd to co-exist without
>> >> >> > >> having a library like glibc handling the registration present.
>> >> >> > >
>> >> >> > > Mathieu, could you share more details about why during dlopen/close
>> >> >> > > libraries we cannot use the same __rseq_abi TLS to detect that rseq was
>> >> >> > > registered?
>> >> >> >
>> >> >> > Sure,
>> >> >> >
>> >> >> > A library which is only loaded and never closed during the execution of the
>> >> >> > program can let the kernel implicitly unregister rseq at thread exit. For
>> >> >> > the dlopen/dlclose use-case, we need to be able to explicitly unregister
>> >> >> > each thread's __rseq_abi which sit in a library which is going to be
>> >> >> > dlclose'd.
>> >> >>
>> >> >> Mathieu, Thanks a lot for the explanation, it makes complete sense. It
>> >> >> sounds from Chris's reply that tcmalloc already checks
>> >> >> __rseq_abi.cpu_id and is not dlopened/closed. Considering these, it
>> >> >> seems to already handle things properly - CMIIW.
>> >> >
>> >> > I'll make a note about this, since we can probably benefit from some
>> >> > more comments about the assumptions/invariants the fastpath uses.
>> >>
>> >> I suspect the integration with glibc and with dlopen'd/dlclose'd libraries will
>> >> not
>> >> behave correctly with the current tcmalloc implementation.
>> >>
>> >> Based on the tcmalloc code-base, InitFastPerCpu is only called from IsFast. As
>> >> long
>> >> as this is the only expected caller, having IsFast comparing the RseqCpuId
>> >> detects
>> >> whether glibc (or some other library) has already registered rseq for the
>> >> current
>> >> thread.
>> >>
>> >> However, if the application chooses to invoke InitFastPerCpu() directly, things
>> >> become
>> >> expected, because it invokes:
>> >>
>> >>   absl::base_internal::LowLevelCallOnce(&init_per_cpu_once, InitPerCpu);
>> >>
>> >> which AFAIU invokes InitPerCpu once after execution of the current program.
>> >> Which
>> >> does:
>> >>
>> >> static bool InitThreadPerCpu() {
>> >>   if (__rseq_refcount++ > 0) {
>> >>     return true;
>> >>   }
>> >>
>> >>   auto ret = syscall(__NR_rseq, &__rseq_abi, sizeof(__rseq_abi), 0,
>> >>                      PERCPU_RSEQ_SIGNATURE);
>> >>   if (ret == 0) {
>> >>     return true;
>> >>   } else {
>> >>     __rseq_refcount--;
>> >>   }
>> >>
>> >>   return false;
>> >> }
>> >>
>> >> static void InitPerCpu() {
>> >>   // Based on the results of successfully initializing the first thread, mark
>> >>   // init_status to initialize all subsequent threads.
>> >>   if (InitThreadPerCpu()) {
>> >>     init_status = kFastMode;
>> >>   }
>> >> }
>> >>
>> >> In a scenario where glibc has already registered Rseq, the __rseq_refcount will
>> >> be incremented, the __NR_rseq syscall will fail with -1, errno=EBUSY, so the
>> >> refcount
>> >> will be immediately decremented and it will return false. Therefore,
>> >> "init_status" will
>> >> never be set fo kFastMode, leaving it in kSlowMode for the entire lifetime of
>> >> this
>> >> program. That being said, even though this state can come as a surprise, it
>> >> seems to
>> >> be entirely bypassed by the fast-paths IsFast() and IsFastNoInit(), so maybe it
>> >> won't
>> >> have any observable side-effects other than leaving init_status in a state that
>> >> does not
>> >> match reality.
>> >
>> > I agree that this could potentially violate inviarants, but
>> > InitFastPerCpu is not intended to be called by the application.
>>
>> OK, explicitly documenting this would be a good thing. In my own projects,
>> I prefix those symbols with double-underscores (__) to indicate that those
>> are not meant to be called by other means than the static inlines in the API.
>>
>> There may be use-cases which justify exposing InitFastPerCpu as a public API for
>> applications though, especially for those which require some level of
>> real-time guarantees from the malloc/free APIs. I've run into this situation
>> with liburcu which I maintain.
>>
>> >
>> >> In the other use-case where tcmalloc co-exist with a dlopened/dlclosed library,
>> >> but glibc
>> >> does not provide Rseq registration, we run into issues as well if the dlopened
>> >> library
>> >> registers rseq first for a given thread. The IsFastNoInit() expects that if Rseq
>> >> has been
>> >> observed as registered in the past for a thread, it stays registered. However,
>> >> if a
>> >> dlclosed library unregisters Rseq, we need to be prepared to re-register it. So
>> >> either
>> >> tcmalloc needs to express its use of Rseq by incrementing __rseq_refcount even
>> >> when Rseq
>> >> is registered (this would hurt the fast-path however, and I would hate to have
>> >> to do this),
>> >> or tcmalloc needs to be able to handle the fact that Rseq may be unregistered by
>> >> a dlclosed
>> >> library which was the actual owner of the Rseq registration.
>> >
>> > We have a bit of an opportunity to figure out whether this is the
>> > first time--from TCMalloc's perspective--a thread is doing per-CPU and
>> > bump the __rseq_count accordingly.  I think this could be done off of
>> > the fast path.
>>
>> Is there an explicit tcmalloc API call that each thread need to do before
>> starting
>> to use tcmalloc to allocate and free memory ? If not, you'll probably need to
>> add
>> at least a load of __rseq_refcount (or some other TLS variable), test and
>> conditional
>> branch on the fast-path, which is an additional cost I would ideally prefer to
>> avoid.
>> Or do you have something else in mind ?
> 
> No explicit call is necessary.  This is something that can be done in
> the slow path, since we can recognize the transition from slow -> fast
> path for that thread

Got it, it should work. Thanks!

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2020-02-26 21:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-20 21:33 Rseq registration: Google tcmalloc vs glibc Mathieu Desnoyers
     [not found] ` <CAEE+ybmTzNoBc9YmK2j48MKtSoPHC_1Mgr+ojPpiOTTc+4E=9g@mail.gmail.com>
2020-02-21 13:07   ` Florian Weimer
2020-02-21 15:49 ` Joel Fernandes
2020-02-21 16:13   ` Mathieu Desnoyers
2020-02-26  3:24     ` Joel Fernandes
2020-02-26  3:38       ` Chris Kennelly
2020-02-26 17:01         ` Mathieu Desnoyers
2020-02-26 17:27           ` Chris Kennelly
2020-02-26 18:56             ` Mathieu Desnoyers
2020-02-26 19:12               ` Chris Kennelly
2020-02-26 21:21                 ` Mathieu Desnoyers [this message]
2020-02-27 10:18               ` Szabolcs Nagy
2020-02-27 10:32                 ` Florian Weimer
2020-02-26 21:51       ` Mathieu Desnoyers
2020-02-27 21:11 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1484567919.9131.1582752104880.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=bgeffon@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=ckennelly@google.com \
    --cc=codonell@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=joel@joelfernandes.org \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.