From: Florian Weimer <fweimer@redhat.com>
To: Chris Kennelly <ckennelly@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
"Joel Fernandes\, Google" <joel@joelfernandes.org>,
Paul Turner <pjt@google.com>,
"Carlos O'Donell" <codonell@redhat.com>,
libc-alpha <libc-alpha@sourceware.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
paulmck <paulmck@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
Brian Geffon <bgeffon@google.com>
Subject: Re: Rseq registration: Google tcmalloc vs glibc
Date: Fri, 21 Feb 2020 14:07:58 +0100 [thread overview]
Message-ID: <87a75cgkb5.fsf@oldenburg2.str.redhat.com> (raw)
In-Reply-To: <CAEE+ybmTzNoBc9YmK2j48MKtSoPHC_1Mgr+ojPpiOTTc+4E=9g@mail.gmail.com> (Chris Kennelly's message of "Thu, 20 Feb 2020 23:23:39 -0500")
Quoting in full to get this message to libc-alpha, past the text/html
filter. I wish we had a different list configuration …
Please Note that we have integration patches for glibc which need
review. A fair number of them have been written by me, so I can't help
with that.
* Chris Kennelly:
> On Thu, Feb 20, 2020 at 4:33 PM Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> wrote:
>
> Hi Chris,
>
> As one of the maintainers of the Rseq system call in the Linux kernel, I would
> like to thank the Google team for open sourcing a tcmalloc implementation based
> on Rseq!
>
> I've looked into some critical integration aspects of the tcmalloc implementation,
> and would like to bring up a topic which involves both tcmalloc developers and the
> glibc community.
>
> Thanks. To answer the later questions first:
> * I implemented TCMalloc's upstream rseq based on the conventions I could find from
> (mostly) the kernel self tests. This is probably why it looks like #1 :)
>
> This is less of an intentional preference and more of "it's important that early adopters follow
> a convention" for future glibc upgrades. Initializing __rseq_abi is the most important, but
> there are some other ones, mostly for debugging/tracing
> (https://github.com/compudj/librseq/issues/1), that I'd like to get right too.
>
> * The TCMalloc project does not provide ABI stability, so TCMalloc can change the convention
> it follows.
>
> I have been discussing aspects of co-existence between early Rseq adopter libraries
> and glibc for the past year with the glibc community, and tcmalloc happens to be the
> first project to publicly use Rseq outside of prototype branches or selftests code.
> Considering that there can only be one Rseq registration per thread (as imposed by
> the rseq ABI), there needs to be some kind of protocol between libraries to ensure we
> don't introduce regressions when we eventually combine a newer glibc which takes care
> of registration of the __rseq_abi TLS along with tcmalloc which also try to perform
> that registration within the same thread.
>
> Throughout the various rounds of review of the glibc Rseq integration patch set,
> there were a few solutions envisioned. Here is a brief history:
>
> 1) Introduce a __rseq_refcount TLS variable.
>
> - Currently used by Linux tools/testing/selftests/rseq/rseq.c,
> - Currently used by Google tcmalloc,
> - Emitted by glibc as well my the original patchset (but was later removed),
>
> A user incrementing the refcount from 0 -> 1 performs rseq registration.
> The last user decrementing from 1 -> 0 performs rseq unregistration.
>
> Works for co-existence of dlopen'd/dlclose'd libraries, for dynamically
> linked libraries, and for the main executable.
>
> The refcounting was deemed too complex for glibc's needs (it always
> exists for the entire executable's lifetime), so we moved to
> __rseq_handled instead.
>
> 2) Introduce a __rseq_handled global variable.
>
> - Currently used by Linux tools/testing/selftests/rseq/rseq.c,
> - At some point emitted by glibc as well in my patch set (but was later
> removed),
>
> A library may take rseq ownership if it is still 0 when executing the
> library constructor. Set to 1 by library constructor when handling rseq.
> Set to 0 in destructor if handling rseq.
>
> Not meant to be set by dlopen'd/dlclose'd libraries, only by libraries
> existing for the whole lifetime of the executable and/or the main executable.
>
> This __rseq_handled symbol has been identified as being somewhat redundant
> with the information provided in the __rseq_abi.cpu_id field (uninitialized
> state), which motivated removing this symbol from the glibc integration
> entirely. The only reason for having __rseq_handled separate from
> __rseq_abi.cpu_id was because it was then impossible to touch TLS data
> early in the glibc initialization. This issue was later resolved within
> glibc.
>
> 3) Use the __rseq_abi TLS cpu_id field to know whether Rseq has been
> registered.
>
> - Current protocol in the most recent glibc integration patch set.
> - Not supported yet by Linux kernel rseq selftests,
> - Not supported yet by tcmalloc,
>
> Use the per-thread state to figure out whether each thread need to register
> Rseq individually.
>
> Works for integration between a library which exists for the entire lifetime
> of the executable (e.g. glibc) and other libraries. However, it does not
> allow a set of libraries which are dlopen'd/dlclose'd to co-exist without
> having a library like glibc handling the registration present.
>
> Overall, I like #3 the most due to its simplicity, but I also do not need to support the
> dlopen/dlclose use case (below).
>
> So overall, I suspect the protocol we want for early adopters is that they
> only register Rseq if __rseq_abi.cpu_id == RSEQ_CPU_ID_UNINITIALIZED, which
> ensure they do not get -1, errno = EBUSY when linked against a newer glibc
> which handles Rseq registration. In order to handle multiple early adopters
> dlopen'd/dlclose'd in the same executable, those should synchronize with a
> __rseq_refcount TLS reference count, but it does not have to be taken into
> account by the main executable or libraries present for the entire executable
> lifetime (like glibc).
>
> Based on this, what I think would be missing from the current Google tcmalloc
> implementation is a check for __rseq_abi.cpu_id == RSEQ_CPU_ID_UNINITIALIZED
> in InitThreadPerCpu().
>
> TCMalloc does not get to InitThreadPerCpu without that check.
>
> Before initialization happens, we
> * end up on a slow path
> https://github.com/google/tcmalloc/blob/master/tcmalloc/tcmalloc.cc#L1486
> * which checks UsePerCpuCache
> https://github.com/google/tcmalloc/blob/master/tcmalloc/cpu_cache.h#L222
> * and inspects the TLS variable in IsFast
> https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/percpu.h#L171
> ...which triggers per-thread rseq registration if __rseq_abi.cpu_id is uninitialized.
>
> Otherwise, the IsOnFastPath() call checks also inspects __rseq_abi.cpu_id via IsFastNoInit
> (same thing, but no registration triggered).
>
> Is tcmalloc ever meant to be dlopen'd/dlclose'd (either directly or indirectly),
> or is it required to exist for the entire executable lifetime ? The check and
> increment of __rseq_refcount is only useful to co-exist with dlopen'd/dlclose'd
> libraries, but it would not allow discovering the presence of a glibc which
> takes care of the rseq registration with the planned protocol. A dlopen'd
> library should then only perform rseq unregistration if if brings the
> __rseq_refcount back to 0 (e.g. in a pthread_key destructor).
>
> TCMalloc cannot practically be dlopen'd or dlclose'd.
> * Once memory is allocated with one instance of malloc (or operator new), it needs to be
> free'd to the same heap.
> * dlclose is explicitly not supported by our dependencies ("do not rely on dynamic
> unloading" https://abseil.io/about/compatibility)
>
> Thanks,
> Chris
>
> Adding this check for __rseq_abi.cpu_id == RSEQ_CPU_ID_UNINITIALIZED is something
> I need to do in the Linux rseq selftests, but I refrained from submitting any
> further change to those tests until the glibc rseq integration gets finally
> merged.
>
> Is it something that could be easily changed at this stage in Google tcmalloc,
> or should we reconsider adding back __rseq_refcount within the glibc integration
> patch set, even though it is not strictly useful to glibc ?
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
next prev parent reply other threads:[~2020-02-21 13:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-20 21:33 Rseq registration: Google tcmalloc vs glibc Mathieu Desnoyers
[not found] ` <CAEE+ybmTzNoBc9YmK2j48MKtSoPHC_1Mgr+ojPpiOTTc+4E=9g@mail.gmail.com>
2020-02-21 13:07 ` Florian Weimer [this message]
2020-02-21 15:49 ` Joel Fernandes
2020-02-21 16:13 ` Mathieu Desnoyers
2020-02-26 3:24 ` Joel Fernandes
2020-02-26 3:38 ` Chris Kennelly
2020-02-26 17:01 ` Mathieu Desnoyers
2020-02-26 17:27 ` Chris Kennelly
2020-02-26 18:56 ` Mathieu Desnoyers
2020-02-26 19:12 ` Chris Kennelly
2020-02-26 21:21 ` Mathieu Desnoyers
2020-02-27 10:18 ` Szabolcs Nagy
2020-02-27 10:32 ` Florian Weimer
2020-02-26 21:51 ` Mathieu Desnoyers
2020-02-27 21:11 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a75cgkb5.fsf@oldenburg2.str.redhat.com \
--to=fweimer@redhat.com \
--cc=bgeffon@google.com \
--cc=boqun.feng@gmail.com \
--cc=ckennelly@google.com \
--cc=codonell@redhat.com \
--cc=joel@joelfernandes.org \
--cc=libc-alpha@sourceware.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.