Linux userland API discussions
 help / color / mirror / Atom feed
* Re: [PATCH v4 14/16] ext4: add basic fs-verity support
From: Theodore Ts'o @ 2019-06-15 15:31 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-15-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:52:03AM -0700, Eric Biggers wrote:
> +/*
> + * Format of ext4 verity xattr.  This points to the location of the verity
> + * descriptor within the file data rather than containing it directly because
> + * the verity descriptor *must* be encrypted when ext4 encryption is used.  But,
> + * ext4 encryption does not encrypt xattrs.
> + */
> +struct fsverity_descriptor_location {
> +	__le32 version;
> +	__le32 size;
> +	__le64 pos;
> +};

What's the benefit of storing the location in an xattr as opposed to
just keying it off the end of i_size, rounded up to next page size (or
64k) as I had suggested earlier?

Using an xattr burns xattr space, which is a limited resource, and it
adds some additional code complexity.  Does the benefits outweigh the
added complexity?

						- Ted

^ permalink raw reply

* Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
From: Andy Lutomirski @ 2019-06-15 15:30 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Yu-cheng Yu, Peter Zijlstra, x86, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, linux-kernel, linux-doc, linux-mm, linux-arch,
	linux-api, Arnd Bergmann, Balbir Singh, Borislav Petkov,
	Cyrill Gorcunov, Dave Hansen, Eugene Syromiatnikov,
	Florian Weimer, H.J. Lu, Jann Horn, Jonathan Corbet, Kees Cook,
	Mike Kravetz
In-Reply-To: <5d7012f6-7ab9-fd3d-4a11-294258e48fb5@intel.com>



> On Jun 14, 2019, at 3:06 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
>> On 6/14/19 2:34 PM, Yu-cheng Yu wrote:
>> On Fri, 2019-06-14 at 13:57 -0700, Dave Hansen wrote:
>>>> I have a related question:
>>>> 
>>>> Do we allow the application to read the bitmap, or any fault from the
>>>> application on bitmap pages?
>>> 
>>> We have to allow apps to read it.  Otherwise they can't execute
>>> instructions.
>> 
>> What I meant was, if an app executes some legacy code that results in bitmap
>> lookup, but the bitmap page is not yet populated, and if we then populate that
>> page with all-zero, a #CP should follow.  So do we even populate that zero page
>> at all?
>> 
>> I think we should; a #CP is more obvious to the user at least.
> 
> Please make an effort to un-Intel-ificate your messages as much as
> possible.  I'd really prefer that folks say "missing end branch fault"
> rather than #CP.  I had to Google "#CP".
> 
> I *think* you are saying that:  The *only* lookups to this bitmap are on
> "missing end branch" conditions.  Normal, proper-functioning code
> execution that has ENDBR instructions in it will never even look at the
> bitmap.  The only case when we reference the bitmap locations is when
> the processor is about do do a "missing end branch fault" so that it can
> be suppressed.  Any population with the zero page would be done when
> code had already encountered a "missing end branch" condition, and
> populating with a zero-filled page will guarantee that a "missing end
> branch fault" will result.  You're arguing that we should just figure
> this out at fault time and not ever reach the "missing end branch fault"
> at all.
> 
> Is that right?
> 
> If so, that's an architecture subtlety that I missed until now and which
> went entirely unmentioned in the changelog and discussion up to this
> point.  Let's make sure that nobody else has to walk that path by
> improving our changelog, please.
> 
> In any case, I don't think this is worth special-casing our zero-fill
> code, FWIW.  It's not performance critical and not worth the complexity.
> If apps want to handle the signals and abuse this to fill space up with
> boring page table contents, they're welcome to.  There are much easier
> ways to consume a lot of memory.

Isn’t it a special case either way?  Either we look at CR2 and populate a page, or we look at CR2 and the “tracker” state and send a different signal.  Admittedly the former is very common in the kernel.

> 
>>> We don't have to allow them to (popuating) fault on it.  But, if we
>>> don't, we need some kind of kernel interface to avoid the faults.
>> 
>> The plan is:
>> 
>> * Move STACK_TOP (and vdso) down to give space to the bitmap.
> 
> Even for apps with 57-bit address spaces?
> 
>> * Reserve the bitmap space from (mm->start_stack + PAGE_SIZE) to cover a code
>> size of TASK_SIZE_LOW, which is (TASK_SIZE_LOW / PAGE_SIZE / 8).
> 
> The bitmap size is determined by CR4.LA57, not the app.  If you place
> the bitmap here, won't references to it for high addresses go into the
> high address space?
> 
> Specifically, on a CR4.LA57=0 system, we have 48 bits of address space,
> so 128TB for apps.  You are proposing sticking the bitmap above the
> stack which is near the top of that 128TB address space.  But on a
> 5-level paging system with CR4.LA57=1, there could be valid data at
> 129GB.  Is there something keeping that data from being mistaken for
> being part of the bitmap?
> 

I think we need to make the vma be full sized — it should cover the entire range that the CPU might access. If that means it spans the 48-bit boundary, so be it.

> Also, if you're limiting it to TASK_SIZE_LOW, please don't forget that
> this is yet another thing that probably won't work with the vsyscall
> page.  Please make sure you consider it and mention it in your next post.

Why not?  The vsyscall page is at a negative address.

> 
>> * Mmap the space only when the app issues the first mark-legacy prctl.  This
>> avoids the core-dump issue for most apps and the accounting problem that
>> MAP_NORESERVE probably won't solve

What happens if there’s another VMA there by the time you map it?

^ permalink raw reply

* Re: [PATCH v4 13/16] fs-verity: support builtin file signatures
From: Theodore Ts'o @ 2019-06-15 15:21 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-14-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:52:02AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> To meet some users' needs, add optional support for having fs-verity
> handle a portion of the authentication policy in the kernel.  An
> ".fs-verity" keyring is created to which X.509 certificates can be
> added; then a sysctl 'fs.verity.require_signatures' can be set to cause
> the kernel to enforce that all fs-verity files contain a signature of
> their file measurement by a key in this keyring.

I think it might be a good idea to allow the require_signatures
setting to be set on a per-file system basis, via a mount option?  We
could plumb it in via a flag in fsverity_info, set by the file system.

Other than this feature request, looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 12/16] fs-verity: add SHA-512 support
From: Theodore Ts'o @ 2019-06-15 15:11 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-13-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:52:01AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add SHA-512 support to fs-verity.  This is primarily a demonstration of
> the trivial changes needed to support a new hash algorithm in fs-verity;
> most users will still use SHA-256, due to the smaller space required to
> store the hashes.  But some users may prefer SHA-512.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 11/16] fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
From: Theodore Ts'o @ 2019-06-15 15:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-12-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:52:00AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add a function for filesystems to call to implement the
> FS_IOC_MEASURE_VERITY ioctl.  This ioctl retrieves the file measurement
> that fs-verity calculated for the given file and is enforcing for reads;
> i.e., reads that don't match this hash will fail.  This ioctl can be
> used for authentication or logging of file measurements in userspace.
> 
> See the "FS_IOC_MEASURE_VERITY" section of
> Documentation/filesystems/fsverity.rst for the documentation.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 10/16] fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
From: Theodore Ts'o @ 2019-06-15 15:08 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-11-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:59AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add a function for filesystems to call to implement the
> FS_IOC_ENABLE_VERITY ioctl.  This ioctl enables fs-verity on a file.
> 
> See the "FS_IOC_ENABLE_VERITY" section of
> Documentation/filesystems/fsverity.rst for the documentation.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

> diff --git a/fs/verity/enable.c b/fs/verity/enable.c
> new file mode 100644
> index 000000000000..7e7ef9d3c376
> --- /dev/null
> +++ b/fs/verity/enable.c
> +	/* Tell the filesystem to finish enabling verity on the file */
> +	err = vops->end_enable_verity(filp, desc, desc_size, params.tree_size);
> +	if (err) {
> +		fsverity_err(inode, "%ps() failed with err %d",
> +			     vops->end_enable_verity, err);
> +		fsverity_free_info(vi);
> +	} else {
> +		/* Successfully enabled verity */
> +
> +		WARN_ON(!IS_VERITY(inode));
> +
> +		/*
> +		 * Readers can start using ->i_verity_info immediately, so it
> +		 * can't be rolled back once set.  So don't set it until just
> +		 * after the filesystem has successfully enabled verity.
> +		 */
> +		fsverity_set_info(inode, vi);
> +	}

If end_enable_Verity() retuns success, and IS_VERITY is not set, I
would think that we should report the error via fsverity_err() and
return an error to userspace, and *not* call fsverity_set_info().  I
don't think the stack trace printed by WARN_ON is going to very
interesting, since the call path which gets us to enable_verity() is
not going to be surprising.

> +
> +	if (inode->i_size <= 0) {
> +		err = -EINVAL;
> +		goto out_unlock;
> +	}

How hard would it be to support fsverity for zero-length files?  There
would be no Merkle tree, but there still would be an fsverity header
file on which we can calculate a checksum for the digital signature.

     	      	     	       	 - Ted

^ permalink raw reply

* Re: [PATCH v4 09/16] fs-verity: add data verification hooks for ->readpages()
From: Theodore Ts'o @ 2019-06-15 14:48 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-10-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:58AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add functions that verify data pages that have been read from a
> fs-verity file, against that file's Merkle tree.  These will be called
> from filesystems' ->readpage() and ->readpages() methods.
> 
> Since data verification can block, a workqueue is provided for these
> methods to enqueue verification work from their bio completion callback.
> 
> See the "Verifying data" section of
> Documentation/filesystems/fsverity.rst for more information.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 08/16] fs-verity: add the hook for file ->setattr()
From: Theodore Ts'o @ 2019-06-15 14:43 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-9-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:57AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add a function fsverity_prepare_setattr() which filesystems that support
> fs-verity must call to deny truncates of verity files.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 07/16] fs-verity: add the hook for file ->open()
From: Theodore Ts'o @ 2019-06-15 14:42 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-8-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:56AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add the fsverity_file_open() function, which prepares an fs-verity file
> to be read from.  If not already done, it loads the fs-verity descriptor
> from the filesystem and sets up an fsverity_info structure for the inode
> which describes the Merkle tree and contains the file measurement.  It
> also denies all attempts to open verity files for writing.
> 
> This commit also begins the include/linux/fsverity.h header, which
> declares the interface between fs/verity/ and filesystems.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

> +/*
> + * Validate the given fsverity_descriptor and create a new fsverity_info from
> + * it.  The signature (if present) is also checked.
> + */
> +struct fsverity_info *fsverity_create_info(const struct inode *inode,
> +					   const void *_desc, size_t desc_size)

Well, technically it's not checked (yet).  It doesn't get checked
until [PATCH 13/16]: support builtin file signatures.  If we want to
be really nit-picky, that portion of the comment could be moved to
later in the series.

^ permalink raw reply

* Re: [PATCH v4 06/16] fs-verity: add inode and superblock fields
From: Theodore Ts'o @ 2019-06-15 12:57 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-7-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:55AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Analogous to fs/crypto/, add fields to the VFS inode and superblock for
> use by the fs/verity/ support layer:
> 
> - ->s_vop: points to the fsverity_operations if the filesystem supports
>   fs-verity, otherwise is NULL.
> 
> - ->i_verity_info: points to cached fs-verity information for the inode
>   after someone opens it, otherwise is NULL.
> 
> - S_VERITY: bit in ->i_flags that identifies verity inodes, even when
>   they haven't been opened yet and thus still have NULL ->i_verity_info.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 05/16] fs-verity: add Kconfig and the helper functions for hashing
From: Theodore Ts'o @ 2019-06-15 12:57 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-6-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:54AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add the beginnings of the fs/verity/ support layer, including the
> Kconfig option and various helper functions for hashing.  To start, only
> SHA-256 is supported, but other hash algorithms can easily be added.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

One thought for consideration below...


> +
> +/*
> + * Maximum depth of the Merkle tree.  Up to 64 levels are theoretically possible
> + * with a very small block size, but we'd like to limit stack usage during
> + * verification, and in practice this is plenty.  E.g., with SHA-256 and 4K
> + * blocks, a file with size UINT64_MAX bytes needs just 8 levels.
> + */
> +#define FS_VERITY_MAX_LEVELS		16

Maybe we should make FS_VERITY_MAX_LEVELS 8 for now?  This is an
implementation-level restriction, and currently we don't support any
architectures that have a page size < 4k.  We can always bump this
number up in the future if it ever becomes necessary, and limiting max
levels to 8 saves almost 100 bytes of stack space in verify_page().

						- Ted

^ permalink raw reply

* Re: [PATCH v4 04/16] fs: uapi: define verity bit for FS_IOC_GETFLAGS
From: Theodore Ts'o @ 2019-06-15 12:40 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-5-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:53AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add FS_VERITY_FL to the flags for FS_IOC_GETFLAGS, so that applications
> can easily determine whether a file is a verity file at the same time as
> they're checking other file flags.  This flag will be gettable only;
> FS_IOC_SETFLAGS won't allow setting it, since an ioctl must be used
> instead to provide more parameters.
> 
> This flag matches the on-disk bit that was already allocated for ext4.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 02/16] fs-verity: add MAINTAINERS file entry
From: Theodore Ts'o @ 2019-06-15 12:39 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-3-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:51AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> fs-verity will be jointly maintained by Eric Biggers and Theodore Ts'o.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-off-by: Theodore Ts'o <tytso@mit.edu>

						- Ted

^ permalink raw reply

* Re: [PATCH v4 01/16] fs-verity: add a documentation file
From: Theodore Ts'o @ 2019-06-15 12:39 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Darrick J . Wong, linux-api, Dave Chinner, linux-f2fs-devel,
	linux-fscrypt, linux-fsdevel, Jaegeuk Kim, linux-integrity,
	linux-ext4, Linus Torvalds, Christoph Hellwig, Victor Hsieh
In-Reply-To: <20190606155205.2872-2-ebiggers@kernel.org>

On Thu, Jun 06, 2019 at 08:51:50AM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Add a documentation file for fs-verity, covering....
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Looks good; you can add:

Reviewed-by: Theodore Ts'o <tytso@mit.edu>


One minor design point below:

> +ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
> +past the end of the file, starting at the first page fully beyond
                                                   ^^^^
> +i_size.  This approach works because (a) verity files are readonly,
> +and (b) pages fully beyond i_size aren't visible to userspace but can
> +be read/written internally by ext4 with only some relatively small
> +changes to ext4.  This approach avoids having to depend on the
> +EA_INODE feature and on rearchitecturing ext4's xattr support to
> +support paging multi-gigabyte xattrs into memory, and to support
> +encrypting xattrs.  Note that the verity metadata *must* be encrypted
> +when the file is, since it contains hashes of the plaintext data.

If we ever want to support mounting, say, a file system with 4k blocks
and fsverity enabled on a architecture with a 16k or 64k page size,
then "page" in that first sentence will need to become "block".  At
the moment we only support fsverity when page size == block size, so
it's not an issue.

However, it's worth reflecting on what this means.  In order to
satisfy this requirement (from the mmap man page):

       A file is mapped in multiples of the page size.  For a file
       that is not a multiple of the page size, the remaining memory
       is zeroed when mapped...

we're going to have to special case how the last page gets mmaped.
The simplest way to do this will be to map in an anonymous page which
just has the blocks that are part of the data block copied in, and the
rest of the page can be zero'ed.

One thing we might consider doing just to make life much easier for
ourselves (should we ever want to support page size != block size ---
which I could imagine some folks like Chandan might find desirable) is
to specify that the fsverity metadata begins at an offset which begins
at i_size rounded up to the next 64k binary, which should handle all
current and future architectures' page sizes.

					- Ted

^ permalink raw reply

* Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
From: Dave Hansen @ 2019-06-14 22:06 UTC (permalink / raw)
  To: Yu-cheng Yu, Andy Lutomirski
  Cc: Peter Zijlstra, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit
In-Reply-To: <359e6f64d646d5305c52f393db5296c469630d11.camel@intel.com>

On 6/14/19 2:34 PM, Yu-cheng Yu wrote:
> On Fri, 2019-06-14 at 13:57 -0700, Dave Hansen wrote:
>>> I have a related question:
>>>
>>> Do we allow the application to read the bitmap, or any fault from the
>>> application on bitmap pages?
>>
>> We have to allow apps to read it.  Otherwise they can't execute
>> instructions.
> 
> What I meant was, if an app executes some legacy code that results in bitmap
> lookup, but the bitmap page is not yet populated, and if we then populate that
> page with all-zero, a #CP should follow.  So do we even populate that zero page
> at all?
> 
> I think we should; a #CP is more obvious to the user at least.

Please make an effort to un-Intel-ificate your messages as much as
possible.  I'd really prefer that folks say "missing end branch fault"
rather than #CP.  I had to Google "#CP".

I *think* you are saying that:  The *only* lookups to this bitmap are on
"missing end branch" conditions.  Normal, proper-functioning code
execution that has ENDBR instructions in it will never even look at the
bitmap.  The only case when we reference the bitmap locations is when
the processor is about do do a "missing end branch fault" so that it can
be suppressed.  Any population with the zero page would be done when
code had already encountered a "missing end branch" condition, and
populating with a zero-filled page will guarantee that a "missing end
branch fault" will result.  You're arguing that we should just figure
this out at fault time and not ever reach the "missing end branch fault"
at all.

Is that right?

If so, that's an architecture subtlety that I missed until now and which
went entirely unmentioned in the changelog and discussion up to this
point.  Let's make sure that nobody else has to walk that path by
improving our changelog, please.

In any case, I don't think this is worth special-casing our zero-fill
code, FWIW.  It's not performance critical and not worth the complexity.
 If apps want to handle the signals and abuse this to fill space up with
boring page table contents, they're welcome to.  There are much easier
ways to consume a lot of memory.

>> We don't have to allow them to (popuating) fault on it.  But, if we
>> don't, we need some kind of kernel interface to avoid the faults.
> 
> The plan is:
> 
> * Move STACK_TOP (and vdso) down to give space to the bitmap.

Even for apps with 57-bit address spaces?

> * Reserve the bitmap space from (mm->start_stack + PAGE_SIZE) to cover a code
> size of TASK_SIZE_LOW, which is (TASK_SIZE_LOW / PAGE_SIZE / 8).

The bitmap size is determined by CR4.LA57, not the app.  If you place
the bitmap here, won't references to it for high addresses go into the
high address space?

Specifically, on a CR4.LA57=0 system, we have 48 bits of address space,
so 128TB for apps.  You are proposing sticking the bitmap above the
stack which is near the top of that 128TB address space.  But on a
5-level paging system with CR4.LA57=1, there could be valid data at
129GB.  Is there something keeping that data from being mistaken for
being part of the bitmap?

Also, if you're limiting it to TASK_SIZE_LOW, please don't forget that
this is yet another thing that probably won't work with the vsyscall
page.  Please make sure you consider it and mention it in your next post.

> * Mmap the space only when the app issues the first mark-legacy prctl.  This
> avoids the core-dump issue for most apps and the accounting problem that
> MAP_NORESERVE probably won't solve completely.

What is this accounting problem?

^ permalink raw reply

* Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
From: Yu-cheng Yu @ 2019-06-14 21:34 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: Peter Zijlstra, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit
In-Reply-To: <598edca7-c36a-a236-3b72-08b2194eb609@intel.com>

On Fri, 2019-06-14 at 13:57 -0700, Dave Hansen wrote:
> On 6/14/19 10:13 AM, Yu-cheng Yu wrote:
> > On Fri, 2019-06-14 at 09:13 -0700, Dave Hansen wrote:
> > > On 6/14/19 8:25 AM, Yu-cheng Yu wrote:
> > > > The bitmap is very big.
> > > 
> > > Really?  It's actually, what, 8*4096=32k, so 1/32,768th of the size of
> > > the libraries legacy libraries you load?  Do our crash dumps really not
> > > know how to represent or deal with sparse mappings?
> > 
> > Ok, even the core dump is not physically big, its size still looks odd,
> > right?
> 
> Hell if I know.
> 
> Could you please go try this in practice so that we're designing this
> thing fixing real actual problems instead of phantoms that we're
> anticipating?
> 
> > Could this also affect how much time for GDB to load it.
> 
> I don't know.  Can you go find out for sure, please?

OK!

> 
> > I have a related question:
> > 
> > Do we allow the application to read the bitmap, or any fault from the
> > application on bitmap pages?
> 
> We have to allow apps to read it.  Otherwise they can't execute
> instructions.

What I meant was, if an app executes some legacy code that results in bitmap
lookup, but the bitmap page is not yet populated, and if we then populate that
page with all-zero, a #CP should follow.  So do we even populate that zero page
at all?

I think we should; a #CP is more obvious to the user at least.

> 
> We don't have to allow them to (popuating) fault on it.  But, if we
> don't, we need some kind of kernel interface to avoid the faults.

The plan is:

* Move STACK_TOP (and vdso) down to give space to the bitmap.

* Reserve the bitmap space from (mm->start_stack + PAGE_SIZE) to cover a code
size of TASK_SIZE_LOW, which is (TASK_SIZE_LOW / PAGE_SIZE / 8).

* Mmap the space only when the app issues the first mark-legacy prctl.  This
avoids the core-dump issue for most apps and the accounting problem that
MAP_NORESERVE probably won't solve completely.

* The bitmap is read-only.  The kernel sets the bitmap with
get_user_pages_fast(FOLL_WRITE) and user_access_begin()/user_addess_end().

I will send out a RFC patch.

Yu-cheng

^ permalink raw reply

* Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
From: Dave Hansen @ 2019-06-14 20:57 UTC (permalink / raw)
  To: Yu-cheng Yu, Andy Lutomirski
  Cc: Peter Zijlstra, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit
In-Reply-To: <b5a915602020a6ce26ea1254f7f60e239c91bc9f.camel@intel.com>

On 6/14/19 10:13 AM, Yu-cheng Yu wrote:
> On Fri, 2019-06-14 at 09:13 -0700, Dave Hansen wrote:
>> On 6/14/19 8:25 AM, Yu-cheng Yu wrote:
>>> The bitmap is very big.
>>
>> Really?  It's actually, what, 8*4096=32k, so 1/32,768th of the size of
>> the libraries legacy libraries you load?  Do our crash dumps really not
>> know how to represent or deal with sparse mappings?
> 
> Ok, even the core dump is not physically big, its size still looks odd, right?

Hell if I know.

Could you please go try this in practice so that we're designing this
thing fixing real actual problems instead of phantoms that we're
anticipating?

> Could this also affect how much time for GDB to load it.

I don't know.  Can you go find out for sure, please?

> I have a related question:
> 
> Do we allow the application to read the bitmap, or any fault from the
> application on bitmap pages?

We have to allow apps to read it.  Otherwise they can't execute
instructions.

We don't have to allow them to (popuating) fault on it.  But, if we
don't, we need some kind of kernel interface to avoid the faults.

^ permalink raw reply

* Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
From: Yu-cheng Yu @ 2019-06-14 17:13 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: Peter Zijlstra, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit
In-Reply-To: <5ddf59e2-c701-3741-eaa1-f63ee741ea55@intel.com>

On Fri, 2019-06-14 at 09:13 -0700, Dave Hansen wrote:
> On 6/14/19 8:25 AM, Yu-cheng Yu wrote:
> > On Mon, 2019-06-10 at 15:59 -0700, Dave Hansen wrote:
> > > On 6/10/19 3:40 PM, Yu-cheng Yu wrote:
> > > > Ok, we will go back to do_mmap() with MAP_PRIVATE, MAP_NORESERVE and
> > > > VM_DONTDUMP.  The bitmap will cover only 48-bit address space.
> > > 
> > > Could you make sure to discuss the downsides of only doing a 48-bit
> > > address space?
> > 
> > The downside is that we cannot load legacy lib's above 48-bit address space,
> > but
> > currently ld-linux does not do that.  Should ld-linux do that in the future,
> > dlopen() fails.  Considering CRIU migration, we probably need to do this
> > anyway?
> 
> Again, I was thinking about JITs.  Please remember that not all code in
> the system is from files on the disk.  Please.  We need to be really,
> really sure that we don't addle this implementation by being narrow
> minded about this.
> 
> Please don't forget about JITs.
> 
> > > What are the reasons behind and implications of VM_DONTDUMP?
> > 
> > The bitmap is very big.
> 
> Really?  It's actually, what, 8*4096=32k, so 1/32,768th of the size of
> the libraries legacy libraries you load?  Do our crash dumps really not
> know how to represent or deal with sparse mappings?

Ok, even the core dump is not physically big, its size still looks odd, right?
Could this also affect how much time for GDB to load it.
We will only mmap the bitmap when the first time the bitmap prctl is called.

I have a related question:

Do we allow the application to read the bitmap, or any fault from the
application on bitmap pages?

We populate a page only when bits are set from a prctl.
Any other fault means either the application tries to find out an address
range's status or it executes legacy code that has not been marked in the
bitmap.

> 
> > In GDB, it should be easy to tell why a control-protection fault occurred
> > without the bitmap.
> 
> How about why one didn't happen?

We'll dump the bitmap if it is allocated.

Yu-cheng

^ permalink raw reply

* Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
From: Dave Hansen @ 2019-06-14 16:13 UTC (permalink / raw)
  To: Yu-cheng Yu, Andy Lutomirski
  Cc: Peter Zijlstra, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit
In-Reply-To: <cf0d1470e95e0a8b88742651d06601a53d6655c1.camel@intel.com>

On 6/14/19 8:25 AM, Yu-cheng Yu wrote:
> On Mon, 2019-06-10 at 15:59 -0700, Dave Hansen wrote:
>> On 6/10/19 3:40 PM, Yu-cheng Yu wrote:
>>> Ok, we will go back to do_mmap() with MAP_PRIVATE, MAP_NORESERVE and
>>> VM_DONTDUMP.  The bitmap will cover only 48-bit address space.
>>
>> Could you make sure to discuss the downsides of only doing a 48-bit
>> address space?
> 
> The downside is that we cannot load legacy lib's above 48-bit address space, but
> currently ld-linux does not do that.  Should ld-linux do that in the future,
> dlopen() fails.  Considering CRIU migration, we probably need to do this anyway?

Again, I was thinking about JITs.  Please remember that not all code in
the system is from files on the disk.  Please.  We need to be really,
really sure that we don't addle this implementation by being narrow
minded about this.

Please don't forget about JITs.

>> What are the reasons behind and implications of VM_DONTDUMP?
> 
> The bitmap is very big.

Really?  It's actually, what, 8*4096=32k, so 1/32,768th of the size of
the libraries legacy libraries you load?  Do our crash dumps really not
know how to represent or deal with sparse mappings?

> In GDB, it should be easy to tell why a control-protection fault occurred
> without the bitmap.

How about why one didn't happen?

^ permalink raw reply

* Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
From: Yu-cheng Yu @ 2019-06-14 15:25 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: Peter Zijlstra, x86, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel, linux-doc, linux-mm, linux-arch, linux-api,
	Arnd Bergmann, Balbir Singh, Borislav Petkov, Cyrill Gorcunov,
	Dave Hansen, Eugene Syromiatnikov, Florian Weimer, H.J. Lu,
	Jann Horn, Jonathan Corbet, Kees Cook, Mike Kravetz, Nadav Amit
In-Reply-To: <ea5e333f-8cd6-8396-635f-a9dc580d5364@intel.com>

On Mon, 2019-06-10 at 15:59 -0700, Dave Hansen wrote:
> On 6/10/19 3:40 PM, Yu-cheng Yu wrote:
> > Ok, we will go back to do_mmap() with MAP_PRIVATE, MAP_NORESERVE and
> > VM_DONTDUMP.  The bitmap will cover only 48-bit address space.
> 
> Could you make sure to discuss the downsides of only doing a 48-bit
> address space?

The downside is that we cannot load legacy lib's above 48-bit address space, but
currently ld-linux does not do that.  Should ld-linux do that in the future,
dlopen() fails.  Considering CRIU migration, we probably need to do this anyway?

> What are the reasons behind and implications of VM_DONTDUMP?

The bitmap is very big.

In GDB, it should be easy to tell why a control-protection fault occurred
without the bitmap.

Yu-cheng

^ permalink raw reply

* [PATCH glibc 2/5] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux (v5)
From: Mathieu Desnoyers @ 2019-06-14 15:23 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Mathieu Desnoyers, Thomas Gleixner, Ben Maurer, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson,
	Paul Turner, linux-kernel, linux-api
In-Reply-To: <20190614152304.12321-1-mathieu.desnoyers@efficios.com>

When available, use the cpu_id field from __rseq_abi on Linux to
implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable.

Benchmarks:

x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

glibc sched_getcpu():                     13.7 ns (baseline)
glibc sched_getcpu() using rseq:           2.5 ns (speedup:  5.5x)
inline load cpuid from __rseq_abi TLS:     0.8 ns (speedup: 17.1x)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Joseph Myers <joseph@codesourcery.com>
CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Maurer <bmaurer@fb.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Paul Turner <pjt@google.com>
CC: libc-alpha@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:
- rseq is only used if both __NR_rseq and RSEQ_SIG are defined.

Changes since v2:
- remove duplicated __rseq_abi extern declaration.

Changes since v3:
- update ChangeLog.

Changes since v4:
- Use atomic_load_relaxed to load the __rseq_abi.cpu_id field, a
  consequence of the fact that __rseq_abi is not volatile anymore.
- Include atomic.h which provides atomic_load_relaxed.
---
 ChangeLog                              |  5 +++++
 sysdeps/unix/sysv/linux/sched_getcpu.c | 25 +++++++++++++++++++++++--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 01a411acbf..58830cafc2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2019-04-23  Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+
+	* sysdeps/unix/sysv/linux/sched_getcpu.c: use rseq cpu_id TLS on
+	Linux.
+
 2019-04-23  Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
 
 	* NEWS: Add Restartable Sequences feature description.
diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
index fb0d317f83..01a818f5b0 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -18,14 +18,15 @@
 #include <errno.h>
 #include <sched.h>
 #include <sysdep.h>
+#include <atomic.h>
 
 #ifdef HAVE_GETCPU_VSYSCALL
 # define HAVE_VSYSCALL
 #endif
 #include <sysdep-vdso.h>
 
-int
-sched_getcpu (void)
+static int
+vsyscall_sched_getcpu (void)
 {
 #ifdef __NR_getcpu
   unsigned int cpu;
@@ -37,3 +38,23 @@ sched_getcpu (void)
   return -1;
 #endif
 }
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#endif
+
+#if defined __NR_rseq && defined RSEQ_SIG
+int
+sched_getcpu (void)
+{
+  int cpu_id = atomic_load_relaxed (&__rseq_abi.cpu_id);
+
+  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
+}
+#else
+int
+sched_getcpu (void)
+{
+  return vsyscall_sched_getcpu ();
+}
+#endif
-- 
2.17.1

^ permalink raw reply related

* [PATCH glibc 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v11)
From: Mathieu Desnoyers @ 2019-06-14 15:23 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Florian Weimer, Joseph Myers, Szabolcs Nagy, libc-alpha,
	Mathieu Desnoyers, Thomas Gleixner, Ben Maurer, Peter Zijlstra,
	Paul E. McKenney, Boqun Feng, Will Deacon, Dave Watson,
	Paul Turner, Rich Felker, linux-kernel, linux-api
In-Reply-To: <20190614152304.12321-1-mathieu.desnoyers@efficios.com>

Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

See the rseq(2) man page proposed here:
  https://lkml.org/lkml/2018/9/19/647

This patch is based on glibc-2.29. The rseq(2) system call was merged
into Linux 4.18.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Joseph Myers <joseph@codesourcery.com>
CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Maurer <bmaurer@fb.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Paul Turner <pjt@google.com>
CC: Rich Felker <dalias@libc.org>
CC: libc-alpha@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:
- Move __rseq_refcount to an extra field at the end of __rseq_abi to
  eliminate one symbol.

  All libraries/programs which try to register rseq (glibc,
  early-adopter applications, early-adopter libraries) should use the
  rseq refcount. It becomes part of the ABI within a user-space
  process, but it's not part of the ABI shared with the kernel per se.

- Restructure how this code is organized so glibc keeps building on
  non-Linux targets.

- Use non-weak symbol for __rseq_abi.

- Move rseq registration/unregistration implementation into its own
  nptl/rseq.c compile unit.

- Move __rseq_abi symbol under GLIBC_2.29.

Changes since v2:
- Move __rseq_refcount to its own symbol, which is less ugly than
  trying to play tricks with the rseq uapi.
- Move __rseq_abi from nptl to csu (C start up), so it can be used
  across glibc, including memory allocator and sched_getcpu(). The
  __rseq_refcount symbol is kept in nptl, because there is no reason
  to use it elsewhere in glibc.

Changes since v3:
- Set __rseq_refcount TLS to 1 on register/set to 0 on unregister
  because glibc is the first/last user.
- Unconditionally register/unregister rseq at thread start/exit, because
  glibc is the first/last user.
- Add missing abilist items.
- Rebase on glibc master commit a502c5294.
- Add NEWS entry.

Changes since v4:
- Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on
  "System V Application Binary Interface", weak only affects the link
  editor, not the dynamic linker.
- Install a new sys/rseq.h system header on Linux, which contains the
  RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount
  declaration. Move those definition/declarations from rseq-internal.h
  to the installed sys/rseq.h header.
- Considering that rseq is only available on Linux, move csu/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c.
- Move __rseq_refcount from nptl/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux.
- Move both ABI definitions for __rseq_abi and __rseq_refcount to
  sysdeps/unix/sysv/linux/Versions, so they only appear on Linux.
- Document __rseq_abi and __rseq_refcount volatile.
- Document the RSEQ_SIG signature define.
- Move registration functions from rseq.c to rseq-internal.h static
  inline functions. Introduce empty stubs in misc/rseq-internal.h,
  which can be overridden by architecture code in
  sysdeps/unix/sysv/linux/rseq-internal.h.
- Rename __rseq_register_current_thread and __rseq_unregister_current_thread
  to rseq_register_current_thread and rseq_unregister_current_thread,
  now that those are only visible as internal static inline functions.
- Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN
  rather than nptl init, so applications not linked against
  libpthread.so have rseq registered for their main() thread. Note that
  it is invoked separately for SHARED and !SHARED builds.

Changes since v5:
- Replace __rseq_refcount by __rseq_lib_abi, which contains two
  uint32_t: register_state and refcount. The "register_state" field
  allows inhibiting rseq registration from signal handlers nested on top
  of glibc registration and occuring after rseq unregistration by glibc.
- Introduce enum rseq_register_state, which contains the states allowed
  for the struct rseq_lib_abi register_state field.

Changes since v6:
- Introduce bits/rseq.h to define RSEQ_SIG for each architecture.
  The generic bits/rseq.h does not define RSEQ_SIG, meaning that each
  architecture implementing rseq needs to implement bits/rseq.h.
- Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING.
- Port to glibc-2.29.

Changes since v7:
- Remove __rseq_lib_abi symbol, including refcount and register_state
  fields.
- Remove reference counting and nested signals handling from
  registration/unregistration functions.
- Introduce new __rseq_handled exported symbol, which is set to 1
  by glibc on C startup when it handles restartable sequences.
  This allows glibc to coexist with early adopter libraries and
  applications wishing to register restartable sequences when it
  is not handled by glibc.
- Introduce rseq_init (), which sets __rseq_handled to 1 from
  C startup.
- Update NEWS entry.
- Update comments at the beginning of new files.
- Registration depends on both __NR_rseq and RSEQ_SIG.
- Remove ARM, powerpc, MIPS RSEQ_SIG until we agree with maintainers
  on the signature choice.
- Update x86, s390 RSEQ_SIG based on discussion with arch maintainers.
- Remove rseq-internal.h from headers list of misc/Makefile, so it
  it not installed by make install.

Changes since v8:
- Introduce RSEQ_SIG_CODE and RSEQ_SIG_DATA on aarch64 to handle
  compiling with -mbig-endian.

Changes since v9:
- Update Changelog.
- Remove unneeded new file comment header newlines.

Changes since v10:
- Remove volatile from __rseq_abi declaration.
- Document that __rseq_handled is about library managing rseq
  registration, independently of whether rseq is available or not.
- Move __rseq_handled symbol to ld.so, initialize this symbol within
  the dynamic linker initialization for both shared (rtld.c) and static
  (dl-support.c) builds.
- Only register the rseq TLS on initialization once in multiple-libc
  scenarios. Use rtld_active () for this purpose.
- In the static libc case, register the rseq TLS after LD_PRELOAD
  constructors are run, so it matches the order of this initialization
  vs LD_PRELOAD contructors execution for the shared libc.
- Agreed on signature choice with powerpc and MIPS maintainers,
  re-adding those signatures,
- The main architecture still left out signature-wise is ARM32.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Joseph Myers <joseph@codesourcery.com>
CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Maurer <bmaurer@fb.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Paul Turner <pjt@google.com>
CC: Rich Felker <dalias@libc.org>
CC: libc-alpha@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 ChangeLog                                     | 77 ++++++++++++++++
 NEWS                                          | 15 ++++
 csu/libc-start.c                              | 14 ++-
 elf/dl-support.c                              | 22 +++++
 elf/rtld.c                                    | 26 ++++++
 misc/rseq-internal.h                          | 38 ++++++++
 nptl/pthread_create.c                         |  9 ++
 sysdeps/unix/sysv/linux/Makefile              |  4 +-
 sysdeps/unix/sysv/linux/Versions              | 10 +++
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   | 43 +++++++++
 sysdeps/unix/sysv/linux/aarch64/ld.abilist    |  1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |  1 +
 sysdeps/unix/sysv/linux/alpha/ld.abilist      |  1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |  1 +
 sysdeps/unix/sysv/linux/arm/ld.abilist        |  1 +
 sysdeps/unix/sysv/linux/arm/libc.abilist      |  1 +
 sysdeps/unix/sysv/linux/bits/rseq.h           | 29 ++++++
 sysdeps/unix/sysv/linux/csky/ld.abilist       |  1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |  1 +
 sysdeps/unix/sysv/linux/hppa/ld.abilist       |  1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |  1 +
 sysdeps/unix/sysv/linux/i386/ld.abilist       |  1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |  1 +
 sysdeps/unix/sysv/linux/ia64/ld.abilist       |  1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |  1 +
 .../unix/sysv/linux/m68k/coldfire/ld.abilist  |  1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |  1 +
 .../unix/sysv/linux/m68k/m680x0/ld.abilist    |  1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |  1 +
 sysdeps/unix/sysv/linux/microblaze/ld.abilist |  1 +
 .../unix/sysv/linux/microblaze/libc.abilist   |  1 +
 sysdeps/unix/sysv/linux/mips/bits/rseq.h      | 62 +++++++++++++
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |  1 +
 .../unix/sysv/linux/mips/mips32/ld.abilist    |  1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |  1 +
 .../sysv/linux/mips/mips64/n32/ld.abilist     |  1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |  1 +
 .../sysv/linux/mips/mips64/n64/ld.abilist     |  1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |  1 +
 sysdeps/unix/sysv/linux/nios2/ld.abilist      |  1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |  1 +
 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h   | 37 ++++++++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |  1 +
 .../sysv/linux/powerpc/powerpc32/ld.abilist   |  1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |  1 +
 .../linux/powerpc/powerpc64/be/ld.abilist     |  1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |  1 +
 .../linux/powerpc/powerpc64/le/ld.abilist     |  1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |  1 +
 sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist |  1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |  1 +
 sysdeps/unix/sysv/linux/rseq-internal.h       | 88 +++++++++++++++++++
 sysdeps/unix/sysv/linux/rseq-sym.c            | 43 +++++++++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h      | 37 ++++++++
 .../unix/sysv/linux/s390/s390-32/ld.abilist   |  1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |  1 +
 .../unix/sysv/linux/s390/s390-64/ld.abilist   |  1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |  1 +
 sysdeps/unix/sysv/linux/sh/ld.abilist         |  1 +
 sysdeps/unix/sysv/linux/sh/libc.abilist       |  1 +
 .../unix/sysv/linux/sparc/sparc32/ld.abilist  |  1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |  1 +
 .../unix/sysv/linux/sparc/sparc64/ld.abilist  |  1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |  1 +
 sysdeps/unix/sysv/linux/sys/rseq.h            | 52 +++++++++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h       | 30 +++++++
 sysdeps/unix/sysv/linux/x86_64/64/ld.abilist  |  1 +
 .../unix/sysv/linux/x86_64/64/libc.abilist    |  1 +
 sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist |  1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |  1 +
 70 files changed, 684 insertions(+), 4 deletions(-)
 create mode 100644 misc/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/mips/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h

diff --git a/ChangeLog b/ChangeLog
index 59dab18463..01a411acbf 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,80 @@
+2019-04-23  Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+
+	* NEWS: Add Restartable Sequences feature description.
+	* csu/libc-start.c: Perform rseq(2) registration at C startup.
+	* elf/rtld.c: Initialize __rseq_handled at dynamic linker
+	startup for shared libc.
+	* elf/dl-support.c: Initialize __rseq_handled at dynamic linker
+	startup for static libc.
+	* nptl/pthread_create.c: Perform rseq(2) registration at thread
+	creation, unregistration at thread exit.
+	* sysdeps/unix/sysv/linux/Makefile: Add rseq-sym, sys/rseq.h,
+	bits/rseq.h.
+	* sysdeps/unix/sysv/linux/Versions: Export __rseq_abi from libc
+	and __rseq_handled from ld.
+	* sysdeps/unix/sysv/linux/aarch64/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/alpha/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/arm/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/arm/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/csky/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/csky/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/hppa/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/hppa/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/i386/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/ia64/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/microblaze/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/microblaze/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips32/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/nios2/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/nios2/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sh/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sh/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/x86_64/64/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
+	* misc/rseq-internal.h: New file.
+	* sysdeps/unix/sysv/linux/rseq-internal.h: Likewise.
+	* sysdeps/unix/sysv/linux/rseq-sym.c: Likewise.
+	* sysdeps/unix/sysv/linux/sys/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/aarch64/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/mips/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/s390/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/x86/bits/rseq.h: Likewise.
+
 2019-01-31  Siddhesh Poyarekar  <siddhesh@sourceware.org>
 
 	* version.h (RELEASE): Set to "stable".
diff --git a/NEWS b/NEWS
index 912a9bdc0f..7276a09b08 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,21 @@ See the end for copying conditions.
 Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
 using `glibc' in the "product" field.
 \f
+Version 2.30
+
+Major new features:
+
+* Support for automatically registering threads with the Linux rseq(2)
+  system call has been added.  This system call is implemented starting
+  from Linux 4.18.  The Restartable Sequences ABI accelerates user-space
+  operations on per-cpu data.  It allows user-space to perform updates
+  on per-cpu data without requiring heavy-weight atomic operations.
+  Automatically registering threads allows all libraries, including libc,
+  to make immediate use of the rseq(2) support by using the documented ABI.
+  See 'man 2 rseq' for the details of the ABI shared between libc and the
+  kernel.
+
+\f
 Version 2.29
 
 Major new features:
diff --git a/csu/libc-start.c b/csu/libc-start.c
index 5d9c3675fa..136d30ace9 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -22,6 +22,8 @@
 #include <ldsodefs.h>
 #include <exit-thread.h>
 #include <libc-internal.h>
+#include <rseq-internal.h>
+#include <ldsodefs.h>
 
 #include <elf/dl-tunables.h>
 
@@ -140,7 +142,13 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;
 
-#ifndef SHARED
+#ifdef SHARED
+  if (rtld_active ())
+    {
+      /* Register rseq ABI to the kernel.   */
+      (void) rseq_register_current_thread ();
+    }
+#else
   _dl_relocate_static_pie ();
 
   char **ev = &argv[argc + 1];
@@ -231,7 +239,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
   __pointer_chk_guard_local = pointer_chk_guard;
 # endif
 
-#endif /* !SHARED  */
+    /* Register rseq ABI to the kernel.   */
+    (void) rseq_register_current_thread ();
+#endif
 
   /* Register the destructor of the dynamic linker if there is any.  */
   if (__glibc_likely (rtld_fini != NULL))
diff --git a/elf/dl-support.c b/elf/dl-support.c
index 42c350c75d..13732b5998 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -34,6 +34,7 @@
 #include <unsecvars.h>
 #include <hp-timing.h>
 #include <stackinfo.h>
+#include <rseq-internal.h>
 
 extern char *__progname;
 char **_dl_argv = &__progname;	/* This is checked for some error messages.  */
@@ -220,6 +221,24 @@ __rtld_lock_define_initialized_recursive (, _dl_load_lock)
    that list.  */
 __rtld_lock_define_initialized_recursive (, _dl_load_write_lock)
 
+/* Advertise Restartable Sequences registration ownership across
+   application and shared libraries.
+
+   Libraries and applications must check whether this variable is zero or
+   non-zero if they wish to perform rseq registration on their own. If it
+   is zero, it means restartable sequence registration is not handled, and
+   the library or application is free to perform rseq registration. In
+   that case, the library or application is taking ownership of rseq
+   registration, and may set __rseq_handled to 1. It may then set it back
+   to 0 after it completes unregistering rseq.
+
+   If __rseq_handled is found to be non-zero, it means that another
+   library (or the application) is currently handling rseq registration.
+
+   Typical use of __rseq_handled is within library constructors and
+   destructors, or at program startup.  */
+
+int __rseq_handled;
 
 #ifdef HAVE_AUX_VECTOR
 int _dl_clktck;
@@ -383,6 +402,9 @@ _dl_non_dynamic_init (void)
 	  _dl_stack_flags = _dl_phdr[i].p_flags;
 	  break;
 	}
+
+  /* Publicize rseq registration ownership.  */
+  rseq_init ();
 }
 
 #ifdef DL_SYSINFO_IMPLEMENTATION
diff --git a/elf/rtld.c b/elf/rtld.c
index 5d97f41b7b..759c9bb8e8 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -43,6 +43,7 @@
 #include <stap-probe.h>
 #include <stackinfo.h>
 #include <not-cancel.h>
+#include <rseq-internal.h>
 
 #include <assert.h>
 
@@ -114,6 +115,25 @@ strong_alias (__pointer_chk_guard_local, __pointer_chk_guard)
 # define SECURE_PATH_LIMIT 1024
 #endif
 
+/* Advertise Restartable Sequences registration ownership across
+   application and shared libraries.
+
+   Libraries and applications must check whether this variable is zero or
+   non-zero if they wish to perform rseq registration on their own. If it
+   is zero, it means restartable sequence registration is not handled, and
+   the library or application is free to perform rseq registration. In
+   that case, the library or application is taking ownership of rseq
+   registration, and may set __rseq_handled to 1. It may then set it back
+   to 0 after it completes unregistering rseq.
+
+   If __rseq_handled is found to be non-zero, it means that another
+   library (or the application) is currently handling rseq registration.
+
+   Typical use of __rseq_handled is within library constructors and
+   destructors, or at program startup.  */
+
+int __rseq_handled;
+
 /* Check that AT_SECURE=0, or that the passed name does not contain
    directories and is not overly long.  Reject empty names
    unconditionally.  */
@@ -2261,6 +2281,12 @@ ERROR: ld.so: object '%s' cannot be loaded as audit interface: %s; ignored.\n",
       HP_TIMING_ACCUM_NT (relocate_time, add);
     }
 
+  /* Publicize rseq registration ownership.  This must be performed
+     after rtld re-relocation, before invoking constructors of
+     preloaded libraries.  IFUNC resolvers are called before this
+     initialization, so they may not observe the initialized state.  */
+  rseq_init ();
+
   /* Do any necessary cleanups for the startup OS interface code.
      We do these now so that no calls are made after rtld re-relocation
      which might be resolved to different functions than we expect.
diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h
new file mode 100644
index 0000000000..ccad30bca5
--- /dev/null
+++ b/misc/rseq-internal.h
@@ -0,0 +1,38 @@
+/* Restartable Sequences internal API. Stub version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_init (void)
+{
+}
+
+#endif /* rseq-internal.h */
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 2bd2b10727..90b3419390 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -33,6 +33,7 @@
 #include <default-sched.h>
 #include <futex-internal.h>
 #include <tls-setup.h>
+#include <rseq-internal.h>
 #include "libioP.h"
 
 #include <shlib-compat.h>
@@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd)
 START_THREAD_DEFN
 {
   struct pthread *pd = START_THREAD_SELF;
+  bool has_rseq = false;
 
 #if HP_TIMING_AVAIL
   /* Remember the time when the thread was started.  */
@@ -396,6 +398,9 @@ START_THREAD_DEFN
   if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
     futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
 
+  /* Register rseq TLS to the kernel. */
+  has_rseq = !rseq_register_current_thread ();
+
 #ifdef __NR_set_robust_list
 # ifndef __ASSUME_SET_ROBUST_LIST
   if (__set_robust_list_avail >= 0)
@@ -573,6 +578,10 @@ START_THREAD_DEFN
     }
 #endif
 
+  /* Unregister rseq TLS from kernel. */
+  if (has_rseq && rseq_unregister_current_thread ())
+    abort();
+
   advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
 		      pd->guardsize);
 
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 5f8c2c7c7d..5b541469ec 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -1,5 +1,5 @@
 ifeq ($(subdir),csu)
-sysdep_routines += errno-loc
+sysdep_routines += errno-loc rseq-sym
 endif
 
 ifeq ($(subdir),assert)
@@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
 		  bits/termios-c_iflag.h bits/termios-c_oflag.h \
 		  bits/termios-baud.h bits/termios-c_cflag.h \
 		  bits/termios-c_lflag.h bits/termios-tcflow.h \
-		  bits/termios-misc.h
+		  bits/termios-misc.h sys/rseq.h bits/rseq.h
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
 	 tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index f1e12d9c69..cc3574f39e 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -174,6 +174,9 @@ libc {
   GLIBC_2.29 {
     getcpu;
   }
+  GLIBC_2.30 {
+    __rseq_abi;
+  }
   GLIBC_PRIVATE {
     # functions used in other libraries
     __syscall_rt_sigqueueinfo;
@@ -185,3 +188,10 @@ libc {
     __netlink_assert_response;
   }
 }
+
+ld {
+  # Symbols required by dynamic linker
+  GLIBC_2.30 {
+    __rseq_handled;
+  }
+}
diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
new file mode 100644
index 0000000000..35fcc41f1e
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
@@ -0,0 +1,43 @@
+/* Restartable Sequences Linux aarch64 architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   aarch64 -mbig-endian generates mixed endianness code vs data:
+   little-endian code and big-endian data. Ensure the RSEQ_SIG signature
+   matches code endianness.  */
+
+#define RSEQ_SIG_CODE	0xd428bc00	/* BRK #0x45E0.  */
+
+#ifdef __AARCH64EB__
+#define RSEQ_SIG_DATA	0x00bc28d4	/* BRK #0x45E0.  */
+#else
+#define RSEQ_SIG_DATA	RSEQ_SIG_CODE
+#endif
+
+#define RSEQ_SIG	RSEQ_SIG_DATA
diff --git a/sysdeps/unix/sysv/linux/aarch64/ld.abilist b/sysdeps/unix/sysv/linux/aarch64/ld.abilist
index 4ffe688649..2ec8010601 100644
--- a/sysdeps/unix/sysv/linux/aarch64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/ld.abilist
@@ -7,3 +7,4 @@ GLIBC_2.17 calloc F
 GLIBC_2.17 free F
 GLIBC_2.17 malloc F
 GLIBC_2.17 realloc F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 9c330f325e..16bd739d05 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2141,3 +2141,4 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
diff --git a/sysdeps/unix/sysv/linux/alpha/ld.abilist b/sysdeps/unix/sysv/linux/alpha/ld.abilist
index 98b66edabf..8513bf0532 100644
--- a/sysdeps/unix/sysv/linux/alpha/ld.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/ld.abilist
@@ -6,4 +6,5 @@ GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x8
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x8
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index f630fa4c6f..64e803ea4b 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2204,6 +2204,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/arm/ld.abilist b/sysdeps/unix/sysv/linux/arm/ld.abilist
index a301c6ebc4..f3a53e3bd9 100644
--- a/sysdeps/unix/sysv/linux/arm/ld.abilist
+++ b/sysdeps/unix/sysv/linux/arm/ld.abilist
@@ -1,3 +1,4 @@
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __libc_stack_end D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
 GLIBC_2.4 __tls_get_addr F
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index b96f45590f..1c3e25f347 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -126,6 +126,7 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
new file mode 100644
index 0000000000..a3c023f5c7
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/bits/rseq.h
@@ -0,0 +1,29 @@
+/* Restartable Sequences architecture header. Stub version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.  */
diff --git a/sysdeps/unix/sysv/linux/csky/ld.abilist b/sysdeps/unix/sysv/linux/csky/ld.abilist
index 71576160ed..2a583e7cb6 100644
--- a/sysdeps/unix/sysv/linux/csky/ld.abilist
+++ b/sysdeps/unix/sysv/linux/csky/ld.abilist
@@ -7,3 +7,4 @@ GLIBC_2.29 calloc F
 GLIBC_2.29 free F
 GLIBC_2.29 malloc F
 GLIBC_2.29 realloc F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 019044c3cd..6642edb46a 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2085,3 +2085,4 @@ GLIBC_2.29 xdrstdio_create F
 GLIBC_2.29 xencrypt F
 GLIBC_2.29 xprt_register F
 GLIBC_2.29 xprt_unregister F
+GLIBC_2.30 __rseq_abi T 0x20
diff --git a/sysdeps/unix/sysv/linux/hppa/ld.abilist b/sysdeps/unix/sysv/linux/hppa/ld.abilist
index 0387614d8f..3614f0c684 100644
--- a/sysdeps/unix/sysv/linux/hppa/ld.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/ld.abilist
@@ -6,4 +6,5 @@ GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 088a8ee369..d273c980f5 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2037,6 +2037,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/ld.abilist b/sysdeps/unix/sysv/linux/i386/ld.abilist
index edb7307228..1e901fbedf 100644
--- a/sysdeps/unix/sysv/linux/i386/ld.abilist
+++ b/sysdeps/unix/sysv/linux/i386/ld.abilist
@@ -7,3 +7,4 @@ GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 ___tls_get_addr F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index f7ff2c57b9..b8e7ca09b4 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2203,6 +2203,7 @@ GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 vm86 F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/ld.abilist b/sysdeps/unix/sysv/linux/ia64/ld.abilist
index 82042472c3..3eb3449014 100644
--- a/sysdeps/unix/sysv/linux/ia64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/ld.abilist
@@ -6,3 +6,4 @@ GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index becd8b1033..b0ea90e8c1 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2069,6 +2069,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
index a301c6ebc4..f3a53e3bd9 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/ld.abilist
@@ -1,3 +1,4 @@
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __libc_stack_end D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
 GLIBC_2.4 __tls_get_addr F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 74e42a5209..3fbcfe1240 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -127,6 +127,7 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
index c9ec45cf1c..a864c76a45 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/ld.abilist
@@ -6,4 +6,5 @@ GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4af5a74e8a..c1f3209530 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2146,6 +2146,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/ld.abilist b/sysdeps/unix/sysv/linux/microblaze/ld.abilist
index aa0d71150a..06db3011f3 100644
--- a/sysdeps/unix/sysv/linux/microblaze/ld.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/ld.abilist
@@ -7,3 +7,4 @@ GLIBC_2.18 calloc F
 GLIBC_2.18 free F
 GLIBC_2.18 malloc F
 GLIBC_2.18 realloc F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index ccef673fd2..47f223e203 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2133,3 +2133,4 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
diff --git a/sysdeps/unix/sysv/linux/mips/bits/rseq.h b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
new file mode 100644
index 0000000000..8c75f107e7
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/bits/rseq.h
@@ -0,0 +1,62 @@
+/* Restartable Sequences Linux mips architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   RSEQ_SIG uses the break instruction. The instruction pattern is:
+
+   On MIPS:
+        0350000d        break     0x350
+
+   On nanoMIPS:
+        00100350        break     0x350
+
+   On microMIPS:
+        0000d407        break     0x350
+
+   For nanoMIPS32 and microMIPS, the instruction stream is encoded as
+   16-bit halfwords, so the signature halfwords need to be swapped
+   accordingly for little-endian.  */
+
+#if defined(__nanomips__)
+# ifdef __MIPSEL__
+#  define RSEQ_SIG	0x03500010
+# else
+#  define RSEQ_SIG	0x00100350
+# endif
+#elif defined(__mips_micromips)
+# ifdef __MIPSEL__
+#  define RSEQ_SIG	0xd4070000
+# else
+#  define RSEQ_SIG	0x0000d407
+# endif
+#elif defined(__mips__)
+# define RSEQ_SIG	0x0350000d
+#else
+/* Unknown MIPS architecture. */
+#endif
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 1054bb599e..1eaee3e730 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2120,6 +2120,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
index 55d48868e8..87ccdf89e6 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/ld.abilist
@@ -6,4 +6,5 @@ GLIBC_2.0 realloc F
 GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 4f5b5ffebf..bfb3a66f93 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2118,6 +2118,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
index 55d48868e8..87ccdf89e6 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/ld.abilist
@@ -6,4 +6,5 @@ GLIBC_2.0 realloc F
 GLIBC_2.2 __libc_stack_end D 0x4
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 943aee58d4..56685efb3f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2126,6 +2126,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
index 44b345b7cf..c781ba9ca1 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/ld.abilist
@@ -6,4 +6,5 @@ GLIBC_2.0 realloc F
 GLIBC_2.2 __libc_stack_end D 0x8
 GLIBC_2.2 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x8
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 17a5d17ef9..ae307ce071 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2120,6 +2120,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/ld.abilist b/sysdeps/unix/sysv/linux/nios2/ld.abilist
index 110f1039fa..0e5a157412 100644
--- a/sysdeps/unix/sysv/linux/nios2/ld.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/ld.abilist
@@ -7,3 +7,4 @@ GLIBC_2.21 calloc F
 GLIBC_2.21 free F
 GLIBC_2.21 malloc F
 GLIBC_2.21 realloc F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 4d62a540fd..410bb2148f 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2174,3 +2174,4 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
diff --git a/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
new file mode 100644
index 0000000000..bae8f4aaa1
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/powerpc/bits/rseq.h
@@ -0,0 +1,37 @@
+/* Restartable Sequences Linux powerpc architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   RSEQ_SIG uses the following trap instruction:
+
+   powerpc-be:    0f e5 00 0b           twui   r5,11
+   powerpc64-le:  0b 00 e5 0f           twui   r5,11
+   powerpc64-be:  0f e5 00 0b           twui   r5,11  */
+
+#define RSEQ_SIG	0x0fe5000b
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index ecc2d6fa13..bdde750166 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2164,6 +2164,7 @@ GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
index e8b0ea3a9b..86f8dbac1c 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist
@@ -8,3 +8,4 @@ GLIBC_2.1 _dl_mcount F
 GLIBC_2.22 __tls_get_addr_opt F
 GLIBC_2.23 __parse_hwcap_and_convert_at_platform F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index f5830f9c33..f661965420 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2197,6 +2197,7 @@ GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
index edfc9ca56f..37ab788373 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/ld.abilist
@@ -8,3 +8,4 @@ GLIBC_2.3 calloc F
 GLIBC_2.3 free F
 GLIBC_2.3 malloc F
 GLIBC_2.3 realloc F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 633d8f4792..568a117cd3 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2027,6 +2027,7 @@ GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
index 37c8f6684b..2a4ab2e318 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/ld.abilist
@@ -8,3 +8,4 @@ GLIBC_2.17 malloc F
 GLIBC_2.17 realloc F
 GLIBC_2.22 __tls_get_addr_opt F
 GLIBC_2.23 __parse_hwcap_and_convert_at_platform F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 2c712636ef..57084175ff 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2231,3 +2231,4 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
index b411871d06..307511e435 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist
@@ -7,3 +7,4 @@ GLIBC_2.27 calloc F
 GLIBC_2.27 free F
 GLIBC_2.27 malloc F
 GLIBC_2.27 realloc F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 195bc8b2cf..05f1041ec1 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2103,3 +2103,4 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
new file mode 100644
index 0000000000..edb31b1c3c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -0,0 +1,88 @@
+/* Restartable Sequences internal API. Linux implementation.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+#include <sysdep.h>
+#include <errno.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#endif
+
+#if defined __NR_rseq && defined RSEQ_SIG
+
+static inline int
+rseq_register_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED)
+    return -1;
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              0, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY)
+    __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
+  ret = -1;
+end:
+  return ret;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  ret = -1;
+end:
+  return ret;
+}
+
+static inline void
+rseq_init (void)
+{
+  __rseq_handled = 1;
+}
+#else
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+
+static inline void
+rseq_init (void)
+{
+}
+#endif
+
+#endif /* rseq-internal.h */
diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c
new file mode 100644
index 0000000000..f86869a380
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-sym.c
@@ -0,0 +1,43 @@
+/* Restartable Sequences exported symbols. Linux Implementation.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/syscall.h>
+#include <stdint.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#else
+
+enum rseq_cpu_id_state {
+  RSEQ_CPU_ID_UNINITIALIZED = -1,
+  RSEQ_CPU_ID_REGISTRATION_FAILED = -2,
+};
+
+/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI
+   size is 20 bytes.  */
+struct rseq {
+  uint32_t cpu_id_start;
+  uint32_t cpu_id;
+  uint64_t rseq_cs;
+  uint32_t flags;
+} __attribute__ ((aligned(4 * sizeof(uint64_t))));
+
+#endif
+
+__thread struct rseq __rseq_abi = {
+  .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
new file mode 100644
index 0000000000..453250d761
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
@@ -0,0 +1,37 @@
+/* Restartable Sequences Linux s390 architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   RSEQ_SIG uses the trap4 instruction. As Linux does not make use of the
+   access-register mode nor the linkage stack this instruction will always
+   cause a special-operation exception (the trap-enabled bit in the DUCT
+   is and will stay 0). The instruction pattern is
+       b2 ff 0f ff        trap4   4095(%r0)  */
+
+#define RSEQ_SIG	0xB2FF0FFF
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
index 0576c9575e..ee723c3d36 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/ld.abilist
@@ -6,3 +6,4 @@ GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_offset F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 334def033c..1a5f4b4b70 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2159,6 +2159,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
index 1fbb890d1d..422c82bd43 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/ld.abilist
@@ -6,3 +6,4 @@ GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_offset F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 536f4c4ced..eb76ca2cb8 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2063,6 +2063,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/ld.abilist b/sysdeps/unix/sysv/linux/sh/ld.abilist
index 0387614d8f..3614f0c684 100644
--- a/sysdeps/unix/sysv/linux/sh/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sh/ld.abilist
@@ -6,4 +6,5 @@ GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __stack_chk_guard D 0x4
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index 30ae3b6ebb..26d15fcedb 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -2041,6 +2041,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
index fd0b33f86d..364c8adba1 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/ld.abilist
@@ -6,3 +6,4 @@ GLIBC_2.0 realloc F
 GLIBC_2.1 __libc_stack_end D 0x4
 GLIBC_2.1 _dl_mcount F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 68b107d080..211c0aa202 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2153,6 +2153,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
index 82042472c3..3eb3449014 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/ld.abilist
@@ -6,3 +6,4 @@ GLIBC_2.2 free F
 GLIBC_2.2 malloc F
 GLIBC_2.2 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index e5b6a4da50..00ff39b04d 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2092,6 +2092,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
new file mode 100644
index 0000000000..56a2b1ebbe
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sys/rseq.h
@@ -0,0 +1,52 @@
+/* Restartable Sequences exported symbols. Linux header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+#define _SYS_RSEQ_H	1
+
+/* We use the structures declarations from the kernel headers.  */
+#include <linux/rseq.h>
+/* Architecture-specific rseq signature.  */
+#include <bits/rseq.h>
+#include <stdint.h>
+
+extern __thread struct rseq __rseq_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+/* Advertise Restartable Sequences registration ownership across
+   application and shared libraries.
+
+   Libraries and applications must check whether this variable is zero or
+   non-zero if they wish to perform rseq registration on their own. If it
+   is zero, it means restartable sequence registration is not handled, and
+   the library or application is free to perform rseq registration. In
+   that case, the library or application is taking ownership of rseq
+   registration, and may set __rseq_handled to 1. It may then set it back
+   to 0 after it completes unregistering rseq.
+
+   If __rseq_handled is found to be non-zero, it means that another
+   library (or the application) is currently handling rseq registration.
+
+   Typical use of __rseq_handled is within library constructors and
+   destructors, or at program startup.
+
+   The fact that a library handles rseq registration is orthogonal to whether
+   the running kernel implements the rseq system call or not.  */
+
+extern int __rseq_handled;
+
+#endif /* sys/rseq.h */
diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
new file mode 100644
index 0000000000..a2918c4617
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
@@ -0,0 +1,30 @@
+/* Restartable Sequences Linux x86 architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   RSEQ_SIG is used with the following reserved undefined instructions, which
+   trap in user-space:
+
+   x86-32:    0f b9 3d 53 30 05 53      ud1    0x53053053,%edi
+   x86-64:    0f b9 3d 53 30 05 53      ud1    0x53053053(%rip),%edi  */
+
+#define RSEQ_SIG	0x53053053
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist b/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
index 0dc9430611..68eb0c0ced 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/ld.abilist
@@ -6,3 +6,4 @@ GLIBC_2.2.5 free F
 GLIBC_2.2.5 malloc F
 GLIBC_2.2.5 realloc F
 GLIBC_2.3 __tls_get_addr F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 86dfb0c94d..0fb1c0048f 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2050,6 +2050,7 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
index 80f3161586..89ce06504b 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist
@@ -6,3 +6,4 @@ GLIBC_2.16 calloc F
 GLIBC_2.16 free F
 GLIBC_2.16 malloc F
 GLIBC_2.16 realloc F
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index dd688263aa..f692d27035 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2149,3 +2149,4 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t
From: Dmitry Safonov @ 2019-06-14 14:39 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86
In-Reply-To: <alpine.DEB.2.21.1906141511440.1722@nanos.tec.linutronix.de>

Hi Thomas,

Thanks much for the review,

On 6/14/19 2:32 PM, Thomas Gleixner wrote:
> Dmitry,
> 
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
>> From: Andrei Vagin <avagin@gmail.com>
>>
>> The callsite in common_timer_get() has already a comment:
>>         /*
>>          * The timespec64 based conversion is suboptimal, but it's not
>>          * worth to implement yet another callback.
>>          */
>>         kc->clock_get(timr->it_clock, &ts64);
>>         now = timespec64_to_ktime(ts64);
>>
>> Now we are going to add time namespaces and we need to be able to get:
> 
> Please avoid 'we' and try to describe the changes in a neutral technical
> form, e.g.:
> 
>  The upcoming support for time namespaces requires to have access to:
> 
>> * clock value in a task time namespace to return it from the clock_gettime
>>   syscall.
> 
>   - The time in a tasks time namespace for sys_clock_gettime()
> 
>> * clock valuse in the root time namespace to use it in
>>   common_timer_get().
> 
>   - The time in the root name space for common_timer_get()
> 
>> It looks like another reason why we need a separate callback to return
>> clock value in ktime_t.
> 
>  That adds a valid reason to finally implement a separate callback which
>  returns the time in ktime_t format.
> 
> Hmm?

Agree, the patch has become bigger than wanted and the message could
have been better in technical sense. Will split, add kernel doc and fix
the commit message(s).

[..]
> TBH, this patch is way to big. It changes too many things at once. Can you
> please structure it this way:
> 
>  1) Rename k_clock::clock_get to k_clock::clock_get_timespec and fix up all
>     struct initializers
> 
>  2) Rename the clock_get_timespec functions per instance
> 
>  3) Add the new callback
> 
>  4) Add the new functions per instance and add them to the corresponding
>     struct initializers
> 
>  5) Use the new callback
> 
Thanks,
          Dima

^ permalink raw reply

* Re: [PATCHv4 02/28] timens: Add timens_offsets
From: Dmitry Safonov @ 2019-06-14 14:32 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
	Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86
In-Reply-To: <alpine.DEB.2.21.1906141510100.1722@nanos.tec.linutronix.de>

On 6/14/19 2:11 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
>> From: Andrei Vagin <avagin@openvz.org>
>>
>> Introduce offsets for time namespace. They will contain an adjustment
>> needed to convert clocks to/from host's.
>>
>> Allocate one page for each time namespace that will be premapped into
>> userspace among vvar pages.
>> index 000000000000..7d7cb68ea778
>> --- /dev/null
>> +++ b/include/linux/timens_offsets.h
>> @@ -0,0 +1,8 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_TIME_OFFSETS_H
>> +#define _LINUX_TIME_OFFSETS_H
>> +
>> +struct timens_offsets {
>> +};
> 
> That empty struct which is nowhere used looks odd. Can you move that to the
> patch which actually makes use of it?

Sure, makes sense.

Thanks,
          Dmitry

^ permalink raw reply

* Re: [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line
From: Thomas Gleixner @ 2019-06-14 14:13 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
	linux-api, x86
In-Reply-To: <20190612192628.23797-27-dima@arista.com>

On Wed, 12 Jun 2019, Dmitry Safonov wrote:

> From: Andrei Vagin <avagin@gmail.com>
> 
> After performance testing VDSO patches a noticeable 20% regression was
> found on gettime_perf selftest with a cold cache.
> As it turns to be, before time namespaces introduction, VDSO functions
> were quite aligned to cache lines, but adding a new code to adjust
> timens offset inside namespace created a small shift and vdso functions
> become unaligned on cache lines.
> 
> Add align to vdso functions with gcc option to fix performance drop.
> 
> Coping the resulting numbers from cover letter:
> 
> Hot CPU cache (more gettime_perf.c cycles - the better):
>         | before     | CONFIG_TIME_NS=n | host        | inside timens
> --------|------------|------------------|-------------|-------------
> cycles  | 139887013  | 139453003        | 139899785   | 128792458
> diff (%)| 100        | 99.7             | 100         | 92

Why is CONFIG_TIME_NS=n behaving worse than current mainline and
worse than 'host' mode?

> Cold cache (lesser tsc per gettime_perf_cold.c cycle - the better):
>         | before     | CONFIG_TIME_NS=n | host        | inside timens
> --------|------------|------------------|-------------|-------------
> tsc     | 6748       | 6718             | 6862        | 12682
> diff (%)| 100        | 99.6             | 101.7       | 188

Weird, now CONFIG_TIME_NS=n is better than current mainline and 'host' mode
drops.

Either I'm misreading the numbers or missing something or I'm just confused
as usual :)

Thanks,

	tglx

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox