public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Chris Wright <chrisw@sous-sol.org>
Cc: linux-kernel@vger.kernel.org, stable@kernel.org, jejb@kernel.org,
	Justin Forbes <jmforbes@linuxtx.org>,
	Zwane Mwaikambo <zwane@arm.linux.org.uk>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Dave Jones <davej@redhat.com>,
	Chuck Wolber <chuckw@quantumlinux.com>,
	Chris Wedgwood <reviews@ml.cw.f00f.org>,
	Michael Krufky <mkrufky@linuxtv.org>,
	Chuck Ebbert <cebbert@redhat.com>,
	Domenico Andreoli <cavokz@gmail.com>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@elte.hu>,
	Lennert Buytenhek <buytenh@wantstofly.org>,
	Riku Voipio <riku.voipio@movial.fi>,
	Greg Kroah-Hartman <gregkh@suse.de>
Subject: Re: [patch 33/76] futex: runtime enable pi and robust functionality
Date: Thu, 27 Mar 2008 14:28:33 +1100	[thread overview]
Message-ID: <1206588514.6926.76.camel@pasglop> (raw)
In-Reply-To: <20080321224400.674465387@sous-sol.org>


On Fri, 2008-03-21 at 15:43 -0700, Chris Wright wrote:
> plain text document attachment
> (futex-runtime-enable-pi-and-robust-functionality.patch)
> -stable review patch.  If anyone has any objections, please let us know.
> ---------------------
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Not all architectures implement futex_atomic_cmpxchg_inatomic().  The default
> implementation returns -ENOSYS, which is currently not handled inside of the
> futex guts.

I think that breaks some embedded PowerPC platforms. It's also very
fishy... So early, we haven't activated an mm yet, it's like trying
to access user memory from a kernel thread... might happen to give
you -EFAULT on x86 but you are making pretty big assumptions on how
low level mm works in architectures...

Ben.

> Futex PI calls and robust list exits with a held futex result in an endless
> loop in the futex code on architectures which have no support.
> 
> Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
> add a fair amount of extra if/else constructs to the already complex code.  It
> is also not possible to disable the robust feature before user space tries to
> register robust lists.
> 
> Compile time disabling is not a good idea either, as there are already
> architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.
> 
> Detect the functionality at runtime instead by calling
> cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
> code.  This is guaranteed to fail, but the call of
> futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.
> 
> On architectures, which use the asm-generic implementation or have a runtime
> CPU feature detection, a -ENOSYS return value disables the PI/robust features.
> 
> On architectures with a working implementation the call returns -EFAULT and
> the PI/robust features are enabled.
> 
> The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
> when the detection fails.
> 
> Fixes http://lkml.org/lkml/2008/2/11/149
> Originally reported by: Lennart Buytenhek
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Acked-by: Ingo Molnar <mingo@elte.hu>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Riku Voipio <riku.voipio@movial.fi>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> ---
>  include/linux/futex.h |    1 +
>  kernel/futex.c        |   38 ++++++++++++++++++++++++++++++++++----
>  kernel/futex_compat.c |    9 +++++++++
>  3 files changed, 44 insertions(+), 4 deletions(-)
> 
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -153,6 +153,7 @@ union futex_key {
>  #ifdef CONFIG_FUTEX
>  extern void exit_robust_list(struct task_struct *curr);
>  extern void exit_pi_state_list(struct task_struct *curr);
> +extern int futex_cmpxchg_enabled;
>  #else
>  static inline void exit_robust_list(struct task_struct *curr)
>  {
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -60,6 +60,8 @@
>  
>  #include "rtmutex_common.h"
>  
> +int __read_mostly futex_cmpxchg_enabled;
> +
>  #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
>  
>  /*
> @@ -466,6 +468,8 @@ void exit_pi_state_list(struct task_stru
>  	struct futex_hash_bucket *hb;
>  	union futex_key key;
>  
> +	if (!futex_cmpxchg_enabled)
> +		return;
>  	/*
>  	 * We are a ZOMBIE and nobody can enqueue itself on
>  	 * pi_state_list anymore, but we have to be careful
> @@ -1854,6 +1858,8 @@ asmlinkage long
>  sys_set_robust_list(struct robust_list_head __user *head,
>  		    size_t len)
>  {
> +	if (!futex_cmpxchg_enabled)
> +		return -ENOSYS;
>  	/*
>  	 * The kernel knows only one size for now:
>  	 */
> @@ -1878,6 +1884,9 @@ sys_get_robust_list(int pid, struct robu
>  	struct robust_list_head __user *head;
>  	unsigned long ret;
>  
> +	if (!futex_cmpxchg_enabled)
> +		return -ENOSYS;
> +
>  	if (!pid)
>  		head = current->robust_list;
>  	else {
> @@ -1980,6 +1989,9 @@ void exit_robust_list(struct task_struct
>  	unsigned long futex_offset;
>  	int rc;
>  
> +	if (!futex_cmpxchg_enabled)
> +		return;
> +
>  	/*
>  	 * Fetch the list head (which was registered earlier, via
>  	 * sys_set_robust_list()):
> @@ -2034,7 +2046,7 @@ void exit_robust_list(struct task_struct
>  long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
>  		u32 __user *uaddr2, u32 val2, u32 val3)
>  {
> -	int ret;
> +	int ret = -ENOSYS;
>  	int cmd = op & FUTEX_CMD_MASK;
>  	struct rw_semaphore *fshared = NULL;
>  
> @@ -2062,13 +2074,16 @@ long do_futex(u32 __user *uaddr, int op,
>  		ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
>  		break;
>  	case FUTEX_LOCK_PI:
> -		ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
> +		if (futex_cmpxchg_enabled)
> +			ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
>  		break;
>  	case FUTEX_UNLOCK_PI:
> -		ret = futex_unlock_pi(uaddr, fshared);
> +		if (futex_cmpxchg_enabled)
> +			ret = futex_unlock_pi(uaddr, fshared);
>  		break;
>  	case FUTEX_TRYLOCK_PI:
> -		ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
> +		if (futex_cmpxchg_enabled)
> +			ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
>  		break;
>  	default:
>  		ret = -ENOSYS;
> @@ -2123,8 +2138,23 @@ static struct file_system_type futex_fs_
>  
>  static int __init init(void)
>  {
> +	u32 curval;
>  	int i;
>  
> +	/*
> +	 * This will fail and we want it. Some arch implementations do
> +	 * runtime detection of the futex_atomic_cmpxchg_inatomic()
> +	 * functionality. We want to know that before we call in any
> +	 * of the complex code paths. Also we want to prevent
> +	 * registration of robust lists in that case. NULL is
> +	 * guaranteed to fault and we get -EFAULT on functional
> +	 * implementation, the non functional ones will return
> +	 * -ENOSYS.
> +	 */
> +	curval = cmpxchg_futex_value_locked(NULL, 0, 0);
> +	if (curval == -EFAULT)
> +		futex_cmpxchg_enabled = 1;
> +
>  	for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
>  		plist_head_init(&futex_queues[i].chain, &futex_queues[i].lock);
>  		spin_lock_init(&futex_queues[i].lock);
> --- a/kernel/futex_compat.c
> +++ b/kernel/futex_compat.c
> @@ -54,6 +54,9 @@ void compat_exit_robust_list(struct task
>  	compat_long_t futex_offset;
>  	int rc;
>  
> +	if (!futex_cmpxchg_enabled)
> +		return;
> +
>  	/*
>  	 * Fetch the list head (which was registered earlier, via
>  	 * sys_set_robust_list()):
> @@ -115,6 +118,9 @@ asmlinkage long
>  compat_sys_set_robust_list(struct compat_robust_list_head __user *head,
>  			   compat_size_t len)
>  {
> +	if (!futex_cmpxchg_enabled)
> +		return -ENOSYS;
> +
>  	if (unlikely(len != sizeof(*head)))
>  		return -EINVAL;
>  
> @@ -130,6 +136,9 @@ compat_sys_get_robust_list(int pid, comp
>  	struct compat_robust_list_head __user *head;
>  	unsigned long ret;
>  
> +	if (!futex_cmpxchg_enabled)
> +		return -ENOSYS;
> +
>  	if (!pid)
>  		head = current->compat_robust_list;
>  	else {
> 


  parent reply	other threads:[~2008-03-27  3:32 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-21 22:42 [patch 00/76] 2.6.24-stable review Chris Wright
2008-03-21 22:42 ` [patch 01/76] Revert "NET: Add if_addrlabel.h to sanitized headers." Chris Wright
2008-03-21 22:42 ` [patch 02/76] SPARC64: Loosen checks in exception table handling Chris Wright
2008-03-21 22:42 ` [patch 03/76] SPARC: Fix link errors with gcc-4.3 Chris Wright
2008-03-21 22:42 ` [patch 04/76] TCP: Improve ipv4 established hash function Chris Wright
2008-03-21 22:42 ` [patch 05/76] NIU: More BMAC alt MAC address fixes Chris Wright
2008-03-21 22:42 ` [patch 06/76] NIU: Fix BMAC alternate MAC address indexing Chris Wright
2008-03-21 22:42 ` [patch 07/76] NIU: Bump driver version and release date Chris Wright
2008-03-22  0:10   ` Jesper Juhl
2008-03-22  0:14     ` David Miller
2008-03-22  0:16       ` Jesper Juhl
2008-03-26 21:28         ` Jesper Juhl
2008-03-22  2:35       ` Roland Dreier
2008-03-22  5:21         ` David Miller
2008-03-22 22:46           ` David Miller
2008-03-23  4:20             ` Roland Dreier
2008-03-22 22:57       ` Pekka Enberg
2008-03-22 23:26         ` David Miller
2008-03-23  8:42           ` Pekka Enberg
2008-03-23 10:18             ` David Miller
2008-03-23  4:13         ` Greg KH
2008-03-23  8:08           ` Chris Wright
2008-03-24 22:23         ` Adrian Bunk
2008-03-21 22:42 ` [patch 08/76] NET: Messed multicast lists after dev_mc_sync/unsync Chris Wright
2008-03-21 22:42 ` [patch 09/76] NET: Fix race in dev_close(). (Bug 9750) Chris Wright
2008-03-21 22:43 ` [patch 10/76] IPV6: Fix IPsec datagram fragmentation Chris Wright
2008-03-21 22:43 ` [patch 11/76] IPV6: dst_entry leak in ip4ip6_err Chris Wright
2008-03-21 22:43 ` [patch 12/76] IPV4: Remove IP_TOS setting privilege checks Chris Wright
2008-03-21 22:43 ` [patch 13/76] IPCONFIG: The kernel gets no IP from some DHCP servers Chris Wright
2008-03-21 22:43 ` [patch 14/76] IPCOMP: Disable BH on output when using shared tfm Chris Wright
2008-03-21 22:43 ` [patch 15/76] IRQ_NOPROBE helper functions Chris Wright
2008-03-21 22:43 ` [patch 16/76] MIPS: Mark all but i8259 interrupts as no-probe Chris Wright
2008-03-21 22:43 ` [patch 17/76] ub: fix up the conversion to sg_init_table() Chris Wright
2008-03-21 22:43 ` [patch 18/76] x86: Clear DF before calling signal handler Chris Wright
2008-03-21 22:43 ` [patch 19/76] iov_iter_advance() fix Chris Wright
2008-03-21 22:43 ` [patch 20/76] drivers: fix dma_get_required_mask Chris Wright
2008-03-21 22:43 ` [patch 21/76] x86: adjust enable_NMI_through_LVT0() Chris Wright
2008-03-21 22:43 ` [patch 22/76] SCSI ips: handle scsi_add_host() failure, and other err cleanups Chris Wright
2008-03-21 22:43 ` [patch 23/76] CRYPTO xcbc: Fix crash with IPsec Chris Wright
2008-03-21 22:43 ` [patch 24/76] CRYPTO xts: Use proper alignment Chris Wright
2008-03-21 22:43 ` [patch 25/76] fuse: fix permission checking Chris Wright
2008-03-21 22:43 ` [patch 26/76] usb-storage: dont access beyond the end of the sg buffer Chris Wright
2008-03-21 22:43 ` [patch 27/76] SCSI ips: fix data buffer accessors conversion bug Chris Wright
2008-03-21 22:43 ` [patch 28/76] SCSI gdth: dont call pci_free_consistent under spinlock Chris Wright
2008-03-21 22:43 ` [patch 29/76] SCSI aic94xx: fix REQ_TASK_ABORT and REQ_DEVICE_RESET Chris Wright
2008-03-21 22:43 ` [patch 30/76] x86: replace LOCK_PREFIX in futex.h Chris Wright
2008-03-21 22:43 ` [patch 31/76] ARM pxa: fix clock lookup to find specific device clocks Chris Wright
2008-03-21 22:43 ` [patch 32/76] futex: fix init order Chris Wright
2008-03-21 22:43 ` [patch 33/76] futex: runtime enable pi and robust functionality Chris Wright
2008-03-22  7:37   ` Thomas Gleixner
2008-03-22 16:19     ` Heiko Carstens
2008-03-22 18:27       ` Chris Wright
2008-03-22 22:05     ` David Miller
2008-03-27  3:28   ` Benjamin Herrenschmidt [this message]
2008-03-21 22:43 ` [patch 34/76] file capabilities: simplify signal check Chris Wright
2008-03-21 22:43 ` [patch 35/76] hugetlb: ensure we do not reference a surplus page after handing it to buddy Chris Wright
2008-03-21 22:43 ` [patch 36/76] ufs: fix parenthesisation in ufs_set_fs_state() Chris Wright
2008-03-21 22:43 ` [patch 37/76] spi: pxa2xx_spi clock polarity fix Chris Wright
2008-03-21 22:43 ` [patch 38/76] NETFILTER: nfnetlink_queue: fix SKB_LINEAR_ASSERT when mangling packet data Chris Wright
2008-03-21 22:43 ` [patch 39/76] NETFILTER: Fix incorrect use of skb_make_writable Chris Wright
2008-03-21 22:43 ` [patch 40/76] NETFILTER: fix ebtable targets return Chris Wright
2008-03-21 22:43 ` [patch 41/76] SCSI advansys: fix overrun_buf aligned bug Chris Wright
2008-03-21 22:43 ` [patch 42/76] pata_hpt*, pata_serverworks: fix UDMA masking Chris Wright
2008-03-21 22:43 ` [patch 43/76] moduleparam: fix alpha, ia64 and ppc64 compile failures Chris Wright
2008-03-21 22:43 ` [patch 44/76] PCI x86: always use conf1 to access config space below 256 bytes Chris Wright
2008-03-21 22:43 ` [patch 45/76] e1000e: Fix CRC stripping in hardware context bug Chris Wright
2008-03-21 22:43 ` [patch 46/76] arcmsr: fix IRQs disabled warning spew Chris Wright
2008-03-21 22:43 ` [patch 47/76] b43: Backport bcm4311 fix Chris Wright
2008-03-21 22:43 ` [patch 48/76] atmel_spi: fix clock polarity Chris Wright
2008-03-21 22:43 ` [patch 49/76] x86: move out tick_nohz_stop_sched_tick() call from the loop Chris Wright
2008-03-21 22:43 ` [patch 50/76] macb: Fix speed setting Chris Wright
2008-03-21 22:43 ` [patch 51/76] ioat: fix ack handling, driver must ensure that ack is zero Chris Wright
2008-03-21 22:43 ` [patch 52/76] eCryptfs: make ecryptfs_prepare_write decrypt the page Chris Wright
2008-03-21 22:43 ` [patch 53/76] VT notifier fix for VT switch Chris Wright
2008-03-21 22:43 ` [patch 54/76] USB: ftdi_sio: Workaround for broken Matrix Orbital serial port Chris Wright
2008-03-21 22:43 ` [patch 55/76] USB: ftdi_sio - really enable EM1010PC Chris Wright
2008-03-21 22:43 ` [patch 56/76] USB: ehci: handle large bulk URBs correctly (again) Chris Wright
2008-03-21 22:43 ` [patch 57/76] SCSI: fix BUG when sum(scatterlist) > bufflen Chris Wright
2008-03-21 22:43 ` [patch 58/76] x86: dont use P6_NOPs if compiling with CONFIG_X86_GENERIC Chris Wright
2008-03-21 22:43 ` [patch 59/76] Fix default compose table initialization Chris Wright
2008-03-21 22:43 ` [patch 60/76] SCSI: gdth: bugfix for the at-exit problems Chris Wright
2008-03-21 22:43 ` [patch 61/76] SCSI: gdth: fix to internal commands execution Chris Wright
2008-03-21 22:43 ` [patch 62/76] SCSI: mpt fusion: dont oops if NumPhys==0 Chris Wright
2008-03-21 22:43 ` [patch 63/76] sched: fix race in schedule() Chris Wright
2008-03-21 22:43 ` [patch 64/76] nfsd: fix oops on access from high-numbered ports Chris Wright
2008-03-21 22:43 ` [patch 65/76] sched_nr_migrate wrong mode bits Chris Wright
2008-03-21 22:43 ` [patch 66/76] NETFILTER: xt_time: fix failure to match on Sundays Chris Wright
2008-03-21 22:43 ` [patch 67/76] NETFILTER: nfnetlink_queue: fix computation of allocated size for netlink skb Chris Wright
2008-03-21 22:43 ` [patch 68/76] NETFILTER: nfnetlink_log: fix computation of netlink skb size Chris Wright
2008-03-21 22:43 ` [patch 69/76] zisofs: fix readpage() outside i_size Chris Wright
2008-03-21 22:44 ` [patch 70/76] jbd2: correctly unescape journal data blocks Chris Wright
2008-03-21 22:44 ` [patch 71/76] jbd: " Chris Wright
2008-03-21 22:44 ` [patch 72/76] aio: bad AIO race in aio_complete() leads to process hang Chris Wright
2008-03-21 22:44 ` [patch 73/76] async_tx: avoid the async xor_zero_sum path when src_cnt > device->max_xor Chris Wright
2008-03-21 22:44 ` [patch 74/76] SCSI advansys: Fix bug in AdvLoadMicrocode Chris Wright
2008-03-21 22:44 ` [patch 75/76] BLUETOOTH: Fix bugs in previous conn add/del workqueue changes Chris Wright
2008-03-21 22:44 ` [patch 76/76] relay: fix subbuf_splice_actor() adding too many pages Chris Wright
2008-03-22  1:02 ` [patch 00/76] 2.6.24-stable review Joe Korty
2008-03-22 18:26   ` Chris Wright
2008-03-22 18:28 ` Chris Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1206588514.6926.76.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=buytenh@wantstofly.org \
    --cc=cavokz@gmail.com \
    --cc=cebbert@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=chuckw@quantumlinux.com \
    --cc=davej@redhat.com \
    --cc=gregkh@suse.de \
    --cc=jejb@kernel.org \
    --cc=jmforbes@linuxtx.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mkrufky@linuxtv.org \
    --cc=rdunlap@xenotime.net \
    --cc=reviews@ml.cw.f00f.org \
    --cc=riku.voipio@movial.fi \
    --cc=stable@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=zwane@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox