* [patch 0/4] ipc sem improvements
@ 2009-08-11 11:09 npiggin
2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel
Hi, this patchset is some improvements I've been trying to make to sysv
semaphore implementation which is pretty suboptimal (O(n^2) behaviour
with n waiting processes, among other things).
I've been quite careful to try retaining fifo and avoiding starvation
even for complex operations.
Comments?
Thanks,
Nick
^ permalink raw reply [flat|nested] 18+ messages in thread* [patch 1/4] ipc: sem optimise undo list search 2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin @ 2009-08-11 11:09 ` npiggin 2009-08-16 13:17 ` Manfred Spraul 2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel [-- Attachment #1: ipc-sem-undo-optimise.patch --] [-- Type: text/plain, Size: 1766 bytes --] Move last looked up sem_undo struct to the head of the task's undo list. Attempt to move common entries to the front of the list so search time is reduced. This reduces lookup_undo on oprofile of problematic SAP workload by 30% (see patch 4 for a description of SAP workload). Signed-off-by: Nick Piggin <npiggin@suse.de> --- ipc/sem.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) Index: linux-2.6/ipc/sem.c =================================================================== --- linux-2.6.orig/ipc/sem.c +++ linux-2.6/ipc/sem.c @@ -961,17 +961,31 @@ static inline int get_undo_list(struct s return 0; } -static struct sem_undo *lookup_undo(struct sem_undo_list *ulp, int semid) +static struct sem_undo *__lookup_undo(struct sem_undo_list *ulp, int semid) { - struct sem_undo *walk; + struct sem_undo *un; - list_for_each_entry_rcu(walk, &ulp->list_proc, list_proc) { - if (walk->semid == semid) - return walk; + list_for_each_entry_rcu(un, &ulp->list_proc, list_proc) { + if (un->semid == semid) + return un; } return NULL; } +static struct sem_undo *lookup_undo(struct sem_undo_list *ulp, int semid) +{ + struct sem_undo *un; + + assert_spin_locked(&ulp->lock); + + un = __lookup_undo(ulp, semid); + if (un) { + list_del_rcu(&un->list_proc); + list_add_rcu(&un->list_proc, &ulp->list_proc); + } + return un; +} + /** * find_alloc_undo - Lookup (and if not present create) undo array * @ns: namespace @@ -1307,7 +1321,7 @@ void exit_sem(struct task_struct *tsk) if (IS_ERR(sma)) continue; - un = lookup_undo(ulp, semid); + un = __lookup_undo(ulp, semid); if (un == NULL) { /* exit_sem raced with IPC_RMID+semget() that created * exactly the same semid. Nothing to do. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 1/4] ipc: sem optimise undo list search 2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin @ 2009-08-16 13:17 ` Manfred Spraul 0 siblings, 0 replies; 18+ messages in thread From: Manfred Spraul @ 2009-08-16 13:17 UTC (permalink / raw) To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On 08/11/2009 01:09 PM, npiggin@suse.de wrote: > Move last looked up sem_undo struct to the head of the task's undo list. > Attempt to move common entries to the front of the list so search time is > reduced. This reduces lookup_undo on oprofile of problematic SAP workload > by 30% (see patch 4 for a description of SAP workload). > > Signed-off-by: Nick Piggin<npiggin@suse.de> > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [patch 2/4] ipc: sem use list operations 2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin 2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin @ 2009-08-11 11:09 ` npiggin 2009-08-16 13:18 ` Manfred Spraul 2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin 2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin 3 siblings, 1 reply; 18+ messages in thread From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel [-- Attachment #1: ipc-sem-use-list-ops.patch --] [-- Type: text/plain, Size: 2838 bytes --] Signed-off-by: Nick Piggin <npiggin@suse.de> --- ipc/sem.c | 79 +++++++++++++++++++++++++------------------------------------- 1 file changed, 33 insertions(+), 46 deletions(-) Index: linux-2.6/ipc/sem.c =================================================================== --- linux-2.6.orig/ipc/sem.c +++ linux-2.6/ipc/sem.c @@ -402,58 +402,45 @@ undo: */ static void update_queue (struct sem_array * sma) { - int error; - struct sem_queue * q; + struct sem_queue *q, *tq; + +again: + list_for_each_entry_safe(q, tq, &sma->sem_pending, list) { + int error; + int alter; - q = list_entry(sma->sem_pending.next, struct sem_queue, list); - while (&q->list != &sma->sem_pending) { error = try_atomic_semop(sma, q->sops, q->nsops, q->undo, q->pid); /* Does q->sleeper still need to sleep? */ - if (error <= 0) { - struct sem_queue *n; + if (error > 0) + continue; + + list_del(&q->list); + + /* + * The next operation that must be checked depends on the type + * of the completed operation: + * - if the operation modified the array, then restart from the + * head of the queue and check for threads that might be + * waiting for semaphore values to become 0. + * - if the operation didn't modify the array, then just + * continue. + */ + alter = q->alter; + + /* wake up the waiting thread */ + q->status = IN_WAKEUP; + + wake_up_process(q->sleeper); + /* hands-off: q will disappear immediately after + * writing q->status. + */ + smp_wmb(); + q->status = error; - /* - * Continue scanning. The next operation - * that must be checked depends on the type of the - * completed operation: - * - if the operation modified the array, then - * restart from the head of the queue and - * check for threads that might be waiting - * for semaphore values to become 0. - * - if the operation didn't modify the array, - * then just continue. - * The order of list_del() and reading ->next - * is crucial: In the former case, the list_del() - * must be done first [because we might be the - * first entry in ->sem_pending], in the latter - * case the list_del() must be done last - * [because the list is invalid after the list_del()] - */ - if (q->alter) { - list_del(&q->list); - n = list_entry(sma->sem_pending.next, - struct sem_queue, list); - } else { - n = list_entry(q->list.next, struct sem_queue, - list); - list_del(&q->list); - } - - /* wake up the waiting thread */ - q->status = IN_WAKEUP; - - wake_up_process(q->sleeper); - /* hands-off: q will disappear immediately after - * writing q->status. - */ - smp_wmb(); - q->status = error; - q = n; - } else { - q = list_entry(q->list.next, struct sem_queue, list); - } + if (alter) + goto again; } } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 2/4] ipc: sem use list operations 2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin @ 2009-08-16 13:18 ` Manfred Spraul 0 siblings, 0 replies; 18+ messages in thread From: Manfred Spraul @ 2009-08-16 13:18 UTC (permalink / raw) To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On 08/11/2009 01:09 PM, npiggin@suse.de wrote: > Signed-off-by: Nick Piggin<npiggin@suse.de> > Signed-off-by: Manfred Spraul <manfred@colorfullife.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* [patch 3/4] ipc: sem preempt improve 2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin 2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin 2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin @ 2009-08-11 11:09 ` npiggin 2009-08-16 13:20 ` Manfred Spraul 2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin 3 siblings, 1 reply; 18+ messages in thread From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel [-- Attachment #1: ipc-sem-preempt-improve.patch --] [-- Type: text/plain, Size: 2573 bytes --] The strange sysv semaphore wakeup scheme has a kind of busy-wait lock involved, which could deadlock if preemption is enabled during the "lock". It is an implementation detail (due to a spinlock being held) that this is actually the case. However if "spinlocks" are made preemptible, or if the sem lock is changed to a sleeping lock for example, then the wakeup would become buggy. So this might be a bugfix for -rt kernels. Imagine waker being preempted by wakee and never clearing IN_WAKEUP -- if wakee has higher RT priority then there is a priority inversion deadlock. Even if there is not a priority inversion to cause a deadlock, then there is still time wasted spinning. Signed-off-by: Nick Piggin <npiggin@suse.de> --- ipc/sem.c | 38 +++++++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 15 deletions(-) Index: linux-2.6/ipc/sem.c =================================================================== --- linux-2.6.orig/ipc/sem.c +++ linux-2.6/ipc/sem.c @@ -397,6 +397,27 @@ undo: return result; } +/* + * Wake up a process waiting on the sem queue with a given error. + * The queue is invalid (may not be accessed) after the function returns. + */ +static void wake_up_sem_queue(struct sem_queue *q, int error) +{ + /* + * Hold preempt off so that we don't get preempted and have the + * wakee busy-wait until we're scheduled back on. We're holding + * locks here so it may not strictly be needed, however if the + * locks become preemptible then this prevents such a problem. + */ + preempt_disable(); + q->status = IN_WAKEUP; + wake_up_process(q->sleeper); + /* hands-off: q can disappear immediately after writing q->status. */ + smp_wmb(); + q->status = error; + preempt_enable(); +} + /* Go through the pending queue for the indicated semaphore * looking for tasks that can be completed. */ @@ -428,17 +449,7 @@ again: * continue. */ alter = q->alter; - - /* wake up the waiting thread */ - q->status = IN_WAKEUP; - - wake_up_process(q->sleeper); - /* hands-off: q will disappear immediately after - * writing q->status. - */ - smp_wmb(); - q->status = error; - + wake_up_sem_queue(q, error); if (alter) goto again; } @@ -522,10 +533,7 @@ static void freeary(struct ipc_namespace list_for_each_entry_safe(q, tq, &sma->sem_pending, list) { list_del(&q->list); - q->status = IN_WAKEUP; - wake_up_process(q->sleeper); /* doesn't sleep */ - smp_wmb(); - q->status = -EIDRM; /* hands-off q */ + wake_up_sem_queue(q, -EIDRM); } /* Remove the semaphore set from the IDR */ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 3/4] ipc: sem preempt improve 2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin @ 2009-08-16 13:20 ` Manfred Spraul 0 siblings, 0 replies; 18+ messages in thread From: Manfred Spraul @ 2009-08-16 13:20 UTC (permalink / raw) To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On 08/11/2009 01:09 PM, npiggin@suse.de wrote: > The strange sysv semaphore wakeup scheme has a kind of busy-wait lock > involved, which could deadlock if preemption is enabled during the > "lock". > > It is an implementation detail (due to a spinlock being held) that this > is actually the case. However if "spinlocks" are made preemptible, or if > the sem lock is changed to a sleeping lock for example, then the wakeup > would become buggy. So this might be a bugfix for -rt kernels. > > Imagine waker being preempted by wakee and never clearing IN_WAKEUP -- > if wakee has higher RT priority then there is a priority inversion deadlock. > Even if there is not a priority inversion to cause a deadlock, then there > is still time wasted spinning. > > Signed-off-by: Nick Piggin<npiggin@suse.de> > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [patch 4/4] ipc: sem optimise simple operations 2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin ` (2 preceding siblings ...) 2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin @ 2009-08-11 11:09 ` npiggin 2009-08-11 18:19 ` Manfred Spraul ` (2 more replies) 3 siblings, 3 replies; 18+ messages in thread From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel [-- Attachment #1: ipc-sem-simple-lists.patch --] [-- Type: text/plain, Size: 9587 bytes --] When sysv semaphores are used as a simple semaphore/mutex (single semop operation on a single sem in the sem-array), the current implementation is suboptimal because it has to handle the complex cases of multiple operations atomically on multiple semaphores. To optimise the simple case, add operations lists to struct sem, where simple operations are added. When there are complex operations pending, we ignore the simple lists and continue to use the old algorithm, however when there are no complex operations, subsequent simple operations need only look at the simple lists which can be done in a much simper way (no need to search the list multiple times). Timed with a simple test program to fork a number of threads which each lock/unlock a sysv semaphore a number of times (with 1us busyloop in each critical section and 10us busyloop outside). Run on 2s8c Opteron. procs time per loop vanilla patched 1 11986ns 11837ns 32 10650ns 7746ns 128 17103ns 8472ns 1024 385885ns 9699ns We hae a report of an SAP application with high idle time caused by very high wait-time on a sysv semaphore, on which it is performing these "simple" operations. Semaphore wait times are somewhat improved with this patch, and sysv semaphore kernel functions are reduced by 40% in the profile (with this and the previous patches combined). It is quite significant considering it is measured on quite a small system (4s16c) and only with 64 processes. Signed-off-by: Nick Piggin <npiggin@suse.de> --- include/linux/sem.h | 6 ++ ipc/sem.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 123 insertions(+), 13 deletions(-) Index: linux-2.6/ipc/sem.c =================================================================== --- linux-2.6.orig/ipc/sem.c +++ linux-2.6/ipc/sem.c @@ -240,6 +240,7 @@ static int newary(struct ipc_namespace * key_t key = params->key; int nsems = params->u.nsems; int semflg = params->flg; + int i; if (!nsems) return -EINVAL; @@ -272,6 +273,15 @@ static int newary(struct ipc_namespace * ns->used_sems += nsems; sma->sem_base = (struct sem *) &sma[1]; + + for (i = 0; i < nsems; i++) { + struct sem *s = &sma->sem_base[i]; + + INIT_LIST_HEAD(&s->negv_pending); + INIT_LIST_HEAD(&s->zero_pending); + } + + sma->complex_count = 0; INIT_LIST_HEAD(&sma->sem_pending); INIT_LIST_HEAD(&sma->list_id); sma->sem_nsems = nsems; @@ -335,7 +345,7 @@ SYSCALL_DEFINE3(semget, key_t, key, int, * all at once. Return 0 if yes, 1 if need to sleep, else return error code. */ -static int try_atomic_semop (struct sem_array * sma, struct sembuf * sops, +static int try_atomic_semop(struct sem_array * sma, struct sembuf * sops, int nsops, struct sem_undo *un, int pid) { int result, sem_op; @@ -421,7 +431,7 @@ static void wake_up_sem_queue(struct sem /* Go through the pending queue for the indicated semaphore * looking for tasks that can be completed. */ -static void update_queue (struct sem_array * sma) +static void update_queue(struct sem_array *sma) { struct sem_queue *q, *tq; @@ -438,23 +448,94 @@ again: continue; list_del(&q->list); + if (q->nsops == 1) + list_del(&q->simple_list); + else + sma->complex_count--; + + alter = q->alter; + wake_up_sem_queue(q, error); /* * The next operation that must be checked depends on the type * of the completed operation: * - if the operation modified the array, then restart from the * head of the queue and check for threads that might be - * waiting for semaphore values to become 0. + * waiting for this particular semaphore value. * - if the operation didn't modify the array, then just * continue. */ - alter = q->alter; - wake_up_sem_queue(q, error); - if (alter) + if (alter && !error) goto again; } } +static void update_zero_queue(struct sem_array *sma, struct sem *sem) +{ + struct sem_queue *q, *tq; + + list_for_each_entry_safe(q, tq, &sem->zero_pending, simple_list) { + int error; + + BUG_ON(q->nsops != 1); + BUG_ON(q->alter); + error = try_atomic_semop(sma, q->sops, q->nsops, + q->undo, q->pid); + BUG_ON(error); + + list_del(&q->list); + list_del(&q->simple_list); + wake_up_sem_queue(q, error); + } +} + +static void update_negv_queue(struct sem_array *sma, struct sem *sem) +{ + struct sem_queue *q, *tq; + + list_for_each_entry_safe(q, tq, &sem->negv_pending, simple_list) { + int error; + + BUG_ON(q->nsops != 1); + BUG_ON(!q->alter); + error = try_atomic_semop(sma, q->sops, q->nsops, + q->undo, q->pid); + + /* Does q->sleeper still need to sleep? */ + if (error > 0) + continue; + + /* Continue scanning as in update_queue */ + list_del(&q->list); + list_del(&q->simple_list); + wake_up_sem_queue(q, error); + + /* + * negv queue should only decrease queue value, so there is + * no point in retrying failed ops on the negv queue after + * altering the sem value here. If the semval is not greater + * than zero, then there will be nothing else to do either. + */ + if (sem->semval <= 0) + break; + } +} + +static void update_queue_simple(struct sem_array *sma, ushort semnum) +{ + if (unlikely(sma->complex_count)) { + update_queue(sma); + } else { + struct sem *sem; + + sem = &sma->sem_base[semnum]; + if (sem->semval > 0) + update_negv_queue(sma, sem); + if (sem->semval == 0) + update_zero_queue(sma, sem); + } +} + /* The following counts are associated to each semaphore: * semncnt number of tasks waiting on semval being nonzero * semzcnt number of tasks waiting on semval being zero @@ -532,6 +613,9 @@ static void freeary(struct ipc_namespace /* Wake up all pending processes and let them fail with EIDRM. */ list_for_each_entry_safe(q, tq, &sma->sem_pending, list) { list_del(&q->list); + list_del(&q->simple_list); + if (q->nsops > 1) + sma->complex_count--; wake_up_sem_queue(q, -EIDRM); } @@ -796,7 +880,7 @@ static int semctl_main(struct ipc_namesp curr->sempid = task_tgid_vnr(current); sma->sem_ctime = get_seconds(); /* maybe some queued-up processes were waiting for this */ - update_queue(sma); + update_queue_simple(sma, semnum); err = 0; goto out_unlock; } @@ -1169,10 +1253,14 @@ SYSCALL_DEFINE4(semtimedop, int, semid, if (error) goto out_unlock_free; - error = try_atomic_semop (sma, sops, nsops, un, task_tgid_vnr(current)); + error = try_atomic_semop(sma, sops, nsops, un, task_tgid_vnr(current)); if (error <= 0) { - if (alter && error == 0) - update_queue (sma); + if (alter && !error) { + if (nsops == 1) + update_queue_simple(sma, sops->sem_num); + else + update_queue(sma); + } goto out_unlock_free; } @@ -1189,6 +1277,17 @@ SYSCALL_DEFINE4(semtimedop, int, semid, list_add_tail(&queue.list, &sma->sem_pending); else list_add(&queue.list, &sma->sem_pending); + if (nsops == 1) { + struct sem *curr; + curr = &sma->sem_base[sops->sem_num]; + if (alter) + list_add_tail(&queue.simple_list, &curr->negv_pending); + else + list_add_tail(&queue.simple_list, &curr->zero_pending); + } else { + INIT_LIST_HEAD(&queue.simple_list); + sma->complex_count++; + } queue.status = -EINTR; queue.sleeper = current; @@ -1232,6 +1331,10 @@ SYSCALL_DEFINE4(semtimedop, int, semid, if (timeout && jiffies_left == 0) error = -EAGAIN; list_del(&queue.list); + if (nsops == 1) + list_del(&queue.simple_list); + else + sma->complex_count--; out_unlock_free: sem_unlock(sma); @@ -1356,11 +1459,14 @@ void exit_sem(struct task_struct *tsk) if (semaphore->semval > SEMVMX) semaphore->semval = SEMVMX; semaphore->sempid = task_tgid_vnr(current); + if (likely(!sma->complex_count)) + update_queue_simple(sma, i); } } sma->sem_otime = get_seconds(); /* maybe some queued-up processes were waiting for this */ - update_queue(sma); + if (unlikely(sma->complex_count)) + update_queue(sma); sem_unlock(sma); call_rcu(&un->rcu, free_un); @@ -1374,7 +1480,7 @@ static int sysvipc_sem_proc_show(struct struct sem_array *sma = it; return seq_printf(s, - "%10d %10d %4o %10lu %5u %5u %5u %5u %10lu %10lu\n", + "%10d %10d %4o %10u %5u %5u %5u %5u %10lu %10lu\n", sma->sem_perm.key, sma->sem_perm.id, sma->sem_perm.mode, Index: linux-2.6/include/linux/sem.h =================================================================== --- linux-2.6.orig/include/linux/sem.h +++ linux-2.6/include/linux/sem.h @@ -86,6 +86,8 @@ struct task_struct; struct sem { int semval; /* current value */ int sempid; /* pid of last operation */ + struct list_head negv_pending; + struct list_head zero_pending; }; /* One sem_array data structure for each set of semaphores in the system. */ @@ -96,11 +98,13 @@ struct sem_array { struct sem *sem_base; /* ptr to first semaphore in array */ struct list_head sem_pending; /* pending operations to be processed */ struct list_head list_id; /* undo requests on this array */ - unsigned long sem_nsems; /* no. of semaphores in array */ + int sem_nsems; /* no. of semaphores in array */ + int complex_count; /* pending complex operations */ }; /* One queue for each sleeping process in the system. */ struct sem_queue { + struct list_head simple_list; /* queue of pending operations */ struct list_head list; /* queue of pending operations */ struct task_struct *sleeper; /* this process */ struct sem_undo *undo; /* undo structure */ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin @ 2009-08-11 18:19 ` Manfred Spraul 2009-08-11 18:23 ` Zach Brown 2009-08-11 20:07 ` Cyrill Gorcunov 2009-08-14 18:48 ` Manfred Spraul 2 siblings, 1 reply; 18+ messages in thread From: Manfred Spraul @ 2009-08-11 18:19 UTC (permalink / raw) To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On 08/11/2009 01:09 PM, npiggin@suse.de wrote: > Index: linux-2.6/include/linux/sem.h > =================================================================== > --- linux-2.6.orig/include/linux/sem.h > +++ linux-2.6/include/linux/sem.h > @@ -86,6 +86,8 @@ struct task_struct; > struct sem { > int semval; /* current value */ > int sempid; /* pid of last operation */ > + struct list_head negv_pending; > + struct list_head zero_pending; > }; > struct sem is increased from 8 to 24 bytes. Is that still ok? Are there any apps that use really lots of semaphores (i.e.: far more than processes in the system?) Postgres uses lots of small semaphore sets (each with 17 entries), I never had access to other apps. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-11 18:19 ` Manfred Spraul @ 2009-08-11 18:23 ` Zach Brown 2009-08-12 4:07 ` Nick Piggin 0 siblings, 1 reply; 18+ messages in thread From: Zach Brown @ 2009-08-11 18:23 UTC (permalink / raw) To: Manfred Spraul Cc: npiggin, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel Manfred Spraul wrote: > On 08/11/2009 01:09 PM, npiggin@suse.de wrote: >> Index: linux-2.6/include/linux/sem.h >> =================================================================== >> --- linux-2.6.orig/include/linux/sem.h >> +++ linux-2.6/include/linux/sem.h >> @@ -86,6 +86,8 @@ struct task_struct; >> struct sem { >> int semval; /* current value */ >> int sempid; /* pid of last operation */ >> + struct list_head negv_pending; >> + struct list_head zero_pending; >> }; >> > struct sem is increased from 8 to 24 bytes. And larger still with 64bit pointers. If it's a problem, this can be scaled back. You can have pointers to lists and you can have fewer lists. Hopefully it won't be a problem, though. We can close our eyes and pretend that the size of the semaphore sets scale with the size of the system and that it's such a relatively small consumer of memory that no one will notice :). - z ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-11 18:23 ` Zach Brown @ 2009-08-12 4:07 ` Nick Piggin 2009-08-12 18:35 ` Manfred Spraul 0 siblings, 1 reply; 18+ messages in thread From: Nick Piggin @ 2009-08-12 4:07 UTC (permalink / raw) To: Zach Brown Cc: Manfred Spraul, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On Tue, Aug 11, 2009 at 11:23:36AM -0700, Zach Brown wrote: > Manfred Spraul wrote: > > On 08/11/2009 01:09 PM, npiggin@suse.de wrote: > >> Index: linux-2.6/include/linux/sem.h > >> =================================================================== > >> --- linux-2.6.orig/include/linux/sem.h > >> +++ linux-2.6/include/linux/sem.h > >> @@ -86,6 +86,8 @@ struct task_struct; > >> struct sem { > >> int semval; /* current value */ > >> int sempid; /* pid of last operation */ > >> + struct list_head negv_pending; > >> + struct list_head zero_pending; > >> }; > >> > > struct sem is increased from 8 to 24 bytes. > > And larger still with 64bit pointers. Yes it is a significant growth. To answer Manfed's question, I don't know if there are applications using large numbers of semaphores per set. Google search for increase SEMMSL results in mostly Oracle, which says to use 250 (which is our current default). A semaphore set with 250 will use 2K before, and 10K afterward. I don't know that it is a huge amount really, given that they also have to presumably be *protecting* stuff. We can convert them to hlists (I was going to send a patch to do everything in hlists, but hlists are missing some _rcu variants... maybe I should just convert the pending lists to start with). > If it's a problem, this can be scaled back. You can have pointers to > lists and you can have fewer lists. > > Hopefully it won't be a problem, though. We can close our eyes and > pretend that the size of the semaphore sets scale with the size of the > system and that it's such a relatively small consumer of memory that no > one will notice :). The other thing is that using semaphores as sets really won't scale well at all. It will scale better now that there are per-sem lists, but there is still a per-set lock. They really should be discouraged. It's not trivial to remove shared cachelines completely. Possible I think, but I think it would further increase complexity without a proven need at this point. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-12 4:07 ` Nick Piggin @ 2009-08-12 18:35 ` Manfred Spraul 2009-08-14 8:58 ` Nick Piggin 0 siblings, 1 reply; 18+ messages in thread From: Manfred Spraul @ 2009-08-12 18:35 UTC (permalink / raw) To: Nick Piggin Cc: Zach Brown, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On 08/12/2009 06:07 AM, Nick Piggin wrote: > A semaphore set with 250 will use 2K before, and 10K afterward. I > don't know that it is a huge amount really, given that they also > have to presumably be *protecting* stuff. > > The allocation uses vmalloc for larger allocations, thus 10k should not be an issue. > We can convert them to hlists (I was going to send a patch to do > everything in hlists, but hlists are missing some _rcu variants... > maybe I should just convert the pending lists to start with). > > Is it possible to use list_add and list_add_tail instead? Add the "waiting for zero" to the beginning and "waiting for nonzero" to the end. Basically, the same approach for "simple" and "complex" operations: - if all operations are simple operations, then all operations affect only individual semaphores --> use per semaphore lists - if there is one complex operation, then the global, per semaphore-array lists are used. -- Manfred ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-12 18:35 ` Manfred Spraul @ 2009-08-14 8:58 ` Nick Piggin 2009-08-14 17:53 ` Manfred Spraul 0 siblings, 1 reply; 18+ messages in thread From: Nick Piggin @ 2009-08-14 8:58 UTC (permalink / raw) To: Manfred Spraul Cc: Zach Brown, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On Wed, Aug 12, 2009 at 08:35:56PM +0200, Manfred Spraul wrote: > On 08/12/2009 06:07 AM, Nick Piggin wrote: > >A semaphore set with 250 will use 2K before, and 10K afterward. I > >don't know that it is a huge amount really, given that they also > >have to presumably be *protecting* stuff. > > > > > The allocation uses vmalloc for larger allocations, thus 10k should not > be an issue. > >We can convert them to hlists (I was going to send a patch to do > >everything in hlists, but hlists are missing some _rcu variants... > >maybe I should just convert the pending lists to start with). > > > > > Is it possible to use list_add and list_add_tail instead? > Add the "waiting for zero" to the beginning and "waiting for nonzero" to > the end. The only problem with this is that it means we have to walk through the list of wait-for-zero for every increment operation, before getting to the list of negative ops. The wait-for-zero list should be empty in the case of simple modify operations which are strictly +1/-1 I guess though. I do like how it cleanly splits the modify and non-modify operations though. But if you feel strongly about saving the sapce, I will do as you suggest. Thanks, Nick ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-14 8:58 ` Nick Piggin @ 2009-08-14 17:53 ` Manfred Spraul 0 siblings, 0 replies; 18+ messages in thread From: Manfred Spraul @ 2009-08-14 17:53 UTC (permalink / raw) To: Nick Piggin Cc: Zach Brown, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On 08/14/2009 10:58 AM, Nick Piggin wrote: > I do like how it cleanly splits the modify and non-modify operations > though. But if you feel strongly about saving the sapce, I will > do as you suggest. > > No, saving space is secondary. IMHO the global and the local lists should use the same algorithm, that's the main point. Perhaps even the same function could be used: One global struct sem_waiters {struct list_head zero; struct list_head decrease}. One struct sem_waiters for each semaphore. Then something like update_queue(sma, sma->complex_ops ? &sma->global_waiters : &sma->sem_base[i].local_waiters); -- Manfred ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin 2009-08-11 18:19 ` Manfred Spraul @ 2009-08-11 20:07 ` Cyrill Gorcunov 2009-08-12 4:48 ` Nick Piggin 2009-08-14 18:48 ` Manfred Spraul 2 siblings, 1 reply; 18+ messages in thread From: Cyrill Gorcunov @ 2009-08-11 20:07 UTC (permalink / raw) To: npiggin Cc: Andrew Morton, Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel [npiggin@suse.de - Tue, Aug 11, 2009 at 09:09:06PM +1000] ... | +static void update_queue_simple(struct sem_array *sma, ushort semnum) | +{ | + if (unlikely(sma->complex_count)) { | + update_queue(sma); | + } else { | + struct sem *sem; | + | + sem = &sma->sem_base[semnum]; | + if (sem->semval > 0) | + update_negv_queue(sma, sem); | + if (sem->semval == 0) | + update_zero_queue(sma, sem); | + } | +} | + ... Hi Nick, mostly probably miss something but can't we trgigger BUG_ON at updating zero queue if semaphore was created with undo list and via new operation reached -ERANGE on undo value? Again, I could be missing something or plain wrong. Just a thought. -- Cyrill ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-11 20:07 ` Cyrill Gorcunov @ 2009-08-12 4:48 ` Nick Piggin 2009-08-12 5:43 ` Cyrill Gorcunov 0 siblings, 1 reply; 18+ messages in thread From: Nick Piggin @ 2009-08-12 4:48 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Andrew Morton, Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel On Wed, Aug 12, 2009 at 12:07:11AM +0400, Cyrill Gorcunov wrote: > [npiggin@suse.de - Tue, Aug 11, 2009 at 09:09:06PM +1000] > ... > | +static void update_queue_simple(struct sem_array *sma, ushort semnum) > | +{ > | + if (unlikely(sma->complex_count)) { > | + update_queue(sma); > | + } else { > | + struct sem *sem; > | + > | + sem = &sma->sem_base[semnum]; > | + if (sem->semval > 0) > | + update_negv_queue(sma, sem); > | + if (sem->semval == 0) > | + update_zero_queue(sma, sem); > | + } > | +} > | + > ... > > Hi Nick, > > mostly probably miss something but can't we trgigger BUG_ON at updating > zero queue if semaphore was created with undo list and via new operation > reached -ERANGE on undo value? > > Again, I could be missing something or plain wrong. Just a thought. Hi Cyrill, Thanks for looking... Hmm, you mean BUG_ON(error) due to try_atomic_semop returning -ERANGE? I think it should not be possible because it should prevent any operation from bringing the undo list to -ERANGE so then any operation which does not modify the sem value should not go out of range I think. (I think it would be a bug if we ever return -ERANGE for a wait-for-zero operation). Thanks, Nick ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-12 4:48 ` Nick Piggin @ 2009-08-12 5:43 ` Cyrill Gorcunov 0 siblings, 0 replies; 18+ messages in thread From: Cyrill Gorcunov @ 2009-08-12 5:43 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel On 8/12/09, Nick Piggin <npiggin@suse.de> wrote: > On Wed, Aug 12, 2009 at 12:07:11AM +0400, Cyrill Gorcunov wrote: >> [npiggin@suse.de - Tue, Aug 11, 2009 at 09:09:06PM +1000] >> ... >> | +static void update_queue_simple(struct sem_array *sma, ushort semnum) >> | +{ >> | + if (unlikely(sma->complex_count)) { >> | + update_queue(sma); >> | + } else { >> | + struct sem *sem; >> | + >> | + sem = &sma->sem_base[semnum]; >> | + if (sem->semval > 0) >> | + update_negv_queue(sma, sem); >> | + if (sem->semval == 0) >> | + update_zero_queue(sma, sem); >> | + } >> | +} >> | + >> ... >> >> Hi Nick, >> >> mostly probably miss something but can't we trgigger BUG_ON at updating >> zero queue if semaphore was created with undo list and via new operation >> reached -ERANGE on undo value? >> >> Again, I could be missing something or plain wrong. Just a thought. > > Hi Cyrill, > > Thanks for looking... Hmm, you mean BUG_ON(error) due to try_atomic_semop > returning -ERANGE? I think it should not be possible because it should > prevent any operation from bringing the undo list to -ERANGE so then any > operation which does not modify the sem value should not go out of range > I think. > > (I think it would be a bug if we ever return -ERANGE for a wait-for-zero > operation). > > Thanks, > Nick > Thanks for explanation, Nick! I meant exactly that. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [patch 4/4] ipc: sem optimise simple operations 2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin 2009-08-11 18:19 ` Manfred Spraul 2009-08-11 20:07 ` Cyrill Gorcunov @ 2009-08-14 18:48 ` Manfred Spraul 2 siblings, 0 replies; 18+ messages in thread From: Manfred Spraul @ 2009-08-14 18:48 UTC (permalink / raw) To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel On 08/11/2009 01:09 PM, npiggin@suse.de wrote: > @@ -438,23 +448,94 @@ again: > continue; > > list_del(&q->list); > + if (q->nsops == 1) > + list_del(&q->simple_list); > + else > + sma->complex_count--; > + > + alter = q->alter; > + wake_up_sem_queue(q, error); > Unlink approach one: list_del() only if q->nsops!=1. @@ -532,6 +613,9 @@ static void freeary(struct ipc_namespace > /* Wake up all pending processes and let them fail with EIDRM. */ > list_for_each_entry_safe(q, tq,&sma->sem_pending, list) { > list_del(&q->list); > + list_del(&q->simple_list); > + if (q->nsops> 1) > + sma->complex_count--; > Unlink approach two: list_del() even if q->nsops==1. [i.e.: rely on list_del() on an empty list is permitted]. I would merge that into a helper function. -- Manfred ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-08-16 13:19 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin 2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin 2009-08-16 13:17 ` Manfred Spraul 2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin 2009-08-16 13:18 ` Manfred Spraul 2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin 2009-08-16 13:20 ` Manfred Spraul 2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin 2009-08-11 18:19 ` Manfred Spraul 2009-08-11 18:23 ` Zach Brown 2009-08-12 4:07 ` Nick Piggin 2009-08-12 18:35 ` Manfred Spraul 2009-08-14 8:58 ` Nick Piggin 2009-08-14 17:53 ` Manfred Spraul 2009-08-11 20:07 ` Cyrill Gorcunov 2009-08-12 4:48 ` Nick Piggin 2009-08-12 5:43 ` Cyrill Gorcunov 2009-08-14 18:48 ` Manfred Spraul
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox