All of lore.kernel.org
 help / color / mirror / Atom feed
* 2048 CPUs [was: Re: New filesystem for Linux]
       [not found] <787b0d920611041154l69db46abv4c8c467809ada57c@mail.gmail.com>
@ 2006-11-04 22:36 ` Mikulas Patocka
       [not found]   ` <787b0d920611050802o460a4000r5ce154e589732a02@mail.gmail.com>
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Mikulas Patocka @ 2006-11-04 22:36 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel

>> Does one Linux kernel run on system with 1024 cpus? I guess it
>> must fry spinlocks... (or even lockup due to spinlock livelocks)
>
> The SGI Altix can have 2048 CPUs.

And does it run one image of Linux? Or more images each on few cpus?

How do they solve problem with spinlock livelocks?

If time-spent-outside-spinlock/time-spent-in-spinlock < number-of-cpus, 
the spinlock livelock may happen --- this condition is not true normally 
with 2 or 4 cpus, but for that high amount of cpus, there is a danger.

Or do they have some special hardware spinlock instruction that guarantees 
fairness?

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
       [not found]   ` <787b0d920611050802o460a4000r5ce154e589732a02@mail.gmail.com>
@ 2006-11-06  1:47     ` Mikulas Patocka
  0 siblings, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2006-11-06  1:47 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel



On Sun, 5 Nov 2006, Albert Cahalan wrote:

> On 11/4/06, Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> wrote:
>> >> Does one Linux kernel run on system with 1024 cpus? I guess it
>> >> must fry spinlocks... (or even lockup due to spinlock livelocks)
>> >
>> > The SGI Altix can have 2048 CPUs.
>> 
>> And does it run one image of Linux? Or more images each on few cpus?
>
> Yes, of course it runs one image of Linux. It's SMP, not a cluster.
> If I remember right:
>
> 2 CPUs per chip
> 2 chips per board (with local RAM; this is NUMA)
> 512 boards per machine
>
>> How do they solve problem with spinlock livelocks?
>> 
>> If time-spent-outside-spinlock/time-spent-in-spinlock < number-of-cpus,
>> the spinlock livelock may happen --- this condition is not true normally
>> with 2 or 4 cpus, but for that high amount of cpus, there is a danger.
>> 
>> Or do they have some special hardware spinlock instruction that guarantees
>> fairness?
>
> Well first of all it's using IA-64 processors. These do atomic operations
> via instructions that are essentially "take lock" and "release lock".
> The non-CPU hardware probably recognizes these operations by the
> special bus cycles and does whatever is needed to ensure fairness.

It looks like a normal looping spinlock with rep nop and cmpxchg on i386. 
Not that I understand IA64 assembler --- but look at 
ia64_spinlock_contention --- it looks like a loop with pause and nonatomic 
test and a branch to acquire spinlock atomically with cmpxchg --- nothing 
that would prevent starving there.

> Second of all, I think we just try to avoid having long spinlock hold
> times. The SGI people love using RCU. As I recall, SGI also caused
> Linux to get a sequence lock for jiffies.

That is right, that you should avoid contention --- but the kernel should 
be stable no matter what userspace programs you run. Or does it mean that 
scientists can run only trusted programs on Altix that do not perform too 
many certain syscalls concurently in order to not jam spinlocks?

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-04 22:36 ` 2048 CPUs [was: Re: New filesystem for Linux] Mikulas Patocka
       [not found]   ` <787b0d920611050802o460a4000r5ce154e589732a02@mail.gmail.com>
@ 2006-11-07 21:26   ` Pavel Machek
  2006-11-07 22:36     ` Mikulas Patocka
  2006-11-08  3:04   ` David Chinner
  2 siblings, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2006-11-07 21:26 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Albert Cahalan, linux-kernel

Hi!

> >The SGI Altix can have 2048 CPUs.
> 
> And does it run one image of Linux? Or more images each 
> on few cpus?
> 
> How do they solve problem with spinlock livelocks?
> 
> If time-spent-outside-spinlock/time-spent-in-spinlock < 
> number-of-cpus, the spinlock livelock may happen --- 
> this condition is not true normally with 2 or 4 cpus, 
> but for that high amount of cpus, there is a danger.

Lets say time-spent-outside-spinlock == time-spent-in-spinlock and
number-of-cpus == 2.

1 < 2 , so it should livelock according to you...

...but afaict this should work okay. Even if spinlocks are very
unfair, as long as time-outside and time-inside comes in big chunks,
it should work.

If you are unlucky, one cpu may stall for a while, but... I see no
livelock.

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-07 21:26   ` Pavel Machek
@ 2006-11-07 22:36     ` Mikulas Patocka
  2006-11-07 23:14       ` Pavel Machek
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2006-11-07 22:36 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Albert Cahalan, linux-kernel

Hi!

> Hi!
>
>>> The SGI Altix can have 2048 CPUs.
>>
>> And does it run one image of Linux? Or more images each
>> on few cpus?
>>
>> How do they solve problem with spinlock livelocks?
>>
>> If time-spent-outside-spinlock/time-spent-in-spinlock <
>> number-of-cpus, the spinlock livelock may happen ---
>> this condition is not true normally with 2 or 4 cpus,
>> but for that high amount of cpus, there is a danger.
>
> Lets say time-spent-outside-spinlock == time-spent-in-spinlock and
> number-of-cpus == 2.
>
> 1 < 2 , so it should livelock according to you...

There is off-by-one bug in the condition. It should be:
(time_spent_in_spinlock + time_spent_outside_spinlock) / 
time_spent_in_spinlock < number_of_cpus

... or if you divide it by time_spent_in_spinlock:
time_spent_outside_spinlock / time_spent_in_spinlock + 1 < number_of_cpus

> ...but afaict this should work okay. Even if spinlocks are very
> unfair, as long as time-outside and time-inside comes in big chunks,
> it should work.
>
> If you are unlucky, one cpu may stall for a while, but... I see no
> livelock.

If some rogue threads (and it may not even be intetional) call the same 
syscall stressing the one spinlock all the time, other syscalls needing 
the same spinlock may stall.

Maybe there are so few Altices in the world that no one has yet observed 
it...

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-07 22:36     ` Mikulas Patocka
@ 2006-11-07 23:14       ` Pavel Machek
  2006-11-08 18:26         ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2006-11-07 23:14 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Albert Cahalan, linux-kernel

Hi!

> >Lets say time-spent-outside-spinlock == time-spent-in-spinlock and
> >number-of-cpus == 2.
> >
> >1 < 2 , so it should livelock according to you...
> 
> There is off-by-one bug in the condition. It should be:
> (time_spent_in_spinlock + time_spent_outside_spinlock) / 
> time_spent_in_spinlock < number_of_cpus
> 
> ... or if you divide it by time_spent_in_spinlock:
> time_spent_outside_spinlock / time_spent_in_spinlock + 1 < number_of_cpus
> 
> >...but afaict this should work okay. Even if spinlocks are very
> >unfair, as long as time-outside and time-inside comes in big chunks,
> >it should work.
> >
> >If you are unlucky, one cpu may stall for a while, but... I see no
> >livelock.
> 
> If some rogue threads (and it may not even be intetional) call the same 
> syscall stressing the one spinlock all the time, other syscalls needing 
> the same spinlock may stall.

Fortunately, they'll unstall with probability of 1... so no, I do not
think this is real problem.

If someone takes semaphore in syscall (we do), same problem may
happen, right...? Without need for 2048 cpus. Maybe semaphores/mutexes
are fair (or mostly fair) these days, but rwlocks may not be or
something.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-04 22:36 ` 2048 CPUs [was: Re: New filesystem for Linux] Mikulas Patocka
       [not found]   ` <787b0d920611050802o460a4000r5ce154e589732a02@mail.gmail.com>
  2006-11-07 21:26   ` Pavel Machek
@ 2006-11-08  3:04   ` David Chinner
  2 siblings, 0 replies; 13+ messages in thread
From: David Chinner @ 2006-11-08  3:04 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Albert Cahalan, linux-kernel

On Sat, Nov 04, 2006 at 11:36:49PM +0100, Mikulas Patocka wrote:
> >>Does one Linux kernel run on system with 1024 cpus? I guess it
> >>must fry spinlocks... (or even lockup due to spinlock livelocks)
> >
> >The SGI Altix can have 2048 CPUs.
> 
> And does it run one image of Linux? Or more images each on few cpus?

One image.

> How do they solve problem with spinlock livelocks?

By replacing contended spinlocks withsleeping locks, using no-lock
techniques (e.g. per-cpu) or changing the algorithm to remove the
contention point.

w.r.t filesystem locking scalability, you should read this paper:

http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-07 23:14       ` Pavel Machek
@ 2006-11-08 18:26         ` Mikulas Patocka
  2006-11-10  9:03           ` Pavel Machek
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2006-11-08 18:26 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Albert Cahalan, linux-kernel

Hi!

>>> Lets say time-spent-outside-spinlock == time-spent-in-spinlock and
>>> number-of-cpus == 2.
>>>
>>> 1 < 2 , so it should livelock according to you...
>>
>> There is off-by-one bug in the condition. It should be:
>> (time_spent_in_spinlock + time_spent_outside_spinlock) /
>> time_spent_in_spinlock < number_of_cpus
>>
>> ... or if you divide it by time_spent_in_spinlock:
>> time_spent_outside_spinlock / time_spent_in_spinlock + 1 < number_of_cpus
>>
>>> ...but afaict this should work okay. Even if spinlocks are very
>>> unfair, as long as time-outside and time-inside comes in big chunks,
>>> it should work.
>>>
>>> If you are unlucky, one cpu may stall for a while, but... I see no
>>> livelock.
>>
>> If some rogue threads (and it may not even be intetional) call the same
>> syscall stressing the one spinlock all the time, other syscalls needing
>> the same spinlock may stall.
>
> Fortunately, they'll unstall with probability of 1... so no, I do not
> think this is real problem.

You can't tell that CPUs behave exactly probabilistically --- it may 
happen that one gets out of the wait loop always too late.

SMP buses have complex protocols to prevent starvation in case all CPUs 
are writing to the same cache line and similar --- however it is unusable 
againt spinlock starvation.

> If someone takes semaphore in syscall (we do), same problem may
> happen, right...? Without need for 2048 cpus. Maybe semaphores/mutexes
> are fair (or mostly fair) these days, but rwlocks may not be or
> something.

Scheduler increases priority of sleeping process, so starving process 
should be waken up first. But if there are so many processes, that process 
that passed the semaphore, sleeps and tries to take the semaphore has 
already increased priority to the level of process that waited on the 
semaphor, livelock can happen too.

Mikulas

> 									Pavel
>
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-08 18:26         ` Mikulas Patocka
@ 2006-11-10  9:03           ` Pavel Machek
  2006-11-10 15:20             ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2006-11-10  9:03 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Albert Cahalan, linux-kernel

Hi!

> >>If some rogue threads (and it may not even be intetional) call the same
> >>syscall stressing the one spinlock all the time, other syscalls needing
> >>the same spinlock may stall.
> >
> >Fortunately, they'll unstall with probability of 1... so no, I do not
> >think this is real problem.
> 
> You can't tell that CPUs behave exactly probabilistically --- it may 
> happen that one gets out of the wait loop always too late.

Well,  I don't need them to be _exactly_ probabilistical.

Anyway, if you have 2048 CPUs... you can perhaps get some non-broken
ones.

> >If someone takes semaphore in syscall (we do), same problem may
> >happen, right...? Without need for 2048 cpus. Maybe semaphores/mutexes
> >are fair (or mostly fair) these days, but rwlocks may not be or
> >something.
> 
> Scheduler increases priority of sleeping process, so starving process 
> should be waken up first. But if there are so many processes, that
>process

I do not think this is how Linux scheduler works.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-10  9:03           ` Pavel Machek
@ 2006-11-10 15:20             ` Mikulas Patocka
  2006-11-10 16:20               ` Alan Cox
  2006-11-12 14:28               ` Pavel Machek
  0 siblings, 2 replies; 13+ messages in thread
From: Mikulas Patocka @ 2006-11-10 15:20 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Albert Cahalan, linux-kernel

Hi!

>>>> If some rogue threads (and it may not even be intetional) call the same
>>>> syscall stressing the one spinlock all the time, other syscalls needing
>>>> the same spinlock may stall.
>>>
>>> Fortunately, they'll unstall with probability of 1... so no, I do not
>>> think this is real problem.
>>
>> You can't tell that CPUs behave exactly probabilistically --- it may
>> happen that one gets out of the wait loop always too late.
>
> Well,  I don't need them to be _exactly_ probabilistical.
>
> Anyway, if you have 2048 CPUs... you can perhaps get some non-broken
> ones.

No intel document guarantees you that if more CPUs simultaneously execute 
locked cmpxchg in a loop that a CPU will see compare success in a finite 
time. In fact, CPUs can't guarantee this at all, because they don't know 
that they're executing a spinlock --- for them its just an instruction 
stream like anything else.

Intel only guarantees that cmpxchg (or any other instruction) completes in 
finite time, but it doesn't say anything about the result of it.

>>> If someone takes semaphore in syscall (we do), same problem may
>>> happen, right...? Without need for 2048 cpus. Maybe semaphores/mutexes
>>> are fair (or mostly fair) these days, but rwlocks may not be or
>>> something.
>>
>> Scheduler increases priority of sleeping process, so starving process
>> should be waken up first. But if there are so many processes, that
>> process
>
> I do not think this is how Linux scheduler works.
> 								Pavel

<= 2.4 scheduler worked exactly like this. 2.6 has it more complicated, 
but does similar thing. But you are right that starvation on semaphore can 
happen, if the process has too high nice value, it will never be risen 
above other processes.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-10 15:20             ` Mikulas Patocka
@ 2006-11-10 16:20               ` Alan Cox
  2006-11-12 14:28               ` Pavel Machek
  1 sibling, 0 replies; 13+ messages in thread
From: Alan Cox @ 2006-11-10 16:20 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Pavel Machek, Albert Cahalan, linux-kernel


> Intel only guarantees that cmpxchg (or any other instruction) completes in 
> finite time, but it doesn't say anything about the result of it.

They don't even guarantee that... (hlt for example) or instructions SMM trapping into stuff that doesn't come back
 
</pedant>

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-10 15:20             ` Mikulas Patocka
  2006-11-10 16:20               ` Alan Cox
@ 2006-11-12 14:28               ` Pavel Machek
  2006-11-12 20:07                 ` Mikulas Patocka
  1 sibling, 1 reply; 13+ messages in thread
From: Pavel Machek @ 2006-11-12 14:28 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Albert Cahalan, linux-kernel

Hi!

> >>You can't tell that CPUs behave exactly 
> >>probabilistically --- it may
> >>happen that one gets out of the wait loop always too 
> >>late.
> >
> >Well,  I don't need them to be _exactly_ 
> >probabilistical.
> >
> >Anyway, if you have 2048 CPUs... you can perhaps get 
> >some non-broken
> >ones.
> 
> No intel document guarantees you that if more CPUs 
> simultaneously execute locked cmpxchg in a loop that a 

If we are talking 2048 cpus, we are talking ia64.

> CPU will see compare success in a finite time. In fact, 
> CPUs can't guarantee this at all, because they don't 
> know that they're executing a spinlock --- for them its 
> just an instruction stream like anything else.

...even i386 has monitor/mwait these days.
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-12 14:28               ` Pavel Machek
@ 2006-11-12 20:07                 ` Mikulas Patocka
  2006-11-14  6:24                   ` Albert Cahalan
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2006-11-12 20:07 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Albert Cahalan, linux-kernel

Hi!

> Hi!
>
>>>> You can't tell that CPUs behave exactly
>>>> probabilistically --- it may
>>>> happen that one gets out of the wait loop always too
>>>> late.
>>>
>>> Well,  I don't need them to be _exactly_
>>> probabilistical.
>>>
>>> Anyway, if you have 2048 CPUs... you can perhaps get
>>> some non-broken
>>> ones.
>>
>> No intel document guarantees you that if more CPUs
>> simultaneously execute locked cmpxchg in a loop that a
>
> If we are talking 2048 cpus, we are talking ia64.

IA64 spinlock is locked cmpxchg, if failed than pause (i386 equivalent of 
rep nop) read the value, and if unlocked, try cmpxchg again.

There is no fairness in it.

>> CPU will see compare success in a finite time. In fact,
>> CPUs can't guarantee this at all, because they don't
>> know that they're executing a spinlock --- for them its
>> just an instruction stream like anything else.
>
> ...even i386 has monitor/mwait these days.

It also doesn't guarantee that subsequent locked instruction will take the 
lock after finite number of loops.

Mikulas

> 							Pavel
> -- 
> Thanks for all the (sleeping) penguins.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2048 CPUs [was: Re: New filesystem for Linux]
  2006-11-12 20:07                 ` Mikulas Patocka
@ 2006-11-14  6:24                   ` Albert Cahalan
  0 siblings, 0 replies; 13+ messages in thread
From: Albert Cahalan @ 2006-11-14  6:24 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Pavel Machek, linux-kernel

> >> No intel document guarantees you that if more CPUs
> >> simultaneously execute locked cmpxchg in a loop that a
> >
> > If we are talking 2048 cpus, we are talking ia64.
>
> IA64 spinlock is locked cmpxchg, if failed than pause (i386 equivalent of
> rep nop) read the value, and if unlocked, try cmpxchg again.
>
> There is no fairness in it.

I suppose we could use something better.

There is the MCS lock, the related CLH lock, and IBM's
improvement on the MCS lock. As with RCU, we'd need
to get IBM's permission to use their lock. (so, how did we
get permission for RCU?) The basic MCS lock is also
patented I think.

http://www.cs.rochester.edu/~scott/professional/Dijkstra/presentation.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-11-14  6:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <787b0d920611041154l69db46abv4c8c467809ada57c@mail.gmail.com>
2006-11-04 22:36 ` 2048 CPUs [was: Re: New filesystem for Linux] Mikulas Patocka
     [not found]   ` <787b0d920611050802o460a4000r5ce154e589732a02@mail.gmail.com>
2006-11-06  1:47     ` Mikulas Patocka
2006-11-07 21:26   ` Pavel Machek
2006-11-07 22:36     ` Mikulas Patocka
2006-11-07 23:14       ` Pavel Machek
2006-11-08 18:26         ` Mikulas Patocka
2006-11-10  9:03           ` Pavel Machek
2006-11-10 15:20             ` Mikulas Patocka
2006-11-10 16:20               ` Alan Cox
2006-11-12 14:28               ` Pavel Machek
2006-11-12 20:07                 ` Mikulas Patocka
2006-11-14  6:24                   ` Albert Cahalan
2006-11-08  3:04   ` David Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.