* Re: btrfs_tree_lock & trylock
2008-09-08 15:07 ` Stephen Hemminger
@ 2008-09-08 15:28 ` Chris Mason
2008-09-08 23:26 ` Steve Long
2008-09-08 15:47 ` Andi Kleen
2008-09-08 17:16 ` adaptive mutexes, was " Christoph Hellwig
2 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2008-09-08 15:28 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Andi Kleen, linux-btrfs
On Mon, 2008-09-08 at 08:07 -0700, Stephen Hemminger wrote:
> On Mon, 8 Sep 2008 16:20:52 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
>
> > On Mon, Sep 08, 2008 at 10:02:30AM -0400, Chris Mason wrote:
> > > On Mon, 2008-09-08 at 15:54 +0200, Andi Kleen wrote:
> > > > > The idea is to try to spin for a bit to avoid scheduling away, which is
> > > > > especially important for the high levels. Most holders of the mutex
> > > > > let it go very quickly.
> > > >
> > > > Ok but that surely should be implemented in the general mutex code then
> > > > or at least in a standard adaptive mutex wrapper?
> > >
> > > That depends, am I the only one crazy enough to think its a good idea?
> >
> > Adaptive mutexes are classic, a lot of other OS have it.
>
> The problem is that they are a nuisance. It is impossible to choose
> the right trade off between spin an no-spin, also they optimize for
> a case that doesn't occur often enough to be justified.
>
> People seem to repeatedly come up with adaptive mutex based on intuitive
> hunch, and never do much analysis before or afterwards.
>
In my case, it very easy to measure. Just watch the context switch rate
on any metadata intensive workload. The current code scores higher on
benchmarks and uses less system time overall.
-chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 15:28 ` Chris Mason
@ 2008-09-08 23:26 ` Steve Long
0 siblings, 0 replies; 18+ messages in thread
From: Steve Long @ 2008-09-08 23:26 UTC (permalink / raw)
To: linux-btrfs
On Monday 08 September 2008 16:28:34 Chris Mason wrote:
> > People seem to repeatedly come up with adaptive mutex based on intuitive
> > hunch, and never do much analysis before or afterwards.
>
> In my case, it very easy to measure. Just watch the context switch rate
> on any metadata intensive workload. The current code scores higher on
> benchmarks and uses less system time overall.
>
Given that it's system-dependent, perhaps making the 512 a #define'd value
(defaulting to 512) while it's current, might be worth doing? That would
allow users to test via different values passed to make, and so gather more
data, or just use the setting they prefer, if they're bothered.
At least until the adaptive mutex API is worked out (which might well take the
number of attempts to spin/relax as a parameter. I'd add a shouldYield [for
before final acquisition] and keep the ifSleeping as implicit in another fn,
based on what I've read, but thankfully that's not my problem.)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 15:07 ` Stephen Hemminger
2008-09-08 15:28 ` Chris Mason
@ 2008-09-08 15:47 ` Andi Kleen
2008-09-08 15:50 ` Stephen Hemminger
2008-09-08 17:16 ` adaptive mutexes, was " Christoph Hellwig
2 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2008-09-08 15:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Andi Kleen, Chris Mason, linux-btrfs
On Mon, Sep 08, 2008 at 08:07:51AM -0700, Stephen Hemminger wrote:
> On Mon, 8 Sep 2008 16:20:52 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
>
> > On Mon, Sep 08, 2008 at 10:02:30AM -0400, Chris Mason wrote:
> > > On Mon, 2008-09-08 at 15:54 +0200, Andi Kleen wrote:
> > > > > The idea is to try to spin for a bit to avoid scheduling away, which is
> > > > > especially important for the high levels. Most holders of the mutex
> > > > > let it go very quickly.
> > > >
> > > > Ok but that surely should be implemented in the general mutex code then
> > > > or at least in a standard adaptive mutex wrapper?
> > >
> > > That depends, am I the only one crazy enough to think its a good idea?
> >
> > Adaptive mutexes are classic, a lot of other OS have it.
>
> The problem is that they are a nuisance. It is impossible to choose
> the right trade off between spin an no-spin, also they optimize for
> a case that doesn't occur often enough to be justified.
At least the numbers done by Gregory et.al. were dramatic improvements.
Given that was an extreme case in that the rt kernel does everything
with mutexes, but it was still a very clear win on a wide range
of workloads.
-Andi
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 15:47 ` Andi Kleen
@ 2008-09-08 15:50 ` Stephen Hemminger
2008-09-08 15:55 ` Chris Mason
0 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2008-09-08 15:50 UTC (permalink / raw)
To: Andi Kleen; +Cc: Andi Kleen, Chris Mason, linux-btrfs
On Mon, 8 Sep 2008 17:47:14 +0200
Andi Kleen <andi@firstfloor.org> wrote:
> On Mon, Sep 08, 2008 at 08:07:51AM -0700, Stephen Hemminger wrote:
> > On Mon, 8 Sep 2008 16:20:52 +0200
> > Andi Kleen <andi@firstfloor.org> wrote:
> >
> > > On Mon, Sep 08, 2008 at 10:02:30AM -0400, Chris Mason wrote:
> > > > On Mon, 2008-09-08 at 15:54 +0200, Andi Kleen wrote:
> > > > > > The idea is to try to spin for a bit to avoid scheduling away, which is
> > > > > > especially important for the high levels. Most holders of the mutex
> > > > > > let it go very quickly.
> > > > >
> > > > > Ok but that surely should be implemented in the general mutex code then
> > > > > or at least in a standard adaptive mutex wrapper?
> > > >
> > > > That depends, am I the only one crazy enough to think its a good idea?
> > >
> > > Adaptive mutexes are classic, a lot of other OS have it.
> >
> > The problem is that they are a nuisance. It is impossible to choose
> > the right trade off between spin an no-spin, also they optimize for
> > a case that doesn't occur often enough to be justified.
>
> At least the numbers done by Gregory et.al. were dramatic improvements.
> Given that was an extreme case in that the rt kernel does everything
> with mutexes, but it was still a very clear win on a wide range
> of workloads.
>
> -Andi
My guess is that the improvement happens mostly from the first couple of tries,
not from repeated spinning. And since it is a mutex, you could even do:
if (mutex_trylock(&eb->mutex))
return 0;
cpu_relax();
if (mutex_trylock(&eb->mutex))
return 0;
yield();
return mutex_lock(&eb->mutex);
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: btrfs_tree_lock & trylock
2008-09-08 15:50 ` Stephen Hemminger
@ 2008-09-08 15:55 ` Chris Mason
2008-09-08 16:13 ` jim owens
0 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2008-09-08 15:55 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Andi Kleen, linux-btrfs
On Mon, 2008-09-08 at 08:50 -0700, Stephen Hemminger wrote:
> On Mon, 8 Sep 2008 17:47:14 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
>
> > On Mon, Sep 08, 2008 at 08:07:51AM -0700, Stephen Hemminger wrote:
> > > On Mon, 8 Sep 2008 16:20:52 +0200
> > > Andi Kleen <andi@firstfloor.org> wrote:
> > >
> > > > On Mon, Sep 08, 2008 at 10:02:30AM -0400, Chris Mason wrote:
> > > > > On Mon, 2008-09-08 at 15:54 +0200, Andi Kleen wrote:
> > > > > > > The idea is to try to spin for a bit to avoid scheduling away, which is
> > > > > > > especially important for the high levels. Most holders of the mutex
> > > > > > > let it go very quickly.
> > > > > >
> > > > > > Ok but that surely should be implemented in the general mutex code then
> > > > > > or at least in a standard adaptive mutex wrapper?
> > > > >
> > > > > That depends, am I the only one crazy enough to think its a good idea?
> > > >
> > > > Adaptive mutexes are classic, a lot of other OS have it.
> > >
> > > The problem is that they are a nuisance. It is impossible to choose
> > > the right trade off between spin an no-spin, also they optimize for
> > > a case that doesn't occur often enough to be justified.
> >
> > At least the numbers done by Gregory et.al. were dramatic improvements.
> > Given that was an extreme case in that the rt kernel does everything
> > with mutexes, but it was still a very clear win on a wide range
> > of workloads.
> >
> > -Andi
>
> My guess is that the improvement happens mostly from the first couple of tries,
> not from repeated spinning. And since it is a mutex, you could even do:
I started with lower spin counts, I really didn't want to spin at all
but the current values came from trial and error.
-chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 15:55 ` Chris Mason
@ 2008-09-08 16:13 ` jim owens
2008-09-08 16:20 ` Chris Mason
0 siblings, 1 reply; 18+ messages in thread
From: jim owens @ 2008-09-08 16:13 UTC (permalink / raw)
To: Chris Mason; +Cc: Stephen Hemminger, Andi Kleen, linux-btrfs
Chris Mason wrote:
>> My guess is that the improvement happens mostly from the first couple of tries,
>> not from repeated spinning. And since it is a mutex, you could even do:
>
> I started with lower spin counts, I really didn't want to spin at all
> but the current values came from trial and error.
Exactly the problem Steven is saying about adaptive locking.
Using benchmarks (or any test), on a small sample of systems
leads you to conclude "this design/tuning combination is better".
I've been burned repeatedly by that... ugly things happen
as you move away from your design testing center.
I'm not saying your code does not work, just that we need
a lot more proof with different configurations and loads
to see that is "at least no worse".
jim
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 16:13 ` jim owens
@ 2008-09-08 16:20 ` Chris Mason
2008-09-08 16:49 ` Stephen Hemminger
0 siblings, 1 reply; 18+ messages in thread
From: Chris Mason @ 2008-09-08 16:20 UTC (permalink / raw)
To: jim owens; +Cc: Stephen Hemminger, Andi Kleen, linux-btrfs
On Mon, 2008-09-08 at 12:13 -0400, jim owens wrote:
> Chris Mason wrote:
> >> My guess is that the improvement happens mostly from the first couple of tries,
> >> not from repeated spinning. And since it is a mutex, you could even do:
> >
> > I started with lower spin counts, I really didn't want to spin at all
> > but the current values came from trial and error.
>
> Exactly the problem Steven is saying about adaptive locking.
>
> Using benchmarks (or any test), on a small sample of systems
> leads you to conclude "this design/tuning combination is better".
>
> I've been burned repeatedly by that... ugly things happen
> as you move away from your design testing center.
>
> I'm not saying your code does not work, just that we need
> a lot more proof with different configurations and loads
> to see that is "at least no worse".
Oh, I completely agree. This is tuned on just one CPU in a handful of
workloads. In general, it makes sense to spin for about as long as it
takes someone to do a btree search in the block, which we could
benchmark up front at mount time.
I could also get better results from an API where the holder of the lock
indicates it is going to hold on to things for a while, which might
happen right before doing an IO.
Over the long term these are important issues, but for today I'm focused
on the disk format ;)
-chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 16:20 ` Chris Mason
@ 2008-09-08 16:49 ` Stephen Hemminger
2008-09-08 17:17 ` Christoph Hellwig
0 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2008-09-08 16:49 UTC (permalink / raw)
To: Chris Mason; +Cc: jim owens, Andi Kleen, linux-btrfs
On Mon, 08 Sep 2008 12:20:32 -0400
Chris Mason <chris.mason@oracle.com> wrote:
> On Mon, 2008-09-08 at 12:13 -0400, jim owens wrote:
> > Chris Mason wrote:
> > >> My guess is that the improvement happens mostly from the first couple of tries,
> > >> not from repeated spinning. And since it is a mutex, you could even do:
> > >
> > > I started with lower spin counts, I really didn't want to spin at all
> > > but the current values came from trial and error.
> >
> > Exactly the problem Steven is saying about adaptive locking.
> >
> > Using benchmarks (or any test), on a small sample of systems
> > leads you to conclude "this design/tuning combination is better".
> >
> > I've been burned repeatedly by that... ugly things happen
> > as you move away from your design testing center.
> >
> > I'm not saying your code does not work, just that we need
> > a lot more proof with different configurations and loads
> > to see that is "at least no worse".
>
> Oh, I completely agree. This is tuned on just one CPU in a handful of
> workloads. In general, it makes sense to spin for about as long as it
> takes someone to do a btree search in the block, which we could
> benchmark up front at mount time.
>
> I could also get better results from an API where the holder of the lock
> indicates it is going to hold on to things for a while, which might
> happen right before doing an IO.
>
> Over the long term these are important issues, but for today I'm focused
> on the disk format ;)
>
> -chris
>
>
Not to mention the problem that developers seem to have faster machines than
average user, but slower than the enterprise and future generation CPU's.
So any tuning value seems to get out of date fast.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 16:49 ` Stephen Hemminger
@ 2008-09-08 17:17 ` Christoph Hellwig
2008-09-08 17:32 ` Ric Wheeler
0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2008-09-08 17:17 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Chris Mason, jim owens, Andi Kleen, linux-btrfs
On Mon, Sep 08, 2008 at 09:49:42AM -0700, Stephen Hemminger wrote:
> Not to mention the problem that developers seem to have faster machines than
> average user, but slower than the enterprise and future generation CPU's.
> So any tuning value seems to get out of date fast.
So where do my fellow developers get these fast systems from? :)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 17:17 ` Christoph Hellwig
@ 2008-09-08 17:32 ` Ric Wheeler
2008-09-08 23:28 ` Steve Long
0 siblings, 1 reply; 18+ messages in thread
From: Ric Wheeler @ 2008-09-08 17:32 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Stephen Hemminger, Chris Mason, jim owens, Andi Kleen,
linux-btrfs
Christoph Hellwig wrote:
> On Mon, Sep 08, 2008 at 09:49:42AM -0700, Stephen Hemminger wrote:
>
>> Not to mention the problem that developers seem to have faster machines than
>> average user, but slower than the enterprise and future generation CPU's.
>> So any tuning value seems to get out of date fast.
>>
>
> So where do my fellow developers get these fast systems from? :)
>
>
Actually, my experience is that most linux file system developers have
really poking, aged hardware with little to no cutting edge storage (or
even cutting edge commercial class storage). Testing file systems on lap
tops and desktops with old CPUs, no DRAM and 40GB drives does not even
reflect a typical home user these days :-)
ric
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs_tree_lock & trylock
2008-09-08 17:32 ` Ric Wheeler
@ 2008-09-08 23:28 ` Steve Long
0 siblings, 0 replies; 18+ messages in thread
From: Steve Long @ 2008-09-08 23:28 UTC (permalink / raw)
To: linux-btrfs
On Monday 08 September 2008 18:32:14 Ric Wheeler wrote:
> Christoph Hellwig wrote:
> > On Mon, Sep 08, 2008 at 09:49:42AM -0700, Stephen Hemminger wrote:
> >> Not to mention the problem that developers seem to have faster machines
> >> than average user, but slower than the enterprise and future generation
> >> CPU's. So any tuning value seems to get out of date fast.
> >
> > So where do my fellow developers get these fast systems from? :)
>
> Actually, my experience is that most linux file system developers have
> really poking, aged hardware with little to no cutting edge storage (or
> even cutting edge commercial class storage). Testing file systems on lap
> tops and desktops with old CPUs, no DRAM and 40GB drives does not even
> reflect a typical home user these days :-)
>
Yeah but it means they really notice when the algorithm has been improved.
It's character-building, or summat. ;-)
^ permalink raw reply [flat|nested] 18+ messages in thread
* adaptive mutexes, was Re: btrfs_tree_lock & trylock
2008-09-08 15:07 ` Stephen Hemminger
2008-09-08 15:28 ` Chris Mason
2008-09-08 15:47 ` Andi Kleen
@ 2008-09-08 17:16 ` Christoph Hellwig
2 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2008-09-08 17:16 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Andi Kleen, Chris Mason, linux-btrfs, linux-kernel
On Mon, Sep 08, 2008 at 08:07:51AM -0700, Stephen Hemminger wrote:
> The problem is that they are a nuisance. It is impossible to choose
> the right trade off between spin an no-spin, also they optimize for
> a case that doesn't occur often enough to be justified.
>
> People seem to repeatedly come up with adaptive mutex based on intuitive
> hunch, and never do much analysis before or afterwards.
>
> You need some facts to come up with a useful model:
> % of time lock is contended
> average lock hold time
> overhead of entry-exit for lock primitive (spin time)
> overhead of the adaptive version either pure spin or pure mutex
>
> Also, adaptive locks have even worse unfairness than spin locks under
> contention.
Well, the traditional wisdom in kernel land is that you want a spinlock
if the contention phases are short. But we grow more an more places
where we might do sleep under the lock. One optimization would be
to spin, but only if the mutex holder is not sleeping. Or we make the
spinning a completely different API, mutex_lock_adaptive()
^ permalink raw reply [flat|nested] 18+ messages in thread