* kernels 3.4 slower due to allocation workqueue
@ 2013-04-15 9:39 Yann Dupont
2013-04-15 13:45 ` Mark Tinguely
0 siblings, 1 reply; 8+ messages in thread
From: Yann Dupont @ 2013-04-15 9:39 UTC (permalink / raw)
To: xfs
Hello,
last week we received new machines (DELL R720xd) for an extension of our
ceph cluster.
(64 Gb ram, 2x Xeon E5-2650, PERC H710P (really LSI MEGARAID), and 12x3
TB disks + 2SSD (not used as cachecade))
I was doing test on the raid card with kernel 3.4.38 to try to find what
I can get of this beast with RAID5, when I noticed an unusual slow
values on compilebench. The difference is very visible on the initial
create tests (can detail more if needed).
I finally observed that ONLY 3.4 kernels exhibit that behaviour ;
3.3.xxx and before are OK, 3.5.xxx and later are back to good values.
I bisected the problem to this commit
c999a223c2f0d31c64ef7379814cea1378b2b800 is the first bad commit
commit c999a223c2f0d31c64ef7379814cea1378b2b800
Author: Dave Chinner <dchinner@redhat.com>
Date: Thu Mar 22 05:15:07 2012 +0000
xfs: introduce an allocation workqueue
I understand this regression is not a bug, and probably just a corner
case of the new code, that was certainly corrected after during 3.5
development (didn't tried to bisect this one, maybe dave know what is
the corrective patch ?)
The problem is that 3.4 is the last long-term kernel for the moment, and
it's unfortunate it shows this regression.
Maybe a backport of the fix (if this backport is possible AND not very
intrusive) could be a good idea ?
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernels 3.4 slower due to allocation workqueue
2013-04-15 9:39 kernels 3.4 slower due to allocation workqueue Yann Dupont
@ 2013-04-15 13:45 ` Mark Tinguely
2013-04-16 7:24 ` Yann Dupont
0 siblings, 1 reply; 8+ messages in thread
From: Mark Tinguely @ 2013-04-15 13:45 UTC (permalink / raw)
To: Yann Dupont; +Cc: xfs
On 04/15/13 04:39, Yann Dupont wrote:
> Hello,
> last week we received new machines (DELL R720xd) for an extension of our
> ceph cluster.
> (64 Gb ram, 2x Xeon E5-2650, PERC H710P (really LSI MEGARAID), and 12x3
> TB disks + 2SSD (not used as cachecade))
>
> I was doing test on the raid card with kernel 3.4.38 to try to find what
> I can get of this beast with RAID5, when I noticed an unusual slow
> values on compilebench. The difference is very visible on the initial
> create tests (can detail more if needed).
>
> I finally observed that ONLY 3.4 kernels exhibit that behaviour ;
> 3.3.xxx and before are OK, 3.5.xxx and later are back to good values.
>
> I bisected the problem to this commit
>
> c999a223c2f0d31c64ef7379814cea1378b2b800 is the first bad commit
> commit c999a223c2f0d31c64ef7379814cea1378b2b800
> Author: Dave Chinner <dchinner@redhat.com>
> Date: Thu Mar 22 05:15:07 2012 +0000
>
> xfs: introduce an allocation workqueue
>
> I understand this regression is not a bug, and probably just a corner
> case of the new code, that was certainly corrected after during 3.5
> development (didn't tried to bisect this one, maybe dave know what is
> the corrective patch ?)
>
> The problem is that 3.4 is the last long-term kernel for the moment, and
> it's unfortunate it shows this regression.
>
> Maybe a backport of the fix (if this backport is possible AND not very
> intrusive) could be a good idea ?
>
> Cheers,
>
Here are the allocation worker changes.
The biggest performance commit should be aa292847, which limits the
callers to the worker.
commit 3b876c8f2a361ceeed3fed894980c69066f903a0
Author: Jeff Liu <jeff.liu@oracle.com>
Date: Thu Jun 7 15:44:32 2012 +0800
xfs: fix debug_object WARN at xfs_alloc_vextent()
commit aa292847b9fc6e187547110de833a7d3131bbddf
Author: Dave Chinner <dchinner@redhat.com>
Date: Thu Jul 12 07:40:43 2012 +1000
xfs: don't defer metadata allocation to the workqueue
commit 2455881c0b52f87be539c4c7deab1afff4d8a560
Author: Dave Chinner <dchinner@redhat.com>
Date: Fri Oct 5 11:06:58 2012 +1000
xfs: introduce XFS_BMAPI_STACK_SWITCH
commit e04426b9202bccd4cfcbc70b2fa2aeca1c86d8f5
Author: Dave Chinner <dchinner@redhat.com>
Date: Fri Oct 5 11:06:59 2012 +1000
xfs: move allocation stack switch up to xfs_bmapi_allocate
commit 9e96fe6df44425b69ed89f6ac20352cec1f127d7
Author: Brian Foster <bfoster@redhat.com>
Date: Thu Jan 17 13:11:29 2013 -0500
xfs: pull up stack_switch check into xfs_bmapi_write
The last 3 patches address an AGF buffer hang with the allocation worker.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: kernels 3.4 slower due to allocation workqueue
2013-04-15 13:45 ` Mark Tinguely
@ 2013-04-16 7:24 ` Yann Dupont
2013-04-16 8:37 ` Yann Dupont
2013-04-16 13:26 ` Mark Tinguely
0 siblings, 2 replies; 8+ messages in thread
From: Yann Dupont @ 2013-04-16 7:24 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
Le 15/04/2013 15:45, Mark Tinguely a écrit :
>
> Here are the allocation worker changes.
>
Hello mark, thanks for you answer
> The biggest performance commit should be aa292847, which limits the
> callers to the worker.
>
> commit 3b876c8f2a361ceeed3fed894980c69066f903a0
> Author: Jeff Liu <jeff.liu@oracle.com>
> Date: Thu Jun 7 15:44:32 2012 +0800
>
> xfs: fix debug_object WARN at xfs_alloc_vextent()
>
> commit aa292847b9fc6e187547110de833a7d3131bbddf
> Author: Dave Chinner <dchinner@redhat.com>
> Date: Thu Jul 12 07:40:43 2012 +1000
>
> xfs: don't defer metadata allocation to the workqueue
>
Only These 2 commits are candidates, the others are post 3.5.
I'll try to patch a 3.4 with each patch.
> commit 2455881c0b52f87be539c4c7deab1afff4d8a560
> Author: Dave Chinner <dchinner@redhat.com>
> Date: Fri Oct 5 11:06:58 2012 +1000
>
> xfs: introduce XFS_BMAPI_STACK_SWITCH
>
> commit e04426b9202bccd4cfcbc70b2fa2aeca1c86d8f5
> Author: Dave Chinner <dchinner@redhat.com>
> Date: Fri Oct 5 11:06:59 2012 +1000
>
> xfs: move allocation stack switch up to xfs_bmapi_allocate
>
> commit 9e96fe6df44425b69ed89f6ac20352cec1f127d7
> Author: Brian Foster <bfoster@redhat.com>
> Date: Thu Jan 17 13:11:29 2013 -0500
>
> xfs: pull up stack_switch check into xfs_bmapi_write
>
>
> The last 3 patches address an AGF buffer hang with the allocation worker.
>
> --Mark.
As 3.4 Kernels don't have thoses patches, is there a risk associated
with 3.4 kernels ?
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernels 3.4 slower due to allocation workqueue
2013-04-16 7:24 ` Yann Dupont
@ 2013-04-16 8:37 ` Yann Dupont
2013-04-16 13:26 ` Mark Tinguely
1 sibling, 0 replies; 8+ messages in thread
From: Yann Dupont @ 2013-04-16 8:37 UTC (permalink / raw)
To: xfs
Le 16/04/2013 09:24, Yann Dupont a écrit :
> Le 15/04/2013 15:45, Mark Tinguely a écrit :
>>
>> Here are the allocation worker changes.
>>
>
> Hello mark, thanks for you answer
>
>> The biggest performance commit should be aa292847, which limits the
>> callers to the worker.
>>
Yes. This is the case :
3.4.39 with aa292847b9fc6e187547110de833a7d3131bbddf applied gives full
speed again, at least on this very specific test. Will try to have a
more global view of the effects on other tests.
Thanks,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernels 3.4 slower due to allocation workqueue
2013-04-16 7:24 ` Yann Dupont
2013-04-16 8:37 ` Yann Dupont
@ 2013-04-16 13:26 ` Mark Tinguely
2013-04-17 13:44 ` Yann Dupont
1 sibling, 1 reply; 8+ messages in thread
From: Mark Tinguely @ 2013-04-16 13:26 UTC (permalink / raw)
To: Yann Dupont; +Cc: xfs
On 04/16/13 02:24, Yann Dupont wrote:
> Le 15/04/2013 15:45, Mark Tinguely a écrit :
>>
>> Here are the allocation worker changes.
>>
>
> Hello mark, thanks for you answer
>
>> The biggest performance commit should be aa292847, which limits the
>> callers to the worker.
>>
>> commit 3b876c8f2a361ceeed3fed894980c69066f903a0
>> Author: Jeff Liu <jeff.liu@oracle.com>
>> Date: Thu Jun 7 15:44:32 2012 +0800
>>
>> xfs: fix debug_object WARN at xfs_alloc_vextent()
>>
>> commit aa292847b9fc6e187547110de833a7d3131bbddf
>> Author: Dave Chinner <dchinner@redhat.com>
>> Date: Thu Jul 12 07:40:43 2012 +1000
>>
>> xfs: don't defer metadata allocation to the workqueue
>>
>
> Only These 2 commits are candidates, the others are post 3.5.
> I'll try to patch a 3.4 with each patch.
>
>> commit 2455881c0b52f87be539c4c7deab1afff4d8a560
>> Author: Dave Chinner <dchinner@redhat.com>
>> Date: Fri Oct 5 11:06:58 2012 +1000
>>
>> xfs: introduce XFS_BMAPI_STACK_SWITCH
>>
>> commit e04426b9202bccd4cfcbc70b2fa2aeca1c86d8f5
>> Author: Dave Chinner <dchinner@redhat.com>
>> Date: Fri Oct 5 11:06:59 2012 +1000
>>
>> xfs: move allocation stack switch up to xfs_bmapi_allocate
>>
>> commit 9e96fe6df44425b69ed89f6ac20352cec1f127d7
>> Author: Brian Foster <bfoster@redhat.com>
>> Date: Thu Jan 17 13:11:29 2013 -0500
>>
>> xfs: pull up stack_switch check into xfs_bmapi_write
>>
>>
>> The last 3 patches address an AGF buffer hang with the allocation worker.
>>
>> --Mark.
>
> As 3.4 Kernels don't have thoses patches, is there a risk associated
> with 3.4 kernels ?
>
> Cheers,
>
The filesystem can hang but only if the OS cannot allocate another
worker when doing certain calls.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernels 3.4 slower due to allocation workqueue
2013-04-16 13:26 ` Mark Tinguely
@ 2013-04-17 13:44 ` Yann Dupont
2013-04-17 14:11 ` Mark Tinguely
0 siblings, 1 reply; 8+ messages in thread
From: Yann Dupont @ 2013-04-17 13:44 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
Le 16/04/2013 15:26, Mark Tinguely a écrit :
>
> The filesystem can hang but only if the OS cannot allocate another
> worker when doing certain calls.
>
> --Mark.
Ok, any chance to see these fixes backported to 3.4 ?
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernels 3.4 slower due to allocation workqueue
2013-04-17 13:44 ` Yann Dupont
@ 2013-04-17 14:11 ` Mark Tinguely
2013-04-17 14:35 ` Yann Dupont
0 siblings, 1 reply; 8+ messages in thread
From: Mark Tinguely @ 2013-04-17 14:11 UTC (permalink / raw)
To: Yann Dupont; +Cc: xfs
On 04/17/13 08:44, Yann Dupont wrote:
> Le 16/04/2013 15:26, Mark Tinguely a écrit :
>>
>> The filesystem can hang but only if the OS cannot allocate another
>> worker when doing certain calls.
>>
>> --Mark.
>
> Ok, any chance to see these fixes backported to 3.4 ?
>
> Cheers,
>
All of the patches apply to Linux 3.4 without modification.
It does not sound like the patches will be pushed back to stable at this
time.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernels 3.4 slower due to allocation workqueue
2013-04-17 14:11 ` Mark Tinguely
@ 2013-04-17 14:35 ` Yann Dupont
0 siblings, 0 replies; 8+ messages in thread
From: Yann Dupont @ 2013-04-17 14:35 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
Le 17/04/2013 16:11, Mark Tinguely a écrit :
> On 04/17/13 08:44, Yann Dupont wrote:
>> Le 16/04/2013 15:26, Mark Tinguely a écrit :
>>>
>>> The filesystem can hang but only if the OS cannot allocate another
>>> worker when doing certain calls.
>>>
>>> --Mark.
>>
>> Ok, any chance to see these fixes backported to 3.4 ?
>>
>> Cheers,
>>
>
> All of the patches apply to Linux 3.4 without modification.
Yes, sorry, I should have said incorporated/pushed back
>
> It does not sound like the patches will be pushed back to stable at
> this time.
If you consider, for the moment, not pushing the patches in 3.4,
(because of stability concerns, or you judge these patches intrusives,
or whatever good reason), I won't do either, because I have no vision of
the potential problems with the patched kernel. I always try to stay on
pure vanilla kernels, longterm if possible, because stability is the
more important.
My best bet is probably 3.8 kernels right now (even if they are not
longterm), or maybe simply going back to 3.2.
Thanks for your answers,
Cheers
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-04-17 14:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15 9:39 kernels 3.4 slower due to allocation workqueue Yann Dupont
2013-04-15 13:45 ` Mark Tinguely
2013-04-16 7:24 ` Yann Dupont
2013-04-16 8:37 ` Yann Dupont
2013-04-16 13:26 ` Mark Tinguely
2013-04-17 13:44 ` Yann Dupont
2013-04-17 14:11 ` Mark Tinguely
2013-04-17 14:35 ` Yann Dupont
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox