* [BUG] Lockless patches cause hardlock under heavy IO @ 2008-06-18 21:15 Ryan Hope 2008-06-18 21:28 ` Arjan van de Ven 2008-06-19 8:12 ` Peter Zijlstra 0 siblings, 2 replies; 25+ messages in thread From: Ryan Hope @ 2008-06-18 21:15 UTC (permalink / raw) To: linux-mm, LKML I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and they caused a hardlock under heavy IO: x86-implement-pte_special.patch mm-introduce-get_user_pages_fast.patch mm-introduce-get_user_pages_fast-fix.patch mm-introduce-get_user_pages_fast-checkpatch-fixes.patch x86-lockless-get_user_pages_fast.patch x86-lockless-get_user_pages_fast-checkpatch-fixes.patch x86-lockless-get_user_pages_fast-fix.patch x86-lockless-get_user_pages_fast-fix-2.patch x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch x86-lockless-get_user_pages_fast-fix-warning.patch dio-use-get_user_pages_fast.patch splice-use-get_user_pages_fast.patch x86-support-1gb-hugepages-with-get_user_pages_lockless.patch # mm-readahead-scan-lockless.patch radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch #mm-speculative-page-references.patch: clameter saw bustage mm-speculative-page-references.patch mm-speculative-page-references-fix.patch mm-speculative-page-references-fix-fix.patch mm-speculative-page-references-hugh-fix3.patch mm-lockless-pagecache.patch mm-spinlock-tree_lock.patch powerpc-implement-pte_special.patch I am on an x86_64. I dont know what other info you need... -Ryan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-18 21:15 [BUG] Lockless patches cause hardlock under heavy IO Ryan Hope @ 2008-06-18 21:28 ` Arjan van de Ven 2008-06-19 14:45 ` Ryan Hope 2008-06-19 8:12 ` Peter Zijlstra 1 sibling, 1 reply; 25+ messages in thread From: Arjan van de Ven @ 2008-06-18 21:28 UTC (permalink / raw) To: Ryan Hope; +Cc: linux-mm, LKML On Wed, 18 Jun 2008 17:15:08 -0400 "Ryan Hope" <rmh3093@gmail.com> wrote: > I applied the following patches from 2.6-26-rc5-mm3 > > I am on an x86_64. I dont know what other info you need... > if it's locking related, enabling LOCKDEP is a first good test to do. CONFIG_PROVE_LOCKING=y as well as the various spinlock/mutex lock debug questions -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-18 21:28 ` Arjan van de Ven @ 2008-06-19 14:45 ` Ryan Hope 2008-06-20 0:05 ` Arjan van de Ven 0 siblings, 1 reply; 25+ messages in thread From: Ryan Hope @ 2008-06-19 14:45 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-mm, LKML Well enabling these debug options is sorta useless because once it hardlocks I cant see or do anything.. On Wed, Jun 18, 2008 at 5:28 PM, Arjan van de Ven <arjan@infradead.org> wrote: > On Wed, 18 Jun 2008 17:15:08 -0400 > "Ryan Hope" <rmh3093@gmail.com> wrote: > >> I applied the following patches from 2.6-26-rc5-mm3 > >> >> I am on an x86_64. I dont know what other info you need... >> > > if it's locking related, enabling LOCKDEP is a first good test to do. > > CONFIG_PROVE_LOCKING=y > as well as the various spinlock/mutex lock debug questions > > > > -- > If you want to reach me at my work email, use arjan@linux.intel.com > For development, discussion and tips for power savings, > visit http://www.lesswatts.org > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-19 14:45 ` Ryan Hope @ 2008-06-20 0:05 ` Arjan van de Ven 0 siblings, 0 replies; 25+ messages in thread From: Arjan van de Ven @ 2008-06-20 0:05 UTC (permalink / raw) To: Ryan Hope; +Cc: linux-mm, LKML On Thu, 19 Jun 2008 10:45:43 -0400 "Ryan Hope" <rmh3093@gmail.com> wrote: > Well enabling these debug options is sorta useless because once it > hardlocks I cant see or do anything.. > the nice thing about lockdep is that it spots *potential* deadlocks; often well before the actual deadlock happens... -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-18 21:15 [BUG] Lockless patches cause hardlock under heavy IO Ryan Hope 2008-06-18 21:28 ` Arjan van de Ven @ 2008-06-19 8:12 ` Peter Zijlstra 2008-06-19 8:19 ` Nick Piggin 1 sibling, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2008-06-19 8:12 UTC (permalink / raw) To: Ryan Hope; +Cc: linux-mm, LKML, Nick Piggin On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and > they caused a hardlock under heavy IO: What kind of machine, how much memory, how many spindles, what filesystem and what is heavy load? Furthermore, try the NMI watchdog with serial/net-console to capture its output. > x86-implement-pte_special.patch > mm-introduce-get_user_pages_fast.patch > mm-introduce-get_user_pages_fast-fix.patch > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch > x86-lockless-get_user_pages_fast.patch > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch > x86-lockless-get_user_pages_fast-fix.patch > x86-lockless-get_user_pages_fast-fix-2.patch > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch > x86-lockless-get_user_pages_fast-fix-warning.patch > dio-use-get_user_pages_fast.patch > splice-use-get_user_pages_fast.patch > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch > # > mm-readahead-scan-lockless.patch > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch > #mm-speculative-page-references.patch: clameter saw bustage > mm-speculative-page-references.patch > mm-speculative-page-references-fix.patch > mm-speculative-page-references-fix-fix.patch > mm-speculative-page-references-hugh-fix3.patch > mm-lockless-pagecache.patch > mm-spinlock-tree_lock.patch > powerpc-implement-pte_special.patch > > I am on an x86_64. I dont know what other info you need... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-19 8:12 ` Peter Zijlstra @ 2008-06-19 8:19 ` Nick Piggin 2008-06-19 14:52 ` Ryan Hope ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Nick Piggin @ 2008-06-19 8:19 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ryan Hope, linux-mm, LKML On Thursday 19 June 2008 18:12, Peter Zijlstra wrote: > On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: > > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and > > they caused a hardlock under heavy IO: > > What kind of machine, how much memory, how many spindles, what > filesystem and what is heavy load? > > Furthermore, try the NMI watchdog with serial/net-console to capture its > output. Good suggestions. A trace would be really helpful. As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be a good idea to turn on if you haven't already. BTW. what was the reason for applying those patches? Did you hit the problem with -mm also, and hope to narrow it down? > > x86-implement-pte_special.patch > > mm-introduce-get_user_pages_fast.patch > > mm-introduce-get_user_pages_fast-fix.patch > > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch > > x86-lockless-get_user_pages_fast.patch > > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch > > x86-lockless-get_user_pages_fast-fix.patch > > x86-lockless-get_user_pages_fast-fix-2.patch > > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch > > x86-lockless-get_user_pages_fast-fix-warning.patch > > dio-use-get_user_pages_fast.patch > > splice-use-get_user_pages_fast.patch > > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch > > # > > mm-readahead-scan-lockless.patch > > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch > > #mm-speculative-page-references.patch: clameter saw bustage > > mm-speculative-page-references.patch > > mm-speculative-page-references-fix.patch > > mm-speculative-page-references-fix-fix.patch > > mm-speculative-page-references-hugh-fix3.patch > > mm-lockless-pagecache.patch > > mm-spinlock-tree_lock.patch > > powerpc-implement-pte_special.patch > > > > I am on an x86_64. I dont know what other info you need... Can you isolate it to one of the two groups of patches? I suspect it might be the latter so you might try that first -- this version of speculative page references is very nice in theory but it is a little more complex to implement the slowpaths so it could be an error there. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-19 8:19 ` Nick Piggin @ 2008-06-19 14:52 ` Ryan Hope 2008-06-19 20:31 ` Ryan Hope ` (2 subsequent siblings) 3 siblings, 0 replies; 25+ messages in thread From: Ryan Hope @ 2008-06-19 14:52 UTC (permalink / raw) To: Nick Piggin; +Cc: Peter Zijlstra, linux-mm, LKML The reason for applying these patches was because users of my patchset have been wanting me to include lockless again. It was pretty popular among the users but we removed it because it would cause hardlocks. I though I would try it out again now that its in -mm. I guess I could start reverting patches and see if the issue goes away. On Thu, Jun 19, 2008 at 4:19 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Thursday 19 June 2008 18:12, Peter Zijlstra wrote: >> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: >> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and >> > they caused a hardlock under heavy IO: >> >> What kind of machine, how much memory, how many spindles, what >> filesystem and what is heavy load? >> >> Furthermore, try the NMI watchdog with serial/net-console to capture its >> output. > > > Good suggestions. A trace would be really helpful. > > As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be > a good idea to turn on if you haven't already. > > BTW. what was the reason for applying those patches? Did you hit the > problem with -mm also, and hope to narrow it down? > > >> > x86-implement-pte_special.patch >> > mm-introduce-get_user_pages_fast.patch >> > mm-introduce-get_user_pages_fast-fix.patch >> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast.patch >> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast-fix.patch >> > x86-lockless-get_user_pages_fast-fix-2.patch >> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch >> > x86-lockless-get_user_pages_fast-fix-warning.patch >> > dio-use-get_user_pages_fast.patch >> > splice-use-get_user_pages_fast.patch >> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch >> > # >> > mm-readahead-scan-lockless.patch >> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch >> > #mm-speculative-page-references.patch: clameter saw bustage >> > mm-speculative-page-references.patch >> > mm-speculative-page-references-fix.patch >> > mm-speculative-page-references-fix-fix.patch >> > mm-speculative-page-references-hugh-fix3.patch >> > mm-lockless-pagecache.patch >> > mm-spinlock-tree_lock.patch >> > powerpc-implement-pte_special.patch >> > >> > I am on an x86_64. I dont know what other info you need... > > Can you isolate it to one of the two groups of patches? I suspect it > might be the latter so you might try that first -- this version of > speculative page references is very nice in theory but it is a little > more complex to implement the slowpaths so it could be an error there. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-19 8:19 ` Nick Piggin 2008-06-19 14:52 ` Ryan Hope @ 2008-06-19 20:31 ` Ryan Hope 2008-06-20 14:33 ` Ryan Hope 2008-06-22 14:37 ` Ryan Hope 3 siblings, 0 replies; 25+ messages in thread From: Ryan Hope @ 2008-06-19 20:31 UTC (permalink / raw) To: Nick Piggin; +Cc: Peter Zijlstra, linux-mm, LKML This seems to be hardlocking on anyone that has a 64bit processor, the only one who this has not locked on yet has a 32bit processor. I hope that helps, its the best I can come up with so far. -Ryan On Thu, Jun 19, 2008 at 4:19 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Thursday 19 June 2008 18:12, Peter Zijlstra wrote: >> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: >> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and >> > they caused a hardlock under heavy IO: >> >> What kind of machine, how much memory, how many spindles, what >> filesystem and what is heavy load? >> >> Furthermore, try the NMI watchdog with serial/net-console to capture its >> output. > > > Good suggestions. A trace would be really helpful. > > As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be > a good idea to turn on if you haven't already. > > BTW. what was the reason for applying those patches? Did you hit the > problem with -mm also, and hope to narrow it down? > > >> > x86-implement-pte_special.patch >> > mm-introduce-get_user_pages_fast.patch >> > mm-introduce-get_user_pages_fast-fix.patch >> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast.patch >> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast-fix.patch >> > x86-lockless-get_user_pages_fast-fix-2.patch >> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch >> > x86-lockless-get_user_pages_fast-fix-warning.patch >> > dio-use-get_user_pages_fast.patch >> > splice-use-get_user_pages_fast.patch >> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch >> > # >> > mm-readahead-scan-lockless.patch >> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch >> > #mm-speculative-page-references.patch: clameter saw bustage >> > mm-speculative-page-references.patch >> > mm-speculative-page-references-fix.patch >> > mm-speculative-page-references-fix-fix.patch >> > mm-speculative-page-references-hugh-fix3.patch >> > mm-lockless-pagecache.patch >> > mm-spinlock-tree_lock.patch >> > powerpc-implement-pte_special.patch >> > >> > I am on an x86_64. I dont know what other info you need... > > Can you isolate it to one of the two groups of patches? I suspect it > might be the latter so you might try that first -- this version of > speculative page references is very nice in theory but it is a little > more complex to implement the slowpaths so it could be an error there. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-19 8:19 ` Nick Piggin 2008-06-19 14:52 ` Ryan Hope 2008-06-19 20:31 ` Ryan Hope @ 2008-06-20 14:33 ` Ryan Hope 2008-06-22 14:37 ` Ryan Hope 3 siblings, 0 replies; 25+ messages in thread From: Ryan Hope @ 2008-06-20 14:33 UTC (permalink / raw) To: Nick Piggin; +Cc: Peter Zijlstra, LKML Well if there are no more suggestion we are going to have to abandon testing lockless for now because it is causing hardlocks on everyones box who uses it. I hope the next round of patches has better luck. On Thu, Jun 19, 2008 at 4:19 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Thursday 19 June 2008 18:12, Peter Zijlstra wrote: >> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: >> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and >> > they caused a hardlock under heavy IO: >> >> What kind of machine, how much memory, how many spindles, what >> filesystem and what is heavy load? >> >> Furthermore, try the NMI watchdog with serial/net-console to capture its >> output. > > > Good suggestions. A trace would be really helpful. > > As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be > a good idea to turn on if you haven't already. > > BTW. what was the reason for applying those patches? Did you hit the > problem with -mm also, and hope to narrow it down? > > >> > x86-implement-pte_special.patch >> > mm-introduce-get_user_pages_fast.patch >> > mm-introduce-get_user_pages_fast-fix.patch >> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast.patch >> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast-fix.patch >> > x86-lockless-get_user_pages_fast-fix-2.patch >> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch >> > x86-lockless-get_user_pages_fast-fix-warning.patch >> > dio-use-get_user_pages_fast.patch >> > splice-use-get_user_pages_fast.patch >> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch >> > # >> > mm-readahead-scan-lockless.patch >> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch >> > #mm-speculative-page-references.patch: clameter saw bustage >> > mm-speculative-page-references.patch >> > mm-speculative-page-references-fix.patch >> > mm-speculative-page-references-fix-fix.patch >> > mm-speculative-page-references-hugh-fix3.patch >> > mm-lockless-pagecache.patch >> > mm-spinlock-tree_lock.patch >> > powerpc-implement-pte_special.patch >> > >> > I am on an x86_64. I dont know what other info you need... > > Can you isolate it to one of the two groups of patches? I suspect it > might be the latter so you might try that first -- this version of > speculative page references is very nice in theory but it is a little > more complex to implement the slowpaths so it could be an error there. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-19 8:19 ` Nick Piggin ` (2 preceding siblings ...) 2008-06-20 14:33 ` Ryan Hope @ 2008-06-22 14:37 ` Ryan Hope 2008-06-22 15:07 ` Peter Zijlstra 3 siblings, 1 reply; 25+ messages in thread From: Ryan Hope @ 2008-06-22 14:37 UTC (permalink / raw) To: Nick Piggin; +Cc: Peter Zijlstra, linux-mm, LKML Well I couldn't stop playing with this... I am pretty sure the cause of the hardlocks is in the second half of the patches (the speculative page ref patches). I reversed all of those patches so that just the GUP patchs were included and no more hardlocks... then I applied the concurrent page cache patches from the -rt branch include 1 OLD speculative page ref patch and this caused hardlocks for peopel again. However enabling heap randomization fixed the hardlocks for one of the users and the disabling swap fixed the issue of the other user. I hope this helps. -Ryan On Thu, Jun 19, 2008 at 4:19 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Thursday 19 June 2008 18:12, Peter Zijlstra wrote: >> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: >> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and >> > they caused a hardlock under heavy IO: >> >> What kind of machine, how much memory, how many spindles, what >> filesystem and what is heavy load? >> >> Furthermore, try the NMI watchdog with serial/net-console to capture its >> output. > > > Good suggestions. A trace would be really helpful. > > As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be > a good idea to turn on if you haven't already. > > BTW. what was the reason for applying those patches? Did you hit the > problem with -mm also, and hope to narrow it down? > > >> > x86-implement-pte_special.patch >> > mm-introduce-get_user_pages_fast.patch >> > mm-introduce-get_user_pages_fast-fix.patch >> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast.patch >> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch >> > x86-lockless-get_user_pages_fast-fix.patch >> > x86-lockless-get_user_pages_fast-fix-2.patch >> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch >> > x86-lockless-get_user_pages_fast-fix-warning.patch >> > dio-use-get_user_pages_fast.patch >> > splice-use-get_user_pages_fast.patch >> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch >> > # >> > mm-readahead-scan-lockless.patch >> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch >> > #mm-speculative-page-references.patch: clameter saw bustage >> > mm-speculative-page-references.patch >> > mm-speculative-page-references-fix.patch >> > mm-speculative-page-references-fix-fix.patch >> > mm-speculative-page-references-hugh-fix3.patch >> > mm-lockless-pagecache.patch >> > mm-spinlock-tree_lock.patch >> > powerpc-implement-pte_special.patch >> > >> > I am on an x86_64. I dont know what other info you need... > > Can you isolate it to one of the two groups of patches? I suspect it > might be the latter so you might try that first -- this version of > speculative page references is very nice in theory but it is a little > more complex to implement the slowpaths so it could be an error there. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-22 14:37 ` Ryan Hope @ 2008-06-22 15:07 ` Peter Zijlstra 2008-06-22 15:18 ` Ryan Hope 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2008-06-22 15:07 UTC (permalink / raw) To: Ryan Hope; +Cc: Nick Piggin, linux-mm, LKML On Sun, 2008-06-22 at 10:37 -0400, Ryan Hope wrote: > Well I couldn't stop playing with this... I am pretty sure the cause > of the hardlocks is in the second half of the patches (the speculative > page ref patches). I reversed all of those patches so that just the > GUP patchs were included and no more hardlocks... then I applied the > concurrent page cache patches from the -rt branch include 1 OLD > speculative page ref patch and this caused hardlocks for peopel again. > However enabling heap randomization fixed the hardlocks for one of the > users and the disabling swap fixed the issue of the other user. I hope > this helps. What are people doing to make it hang? > On Thu, Jun 19, 2008 at 4:19 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > On Thursday 19 June 2008 18:12, Peter Zijlstra wrote: > >> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: > >> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and > >> > they caused a hardlock under heavy IO: > >> > >> What kind of machine, how much memory, how many spindles, what > >> filesystem and what is heavy load? > >> > >> Furthermore, try the NMI watchdog with serial/net-console to capture its > >> output. > > > > > > Good suggestions. A trace would be really helpful. > > > > As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be > > a good idea to turn on if you haven't already. > > > > BTW. what was the reason for applying those patches? Did you hit the > > problem with -mm also, and hope to narrow it down? > > > > > >> > x86-implement-pte_special.patch > >> > mm-introduce-get_user_pages_fast.patch > >> > mm-introduce-get_user_pages_fast-fix.patch > >> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch > >> > x86-lockless-get_user_pages_fast.patch > >> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch > >> > x86-lockless-get_user_pages_fast-fix.patch > >> > x86-lockless-get_user_pages_fast-fix-2.patch > >> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch > >> > x86-lockless-get_user_pages_fast-fix-warning.patch > >> > dio-use-get_user_pages_fast.patch > >> > splice-use-get_user_pages_fast.patch > >> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch > >> > # > >> > mm-readahead-scan-lockless.patch > >> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch > >> > #mm-speculative-page-references.patch: clameter saw bustage > >> > mm-speculative-page-references.patch > >> > mm-speculative-page-references-fix.patch > >> > mm-speculative-page-references-fix-fix.patch > >> > mm-speculative-page-references-hugh-fix3.patch > >> > mm-lockless-pagecache.patch > >> > mm-spinlock-tree_lock.patch > >> > powerpc-implement-pte_special.patch > >> > > >> > I am on an x86_64. I dont know what other info you need... > > > > Can you isolate it to one of the two groups of patches? I suspect it > > might be the latter so you might try that first -- this version of > > speculative page references is very nice in theory but it is a little > > more complex to implement the slowpaths so it could be an error there. > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-22 15:07 ` Peter Zijlstra @ 2008-06-22 15:18 ` Ryan Hope 2008-06-23 2:29 ` Nick Piggin 0 siblings, 1 reply; 25+ messages in thread From: Ryan Hope @ 2008-06-22 15:18 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Nick Piggin, linux-mm, LKML Well in the current version of the patchset we are using, one user would start playing some game (disabling "Disable Heap Randomization" fixed the hardlocks for him... the other user got hardlocks when copying an ISO from a reiser4 partition to a reiserfs partition (disabling swap fixed the issue for him). On Sun, Jun 22, 2008 at 11:07 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Sun, 2008-06-22 at 10:37 -0400, Ryan Hope wrote: >> Well I couldn't stop playing with this... I am pretty sure the cause >> of the hardlocks is in the second half of the patches (the speculative >> page ref patches). I reversed all of those patches so that just the >> GUP patchs were included and no more hardlocks... then I applied the >> concurrent page cache patches from the -rt branch include 1 OLD >> speculative page ref patch and this caused hardlocks for peopel again. >> However enabling heap randomization fixed the hardlocks for one of the >> users and the disabling swap fixed the issue of the other user. I hope >> this helps. > > What are people doing to make it hang? > >> On Thu, Jun 19, 2008 at 4:19 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: >> > On Thursday 19 June 2008 18:12, Peter Zijlstra wrote: >> >> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote: >> >> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and >> >> > they caused a hardlock under heavy IO: >> >> >> >> What kind of machine, how much memory, how many spindles, what >> >> filesystem and what is heavy load? >> >> >> >> Furthermore, try the NMI watchdog with serial/net-console to capture its >> >> output. >> > >> > >> > Good suggestions. A trace would be really helpful. >> > >> > As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be >> > a good idea to turn on if you haven't already. >> > >> > BTW. what was the reason for applying those patches? Did you hit the >> > problem with -mm also, and hope to narrow it down? >> > >> > >> >> > x86-implement-pte_special.patch >> >> > mm-introduce-get_user_pages_fast.patch >> >> > mm-introduce-get_user_pages_fast-fix.patch >> >> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch >> >> > x86-lockless-get_user_pages_fast.patch >> >> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch >> >> > x86-lockless-get_user_pages_fast-fix.patch >> >> > x86-lockless-get_user_pages_fast-fix-2.patch >> >> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch >> >> > x86-lockless-get_user_pages_fast-fix-warning.patch >> >> > dio-use-get_user_pages_fast.patch >> >> > splice-use-get_user_pages_fast.patch >> >> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch >> >> > # >> >> > mm-readahead-scan-lockless.patch >> >> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch >> >> > #mm-speculative-page-references.patch: clameter saw bustage >> >> > mm-speculative-page-references.patch >> >> > mm-speculative-page-references-fix.patch >> >> > mm-speculative-page-references-fix-fix.patch >> >> > mm-speculative-page-references-hugh-fix3.patch >> >> > mm-lockless-pagecache.patch >> >> > mm-spinlock-tree_lock.patch >> >> > powerpc-implement-pte_special.patch >> >> > >> >> > I am on an x86_64. I dont know what other info you need... >> > >> > Can you isolate it to one of the two groups of patches? I suspect it >> > might be the latter so you might try that first -- this version of >> > speculative page references is very nice in theory but it is a little >> > more complex to implement the slowpaths so it could be an error there. >> > > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-22 15:18 ` Ryan Hope @ 2008-06-23 2:29 ` Nick Piggin 2008-06-23 3:51 ` Ryan Hope 2008-06-23 23:48 ` Zan Lynx 0 siblings, 2 replies; 25+ messages in thread From: Nick Piggin @ 2008-06-23 2:29 UTC (permalink / raw) To: Ryan Hope; +Cc: Peter Zijlstra, linux-mm, LKML On Monday 23 June 2008 01:18, Ryan Hope wrote: > Well in the current version of the patchset we are using, one user > would start playing some game (disabling "Disable Heap Randomization" > fixed the hardlocks for him... the other user got hardlocks when > copying an ISO from a reiser4 partition to a reiserfs partition > (disabling swap fixed the issue for him). Hmm, nobody has reported such a hang with -mm yet, so maybe it is another interaction in the patchset. OTOH, probably nobody much uses -mm and reiser4, and reiser4 does lots of weird fiddling with pagecache so it could be broken in -mm even. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-23 2:29 ` Nick Piggin @ 2008-06-23 3:51 ` Ryan Hope 2008-06-23 3:56 ` Nick Piggin 2008-06-23 11:54 ` Nick Piggin 2008-06-23 23:48 ` Zan Lynx 1 sibling, 2 replies; 25+ messages in thread From: Ryan Hope @ 2008-06-23 3:51 UTC (permalink / raw) To: Nick Piggin; +Cc: Peter Zijlstra, linux-mm, LKML well i get the hardlock on -mm with out using reiser4, i am pretty sure is swap related On Sun, Jun 22, 2008 at 10:29 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Monday 23 June 2008 01:18, Ryan Hope wrote: >> Well in the current version of the patchset we are using, one user >> would start playing some game (disabling "Disable Heap Randomization" >> fixed the hardlocks for him... the other user got hardlocks when >> copying an ISO from a reiser4 partition to a reiserfs partition >> (disabling swap fixed the issue for him). > > Hmm, nobody has reported such a hang with -mm yet, so maybe it > is another interaction in the patchset. OTOH, probably nobody > much uses -mm and reiser4, and reiser4 does lots of weird fiddling > with pagecache so it could be broken in -mm even. > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-23 3:51 ` Ryan Hope @ 2008-06-23 3:56 ` Nick Piggin 2008-06-23 11:54 ` Nick Piggin 1 sibling, 0 replies; 25+ messages in thread From: Nick Piggin @ 2008-06-23 3:56 UTC (permalink / raw) To: Ryan Hope; +Cc: Peter Zijlstra, linux-mm, LKML On Monday 23 June 2008 13:51, Ryan Hope wrote: > well i get the hardlock on -mm with out using reiser4, i am pretty > sure is swap related Oh you do, OK good, it would be nice if I were able to reproduce it here. Any particular thing that triggers it? Preferably without running X or any proprietary software (eg. if you run a make -j128 kernel compile or something that forces a lot of swapping, does that lock up?). What filesystem? Can you also attach your .config No luck getting a backtrace out of the NMI watchdog? Thanks, Nick ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-23 3:51 ` Ryan Hope 2008-06-23 3:56 ` Nick Piggin @ 2008-06-23 11:54 ` Nick Piggin 2008-06-23 13:05 ` Paul E. McKenney 1 sibling, 1 reply; 25+ messages in thread From: Nick Piggin @ 2008-06-23 11:54 UTC (permalink / raw) To: Ryan Hope, Paul E. McKenney; +Cc: Peter Zijlstra, linux-mm, LKML On Monday 23 June 2008 13:51, Ryan Hope wrote: > well i get the hardlock on -mm with out using reiser4, i am pretty > sure is swap related The guys seeing hangs don't use PREEMPT_RCU, do they? In my swapping tests, I found -mm3 to be stable with classic RCU, but on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather quickly. First crash was in find_get_pages so I suspected lockless pagecache doing something subtly wrong with the RCU API, but I just got another crash in __d_lookup: BUG: unable to handle kernel paging request at ffff81004a139f38 IP: [<ffffffff802bb82c>] __d_lookup+0x8c/0x160 PGD 8063 PUD 7fc3f163 PMD 7df50163 PTE 800000004a139160 Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map CPU 0 Modules linked in: brd Pid: 29563, comm: cc1 Not tainted 2.6.26-rc5-mm3 #467 RIP: 0010:[<ffffffff802bb82c>] [<ffffffff802bb82c>] __d_lookup+0x8c/0x160 RSP: 0018:ffff81004bf7dba8 EFLAGS: 00010282 RAX: 0000000000000007 RBX: ffff81004a139f38 RCX: 0000000000000000 RDX: ffff810028057808 RSI: 0000000000000000 RDI: ffff81004bf7a880 RBP: ffff81004bf7dbf8 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffff81004a139ef8 R13: 0000000073885cf7 R14: ffff810070f53ef8 R15: ffff81004bf7dca8 FS: 00002abe0a1decf0(0000) GS:ffffffff80779dc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff81004a139f38 CR3: 0000000057569000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cc1 (pid: 29563, threadinfo ffff81004bf7c000, task ffff81004bf7a880) Stack: 0000000100000001 0000000000000007 ffff810070f53f00 00000007000041ed ffff810001ce2013 ffff81004bf7dca8 00000000000041ed ffff81004bf7de48 ffff81004bf7dca8 ffff81004bf7dcb8 ffff81004bf7dc48 ffffffff802af2b5 Call Trace: [<ffffffff802af2b5>] do_lookup+0x35/0x230 [<ffffffff80312d60>] ? ext3_permission+0x10/0x20 [<ffffffff802b0cbb>] __link_path_walk+0x39b/0x10a0 [<ffffffff802b1a26>] path_walk+0x66/0xd0 [<ffffffff802b1cde>] do_path_lookup+0x9e/0x240 [<ffffffff802b21d7>] __path_lookup_intent_open+0x67/0xd0 [<ffffffff802b224c>] path_lookup_open+0xc/0x10 [<ffffffff802b31ba>] do_filp_open+0xaa/0x9f0 [<ffffffff805445f0>] ? _spin_unlock+0x30/0x60 [<ffffffff802a467d>] ? get_unused_fd_flags+0xed/0x140 [<ffffffff802a4746>] do_sys_open+0x76/0x100 [<ffffffff802a47fb>] sys_open+0x1b/0x20 [<ffffffff8020b90b>] system_call_after_swapgs+0x7b/0x80 This path is completely independent of the pagecache, but it does also use RCU, so I suspect PREEMPT_RCU is freeing things before the proper grace period. These are showing up as oopses for me because I have DEBUG_PAGEALLOC set, but if you don't have that set then you'll get much more subtle corruption. Here is the find_get_pages bug FYI: BUG: unable to handle kernel paging request at ffff8100c7997de0 IP: [<ffffffff802732ee>] find_get_pages+0xce/0x130 PGD 8063 PUD 7fa6e163 PMD cfa64163 PTE 80000000c7997163 Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map CPU 0 Modules linked in: brd Pid: 446, comm: kswapd0 Not tainted 2.6.26-rc5-mm3 #465 RIP: 0010:[<ffffffff802732ee>] [<ffffffff802732ee>] find_get_pages+0xce/0x130 RSP: 0000:ffff81007e4cbbf0 EFLAGS: 00010246 RAX: ffff8100c7997de0 RBX: ffff81007e4cbc90 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffe2000447f080 RBP: ffff81007e4cbc30 R08: ffffe2000447f088 R09: 0000000000000004 R10: 0000000000000040 R11: 0000000000000040 R12: 0000000000000000 R13: ffff81007e4cbc90 R14: ffff8100c7996e18 R15: 0000000000000000 240 97 7184 1FS: 00002b774a14ccf0(0000) GS:ffffffff807e5dc0(0000) knlGS:0000 000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 2204 25364 4164CR2: ffff8100c7997de0 CR3: 0000000000201000 CR4: 00000000000006e 0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 88 0 8 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 000000000000040 0 Process kswapd0 (pid: 446, threadinfo ffff81007e4ca000, task ffff81007e4d2a00) ffff81007e4cbcb0 0000000e00000000000000437 65 35 0 0 ffff81007e4cbc80 0000000000000080 0000000000000052 0000000000000000 ffffffffffffffff ffff81007e4cbc50 ffffffff8027dcdf 0000000000000000 ffff8100c7996c28 Call Trace: [<ffffffff8027dcdf>] pagevec_lookup+0x1f/0x30 [<ffffffff8027ef43>] __invalidate_mapping_pages+0x83/0x1b0 [<ffffffff8027f07b>] invalidate_mapping_pages+0xb/0x10 [<ffffffff802be2a3>] shrink_icache_memory+0x293/0x2a0 [<ffffffff80281632>] ? shrink_slab+0x32/0x220 [<ffffffff8028172d>] shrink_slab+0x12d/0x220 [<ffffffff8028202a>] kswapd+0x53a/0x670 [<ffffffff8027f830>] ? isolate_pages_global+0x0/0x280 [<ffffffff805a1ada>] ? thread_return+0xa6/0x3bc [<ffffffff802513f0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff80281af0>] ? kswapd+0x0/0x670 [<ffffffff80251059>] kthread+0x49/0x80 [<ffffffff8020c878>] child_rip+0xa/0x12 [<ffffffff8020bf63>] ? restore_args+0x0/0x30 [<ffffffff80251010>] ? kthread+0x0/0x80 [<ffffffff8020c86e>] ? child_rip+0x0/0x12 If you're not using PREEMPT_RCU, then I'm stumped for the moment. You'll have to send .configs over... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-23 11:54 ` Nick Piggin @ 2008-06-23 13:05 ` Paul E. McKenney 2008-06-24 0:13 ` Nick Piggin 0 siblings, 1 reply; 25+ messages in thread From: Paul E. McKenney @ 2008-06-23 13:05 UTC (permalink / raw) To: Nick Piggin; +Cc: Ryan Hope, Peter Zijlstra, linux-mm, LKML On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: > On Monday 23 June 2008 13:51, Ryan Hope wrote: > > well i get the hardlock on -mm with out using reiser4, i am pretty > > sure is swap related > > The guys seeing hangs don't use PREEMPT_RCU, do they? > > In my swapping tests, I found -mm3 to be stable with classic RCU, but > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather > quickly. First crash was in find_get_pages so I suspected lockless > pagecache doing something subtly wrong with the RCU API, but I just got > another crash in __d_lookup: Could you please send me a repeat-by? (At least Alexey is no longer alone!) Thanx, Paul > BUG: unable to handle kernel paging request at ffff81004a139f38 > IP: [<ffffffff802bb82c>] __d_lookup+0x8c/0x160 > PGD 8063 PUD 7fc3f163 PMD 7df50163 PTE 800000004a139160 > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map > CPU 0 > Modules linked in: brd > Pid: 29563, comm: cc1 Not tainted 2.6.26-rc5-mm3 #467 > RIP: 0010:[<ffffffff802bb82c>] [<ffffffff802bb82c>] __d_lookup+0x8c/0x160 > RSP: 0018:ffff81004bf7dba8 EFLAGS: 00010282 > RAX: 0000000000000007 RBX: ffff81004a139f38 RCX: 0000000000000000 > RDX: ffff810028057808 RSI: 0000000000000000 RDI: ffff81004bf7a880 > RBP: ffff81004bf7dbf8 R08: 0000000000000001 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000001 R12: ffff81004a139ef8 > R13: 0000000073885cf7 R14: ffff810070f53ef8 R15: ffff81004bf7dca8 > FS: 00002abe0a1decf0(0000) GS:ffffffff80779dc0(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffff81004a139f38 CR3: 0000000057569000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process cc1 (pid: 29563, threadinfo ffff81004bf7c000, task ffff81004bf7a880) > Stack: 0000000100000001 0000000000000007 ffff810070f53f00 00000007000041ed > ffff810001ce2013 ffff81004bf7dca8 00000000000041ed ffff81004bf7de48 > ffff81004bf7dca8 ffff81004bf7dcb8 ffff81004bf7dc48 ffffffff802af2b5 > Call Trace: > [<ffffffff802af2b5>] do_lookup+0x35/0x230 > [<ffffffff80312d60>] ? ext3_permission+0x10/0x20 > [<ffffffff802b0cbb>] __link_path_walk+0x39b/0x10a0 > [<ffffffff802b1a26>] path_walk+0x66/0xd0 > [<ffffffff802b1cde>] do_path_lookup+0x9e/0x240 > [<ffffffff802b21d7>] __path_lookup_intent_open+0x67/0xd0 > [<ffffffff802b224c>] path_lookup_open+0xc/0x10 > [<ffffffff802b31ba>] do_filp_open+0xaa/0x9f0 > [<ffffffff805445f0>] ? _spin_unlock+0x30/0x60 > [<ffffffff802a467d>] ? get_unused_fd_flags+0xed/0x140 > [<ffffffff802a4746>] do_sys_open+0x76/0x100 > [<ffffffff802a47fb>] sys_open+0x1b/0x20 > [<ffffffff8020b90b>] system_call_after_swapgs+0x7b/0x80 > > This path is completely independent of the pagecache, but it does > also use RCU, so I suspect PREEMPT_RCU is freeing things before > the proper grace period. These are showing up as oopses for me > because I have DEBUG_PAGEALLOC set, but if you don't have that set > then you'll get much more subtle corruption. > > Here is the find_get_pages bug FYI: > BUG: unable to handle kernel paging request at ffff8100c7997de0 > IP: [<ffffffff802732ee>] find_get_pages+0xce/0x130 > PGD 8063 PUD 7fa6e163 PMD cfa64163 PTE 80000000c7997163 > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map > CPU 0 > Modules linked in: brd > Pid: 446, comm: kswapd0 Not tainted 2.6.26-rc5-mm3 #465 > RIP: 0010:[<ffffffff802732ee>] [<ffffffff802732ee>] find_get_pages+0xce/0x130 > RSP: 0000:ffff81007e4cbbf0 EFLAGS: 00010246 > RAX: ffff8100c7997de0 RBX: ffff81007e4cbc90 RCX: 0000000000000001 > RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffe2000447f080 > RBP: ffff81007e4cbc30 R08: ffffe2000447f088 R09: 0000000000000004 > R10: 0000000000000040 R11: 0000000000000040 R12: 0000000000000000 > R13: ffff81007e4cbc90 R14: ffff8100c7996e18 R15: 0000000000000000 > 240 97 7184 1FS: 00002b774a14ccf0(0000) GS:ffffffff807e5dc0(0000) > knlGS:0000 > 000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > 2204 25364 4164CR2: ffff8100c7997de0 CR3: 0000000000201000 CR4: > 00000000000006e > 0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > 88 0 8 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 000000000000040 > 0 > Process kswapd0 (pid: 446, threadinfo ffff81007e4ca000, task ffff81007e4d2a00) > ffff81007e4cbcb0 0000000e00000000000000437 65 35 0 0 > ffff81007e4cbc80 > 0000000000000080 0000000000000052 0000000000000000 ffffffffffffffff > ffff81007e4cbc50 ffffffff8027dcdf 0000000000000000 ffff8100c7996c28 > Call Trace: > [<ffffffff8027dcdf>] pagevec_lookup+0x1f/0x30 > [<ffffffff8027ef43>] __invalidate_mapping_pages+0x83/0x1b0 > [<ffffffff8027f07b>] invalidate_mapping_pages+0xb/0x10 > [<ffffffff802be2a3>] shrink_icache_memory+0x293/0x2a0 > [<ffffffff80281632>] ? shrink_slab+0x32/0x220 > [<ffffffff8028172d>] shrink_slab+0x12d/0x220 > [<ffffffff8028202a>] kswapd+0x53a/0x670 > [<ffffffff8027f830>] ? isolate_pages_global+0x0/0x280 > [<ffffffff805a1ada>] ? thread_return+0xa6/0x3bc > [<ffffffff802513f0>] ? autoremove_wake_function+0x0/0x40 > [<ffffffff80281af0>] ? kswapd+0x0/0x670 > [<ffffffff80251059>] kthread+0x49/0x80 > [<ffffffff8020c878>] child_rip+0xa/0x12 > [<ffffffff8020bf63>] ? restore_args+0x0/0x30 > [<ffffffff80251010>] ? kthread+0x0/0x80 > [<ffffffff8020c86e>] ? child_rip+0x0/0x12 > > If you're not using PREEMPT_RCU, then I'm stumped for the moment. You'll > have to send .configs over... > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-23 13:05 ` Paul E. McKenney @ 2008-06-24 0:13 ` Nick Piggin 2008-06-24 15:12 ` Ryan Hope 0 siblings, 1 reply; 25+ messages in thread From: Nick Piggin @ 2008-06-24 0:13 UTC (permalink / raw) To: paulmck; +Cc: Ryan Hope, Peter Zijlstra, linux-mm, LKML On Monday 23 June 2008 23:05, Paul E. McKenney wrote: > On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: > > On Monday 23 June 2008 13:51, Ryan Hope wrote: > > > well i get the hardlock on -mm with out using reiser4, i am pretty > > > sure is swap related > > > > The guys seeing hangs don't use PREEMPT_RCU, do they? > > > > In my swapping tests, I found -mm3 to be stable with classic RCU, but > > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather > > quickly. First crash was in find_get_pages so I suspected lockless > > pagecache doing something subtly wrong with the RCU API, but I just got > > another crash in __d_lookup: > > Could you please send me a repeat-by? (At least Alexey is no longer > alone!) OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably important to reproduce it (but the fact that I'm reproducing oopses with << PAGE_SIZE objects like dentries and radix tree nodes indicates that there is even more free-before-grace activity going undetected -- if you construct a test case using full pages, it might become even easier to detect with DEBUG_PAGEALLOC). 2 socket, 8 core x86 system. I mounted two tmpfs filesystems, one contains a single large file which is formatted as 1K block size ext3 and mounted loopback, the other is used directly. Linux kernel source is unpacked on each mount and concurrent make -j128 on each. This pushes it pretty hard into swap. Classic RCU survived another 5 hours of this last night. But that's a fairly convoluted test for an RCU problem. I expect it should be easier to trigger with something more targetted... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-24 0:13 ` Nick Piggin @ 2008-06-24 15:12 ` Ryan Hope 2008-06-24 15:32 ` Paul E. McKenney 0 siblings, 1 reply; 25+ messages in thread From: Ryan Hope @ 2008-06-24 15:12 UTC (permalink / raw) To: Nick Piggin; +Cc: paulmck, Peter Zijlstra, linux-mm, LKML Well i tried to run pure -mm this weekend, it locked as soon as I got into gnome so I applied a couple of the bug fixes from lkml and -mm seems to be running stable now. I cant seem to get it to hard lock now, at least not doing the simple stuff that was causing it to hard lock on my other patchset, either the lockless patches expose some bug that in -rc6 or lockless requires some other patches further up in the -mm series file. On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > On Monday 23 June 2008 23:05, Paul E. McKenney wrote: >> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: >> > On Monday 23 June 2008 13:51, Ryan Hope wrote: >> > > well i get the hardlock on -mm with out using reiser4, i am pretty >> > > sure is swap related >> > >> > The guys seeing hangs don't use PREEMPT_RCU, do they? >> > >> > In my swapping tests, I found -mm3 to be stable with classic RCU, but >> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather >> > quickly. First crash was in find_get_pages so I suspected lockless >> > pagecache doing something subtly wrong with the RCU API, but I just got >> > another crash in __d_lookup: >> >> Could you please send me a repeat-by? (At least Alexey is no longer >> alone!) > > OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably > important to reproduce it (but the fact that I'm reproducing oopses > with << PAGE_SIZE objects like dentries and radix tree nodes indicates > that there is even more free-before-grace activity going undetected -- > if you construct a test case using full pages, it might become even > easier to detect with DEBUG_PAGEALLOC). > > 2 socket, 8 core x86 system. > > I mounted two tmpfs filesystems, one contains a single large file > which is formatted as 1K block size ext3 and mounted loopback, the > other is used directly. Linux kernel source is unpacked on each mount > and concurrent make -j128 on each. This pushes it pretty hard into > swap. Classic RCU survived another 5 hours of this last night. > > But that's a fairly convoluted test for an RCU problem. I expect it > should be easier to trigger with something more targetted... > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-24 15:12 ` Ryan Hope @ 2008-06-24 15:32 ` Paul E. McKenney 2008-06-24 15:57 ` Ryan Hope 0 siblings, 1 reply; 25+ messages in thread From: Paul E. McKenney @ 2008-06-24 15:32 UTC (permalink / raw) To: Ryan Hope; +Cc: Nick Piggin, Peter Zijlstra, linux-mm, LKML On Tue, Jun 24, 2008 at 11:12:03AM -0400, Ryan Hope wrote: > Well i tried to run pure -mm this weekend, it locked as soon as I got > into gnome so I applied a couple of the bug fixes from lkml and -mm > seems to be running stable now. I cant seem to get it to hard lock > now, at least not doing the simple stuff that was causing it to hard > lock on my other patchset, either the lockless patches expose some bug > that in -rc6 or lockless requires some other patches further up in the > -mm series file. Cool!!! Any guess as to which of the bug fixes did the trick? Failing that, a list of the bug fixes that you applied? Thanx, Paul > On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > On Monday 23 June 2008 23:05, Paul E. McKenney wrote: > >> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: > >> > On Monday 23 June 2008 13:51, Ryan Hope wrote: > >> > > well i get the hardlock on -mm with out using reiser4, i am pretty > >> > > sure is swap related > >> > > >> > The guys seeing hangs don't use PREEMPT_RCU, do they? > >> > > >> > In my swapping tests, I found -mm3 to be stable with classic RCU, but > >> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather > >> > quickly. First crash was in find_get_pages so I suspected lockless > >> > pagecache doing something subtly wrong with the RCU API, but I just got > >> > another crash in __d_lookup: > >> > >> Could you please send me a repeat-by? (At least Alexey is no longer > >> alone!) > > > > OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably > > important to reproduce it (but the fact that I'm reproducing oopses > > with << PAGE_SIZE objects like dentries and radix tree nodes indicates > > that there is even more free-before-grace activity going undetected -- > > if you construct a test case using full pages, it might become even > > easier to detect with DEBUG_PAGEALLOC). > > > > 2 socket, 8 core x86 system. > > > > I mounted two tmpfs filesystems, one contains a single large file > > which is formatted as 1K block size ext3 and mounted loopback, the > > other is used directly. Linux kernel source is unpacked on each mount > > and concurrent make -j128 on each. This pushes it pretty hard into > > swap. Classic RCU survived another 5 hours of this last night. > > > > But that's a fairly convoluted test for an RCU problem. I expect it > > should be easier to trigger with something more targetted... > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-24 15:32 ` Paul E. McKenney @ 2008-06-24 15:57 ` Ryan Hope 2008-06-24 16:12 ` Paul E. McKenney 0 siblings, 1 reply; 25+ messages in thread From: Ryan Hope @ 2008-06-24 15:57 UTC (permalink / raw) To: paulmck; +Cc: Nick Piggin, Peter Zijlstra, linux-mm, LKML I can give you a list of patches that should correspond to the thread name (for the most part): fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch fix_munlock-page-table-walk.patch migration_entry_wait-fix.patch PATCH collect lru meminfo statistics from correct offset Mlocked field of /proc/meminfo display silly number. because trivial mistake exist in meminfo_read_proc(). You can also look in our git repo to see the code that changed with these patches if you cant track them down in LKML: http://zen-sources.org/cgi-bin/gitweb.cgi?p=kernel-mm.git;a=shortlog;h=refs/heads/lkml On Tue, Jun 24, 2008 at 11:32 AM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, Jun 24, 2008 at 11:12:03AM -0400, Ryan Hope wrote: >> Well i tried to run pure -mm this weekend, it locked as soon as I got >> into gnome so I applied a couple of the bug fixes from lkml and -mm >> seems to be running stable now. I cant seem to get it to hard lock >> now, at least not doing the simple stuff that was causing it to hard >> lock on my other patchset, either the lockless patches expose some bug >> that in -rc6 or lockless requires some other patches further up in the >> -mm series file. > > Cool!!! Any guess as to which of the bug fixes did the trick? > Failing that, a list of the bug fixes that you applied? > > Thanx, Paul > >> On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: >> > On Monday 23 June 2008 23:05, Paul E. McKenney wrote: >> >> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: >> >> > On Monday 23 June 2008 13:51, Ryan Hope wrote: >> >> > > well i get the hardlock on -mm with out using reiser4, i am pretty >> >> > > sure is swap related >> >> > >> >> > The guys seeing hangs don't use PREEMPT_RCU, do they? >> >> > >> >> > In my swapping tests, I found -mm3 to be stable with classic RCU, but >> >> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather >> >> > quickly. First crash was in find_get_pages so I suspected lockless >> >> > pagecache doing something subtly wrong with the RCU API, but I just got >> >> > another crash in __d_lookup: >> >> >> >> Could you please send me a repeat-by? (At least Alexey is no longer >> >> alone!) >> > >> > OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably >> > important to reproduce it (but the fact that I'm reproducing oopses >> > with << PAGE_SIZE objects like dentries and radix tree nodes indicates >> > that there is even more free-before-grace activity going undetected -- >> > if you construct a test case using full pages, it might become even >> > easier to detect with DEBUG_PAGEALLOC). >> > >> > 2 socket, 8 core x86 system. >> > >> > I mounted two tmpfs filesystems, one contains a single large file >> > which is formatted as 1K block size ext3 and mounted loopback, the >> > other is used directly. Linux kernel source is unpacked on each mount >> > and concurrent make -j128 on each. This pushes it pretty hard into >> > swap. Classic RCU survived another 5 hours of this last night. >> > >> > But that's a fairly convoluted test for an RCU problem. I expect it >> > should be easier to trigger with something more targetted... >> > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-24 15:57 ` Ryan Hope @ 2008-06-24 16:12 ` Paul E. McKenney 2008-06-24 16:23 ` Ryan Hope 0 siblings, 1 reply; 25+ messages in thread From: Paul E. McKenney @ 2008-06-24 16:12 UTC (permalink / raw) To: Ryan Hope; +Cc: Nick Piggin, Peter Zijlstra, linux-mm, LKML On Tue, Jun 24, 2008 at 11:57:05AM -0400, Ryan Hope wrote: > I can give you a list of patches that should correspond to the thread > name (for the most part): > > fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch > > fix_munlock-page-table-walk.patch > > migration_entry_wait-fix.patch > > PATCH collect lru meminfo statistics from correct offset > > Mlocked field of /proc/meminfo display silly number. > because trivial mistake exist in meminfo_read_proc(). > > You can also look in our git repo to see the code that changed with > these patches if you cant track them down in LKML: > http://zen-sources.org/cgi-bin/gitweb.cgi?p=kernel-mm.git;a=shortlog;h=refs/heads/lkml Thank you! And is this using Classic RCU or Preemptable RCU? Thanx, Paul > On Tue, Jun 24, 2008 at 11:32 AM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Tue, Jun 24, 2008 at 11:12:03AM -0400, Ryan Hope wrote: > >> Well i tried to run pure -mm this weekend, it locked as soon as I got > >> into gnome so I applied a couple of the bug fixes from lkml and -mm > >> seems to be running stable now. I cant seem to get it to hard lock > >> now, at least not doing the simple stuff that was causing it to hard > >> lock on my other patchset, either the lockless patches expose some bug > >> that in -rc6 or lockless requires some other patches further up in the > >> -mm series file. > > > > Cool!!! Any guess as to which of the bug fixes did the trick? > > Failing that, a list of the bug fixes that you applied? > > > > Thanx, Paul > > > >> On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: > >> > On Monday 23 June 2008 23:05, Paul E. McKenney wrote: > >> >> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: > >> >> > On Monday 23 June 2008 13:51, Ryan Hope wrote: > >> >> > > well i get the hardlock on -mm with out using reiser4, i am pretty > >> >> > > sure is swap related > >> >> > > >> >> > The guys seeing hangs don't use PREEMPT_RCU, do they? > >> >> > > >> >> > In my swapping tests, I found -mm3 to be stable with classic RCU, but > >> >> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather > >> >> > quickly. First crash was in find_get_pages so I suspected lockless > >> >> > pagecache doing something subtly wrong with the RCU API, but I just got > >> >> > another crash in __d_lookup: > >> >> > >> >> Could you please send me a repeat-by? (At least Alexey is no longer > >> >> alone!) > >> > > >> > OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably > >> > important to reproduce it (but the fact that I'm reproducing oopses > >> > with << PAGE_SIZE objects like dentries and radix tree nodes indicates > >> > that there is even more free-before-grace activity going undetected -- > >> > if you construct a test case using full pages, it might become even > >> > easier to detect with DEBUG_PAGEALLOC). > >> > > >> > 2 socket, 8 core x86 system. > >> > > >> > I mounted two tmpfs filesystems, one contains a single large file > >> > which is formatted as 1K block size ext3 and mounted loopback, the > >> > other is used directly. Linux kernel source is unpacked on each mount > >> > and concurrent make -j128 on each. This pushes it pretty hard into > >> > swap. Classic RCU survived another 5 hours of this last night. > >> > > >> > But that's a fairly convoluted test for an RCU problem. I expect it > >> > should be easier to trigger with something more targetted... > >> > > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-24 16:12 ` Paul E. McKenney @ 2008-06-24 16:23 ` Ryan Hope 2008-06-24 18:01 ` Ryan Hope 0 siblings, 1 reply; 25+ messages in thread From: Ryan Hope @ 2008-06-24 16:23 UTC (permalink / raw) To: paulmck; +Cc: Nick Piggin, Peter Zijlstra, linux-mm, LKML I have been using CONFIG_PREEMPT_RCU=Y On Tue, Jun 24, 2008 at 12:12 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Tue, Jun 24, 2008 at 11:57:05AM -0400, Ryan Hope wrote: >> I can give you a list of patches that should correspond to the thread >> name (for the most part): >> >> fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch >> >> fix_munlock-page-table-walk.patch >> >> migration_entry_wait-fix.patch >> >> PATCH collect lru meminfo statistics from correct offset >> >> Mlocked field of /proc/meminfo display silly number. >> because trivial mistake exist in meminfo_read_proc(). >> >> You can also look in our git repo to see the code that changed with >> these patches if you cant track them down in LKML: >> http://zen-sources.org/cgi-bin/gitweb.cgi?p=kernel-mm.git;a=shortlog;h=refs/heads/lkml > > Thank you! And is this using Classic RCU or Preemptable RCU? > > Thanx, Paul > >> On Tue, Jun 24, 2008 at 11:32 AM, Paul E. McKenney >> <paulmck@linux.vnet.ibm.com> wrote: >> > On Tue, Jun 24, 2008 at 11:12:03AM -0400, Ryan Hope wrote: >> >> Well i tried to run pure -mm this weekend, it locked as soon as I got >> >> into gnome so I applied a couple of the bug fixes from lkml and -mm >> >> seems to be running stable now. I cant seem to get it to hard lock >> >> now, at least not doing the simple stuff that was causing it to hard >> >> lock on my other patchset, either the lockless patches expose some bug >> >> that in -rc6 or lockless requires some other patches further up in the >> >> -mm series file. >> > >> > Cool!!! Any guess as to which of the bug fixes did the trick? >> > Failing that, a list of the bug fixes that you applied? >> > >> > Thanx, Paul >> > >> >> On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: >> >> > On Monday 23 June 2008 23:05, Paul E. McKenney wrote: >> >> >> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: >> >> >> > On Monday 23 June 2008 13:51, Ryan Hope wrote: >> >> >> > > well i get the hardlock on -mm with out using reiser4, i am pretty >> >> >> > > sure is swap related >> >> >> > >> >> >> > The guys seeing hangs don't use PREEMPT_RCU, do they? >> >> >> > >> >> >> > In my swapping tests, I found -mm3 to be stable with classic RCU, but >> >> >> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather >> >> >> > quickly. First crash was in find_get_pages so I suspected lockless >> >> >> > pagecache doing something subtly wrong with the RCU API, but I just got >> >> >> > another crash in __d_lookup: >> >> >> >> >> >> Could you please send me a repeat-by? (At least Alexey is no longer >> >> >> alone!) >> >> > >> >> > OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably >> >> > important to reproduce it (but the fact that I'm reproducing oopses >> >> > with << PAGE_SIZE objects like dentries and radix tree nodes indicates >> >> > that there is even more free-before-grace activity going undetected -- >> >> > if you construct a test case using full pages, it might become even >> >> > easier to detect with DEBUG_PAGEALLOC). >> >> > >> >> > 2 socket, 8 core x86 system. >> >> > >> >> > I mounted two tmpfs filesystems, one contains a single large file >> >> > which is formatted as 1K block size ext3 and mounted loopback, the >> >> > other is used directly. Linux kernel source is unpacked on each mount >> >> > and concurrent make -j128 on each. This pushes it pretty hard into >> >> > swap. Classic RCU survived another 5 hours of this last night. >> >> > >> >> > But that's a fairly convoluted test for an RCU problem. I expect it >> >> > should be easier to trigger with something more targetted... >> >> > >> > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-24 16:23 ` Ryan Hope @ 2008-06-24 18:01 ` Ryan Hope 0 siblings, 0 replies; 25+ messages in thread From: Ryan Hope @ 2008-06-24 18:01 UTC (permalink / raw) To: paulmck; +Cc: Nick Piggin, Peter Zijlstra, linux-mm, LKML I just a report of someone getting a hardlock while building boost, he was using classic RCU and no swap. On Tue, Jun 24, 2008 at 12:23 PM, Ryan Hope <rmh3093@gmail.com> wrote: > I have been using CONFIG_PREEMPT_RCU=Y > > On Tue, Jun 24, 2008 at 12:12 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: >> On Tue, Jun 24, 2008 at 11:57:05AM -0400, Ryan Hope wrote: >>> I can give you a list of patches that should correspond to the thread >>> name (for the most part): >>> >>> fix-double-unlock_page-in-2626-rc5-mm3-kernel-bug-at-mm-filemapc-575.patch >>> >>> fix_munlock-page-table-walk.patch >>> >>> migration_entry_wait-fix.patch >>> >>> PATCH collect lru meminfo statistics from correct offset >>> >>> Mlocked field of /proc/meminfo display silly number. >>> because trivial mistake exist in meminfo_read_proc(). >>> >>> You can also look in our git repo to see the code that changed with >>> these patches if you cant track them down in LKML: >>> http://zen-sources.org/cgi-bin/gitweb.cgi?p=kernel-mm.git;a=shortlog;h=refs/heads/lkml >> >> Thank you! And is this using Classic RCU or Preemptable RCU? >> >> Thanx, Paul >> >>> On Tue, Jun 24, 2008 at 11:32 AM, Paul E. McKenney >>> <paulmck@linux.vnet.ibm.com> wrote: >>> > On Tue, Jun 24, 2008 at 11:12:03AM -0400, Ryan Hope wrote: >>> >> Well i tried to run pure -mm this weekend, it locked as soon as I got >>> >> into gnome so I applied a couple of the bug fixes from lkml and -mm >>> >> seems to be running stable now. I cant seem to get it to hard lock >>> >> now, at least not doing the simple stuff that was causing it to hard >>> >> lock on my other patchset, either the lockless patches expose some bug >>> >> that in -rc6 or lockless requires some other patches further up in the >>> >> -mm series file. >>> > >>> > Cool!!! Any guess as to which of the bug fixes did the trick? >>> > Failing that, a list of the bug fixes that you applied? >>> > >>> > Thanx, Paul >>> > >>> >> On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote: >>> >> > On Monday 23 June 2008 23:05, Paul E. McKenney wrote: >>> >> >> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote: >>> >> >> > On Monday 23 June 2008 13:51, Ryan Hope wrote: >>> >> >> > > well i get the hardlock on -mm with out using reiser4, i am pretty >>> >> >> > > sure is swap related >>> >> >> > >>> >> >> > The guys seeing hangs don't use PREEMPT_RCU, do they? >>> >> >> > >>> >> >> > In my swapping tests, I found -mm3 to be stable with classic RCU, but >>> >> >> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather >>> >> >> > quickly. First crash was in find_get_pages so I suspected lockless >>> >> >> > pagecache doing something subtly wrong with the RCU API, but I just got >>> >> >> > another crash in __d_lookup: >>> >> >> >>> >> >> Could you please send me a repeat-by? (At least Alexey is no longer >>> >> >> alone!) >>> >> > >>> >> > OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably >>> >> > important to reproduce it (but the fact that I'm reproducing oopses >>> >> > with << PAGE_SIZE objects like dentries and radix tree nodes indicates >>> >> > that there is even more free-before-grace activity going undetected -- >>> >> > if you construct a test case using full pages, it might become even >>> >> > easier to detect with DEBUG_PAGEALLOC). >>> >> > >>> >> > 2 socket, 8 core x86 system. >>> >> > >>> >> > I mounted two tmpfs filesystems, one contains a single large file >>> >> > which is formatted as 1K block size ext3 and mounted loopback, the >>> >> > other is used directly. Linux kernel source is unpacked on each mount >>> >> > and concurrent make -j128 on each. This pushes it pretty hard into >>> >> > swap. Classic RCU survived another 5 hours of this last night. >>> >> > >>> >> > But that's a fairly convoluted test for an RCU problem. I expect it >>> >> > should be easier to trigger with something more targetted... >>> >> > >>> > >> > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [BUG] Lockless patches cause hardlock under heavy IO 2008-06-23 2:29 ` Nick Piggin 2008-06-23 3:51 ` Ryan Hope @ 2008-06-23 23:48 ` Zan Lynx 1 sibling, 0 replies; 25+ messages in thread From: Zan Lynx @ 2008-06-23 23:48 UTC (permalink / raw) To: Nick Piggin; +Cc: Ryan Hope, Peter Zijlstra, linux-mm, LKML Nick Piggin wrote: > On Monday 23 June 2008 01:18, Ryan Hope wrote: >> Well in the current version of the patchset we are using, one user >> would start playing some game (disabling "Disable Heap Randomization" >> fixed the hardlocks for him... the other user got hardlocks when >> copying an ISO from a reiser4 partition to a reiserfs partition >> (disabling swap fixed the issue for him). > > Hmm, nobody has reported such a hang with -mm yet, so maybe it > is another interaction in the patchset. OTOH, probably nobody > much uses -mm and reiser4, and reiser4 does lots of weird fiddling > with pagecache so it could be broken in -mm even. I have been running and testing -mm with reiser4 for a couple years now. I haven't been able to run -mm recently because I've been hitting the copy_user bug Linus fixed for AMD64 but Andrew hasn't updated yet (no complaints, Andrew) and I'm too lazy to manually patch. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2008-06-24 18:01 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-18 21:15 [BUG] Lockless patches cause hardlock under heavy IO Ryan Hope 2008-06-18 21:28 ` Arjan van de Ven 2008-06-19 14:45 ` Ryan Hope 2008-06-20 0:05 ` Arjan van de Ven 2008-06-19 8:12 ` Peter Zijlstra 2008-06-19 8:19 ` Nick Piggin 2008-06-19 14:52 ` Ryan Hope 2008-06-19 20:31 ` Ryan Hope 2008-06-20 14:33 ` Ryan Hope 2008-06-22 14:37 ` Ryan Hope 2008-06-22 15:07 ` Peter Zijlstra 2008-06-22 15:18 ` Ryan Hope 2008-06-23 2:29 ` Nick Piggin 2008-06-23 3:51 ` Ryan Hope 2008-06-23 3:56 ` Nick Piggin 2008-06-23 11:54 ` Nick Piggin 2008-06-23 13:05 ` Paul E. McKenney 2008-06-24 0:13 ` Nick Piggin 2008-06-24 15:12 ` Ryan Hope 2008-06-24 15:32 ` Paul E. McKenney 2008-06-24 15:57 ` Ryan Hope 2008-06-24 16:12 ` Paul E. McKenney 2008-06-24 16:23 ` Ryan Hope 2008-06-24 18:01 ` Ryan Hope 2008-06-23 23:48 ` Zan Lynx
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox