* Problems with swapping in v4.5-rc on POWER @ 2016-02-25 2:10 Hugh Dickins 2016-02-25 4:12 ` Michael Ellerman 2016-02-25 4:52 ` Aneesh Kumar K.V 0 siblings, 2 replies; 12+ messages in thread From: Hugh Dickins @ 2016-02-25 2:10 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, linux-mm I've plagiarized the subject from Paulus's "Problems with THP" mail last weekend; but my similar problems are on PowerMac G5 baremetal, with 4kB pages, not capable of THP and no THP configured in. Under heavily swapping load, running kernel builds on tmpfs in limited memory, I've been seeing random segfaults too, internal compiler errors etc. Not easily reproduced: sometimes happens in minutes, sometimes not for several hours. I tried and failed to construct a reproducer for you: my lack of a good recipe has deterred me from reporting it, and seeing Paulus's mail on THP gave me hope that the answer would come up in that thread; but no, that was quickly resolved as a THP issue, since fixed. (Mine had appeared to be fixed in v4.5-rc4 anyway; but I guess I just didn't try hard enough, it resurfaced on -rc5 immediately.) I've seen no sign of such problems on x86. And I saw no sign of such problems on v4.4-rc8-mm1, when I included the fixes to the _PAGE_PTE and _PAGE_SWP_SOFT_DIRTY swapoff issues we discussed back then (in 33 hours of load, should be good enough; but did see such problems a couple of times before including those fixes - I took them to be a side-effect of the page flags issue, but now rather doubt that). The minutes or hours thing: I wonder if that indicates a missing initialization somewhere: that can easily show up soon after booting, but then the machine settles into a steady state of reusing the same structures, now initialized; until much later something disturbs the state and it has to allocate more. Sheer speculation, but I wonder. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-02-25 2:10 Problems with swapping in v4.5-rc on POWER Hugh Dickins @ 2016-02-25 4:12 ` Michael Ellerman 2016-02-25 5:36 ` Hugh Dickins 2016-02-25 4:52 ` Aneesh Kumar K.V 1 sibling, 1 reply; 12+ messages in thread From: Michael Ellerman @ 2016-02-25 4:12 UTC (permalink / raw) To: Hugh Dickins, Aneesh Kumar K.V; +Cc: linux-mm, linuxppc-dev On Wed, 2016-02-24 at 18:10 -0800, Hugh Dickins via Linuxppc-dev wrote: > I've plagiarized the subject from Paulus's "Problems with THP" mail > last weekend; but my similar problems are on PowerMac G5 baremetal, > with 4kB pages, not capable of THP and no THP configured in. > > Under heavily swapping load, running kernel builds on tmpfs in limited > memory, I've been seeing random segfaults too, internal compiler errors > etc. Not easily reproduced: sometimes happens in minutes, sometimes > not for several hours. > > I tried and failed to construct a reproducer for you: my lack of a good > recipe has deterred me from reporting it, and seeing Paulus's mail on > THP gave me hope that the answer would come up in that thread; but no, > that was quickly resolved as a THP issue, since fixed. > > (Mine had appeared to be fixed in v4.5-rc4 anyway; but I guess I > just didn't try hard enough, it resurfaced on -rc5 immediately.) > > I've seen no sign of such problems on x86. And I saw no sign of such > problems on v4.4-rc8-mm1, when I included the fixes to the _PAGE_PTE > and _PAGE_SWP_SOFT_DIRTY swapoff issues we discussed back then (in > 33 hours of load, should be good enough; but did see such problems > a couple of times before including those fixes - I took them to be > a side-effect of the page flags issue, but now rather doubt that). > > The minutes or hours thing: I wonder if that indicates a missing > initialization somewhere: that can easily show up soon after booting, > but then the machine settles into a steady state of reusing the same > structures, now initialized; until much later something disturbs the > state and it has to allocate more. Sheer speculation, but I wonder. Thanks Hugh. I do run tests on G5, but obviously not rigorously enough. I kicked off a few kernel builds on mine and it survived, though once it hits swap it's almost unusably slow. I'll leave it running overnight and see if I hit anything. cheers -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-02-25 4:12 ` Michael Ellerman @ 2016-02-25 5:36 ` Hugh Dickins 0 siblings, 0 replies; 12+ messages in thread From: Hugh Dickins @ 2016-02-25 5:36 UTC (permalink / raw) To: Michael Ellerman; +Cc: Hugh Dickins, Aneesh Kumar K.V, linux-mm, linuxppc-dev On Thu, 25 Feb 2016, Michael Ellerman wrote: > > I do run tests on G5, but obviously not rigorously enough. I kicked off a few > kernel builds on mine and it survived, though once it hits swap it's almost > unusably slow. I'll leave it running overnight and see if I hit anything. Oh yes, I'd forgotten how unusably slow: I tend to forget that I slipped an SSD in there some while back, just for the swapping: slow, but not unusable. Thanks, I'm hoping you will be able to reproduce it yourself. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-02-25 2:10 Problems with swapping in v4.5-rc on POWER Hugh Dickins 2016-02-25 4:12 ` Michael Ellerman @ 2016-02-25 4:52 ` Aneesh Kumar K.V 2016-02-25 5:43 ` Hugh Dickins 1 sibling, 1 reply; 12+ messages in thread From: Aneesh Kumar K.V @ 2016-02-25 4:52 UTC (permalink / raw) To: Hugh Dickins; +Cc: Paul Mackerras, linuxppc-dev, linux-mm Hugh Dickins <hughd@google.com> writes: > I've plagiarized the subject from Paulus's "Problems with THP" mail > last weekend; but my similar problems are on PowerMac G5 baremetal, > with 4kB pages, not capable of THP and no THP configured in. > > Under heavily swapping load, running kernel builds on tmpfs in limited > memory, I've been seeing random segfaults too, internal compiler errors > etc. Not easily reproduced: sometimes happens in minutes, sometimes > not for several hours. > > I tried and failed to construct a reproducer for you: my lack of a good > recipe has deterred me from reporting it, and seeing Paulus's mail on > THP gave me hope that the answer would come up in that thread; but no, > that was quickly resolved as a THP issue, since fixed. > > (Mine had appeared to be fixed in v4.5-rc4 anyway; but I guess I > just didn't try hard enough, it resurfaced on -rc5 immediately.) > > I've seen no sign of such problems on x86. And I saw no sign of such > problems on v4.4-rc8-mm1, when I included the fixes to the _PAGE_PTE > and _PAGE_SWP_SOFT_DIRTY swapoff issues we discussed back then (in > 33 hours of load, should be good enough; but did see such problems > a couple of times before including those fixes - I took them to be > a side-effect of the page flags issue, but now rather doubt that). > Can you test the impact of the merge listed below ?(ie, revert the merge and see if we can reproduce and also verify with merge applied). This will give us a set of commits to look closer. We had quiet a lot of page table related changes going in this merge window. f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") That is the merge commit that added _PAGE_PTE. > The minutes or hours thing: I wonder if that indicates a missing > initialization somewhere: that can easily show up soon after booting, > but then the machine settles into a steady state of reusing the same > structures, now initialized; until much later something disturbs the > state and it has to allocate more. Sheer speculation, but I wonder. > -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-02-25 4:52 ` Aneesh Kumar K.V @ 2016-02-25 5:43 ` Hugh Dickins 2016-02-25 21:35 ` Hugh Dickins 0 siblings, 1 reply; 12+ messages in thread From: Hugh Dickins @ 2016-02-25 5:43 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Hugh Dickins, Paul Mackerras, linuxppc-dev, linux-mm On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > Can you test the impact of the merge listed below ?(ie, revert the merge and see if > we can reproduce and also verify with merge applied). This will give us a > set of commits to look closer. We had quiet a lot of page table > related changes going in this merge window. > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > That is the merge commit that added _PAGE_PTE. Another experiment running on it at the moment, I'd like to give that a few more hours, and then will try the revert you suggest. But does that merge revert cleanly, did you try? I'm afraid of interactions, whether obvious or subtle, with the THP refcounting rework. Oh, since I don't have THP configured on, maybe I can ignore any issues from that. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-02-25 5:43 ` Hugh Dickins @ 2016-02-25 21:35 ` Hugh Dickins 2016-02-26 10:04 ` Hugh Dickins 0 siblings, 1 reply; 12+ messages in thread From: Hugh Dickins @ 2016-02-25 21:35 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Michael Ellerman, Paul Mackerras, linuxppc-dev, linux-mm On Wed, 24 Feb 2016, Hugh Dickins wrote: > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > Can you test the impact of the merge listed below ?(ie, revert the merge and see if > > we can reproduce and also verify with merge applied). This will give us a > > set of commits to look closer. We had quiet a lot of page table > > related changes going in this merge window. > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > That is the merge commit that added _PAGE_PTE. > > Another experiment running on it at the moment, I'd like to give that > a few more hours, and then will try the revert you suggest. But does > that merge revert cleanly, did you try? I'm afraid of interactions, > whether obvious or subtle, with the THP refcounting rework. Oh, since > I don't have THP configured on, maybe I can ignore any issues from that. That revert worked painlessly, only a very few and simple conflicts, I ran that under load for 12 hours, no problem seen. I've now checked out an f689b742 tree and started on that, just to confirm that it fails fairly quickly I hope; and will then proceed to git bisect, giving that as bad and 37cea93b as good. Given the uncertainty of whether 12 hours is really long enough to be sure, and perhaps difficulties along the way, I don't rate my chances of a reliable bisection higher than 60%, but we'll see. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-02-25 21:35 ` Hugh Dickins @ 2016-02-26 10:04 ` Hugh Dickins 2016-03-02 20:49 ` Hugh Dickins 0 siblings, 1 reply; 12+ messages in thread From: Hugh Dickins @ 2016-02-26 10:04 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Michael Ellerman, Paul Mackerras, linuxppc-dev, linux-mm On Thu, 25 Feb 2016, Hugh Dickins wrote: > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > Can you test the impact of the merge listed below ?(ie, revert the merge and see if > > > we can reproduce and also verify with merge applied). This will give us a > > > set of commits to look closer. We had quiet a lot of page table > > > related changes going in this merge window. > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > That is the merge commit that added _PAGE_PTE. > > > > Another experiment running on it at the moment, I'd like to give that > > a few more hours, and then will try the revert you suggest. But does > > that merge revert cleanly, did you try? I'm afraid of interactions, > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > I don't have THP configured on, maybe I can ignore any issues from that. > > That revert worked painlessly, only a very few and simple conflicts, > I ran that under load for 12 hours, no problem seen. > > I've now checked out an f689b742 tree and started on that, just to > confirm that it fails fairly quickly I hope; and will then proceed > to git bisect, giving that as bad and 37cea93b as good. > > Given the uncertainty of whether 12 hours is really long enough to be > sure, and perhaps difficulties along the way, I don't rate my chances > of a reliable bisection higher than 60%, but we'll see. I'm sure you won't want a breathless report from me on each bisection step, but I ought to report that: contrary to our expectations, the f689b742 survived without error for 12 hours, so appears to be good. I'll bisect between there and v4.5-rc1. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-02-26 10:04 ` Hugh Dickins @ 2016-03-02 20:49 ` Hugh Dickins 2016-03-03 5:51 ` Michael Ellerman 0 siblings, 1 reply; 12+ messages in thread From: Hugh Dickins @ 2016-03-02 20:49 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Hugh Dickins, Michael Ellerman, Paul Mackerras, linuxppc-dev, linux-mm On Fri, 26 Feb 2016, Hugh Dickins wrote: > On Thu, 25 Feb 2016, Hugh Dickins wrote: > > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > > > Can you test the impact of the merge listed below ?(ie, revert the merge and see if > > > > we can reproduce and also verify with merge applied). This will give us a > > > > set of commits to look closer. We had quiet a lot of page table > > > > related changes going in this merge window. > > > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > > > That is the merge commit that added _PAGE_PTE. > > > > > > Another experiment running on it at the moment, I'd like to give that > > > a few more hours, and then will try the revert you suggest. But does > > > that merge revert cleanly, did you try? I'm afraid of interactions, > > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > > I don't have THP configured on, maybe I can ignore any issues from that. > > > > That revert worked painlessly, only a very few and simple conflicts, > > I ran that under load for 12 hours, no problem seen. > > > > I've now checked out an f689b742 tree and started on that, just to > > confirm that it fails fairly quickly I hope; and will then proceed > > to git bisect, giving that as bad and 37cea93b as good. > > > > Given the uncertainty of whether 12 hours is really long enough to be > > sure, and perhaps difficulties along the way, I don't rate my chances > > of a reliable bisection higher than 60%, but we'll see. > > I'm sure you won't want a breathless report from me on each bisection > step, but I ought to report that: contrary to our expectations, the > f689b742 survived without error for 12 hours, so appears to be good. > I'll bisect between there and v4.5-rc1. The bisection completed this morning (log appended below): not a satisfactory conclusion, it's pointing to a davem/net merge. I was uncomfortable when I marked that point bad in the first place: it ran for 9 hours before hitting a compiler error, which was nearly twice as long as the longest I'd seen before (5 hours), and uncomfortably close to the 12 hours I've been taking as good. My current thinking is that the powerpc merge that you indicated, that I found to be "good", is the one that contains the bad commit; but that the bug is very rare to manifest in that kernel, and my test of the davem/net merge happened to be unusually unlucky to hit it. Then some other later change makes it significantly easier to hit; and identifying that change may make it much easier to pin down what the original bug is. So I've replayed the bisection up to that point, marked the davem/net merge as good this time, and set off again in the hope that it will lead somewhere more enlightening. But prepared for disappointment. Hugh git bisect start # good: [f689b742f217b2ffe7925f8a6521b208ee995309] Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good f689b742f217b2ffe7925f8a6521b208ee995309 # bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1 git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d # bad: [7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide git bisect bad 7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015 # bad: [6606b342febfd470b4a33acb73e360eeaca1d9bb] Merge git://www.linux-watchdog.org/linux-watchdog git bisect bad 6606b342febfd470b4a33acb73e360eeaca1d9bb # good: [d0021d3bdfe9d551859bca1f58da0e6be8e26043] Merge remote-tracking branch 'asoc/topic/wm8960' into asoc-next git bisect good d0021d3bdfe9d551859bca1f58da0e6be8e26043 # good: [e3315b439c30c208582ac64e58f0c0d36b83181e] ALSA: oxfw: allocate own address region for SCS.1 series git bisect good e3315b439c30c208582ac64e58f0c0d36b83181e # good: [3da834e3e5a4a5d26882955298b55a9ed37a00bc] clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig git bisect good 3da834e3e5a4a5d26882955298b55a9ed37a00bc # bad: [e535d74bc50df2357d3253f8f3ca48c66d0d892a] Merge tag 'docs-4.5' of git://git.lwn.net/linux git bisect bad e535d74bc50df2357d3253f8f3ca48c66d0d892a # bad: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect bad 4e5448a31d73d0e944b7adb9049438a09bc332cb # good: [b70ce2ab41cb67ab3d661eda078f7c4029bbca95] dts: hisi: fixes no syscon fault when init mdio git bisect good b70ce2ab41cb67ab3d661eda078f7c4029bbca95 # good: [4a658527271bce43afb1cf4feec89afe6716ca59] xen-netback: delete NAPI instance when queue fails to initialize git bisect good 4a658527271bce43afb1cf4feec89afe6716ca59 # good: [c6894dec8ea9ae05747124dce98b3b5c2e69b168] bridge: fix lockdep addr_list_lock false positive splat git bisect good c6894dec8ea9ae05747124dce98b3b5c2e69b168 # good: [36beca6571c941b28b0798667608239731f9bc3a] sparc64: Fix numa node distance initialization git bisect good 36beca6571c941b28b0798667608239731f9bc3a # good: [750afbf8ee9c6a1c74a1fe5fc9852146b1d72687] bgmac: Fix reversed test of build_skb() return value. git bisect good 750afbf8ee9c6a1c74a1fe5fc9852146b1d72687 # good: [5a18d263f8d27418c98b8e8551dadfe975c054e3] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc git bisect good 5a18d263f8d27418c98b8e8551dadfe975c054e3 # first bad commit: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-03-02 20:49 ` Hugh Dickins @ 2016-03-03 5:51 ` Michael Ellerman 2016-03-04 17:58 ` Hugh Dickins 0 siblings, 1 reply; 12+ messages in thread From: Michael Ellerman @ 2016-03-03 5:51 UTC (permalink / raw) To: Hugh Dickins, Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, linux-mm On Wed, 2016-03-02 at 12:49 -0800, Hugh Dickins wrote: > On Fri, 26 Feb 2016, Hugh Dickins wrote: > > On Thu, 25 Feb 2016, Hugh Dickins wrote: > > > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > > > > > Can you test the impact of the merge listed below ?(ie, revert the merge and see if > > > > > we can reproduce and also verify with merge applied). This will give us a > > > > > set of commits to look closer. We had quiet a lot of page table > > > > > related changes going in this merge window. > > > > > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > > > > > That is the merge commit that added _PAGE_PTE. > > > > > > > > Another experiment running on it at the moment, I'd like to give that > > > > a few more hours, and then will try the revert you suggest. But does > > > > that merge revert cleanly, did you try? I'm afraid of interactions, > > > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > > > I don't have THP configured on, maybe I can ignore any issues from that. > > > > > > That revert worked painlessly, only a very few and simple conflicts, > > > I ran that under load for 12 hours, no problem seen. > > > > > > I've now checked out an f689b742 tree and started on that, just to > > > confirm that it fails fairly quickly I hope; and will then proceed > > > to git bisect, giving that as bad and 37cea93b as good. > > > > > > Given the uncertainty of whether 12 hours is really long enough to be > > > sure, and perhaps difficulties along the way, I don't rate my chances > > > of a reliable bisection higher than 60%, but we'll see. > > > > I'm sure you won't want a breathless report from me on each bisection > > step, but I ought to report that: contrary to our expectations, the > > f689b742 survived without error for 12 hours, so appears to be good. > > I'll bisect between there and v4.5-rc1. > > The bisection completed this morning (log appended below): > not a satisfactory conclusion, it's pointing to a davem/net merge. > > I was uncomfortable when I marked that point bad in the first place: > it ran for 9 hours before hitting a compiler error, which was nearly > twice as long as the longest I'd seen before (5 hours), and > uncomfortably close to the 12 hours I've been taking as good. > > My current thinking is that the powerpc merge that you indicated, > that I found to be "good", is the one that contains the bad commit; > but that the bug is very rare to manifest in that kernel, and my test > of the davem/net merge happened to be unusually unlucky to hit it. > > Then some other later change makes it significantly easier to hit; > and identifying that change may make it much easier to pin down > what the original bug is. > > So I've replayed the bisection up to that point, marked the davem/net > merge as good this time, and set off again in the hope that it will > lead somewhere more enlightening. But prepared for disappointment. Thanks Hugh. That logic sounds reasonable, I doubt we can blame davem :) I've setup another box here to try and reproduce it. It's running with 4k pages, no THP, and it's going well into swap. Hopefully I can hit the same bug, but we'll see in 12 hours I guess. cheers -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-03-03 5:51 ` Michael Ellerman @ 2016-03-04 17:58 ` Hugh Dickins 2016-03-07 3:00 ` Michael Ellerman 0 siblings, 1 reply; 12+ messages in thread From: Hugh Dickins @ 2016-03-04 17:58 UTC (permalink / raw) To: Michael Ellerman Cc: Hugh Dickins, Aneesh Kumar K.V, Paul Mackerras, linuxppc-dev, linux-mm On Thu, 3 Mar 2016, Michael Ellerman wrote: > On Wed, 2016-03-02 at 12:49 -0800, Hugh Dickins wrote: > > On Fri, 26 Feb 2016, Hugh Dickins wrote: > > > On Thu, 25 Feb 2016, Hugh Dickins wrote: > > > > On Wed, 24 Feb 2016, Hugh Dickins wrote: > > > > > On Thu, 25 Feb 2016, Aneesh Kumar K.V wrote: > > > > > > > > > > > > Can you test the impact of the merge listed below ?(ie, revert the merge and see if > > > > > > we can reproduce and also verify with merge applied). This will give us a > > > > > > set of commits to look closer. We had quiet a lot of page table > > > > > > related changes going in this merge window. > > > > > > > > > > > > f689b742f217b2ffe7 ("Pull powerpc updates from Michael Ellerman:") > > > > > > > > > > > > That is the merge commit that added _PAGE_PTE. > > > > > > > > > > Another experiment running on it at the moment, I'd like to give that > > > > > a few more hours, and then will try the revert you suggest. But does > > > > > that merge revert cleanly, did you try? I'm afraid of interactions, > > > > > whether obvious or subtle, with the THP refcounting rework. Oh, since > > > > > I don't have THP configured on, maybe I can ignore any issues from that. > > > > > > > > That revert worked painlessly, only a very few and simple conflicts, > > > > I ran that under load for 12 hours, no problem seen. > > > > > > > > I've now checked out an f689b742 tree and started on that, just to > > > > confirm that it fails fairly quickly I hope; and will then proceed > > > > to git bisect, giving that as bad and 37cea93b as good. > > > > > > > > Given the uncertainty of whether 12 hours is really long enough to be > > > > sure, and perhaps difficulties along the way, I don't rate my chances > > > > of a reliable bisection higher than 60%, but we'll see. > > > > > > I'm sure you won't want a breathless report from me on each bisection > > > step, but I ought to report that: contrary to our expectations, the > > > f689b742 survived without error for 12 hours, so appears to be good. > > > I'll bisect between there and v4.5-rc1. > > > > The bisection completed this morning (log appended below): > > not a satisfactory conclusion, it's pointing to a davem/net merge. > > > > I was uncomfortable when I marked that point bad in the first place: > > it ran for 9 hours before hitting a compiler error, which was nearly > > twice as long as the longest I'd seen before (5 hours), and > > uncomfortably close to the 12 hours I've been taking as good. > > > > My current thinking is that the powerpc merge that you indicated, > > that I found to be "good", is the one that contains the bad commit; > > but that the bug is very rare to manifest in that kernel, and my test > > of the davem/net merge happened to be unusually unlucky to hit it. > > > > Then some other later change makes it significantly easier to hit; > > and identifying that change may make it much easier to pin down > > what the original bug is. > > > > So I've replayed the bisection up to that point, marked the davem/net > > merge as good this time, and set off again in the hope that it will > > lead somewhere more enlightening. But prepared for disappointment. > > Thanks Hugh. That logic sounds reasonable, I doubt we can blame davem :) > > I've setup another box here to try and reproduce it. It's running with 4k > pages, no THP, and it's going well into swap. Hopefully I can hit the same bug, > but we'll see in 12 hours I guess. The alternative bisection was as unsatisfactory as the first: again it fingered an irrelevant merge (rather than any commit pulled in by that merge) as the bad commit. It seems this issue is too intermittent for bisection to be useful, on my load anyway. The best I can do now is try v4.4 for a couple of days, to verify that still comes out good (rather than the machine going bad coincident with v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad. I'll report back on those; but beyond that, I'll have to leave it to you. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-03-04 17:58 ` Hugh Dickins @ 2016-03-07 3:00 ` Michael Ellerman 2016-03-08 11:49 ` Hugh Dickins 0 siblings, 1 reply; 12+ messages in thread From: Michael Ellerman @ 2016-03-07 3:00 UTC (permalink / raw) To: Hugh Dickins; +Cc: Aneesh Kumar K.V, Paul Mackerras, linuxppc-dev, linux-mm On Fri, 2016-03-04 at 09:58 -0800, Hugh Dickins wrote: > > The alternative bisection was as unsatisfactory as the first: > again it fingered an irrelevant merge (rather than any commit > pulled in by that merge) as the bad commit. > > It seems this issue is too intermittent for bisection to be useful, > on my load anyway. Darn. Thanks for trying. > The best I can do now is try v4.4 for a couple of days, to verify that > still comes out good (rather than the machine going bad coincident with > v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad. Thanks, that would still be helpful. > I'll report back on those; but beyond that, I'll have to leave it to you. I haven't had any luck here :/ Can you give us a more verbose description of your test setup? - G5, which exact model? - 4k pages, no THP. - how much ram & swap? - building linus' tree, make -j ? - source and output on tmpfs? (how big?) - what device is the swap device? (you said SSD I think?) - anything else I've forgotten? Oh and can you send us your bisect logs, we can at least trust the bad results I think. cheers -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Problems with swapping in v4.5-rc on POWER 2016-03-07 3:00 ` Michael Ellerman @ 2016-03-08 11:49 ` Hugh Dickins 0 siblings, 0 replies; 12+ messages in thread From: Hugh Dickins @ 2016-03-08 11:49 UTC (permalink / raw) To: Michael Ellerman Cc: Hugh Dickins, Aneesh Kumar K.V, Paul Mackerras, linuxppc-dev, linux-mm On Mon, 7 Mar 2016, Michael Ellerman wrote: > On Fri, 2016-03-04 at 09:58 -0800, Hugh Dickins wrote: > > > > The alternative bisection was as unsatisfactory as the first: > > again it fingered an irrelevant merge (rather than any commit > > pulled in by that merge) as the bad commit. > > > > It seems this issue is too intermittent for bisection to be useful, > > on my load anyway. > > Darn. Thanks for trying. > > > The best I can do now is try v4.4 for a couple of days, to verify that > > still comes out good (rather than the machine going bad coincident with > > v4.5-rc), then try v4.5-rc7 to verify that that still comes out bad. > > Thanks, that would still be helpful. v4.4 ran under load for 56 hours without any trouble, before I stopped it to switch kernels. v4.5-rc7 ran for 19.5 hours, then hit the problem (sigsegv in "as" on this occasion). > > > I'll report back on those; but beyond that, I'll have to leave it to you. > > I haven't had any luck here :/ > > Can you give us a more verbose description of your test setup? I'll be a lot more terse than you'd like, not much time to spare. If I had a good reproducer, then of course I should specify it exactly to you; but no, 19.5 hours or 5 hours or a few minutes, that does not amount to a good reproducer. > > - G5, which exact model? /proc/cpuinfo says: processor : 0 cpu : PPC970MP, altivec supported clock : 2500.000000MHz revision : 1.1 (pvr 0044 0101) processor : 1 cpu : PPC970MP, altivec supported clock : 2500.000000MHz revision : 1.1 (pvr 0044 0101) processor : 2 cpu : PPC970MP, altivec supported clock : 2500.000000MHz revision : 1.1 (pvr 0044 0101) processor : 3 cpu : PPC970MP, altivec supported clock : 2500.000000MHz revision : 1.1 (pvr 0044 0101) timebase : 33333333 platform : PowerMac model : PowerMac11,2 machine : PowerMac11,2 motherboard : PowerMac11,2 MacRISC4 Power Macintosh detected as : 337 (PowerMac G5 Dual Core) pmac flags : 00000000 L2 cache : 1024K unified pmac-generation : NewWorld > - 4k pages, no THP. Yes. > - how much ram & swap? I boot with mem=700M, and use 1.5G swap. > - building linus' tree, make -j ? Building an old 2.6.24 tree (which had a higher source to built ratio than nowadays; with patches to get it to build with more recent toolchain, from openSUSE 13.1); building some config I used to run on that machine. Building two of them, each make -j20, concurrently: one in tmpfs, one in 4kB-blocksize ext4 on loop on tmpfs file. But I doubt that complication is relevant here: sometimes it's the build in tmpfs that collapses, sometimes the build in ext4, it's fairly even which. (Do not bother to attempt such a load on linux-next, only on v4.5: the OOM rework in mmotm has an unsolved problem with order=2 allocations, which means that such a load will be OOM-killed very quickly.) > - source and output on tmpfs? (how big?) One source and output in ext4 on loop on file filling 470M tmpfs. Other source and output in tmpfs on /tmp which I happen to size at 1300M (but could be half that). Sizes of course fitted to that source tree and config I happen to be building. > - what device is the swap device? (you said SSD I think?) Old 75G Intel SSD: ata2.00: ATA-7: INTEL SSDSA2M080G2GN, 2CV102HD, max UDMA/133 > - anything else I've forgotten? I happen to run with /proc/sys/vm/swappiness 100, merely because it's swapping that I'm trying to exercise. I doubt that any of the details above are important: plenty of swapping is probably the only message (and doing everything in tmpfs in limited memory is a good way to force plenty of swapping). > > Oh and can you send us your bisect logs, we can at least trust the bad results > I think. Remember that both of these bisections started from 4.5-rc1 as bad, and f689b742f217, the powerpc merge, as good - since I didn't see a problem at that commit in 12 hours. But we all suspect that in fact something in that powerpc merge was actually the bad. git bisect start # good: [f689b742f217b2ffe7925f8a6521b208ee995309] Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good f689b742f217b2ffe7925f8a6521b208ee995309 # bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1 git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d # bad: [7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide git bisect bad 7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015 # bad: [6606b342febfd470b4a33acb73e360eeaca1d9bb] Merge git://www.linux-watchdog.org/linux-watchdog git bisect bad 6606b342febfd470b4a33acb73e360eeaca1d9bb # good: [d0021d3bdfe9d551859bca1f58da0e6be8e26043] Merge remote-tracking branch 'asoc/topic/wm8960' into asoc-next git bisect good d0021d3bdfe9d551859bca1f58da0e6be8e26043 # good: [e3315b439c30c208582ac64e58f0c0d36b83181e] ALSA: oxfw: allocate own address region for SCS.1 series git bisect good e3315b439c30c208582ac64e58f0c0d36b83181e # good: [3da834e3e5a4a5d26882955298b55a9ed37a00bc] clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig git bisect good 3da834e3e5a4a5d26882955298b55a9ed37a00bc # bad: [e535d74bc50df2357d3253f8f3ca48c66d0d892a] Merge tag 'docs-4.5' of git://git.lwn.net/linux git bisect bad e535d74bc50df2357d3253f8f3ca48c66d0d892a # bad: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect bad 4e5448a31d73d0e944b7adb9049438a09bc332cb # good: [b70ce2ab41cb67ab3d661eda078f7c4029bbca95] dts: hisi: fixes no syscon fault when init mdio git bisect good b70ce2ab41cb67ab3d661eda078f7c4029bbca95 # good: [4a658527271bce43afb1cf4feec89afe6716ca59] xen-netback: delete NAPI instance when queue fails to initialize git bisect good 4a658527271bce43afb1cf4feec89afe6716ca59 # good: [c6894dec8ea9ae05747124dce98b3b5c2e69b168] bridge: fix lockdep addr_list_lock false positive splat git bisect good c6894dec8ea9ae05747124dce98b3b5c2e69b168 # good: [36beca6571c941b28b0798667608239731f9bc3a] sparc64: Fix numa node distance initialization git bisect good 36beca6571c941b28b0798667608239731f9bc3a # good: [750afbf8ee9c6a1c74a1fe5fc9852146b1d72687] bgmac: Fix reversed test of build_skb() return value. git bisect good 750afbf8ee9c6a1c74a1fe5fc9852146b1d72687 # good: [5a18d263f8d27418c98b8e8551dadfe975c054e3] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc git bisect good 5a18d263f8d27418c98b8e8551dadfe975c054e3 # first bad commit: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net And then I replayed, taking the davem/net merge as good instead, on the basis that it had taken longer than usual to hit the issue: git bisect start # good: [f689b742f217b2ffe7925f8a6521b208ee995309] Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good f689b742f217b2ffe7925f8a6521b208ee995309 # bad: [92e963f50fc74041b5e9e744c330dca48e04f08d] Linux 4.5-rc1 git bisect bad 92e963f50fc74041b5e9e744c330dca48e04f08d # bad: [7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide git bisect bad 7f36f1b2a8c4f55f8226ed6c8bb4ed6de11c4015 # bad: [6606b342febfd470b4a33acb73e360eeaca1d9bb] Merge git://www.linux-watchdog.org/linux-watchdog git bisect bad 6606b342febfd470b4a33acb73e360eeaca1d9bb # good: [d0021d3bdfe9d551859bca1f58da0e6be8e26043] Merge remote-tracking branch 'asoc/topic/wm8960' into asoc-next git bisect good d0021d3bdfe9d551859bca1f58da0e6be8e26043 # good: [e3315b439c30c208582ac64e58f0c0d36b83181e] ALSA: oxfw: allocate own address region for SCS.1 series git bisect good e3315b439c30c208582ac64e58f0c0d36b83181e # good: [3da834e3e5a4a5d26882955298b55a9ed37a00bc] clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig git bisect good 3da834e3e5a4a5d26882955298b55a9ed37a00bc # bad: [e535d74bc50df2357d3253f8f3ca48c66d0d892a] Merge tag 'docs-4.5' of git://git.lwn.net/linux git bisect bad e535d74bc50df2357d3253f8f3ca48c66d0d892a # good: [4e5448a31d73d0e944b7adb9049438a09bc332cb] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect good 4e5448a31d73d0e944b7adb9049438a09bc332cb # good: [aa13a960fc1bd28cfd8b3aef43e523ade1817a2c] Documentation: cpu-hotplug: Fix sysfs mount instructions git bisect good aa13a960fc1bd28cfd8b3aef43e523ade1817a2c # good: [afd8c08446d6503adc1ccd2726a8e27f35d95b79] Documentation: Explain pci=conf1,conf2 more verbosely git bisect good afd8c08446d6503adc1ccd2726a8e27f35d95b79 # good: [e5b6c1518878e157df4121c1caf70d9c470a6d31] firmware: dmi_scan: Save SMBIOS Type 9 System Slots git bisect good e5b6c1518878e157df4121c1caf70d9c470a6d31 # good: [ec3fc58b1e7a32cc9f552b306f8dbb4454e83798] thermal: add description for integral_cutoff unit git bisect good ec3fc58b1e7a32cc9f552b306f8dbb4454e83798 # bad: [ece6267878aed4eadff766112f1079984315d8c8] Merge tag 'clk-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux git bisect bad ece6267878aed4eadff766112f1079984315d8c8 # bad: [d45187aaf0e256d23da2f7694a7826524499aa31] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging git bisect bad d45187aaf0e256d23da2f7694a7826524499aa31 # first bad commit: [d45187aaf0e256d23da2f7694a7826524499aa31] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-03-08 11:49 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-25 2:10 Problems with swapping in v4.5-rc on POWER Hugh Dickins 2016-02-25 4:12 ` Michael Ellerman 2016-02-25 5:36 ` Hugh Dickins 2016-02-25 4:52 ` Aneesh Kumar K.V 2016-02-25 5:43 ` Hugh Dickins 2016-02-25 21:35 ` Hugh Dickins 2016-02-26 10:04 ` Hugh Dickins 2016-03-02 20:49 ` Hugh Dickins 2016-03-03 5:51 ` Michael Ellerman 2016-03-04 17:58 ` Hugh Dickins 2016-03-07 3:00 ` Michael Ellerman 2016-03-08 11:49 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).