* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] ` <c638ec9fdee2954ec5a7a2bd405aa2ba@tauceti.net> @ 2010-04-22 10:03 ` Michael S. Tsirkin 2010-04-23 5:26 ` Robert Wimmer 0 siblings, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2010-04-22 10:03 UTC (permalink / raw) To: kernel Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > Maybe some comments to my former mail about what I've done: > I started with a fresh clone (deleted the old /usr/src/linux > of course). > > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > > Then I started bisect > > git bisect start 'v2.6.31' 'v2.6.30' > > and build the first kernel and then marked kernels which > "crashed" with "soft lockup" or "swapper page allocation failure" > as bad and the other ones as good. Before I've compiled > a new kernel I've always done a "make mrproper". I don't know > if this is needed but thought it wouldn't hurt. > > For me it was not clear that maybe I should have had stopped > testing after the first commit that came up with a "swapper > page allocation failure". It was only one commit which cased > the allocation failure. All the other commits marked as bad > came up with a soft lockup. But I thought it is important to > find the earliest commit which crashes. So should I find out > the commit with the allocation failure? I think you did the right thing. We'll have to figure out soft lockup thing, then if page allocation failure turns out to be a different issue, look at it. > As you requested I've now done now a > > git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > > which ended with a soft lockup within 3 min. after starting > the VM (see > https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > with this kernel. I'm not sure why the lockup backtrace does not show function names - is the kernel stripped? > > Then I've done a > > git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > > compiled and restarted the VM with this kernel version > (BTW: Of course I've always used the same .config for > all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > is running fine. > > Thanks! > Robert Well, so the soft lockup issue seems NFS-related? Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to be causing problems on some old kernels (See bisect below). Any idea why? > On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > wrote: > > On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >> So after the compiler was running hot I've now the following result: > >> > >> server10:/usr/src/linux # git bisect log > >> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >> git bisect start 'v2.6.31' 'v2.6.30' > >> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >> capabilities of 82599 single speed fiber NICs. > >> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >> lowmemorykiller: fix up remaining checkpatch warnings > >> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >> 'for-linus' > >> of git://www.jni.nu/cris > >> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >> git://git.infradead.org/mtd-2.6 > >> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >> gspca_sonixj: Add light frequency control > >> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >> 'for-linus' > >> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >> card > >> driver reloading > >> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >> nfs4_path_walk() with VFS path lookup in a private namespace > >> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >> function put_mnt_ns() > >> git bisect good 616511d039af402670de8500d0e24495113a9cab > >> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >> functions for setting up private namespaces > >> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >> > >> > >> The last "git bisect good" prints out: > >> > >> server10:/usr/src/linux # git bisect good > >> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >> Date: Mon Jun 22 15:09:14 2009 -0400 > >> > >> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >> namespace > >> > >> As noted in the previous patch, the NFSv4 client mount code > currently > >> has several limitations. If the mount path contains symlinks, or > >> referrals, or even if it just contains a '..', then the client code > >> in > >> nfs4_path_walk() will fail with an error. > >> > >> This patch replaces the nfs4_path_walk()-based lookup with a helper > >> function that sets up a private namespace to represent the > namespace > >> on the > >> server, then uses the ordinary VFS and NFS path lookup code to walk > >> down the > >> mount path in that namespace. > >> > >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >> > >> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >> > >> Does this help you any further? > >> > >> Thanks! > >> Robert > > > > Looks suspiciously like some error in testing. > > Could you pls retest and verify again that > > cf8d2c11cb77f129675478792122f50827e5b0ae > > is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-22 10:03 ` [Bugme-new] [Bug 15709] New: swapper page allocation failure Michael S. Tsirkin @ 2010-04-23 5:26 ` Robert Wimmer 2010-04-25 9:18 ` Michael S. Tsirkin 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-04-23 5:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel > I'm not sure why the lockup backtrace does not show function names - > is the kernel stripped? I'm building the kernels always with "genkernel" a Gentoo helper programm for kernel building. But I've looked into the log file of genkernel and there is nothing mentioned about striping the kernel. There will be a future release of genkernel which supports this but this is currently not the case. Since I haven't stripped the kernel I would answer no. Maybe a kernel option which should be enabled? Thanks! Robert On 04/22/10 12:03, Michael S. Tsirkin wrote: > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > >> Maybe some comments to my former mail about what I've done: >> I started with a fresh clone (deleted the old /usr/src/linux >> of course). >> >> git clone >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >> >> Then I started bisect >> >> git bisect start 'v2.6.31' 'v2.6.30' >> >> and build the first kernel and then marked kernels which >> "crashed" with "soft lockup" or "swapper page allocation failure" >> as bad and the other ones as good. Before I've compiled >> a new kernel I've always done a "make mrproper". I don't know >> if this is needed but thought it wouldn't hurt. >> >> For me it was not clear that maybe I should have had stopped >> testing after the first commit that came up with a "swapper >> page allocation failure". It was only one commit which cased >> the allocation failure. All the other commits marked as bad >> came up with a soft lockup. But I thought it is important to >> find the earliest commit which crashes. So should I find out >> the commit with the allocation failure? >> > I think you did the right thing. We'll have to > figure out soft lockup thing, then if page allocation failure > turns out to be a different issue, look at it. > > >> As you requested I've now done now a >> >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 >> >> which ended with a soft lockup within 3 min. after starting >> the VM (see >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) >> with this kernel. >> > I'm not sure why the lockup backtrace does not show function names - > is the kernel stripped? > > >> Then I've done a >> >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae >> >> compiled and restarted the VM with this kernel version >> (BTW: Of course I've always used the same .config for >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae >> is running fine. >> >> Thanks! >> Robert >> > Well, so the soft lockup issue seems NFS-related? > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > be causing problems on some old kernels (See bisect below). Any idea why? > > > >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> >> wrote: >> >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: >>> >>>> So after the compiler was running hot I've now the following result: >>>> >>>> server10:/usr/src/linux # git bisect log >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >>>> git bisect start 'v2.6.31' 'v2.6.30' >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device >>>> capabilities of 82599 single speed fiber NICs. >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: >>>> lowmemorykiller: fix up remaining checkpatch warnings >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch >>>> 'for-linus' >>>> of git://www.jni.nu/cris >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge >>>> git://git.infradead.org/mtd-2.6 >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): >>>> gspca_sonixj: Add light frequency control >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch >>>> 'for-linus' >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix >>>> card >>>> driver reloading >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace >>>> nfs4_path_walk() with VFS path lookup in a private namespace >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the >>>> function put_mnt_ns() >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper >>>> functions for setting up private namespaces >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae >>>> >>>> >>>> The last "git bisect good" prints out: >>>> >>>> server10:/usr/src/linux # git bisect good >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >>>> Date: Mon Jun 22 15:09:14 2009 -0400 >>>> >>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private >>>> namespace >>>> >>>> As noted in the previous patch, the NFSv4 client mount code >>>> >> currently >> >>>> has several limitations. If the mount path contains symlinks, or >>>> referrals, or even if it just contains a '..', then the client code >>>> in >>>> nfs4_path_walk() will fail with an error. >>>> >>>> This patch replaces the nfs4_path_walk()-based lookup with a helper >>>> function that sets up a private namespace to represent the >>>> >> namespace >> >>>> on the >>>> server, then uses the ordinary VFS and NFS path lookup code to walk >>>> down the >>>> mount path in that namespace. >>>> >>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> >>>> >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs >>>> >>>> Does this help you any further? >>>> >>>> Thanks! >>>> Robert >>>> >>> Looks suspiciously like some error in testing. >>> Could you pls retest and verify again that >>> cf8d2c11cb77f129675478792122f50827e5b0ae >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-23 5:26 ` Robert Wimmer @ 2010-04-25 9:18 ` Michael S. Tsirkin 2010-04-25 20:41 ` Robert Wimmer 0 siblings, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2010-04-25 9:18 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > > I'm not sure why the lockup backtrace does not show function names - > > is the kernel stripped? > > I'm building the kernels always with "genkernel" a Gentoo > helper programm for kernel building. But I've looked into > the log file of genkernel and there is nothing mentioned about > striping the kernel. There will be a future release of genkernel > which supports this but this is currently not the case. Since > I haven't stripped the kernel I would answer no. Maybe a > kernel option which should be enabled? > > Thanks! > Robert > Hmm. I have these CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y # CONFIG_STRIP_ASM_SYMS is not set > > > On 04/22/10 12:03, Michael S. Tsirkin wrote: > > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > > > >> Maybe some comments to my former mail about what I've done: > >> I started with a fresh clone (deleted the old /usr/src/linux > >> of course). > >> > >> git clone > >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >> > >> Then I started bisect > >> > >> git bisect start 'v2.6.31' 'v2.6.30' > >> > >> and build the first kernel and then marked kernels which > >> "crashed" with "soft lockup" or "swapper page allocation failure" > >> as bad and the other ones as good. Before I've compiled > >> a new kernel I've always done a "make mrproper". I don't know > >> if this is needed but thought it wouldn't hurt. > >> > >> For me it was not clear that maybe I should have had stopped > >> testing after the first commit that came up with a "swapper > >> page allocation failure". It was only one commit which cased > >> the allocation failure. All the other commits marked as bad > >> came up with a soft lockup. But I thought it is important to > >> find the earliest commit which crashes. So should I find out > >> the commit with the allocation failure? > >> > > I think you did the right thing. We'll have to > > figure out soft lockup thing, then if page allocation failure > > turns out to be a different issue, look at it. > > > > > >> As you requested I've now done now a > >> > >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> > >> which ended with a soft lockup within 3 min. after starting > >> the VM (see > >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > >> with this kernel. > >> > > I'm not sure why the lockup backtrace does not show function names - > > is the kernel stripped? > > > > > >> Then I've done a > >> > >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > >> > >> compiled and restarted the VM with this kernel version > >> (BTW: Of course I've always used the same .config for > >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > >> is running fine. > >> > >> Thanks! > >> Robert > >> > > Well, so the soft lockup issue seems NFS-related? > > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > > be causing problems on some old kernels (See bisect below). Any idea why? > > > > > > > >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > >> wrote: > >> > >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >>> > >>>> So after the compiler was running hot I've now the following result: > >>>> > >>>> server10:/usr/src/linux # git bisect log > >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >>>> capabilities of 82599 single speed fiber NICs. > >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >>>> lowmemorykiller: fix up remaining checkpatch warnings > >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >>>> 'for-linus' > >>>> of git://www.jni.nu/cris > >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >>>> git://git.infradead.org/mtd-2.6 > >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >>>> gspca_sonixj: Add light frequency control > >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >>>> 'for-linus' > >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >>>> card > >>>> driver reloading > >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >>>> nfs4_path_walk() with VFS path lookup in a private namespace > >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >>>> function put_mnt_ns() > >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab > >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >>>> functions for setting up private namespaces > >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> > >>>> > >>>> The last "git bisect good" prints out: > >>>> > >>>> server10:/usr/src/linux # git bisect good > >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>> Date: Mon Jun 22 15:09:14 2009 -0400 > >>>> > >>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >>>> namespace > >>>> > >>>> As noted in the previous patch, the NFSv4 client mount code > >>>> > >> currently > >> > >>>> has several limitations. If the mount path contains symlinks, or > >>>> referrals, or even if it just contains a '..', then the client code > >>>> in > >>>> nfs4_path_walk() will fail with an error. > >>>> > >>>> This patch replaces the nfs4_path_walk()-based lookup with a helper > >>>> function that sets up a private namespace to represent the > >>>> > >> namespace > >> > >>>> on the > >>>> server, then uses the ordinary VFS and NFS path lookup code to walk > >>>> down the > >>>> mount path in that namespace. > >>>> > >>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >>>> > >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >>>> > >>>> Does this help you any further? > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>> Looks suspiciously like some error in testing. > >>> Could you pls retest and verify again that > >>> cf8d2c11cb77f129675478792122f50827e5b0ae > >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? > >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-25 9:18 ` Michael S. Tsirkin @ 2010-04-25 20:41 ` Robert Wimmer 2010-04-25 20:49 ` Michael S. Tsirkin 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-04-25 20:41 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL to my .config. I've uploaded the dmesg output. Maybe it helps a little bit: https://bugzilla.kernel.org/attachment.cgi?id=26138 - Robert On 04/25/10 11:18, Michael S. Tsirkin wrote: > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > >>> I'm not sure why the lockup backtrace does not show function names - >>> is the kernel stripped? >>> >> I'm building the kernels always with "genkernel" a Gentoo >> helper programm for kernel building. But I've looked into >> the log file of genkernel and there is nothing mentioned about >> striping the kernel. There will be a future release of genkernel >> which supports this but this is currently not the case. Since >> I haven't stripped the kernel I would answer no. Maybe a >> kernel option which should be enabled? >> >> Thanks! >> Robert >> >> > Hmm. I have these > CONFIG_KALLSYMS=y > CONFIG_KALLSYMS_ALL=y > CONFIG_KALLSYMS_EXTRA_PASS=y > # CONFIG_STRIP_ASM_SYMS is not set > > > >> >> On 04/22/10 12:03, Michael S. Tsirkin wrote: >> >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: >>> >>> >>>> Maybe some comments to my former mail about what I've done: >>>> I started with a fresh clone (deleted the old /usr/src/linux >>>> of course). >>>> >>>> git clone >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >>>> >>>> Then I started bisect >>>> >>>> git bisect start 'v2.6.31' 'v2.6.30' >>>> >>>> and build the first kernel and then marked kernels which >>>> "crashed" with "soft lockup" or "swapper page allocation failure" >>>> as bad and the other ones as good. Before I've compiled >>>> a new kernel I've always done a "make mrproper". I don't know >>>> if this is needed but thought it wouldn't hurt. >>>> >>>> For me it was not clear that maybe I should have had stopped >>>> testing after the first commit that came up with a "swapper >>>> page allocation failure". It was only one commit which cased >>>> the allocation failure. All the other commits marked as bad >>>> came up with a soft lockup. But I thought it is important to >>>> find the earliest commit which crashes. So should I find out >>>> the commit with the allocation failure? >>>> >>>> >>> I think you did the right thing. We'll have to >>> figure out soft lockup thing, then if page allocation failure >>> turns out to be a different issue, look at it. >>> >>> >>> >>>> As you requested I've now done now a >>>> >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> >>>> which ended with a soft lockup within 3 min. after starting >>>> the VM (see >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) >>>> with this kernel. >>>> >>>> >>> I'm not sure why the lockup backtrace does not show function names - >>> is the kernel stripped? >>> >>> >>> >>>> Then I've done a >>>> >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae >>>> >>>> compiled and restarted the VM with this kernel version >>>> (BTW: Of course I've always used the same .config for >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae >>>> is running fine. >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>> Well, so the soft lockup issue seems NFS-related? >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to >>> be causing problems on some old kernels (See bisect below). Any idea why? >>> >>> >>> >>> >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> >>>> wrote: >>>> >>>> >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: >>>>> >>>>> >>>>>> So after the compiler was running hot I've now the following result: >>>>>> >>>>>> server10:/usr/src/linux # git bisect log >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >>>>>> git bisect start 'v2.6.31' 'v2.6.30' >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device >>>>>> capabilities of 82599 single speed fiber NICs. >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: >>>>>> lowmemorykiller: fix up remaining checkpatch warnings >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch >>>>>> 'for-linus' >>>>>> of git://www.jni.nu/cris >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge >>>>>> git://git.infradead.org/mtd-2.6 >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): >>>>>> gspca_sonixj: Add light frequency control >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch >>>>>> 'for-linus' >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix >>>>>> card >>>>>> driver reloading >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the >>>>>> function put_mnt_ns() >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper >>>>>> functions for setting up private namespaces >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae >>>>>> >>>>>> >>>>>> The last "git bisect good" prints out: >>>>>> >>>>>> server10:/usr/src/linux # git bisect good >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >>>>>> Date: Mon Jun 22 15:09:14 2009 -0400 >>>>>> >>>>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private >>>>>> namespace >>>>>> >>>>>> As noted in the previous patch, the NFSv4 client mount code >>>>>> >>>>>> >>>> currently >>>> >>>> >>>>>> has several limitations. If the mount path contains symlinks, or >>>>>> referrals, or even if it just contains a '..', then the client code >>>>>> in >>>>>> nfs4_path_walk() will fail with an error. >>>>>> >>>>>> This patch replaces the nfs4_path_walk()-based lookup with a helper >>>>>> function that sets up a private namespace to represent the >>>>>> >>>>>> >>>> namespace >>>> >>>> >>>>>> on the >>>>>> server, then uses the ordinary VFS and NFS path lookup code to walk >>>>>> down the >>>>>> mount path in that namespace. >>>>>> >>>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >>>>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> >>>>>> >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs >>>>>> >>>>>> Does this help you any further? >>>>>> >>>>>> Thanks! >>>>>> Robert >>>>>> >>>>>> >>>>> Looks suspiciously like some error in testing. >>>>> Could you pls retest and verify again that >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? >>>>> >>>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-25 20:41 ` Robert Wimmer @ 2010-04-25 20:49 ` Michael S. Tsirkin 2010-04-26 12:15 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Michael S. Tsirkin @ 2010-04-25 20:49 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel So, it's an NFS-related regression, which is consistent with the bisect results. I guess someone who knows about NFS will have to look at it... BTW, you probably want to label the bug as regression. On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote: > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL > to my .config. I've uploaded the dmesg output. Maybe it > helps a little bit: > > https://bugzilla.kernel.org/attachment.cgi?id=26138 > > - Robert > > > On 04/25/10 11:18, Michael S. Tsirkin wrote: > > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > > > >>> I'm not sure why the lockup backtrace does not show function names - > >>> is the kernel stripped? > >>> > >> I'm building the kernels always with "genkernel" a Gentoo > >> helper programm for kernel building. But I've looked into > >> the log file of genkernel and there is nothing mentioned about > >> striping the kernel. There will be a future release of genkernel > >> which supports this but this is currently not the case. Since > >> I haven't stripped the kernel I would answer no. Maybe a > >> kernel option which should be enabled? > >> > >> Thanks! > >> Robert > >> > >> > > Hmm. I have these > > CONFIG_KALLSYMS=y > > CONFIG_KALLSYMS_ALL=y > > CONFIG_KALLSYMS_EXTRA_PASS=y > > # CONFIG_STRIP_ASM_SYMS is not set > > > > > > > >> > >> On 04/22/10 12:03, Michael S. Tsirkin wrote: > >> > >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > >>> > >>> > >>>> Maybe some comments to my former mail about what I've done: > >>>> I started with a fresh clone (deleted the old /usr/src/linux > >>>> of course). > >>>> > >>>> git clone > >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >>>> > >>>> Then I started bisect > >>>> > >>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>> > >>>> and build the first kernel and then marked kernels which > >>>> "crashed" with "soft lockup" or "swapper page allocation failure" > >>>> as bad and the other ones as good. Before I've compiled > >>>> a new kernel I've always done a "make mrproper". I don't know > >>>> if this is needed but thought it wouldn't hurt. > >>>> > >>>> For me it was not clear that maybe I should have had stopped > >>>> testing after the first commit that came up with a "swapper > >>>> page allocation failure". It was only one commit which cased > >>>> the allocation failure. All the other commits marked as bad > >>>> came up with a soft lockup. But I thought it is important to > >>>> find the earliest commit which crashes. So should I find out > >>>> the commit with the allocation failure? > >>>> > >>>> > >>> I think you did the right thing. We'll have to > >>> figure out soft lockup thing, then if page allocation failure > >>> turns out to be a different issue, look at it. > >>> > >>> > >>> > >>>> As you requested I've now done now a > >>>> > >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> > >>>> which ended with a soft lockup within 3 min. after starting > >>>> the VM (see > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > >>>> with this kernel. > >>>> > >>>> > >>> I'm not sure why the lockup backtrace does not show function names - > >>> is the kernel stripped? > >>> > >>> > >>> > >>>> Then I've done a > >>>> > >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> > >>>> compiled and restarted the VM with this kernel version > >>>> (BTW: Of course I've always used the same .config for > >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> is running fine. > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>> Well, so the soft lockup issue seems NFS-related? > >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > >>> be causing problems on some old kernels (See bisect below). Any idea why? > >>> > >>> > >>> > >>> > >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > >>>> wrote: > >>>> > >>>> > >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >>>>> > >>>>> > >>>>>> So after the compiler was running hot I've now the following result: > >>>>>> > >>>>>> server10:/usr/src/linux # git bisect log > >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >>>>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >>>>>> capabilities of 82599 single speed fiber NICs. > >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >>>>>> lowmemorykiller: fix up remaining checkpatch warnings > >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >>>>>> 'for-linus' > >>>>>> of git://www.jni.nu/cris > >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >>>>>> git://git.infradead.org/mtd-2.6 > >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >>>>>> gspca_sonixj: Add light frequency control > >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >>>>>> 'for-linus' > >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >>>>>> card > >>>>>> driver reloading > >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace > >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >>>>>> function put_mnt_ns() > >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab > >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >>>>>> functions for setting up private namespaces > >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >>>>>> > >>>>>> > >>>>>> The last "git bisect good" prints out: > >>>>>> > >>>>>> server10:/usr/src/linux # git bisect good > >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>>>> Date: Mon Jun 22 15:09:14 2009 -0400 > >>>>>> > >>>>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >>>>>> namespace > >>>>>> > >>>>>> As noted in the previous patch, the NFSv4 client mount code > >>>>>> > >>>>>> > >>>> currently > >>>> > >>>> > >>>>>> has several limitations. If the mount path contains symlinks, or > >>>>>> referrals, or even if it just contains a '..', then the client code > >>>>>> in > >>>>>> nfs4_path_walk() will fail with an error. > >>>>>> > >>>>>> This patch replaces the nfs4_path_walk()-based lookup with a helper > >>>>>> function that sets up a private namespace to represent the > >>>>>> > >>>>>> > >>>> namespace > >>>> > >>>> > >>>>>> on the > >>>>>> server, then uses the ordinary VFS and NFS path lookup code to walk > >>>>>> down the > >>>>>> mount path in that namespace. > >>>>>> > >>>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >>>>>> > >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >>>>>> > >>>>>> Does this help you any further? > >>>>>> > >>>>>> Thanks! > >>>>>> Robert > >>>>>> > >>>>>> > >>>>> Looks suspiciously like some error in testing. > >>>>> Could you pls retest and verify again that > >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae > >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? > >>>>> > >>>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-25 20:49 ` Michael S. Tsirkin @ 2010-04-26 12:15 ` Trond Myklebust 2010-04-26 20:25 ` Robert Wimmer 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2010-04-26 12:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Robert Wimmer, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Sun, 2010-04-25 at 23:49 +0300, Michael S. Tsirkin wrote: > So, it's an NFS-related regression, which is consistent with the bisect > results. I guess someone who knows about NFS will have to look at it... > BTW, you probably want to label the bug as regression. > > On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote: > > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL > > to my .config. I've uploaded the dmesg output. Maybe it > > helps a little bit: > > > > https://bugzilla.kernel.org/attachment.cgi?id=26138 > > > > - Robert > > That last trace is just saying that the NFSv4 reboot recovery code is crashing (which is hardly surprising if the memory management is hosed). The initial bisection makes little sense to me: it is basically blaming a page allocation problem on a change to the NFSv4 mount code. The only way I can see that possibly happen is if you are hitting a stack overflow. So 2 questions: - Are you able to reproduce the bug when using NFSv3 instead? - Have you tried running with stack tracing enabled? Cheers Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 12:15 ` Trond Myklebust @ 2010-04-26 20:25 ` Robert Wimmer [not found] ` <4BD5F6C5.8080605-PAwl83ecUlHR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-04-26 20:25 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel >>> I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL >>> to my .config. I've uploaded the dmesg output. Maybe it >>> helps a little bit: >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=26138 >>> >>> - Robert >>> >>> > That last trace is just saying that the NFSv4 reboot recovery code is > crashing (which is hardly surprising if the memory management is hosed). > > The initial bisection makes little sense to me: it is basically blaming > a page allocation problem on a change to the NFSv4 mount code. The only > way I can see that possibly happen is if you are hitting a stack > overflow. > So 2 questions: > > - Are you able to reproduce the bug when using NFSv3 instead? > I've tried with NFSv3 now. With v4 the error normally occur within 5 minutes. The VM is now running for one hour and no soft lockup so far. So I would say it can't be reproduced with v3. > - Have you tried running with stack tracing enabled? > Can you explain this a little bit more please? CONFIG_STACKTRACE=y was already enabled. I've now enabled CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_NOP_TRACER=y CONFIG_HAVE_FTRACE_NMI_ENTER=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_FTRACE_SYSCALLS=y CONFIG_FTRACE_NMI_ENTER=y CONFIG_CONTEXT_SWITCH_TRACER=y CONFIG_GENERIC_TRACER=y CONFIG_FTRACE=y CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y CONFIG_FTRACE_SYSCALLS=y CONFIG_STACK_TRACER=y CONFIG_KMEMTRACE=y CONFIG_DYNAMIC_FTRACE=y CONFIG_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y and run echo 1 > /proc/sys/kernel/stack_tracer_enabled But the output is mostly the same in dmesg/ var/log/messages. Can you please guide me how I can enable the stack tracing you need? Thanks! Robert -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4BD5F6C5.8080605-PAwl83ecUlHR7s880joybQ@public.gmane.org>]
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] ` <4BD5F6C5.8080605-PAwl83ecUlHR7s880joybQ@public.gmane.org> @ 2010-04-26 21:04 ` Trond Myklebust 2010-04-26 22:18 ` Robert Wimmer 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2010-04-26 21:04 UTC (permalink / raw) To: Robert Wimmer Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Mon, 2010-04-26 at 22:25 +0200, Robert Wimmer wrote: > I've tried with NFSv3 now. With v4 the error normally occur > within 5 minutes. The VM is now running for one hour and no > soft lockup so far. So I would say it can't be reproduced with > v3. Thanks! That's useful info. > > - Have you tried running with stack tracing enabled? > > > > Can you explain this a little bit more please? CONFIG_STACKTRACE=y > was already enabled. I've now enabled > > CONFIG_USER_STACKTRACE_SUPPORT=y > CONFIG_NOP_TRACER=y > CONFIG_HAVE_FTRACE_NMI_ENTER=y > CONFIG_HAVE_FUNCTION_TRACER=y > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y > CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y > CONFIG_HAVE_DYNAMIC_FTRACE=y > CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_FTRACE_SYSCALLS=y > CONFIG_FTRACE_NMI_ENTER=y > CONFIG_CONTEXT_SWITCH_TRACER=y > CONFIG_GENERIC_TRACER=y > CONFIG_FTRACE=y > CONFIG_FUNCTION_TRACER=y > CONFIG_FUNCTION_GRAPH_TRACER=y > CONFIG_FTRACE_SYSCALLS=y > CONFIG_STACK_TRACER=y > CONFIG_KMEMTRACE=y > CONFIG_DYNAMIC_FTRACE=y > CONFIG_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_MMIOTRACE_SUPPORT=y > > and run > > echo 1 > /proc/sys/kernel/stack_tracer_enabled > > But the output is mostly the same in dmesg/ > var/log/messages. Can you please guide me how I can > enable the stack tracing you need? Sure. In addition to what you did above, please do mount -t debugfs none /sys/kernel/debug and then cat the contents of the pseudofile at /sys/kernel/debug/tracing/stack_trace Please do this more or less immediately after you've finished mounting the NFSv4 client. Does your server have the 'crossmnt' or 'nohide' flags set, or does it use the 'refer' export option anywhere? If so, then we might have to test further, since those may trigger the NFSv4 submount feature. Cheers Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 21:04 ` Trond Myklebust @ 2010-04-26 22:18 ` Robert Wimmer 2010-04-26 23:28 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-04-26 22:18 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel > Sure. In addition to what you did above, please do > > mount -t debugfs none /sys/kernel/debug > > and then cat the contents of the pseudofile at > > /sys/kernel/debug/tracing/stack_trace > > Please do this more or less immediately after you've finished mounting > the NFSv4 client. > I've uploaded the stack trace. It was generated directly after mounting. Here are the stacks: After mounting: https://bugzilla.kernel.org/attachment.cgi?id=26153 After the soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26154 The dmesg output of the soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26155 > Does your server have the 'crossmnt' or 'nohide' flags set, or does it > use the 'refer' export option anywhere? If so, then we might have to > test further, since those may trigger the NFSv4 submount feature. > The server has the following settings: rw,nohide,insecure,async,no_subtree_check,no_root_squash Thanks! Robert ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 22:18 ` Robert Wimmer @ 2010-04-26 23:28 ` Trond Myklebust 2010-04-27 22:56 ` Robert Wimmer 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2010-04-26 23:28 UTC (permalink / raw) To: Robert Wimmer Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1532 bytes --] On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > > Sure. In addition to what you did above, please do > > > > mount -t debugfs none /sys/kernel/debug > > > > and then cat the contents of the pseudofile at > > > > /sys/kernel/debug/tracing/stack_trace > > > > Please do this more or less immediately after you've finished mounting > > the NFSv4 client. > > > > I've uploaded the stack trace. It was generated > directly after mounting. Here are the stacks: > > After mounting: > https://bugzilla.kernel.org/attachment.cgi?id=26153 > After the soft lockup: > https://bugzilla.kernel.org/attachment.cgi?id=26154 > The dmesg output of the soft lockup: > https://bugzilla.kernel.org/attachment.cgi?id=26155 > > > Does your server have the 'crossmnt' or 'nohide' flags set, or does it > > use the 'refer' export option anywhere? If so, then we might have to > > test further, since those may trigger the NFSv4 submount feature. > > > The server has the following settings: > rw,nohide,insecure,async,no_subtree_check,no_root_squash > > Thanks! > Robert > > That second trace is more than 5.5K deep, more than half of which is socket overhead :-(((. The process stack does not appear to have overflowed, however that trace doesn't include any IRQ stack overhead. OK... So what happens if we get rid of half of that trace by forcing asynchronous tasks such as this to run entirely in rpciod instead of first trying to run in the process context? See the attachment... [-- Attachment #2: linux-2.6.34-000-reduce_async_rpc_stack_usage.dif --] [-- Type: text/plain, Size: 856 bytes --] SUNRPC: Reduce asynchronous RPC task stack usage From: Trond Myklebust <Trond.Myklebust@netapp.com> We should just farm out asynchronous RPC tasks immediately to rpciod... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- net/sunrpc/sched.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index c8979ce..22a097f 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -720,7 +720,12 @@ void rpc_execute(struct rpc_task *task) { rpc_set_active(task); rpc_set_running(task); - __rpc_execute(task); + if (RPC_IS_ASYNC(task)) { + INIT_WORK(&task->u.tk_work, rpc_async_schedule); + queue_work(rpciod_workqueue, &task->u.tk_work); + + } else + __rpc_execute(task); } static void rpc_async_schedule(struct work_struct *work) ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 23:28 ` Trond Myklebust @ 2010-04-27 22:56 ` Robert Wimmer [not found] ` <be8a0f012ebb2ae02522998591e6f1a5@tauceti.net> 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-04-27 22:56 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel I've applied the patch against the kernel which I got from "git clone ...." resulted in a kernel 2.6.34-rc5. The stack trace after mounting NFS is here: https://bugzilla.kernel.org/attachment.cgi?id=26166 /var/log/messages after soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26167 I hope that there is any usefull information in there. Thanks! Robert On 04/27/10 01:28, Trond Myklebust wrote: > On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > >>> Sure. In addition to what you did above, please do >>> >>> mount -t debugfs none /sys/kernel/debug >>> >>> and then cat the contents of the pseudofile at >>> >>> /sys/kernel/debug/tracing/stack_trace >>> >>> Please do this more or less immediately after you've finished mounting >>> the NFSv4 client. >>> >>> >> I've uploaded the stack trace. It was generated >> directly after mounting. Here are the stacks: >> >> After mounting: >> https://bugzilla.kernel.org/attachment.cgi?id=26153 >> After the soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26154 >> The dmesg output of the soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26155 >> >> >>> Does your server have the 'crossmnt' or 'nohide' flags set, or does it >>> use the 'refer' export option anywhere? If so, then we might have to >>> test further, since those may trigger the NFSv4 submount feature. >>> >>> >> The server has the following settings: >> rw,nohide,insecure,async,no_subtree_check,no_root_squash >> >> Thanks! >> Robert >> >> >> > That second trace is more than 5.5K deep, more than half of which is > socket overhead :-(((. > > The process stack does not appear to have overflowed, however that trace > doesn't include any IRQ stack overhead. > > OK... So what happens if we get rid of half of that trace by forcing > asynchronous tasks such as this to run entirely in rpciod instead of > first trying to run in the process context? > > See the attachment... > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <be8a0f012ebb2ae02522998591e6f1a5@tauceti.net>]
[parent not found: <be8a0f012ebb2ae02522998591e6f1a5-PAwl83ecUlHR7s880joybQ@public.gmane.org>]
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] ` <be8a0f012ebb2ae02522998591e6f1a5-PAwl83ecUlHR7s880joybQ@public.gmane.org> @ 2010-05-06 21:19 ` Robert Wimmer [not found] ` <4BE33259.3000609-PAwl83ecUlHR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-05-06 21:19 UTC (permalink / raw) To: Trond Myklebust, mst Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel I don't know if someone is still interested in this but I think Trond isn't further interested because the last error was of cource a "page allocation failure" and not a "soft lookup" which Trond was trying to solve. But the patch was for 2.6.34 and the "soft lookup" comes up only with some 2.6.30 and maybe some 2.6.31 kernel versions. But the first error I reported was a "page allocation failure" which all kernels >= 2.6.32 produces with this configuration I use (NFSv4). Michael suggested to first solve the "soft lookup" before further investigating the "page allocation failure". We know that the "soft lookup" only pop's up with NFSv4 and not v3. I really want to use v4 but since I'm not a kernel hacker someone must guide me what to try next. I know that you're all have a lot of other work to do but if there're no ideas left what to do next it's maybe best to close the bug for now and I stay with kernel 2.6.30 for now or go back to NFS v3 if I upgrade to a newer kernel. Maybe the error will be fixed "by accident" in >= 2.6.35 ;-) Thanks! Robert On 05/03/10 10:11, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote: > Anything we can do to investigate this further? > > Thanks! > Robert > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org> > wrote: > >> I've applied the patch against the kernel which I got >> from "git clone ...." resulted in a kernel 2.6.34-rc5. >> >> The stack trace after mounting NFS is here: >> https://bugzilla.kernel.org/attachment.cgi?id=26166 >> /var/log/messages after soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26167 >> >> I hope that there is any usefull information in there. >> >> Thanks! >> Robert >> >> On 04/27/10 01:28, Trond Myklebust wrote: >> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >>> >>> >>>>> Sure. In addition to what you did above, please do >>>>> >>>>> mount -t debugfs none /sys/kernel/debug >>>>> >>>>> and then cat the contents of the pseudofile at >>>>> >>>>> /sys/kernel/debug/tracing/stack_trace >>>>> >>>>> Please do this more or less immediately after you've finished >>>>> > mounting > >>>>> the NFSv4 client. >>>>> >>>>> >>>>> >>>> I've uploaded the stack trace. It was generated >>>> directly after mounting. Here are the stacks: >>>> >>>> After mounting: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>>> After the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>>> The dmesg output of the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>>> >>>> >>>> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does >>>>> > it > >>>>> use the 'refer' export option anywhere? If so, then we might have to >>>>> test further, since those may trigger the NFSv4 submount feature. >>>>> >>>>> >>>>> >>>> The server has the following settings: >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>>> >>>> >>> That second trace is more than 5.5K deep, more than half of which is >>> socket overhead :-(((. >>> >>> The process stack does not appear to have overflowed, however that >>> > trace > >>> doesn't include any IRQ stack overhead. >>> >>> OK... So what happens if we get rid of half of that trace by forcing >>> asynchronous tasks such as this to run entirely in rpciod instead of >>> first trying to run in the process context? >>> >>> See the attachment... >>> >>> ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4BE33259.3000609-PAwl83ecUlHR7s880joybQ@public.gmane.org>]
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] ` <4BE33259.3000609-PAwl83ecUlHR7s880joybQ@public.gmane.org> @ 2010-05-06 21:30 ` Trond Myklebust 2010-05-13 21:08 ` Robert Wimmer 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2010-05-06 21:30 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Sorry. I've been caught up in work in the past few days. I can certainly help with the soft lockup if you are able to supply either a dump that includes all threads stuck in the NFS, or a (binary) wireshark dump that shows the NFSv4 traffic between the client and server around the time of the hang. Cheers Trond On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: > I don't know if someone is still interested in this > but I think Trond isn't further interested because > the last error was of cource a "page allocation > failure" and not a "soft lookup" which Trond was > trying to solve. But the patch was for 2.6.34 and > the "soft lookup" comes up only with some 2.6.30 and > maybe some 2.6.31 kernel versions. But the first error > I reported was a "page allocation failure" which > all kernels >= 2.6.32 produces with this configuration > I use (NFSv4). > > Michael suggested to first solve the "soft lookup" > before further investigating the "page allocation > failure". We know that the "soft lookup" only > pop's up with NFSv4 and not v3. I really want to > use v4 but since I'm not a kernel hacker someone > must guide me what to try next. > > I know that you're all have a lot of other work to > do but if there're no ideas left what to do next > it's maybe best to close the bug for now and I stay with > kernel 2.6.30 for now or go back to NFS v3 if I > upgrade to a newer kernel. Maybe the error will > be fixed "by accident" in >= 2.6.35 ;-) > > Thanks! > Robert > > > > On 05/03/10 10:11, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote: > > Anything we can do to investigate this further? > > > > Thanks! > > Robert > > > > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org> > > wrote: > > > >> I've applied the patch against the kernel which I got > >> from "git clone ...." resulted in a kernel 2.6.34-rc5. > >> > >> The stack trace after mounting NFS is here: > >> https://bugzilla.kernel.org/attachment.cgi?id=26166 > >> /var/log/messages after soft lockup: > >> https://bugzilla.kernel.org/attachment.cgi?id=26167 > >> > >> I hope that there is any usefull information in there. > >> > >> Thanks! > >> Robert > >> > >> On 04/27/10 01:28, Trond Myklebust wrote: > >> > >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > >>> > >>> > >>>>> Sure. In addition to what you did above, please do > >>>>> > >>>>> mount -t debugfs none /sys/kernel/debug > >>>>> > >>>>> and then cat the contents of the pseudofile at > >>>>> > >>>>> /sys/kernel/debug/tracing/stack_trace > >>>>> > >>>>> Please do this more or less immediately after you've finished > >>>>> > > mounting > > > >>>>> the NFSv4 client. > >>>>> > >>>>> > >>>>> > >>>> I've uploaded the stack trace. It was generated > >>>> directly after mounting. Here are the stacks: > >>>> > >>>> After mounting: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 > >>>> After the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 > >>>> The dmesg output of the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 > >>>> > >>>> > >>>> > >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does > >>>>> > > it > > > >>>>> use the 'refer' export option anywhere? If so, then we might have to > >>>>> test further, since those may trigger the NFSv4 submount feature. > >>>>> > >>>>> > >>>>> > >>>> The server has the following settings: > >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>>> > >>>> > >>> That second trace is more than 5.5K deep, more than half of which is > >>> socket overhead :-(((. > >>> > >>> The process stack does not appear to have overflowed, however that > >>> > > trace > > > >>> doesn't include any IRQ stack overhead. > >>> > >>> OK... So what happens if we get rid of half of that trace by forcing > >>> asynchronous tasks such as this to run entirely in rpciod instead of > >>> first trying to run in the process context? > >>> > >>> See the attachment... > >>> > >>> > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-05-06 21:30 ` Trond Myklebust @ 2010-05-13 21:08 ` Robert Wimmer [not found] ` <4BEC6A5D.5070304-PAwl83ecUlHR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-05-13 21:08 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Finally I've had some time to do the next test. Here is a wireshark dump (~750 MByte): http://213.252.12.93/2.6.34-rc5.cap.gz dmesg output after page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26371 stack trace before page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26369 stack trace after page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26370 I hope the wireshark dump is not to big to download. It was created with tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap Thanks! Robert On 05/06/10 23:30, Trond Myklebust wrote: > Sorry. I've been caught up in work in the past few days. > > I can certainly help with the soft lockup if you are able to supply > either a dump that includes all threads stuck in the NFS, or a (binary) > wireshark dump that shows the NFSv4 traffic between the client and > server around the time of the hang. > > Cheers > Trond > > On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: > >> I don't know if someone is still interested in this >> but I think Trond isn't further interested because >> the last error was of cource a "page allocation >> failure" and not a "soft lookup" which Trond was >> trying to solve. But the patch was for 2.6.34 and >> the "soft lookup" comes up only with some 2.6.30 and >> maybe some 2.6.31 kernel versions. But the first error >> I reported was a "page allocation failure" which >> all kernels >= 2.6.32 produces with this configuration >> I use (NFSv4). >> >> Michael suggested to first solve the "soft lookup" >> before further investigating the "page allocation >> failure". We know that the "soft lookup" only >> pop's up with NFSv4 and not v3. I really want to >> use v4 but since I'm not a kernel hacker someone >> must guide me what to try next. >> >> I know that you're all have a lot of other work to >> do but if there're no ideas left what to do next >> it's maybe best to close the bug for now and I stay with >> kernel 2.6.30 for now or go back to NFS v3 if I >> upgrade to a newer kernel. Maybe the error will >> be fixed "by accident" in >= 2.6.35 ;-) >> >> Thanks! >> Robert >> >> >> >> On 05/03/10 10:11, kernel@tauceti.net wrote: >> >>> Anything we can do to investigate this further? >>> >>> Thanks! >>> Robert >>> >>> >>> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> >>> wrote: >>> >>> >>>> I've applied the patch against the kernel which I got >>>> from "git clone ...." resulted in a kernel 2.6.34-rc5. >>>> >>>> The stack trace after mounting NFS is here: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26166 >>>> /var/log/messages after soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26167 >>>> >>>> I hope that there is any usefull information in there. >>>> >>>> Thanks! >>>> Robert >>>> >>>> On 04/27/10 01:28, Trond Myklebust wrote: >>>> >>>> >>>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >>>>> >>>>> >>>>> >>>>>>> Sure. In addition to what you did above, please do >>>>>>> >>>>>>> mount -t debugfs none /sys/kernel/debug >>>>>>> >>>>>>> and then cat the contents of the pseudofile at >>>>>>> >>>>>>> /sys/kernel/debug/tracing/stack_trace >>>>>>> >>>>>>> Please do this more or less immediately after you've finished >>>>>>> >>>>>>> >>> mounting >>> >>> >>>>>>> the NFSv4 client. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> I've uploaded the stack trace. It was generated >>>>>> directly after mounting. Here are the stacks: >>>>>> >>>>>> After mounting: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>>>>> After the soft lockup: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>>>>> The dmesg output of the soft lockup: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does >>>>>>> >>>>>>> >>> it >>> >>> >>>>>>> use the 'refer' export option anywhere? If so, then we might have to >>>>>>> test further, since those may trigger the NFSv4 submount feature. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> The server has the following settings: >>>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>>>>> >>>>>> Thanks! >>>>>> Robert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> That second trace is more than 5.5K deep, more than half of which is >>>>> socket overhead :-(((. >>>>> >>>>> The process stack does not appear to have overflowed, however that >>>>> >>>>> >>> trace >>> >>> >>>>> doesn't include any IRQ stack overhead. >>>>> >>>>> OK... So what happens if we get rid of half of that trace by forcing >>>>> asynchronous tasks such as this to run entirely in rpciod instead of >>>>> first trying to run in the process context? >>>>> >>>>> See the attachment... >>>>> >>>>> >>>>> >> > > ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4BEC6A5D.5070304-PAwl83ecUlHR7s880joybQ@public.gmane.org>]
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] ` <4BEC6A5D.5070304-PAwl83ecUlHR7s880joybQ@public.gmane.org> @ 2010-05-13 21:13 ` Trond Myklebust 2010-05-14 5:42 ` Robert Wimmer [not found] ` <1273785234.22932.14.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 0 siblings, 2 replies; 19+ messages in thread From: Trond Myklebust @ 2010-05-13 21:13 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > Finally I've had some time to do the next test. > Here is a wireshark dump (~750 MByte): > http://213.252.12.93/2.6.34-rc5.cap.gz > > dmesg output after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26371 > > stack trace before page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26369 > > stack trace after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26370 > > I hope the wireshark dump is not to big to download. > It was created with > tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap > > Thanks! > Robert Hi Robert, I tried the above wireshark dump URL, but it appears to point to an empty file. Cheers Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-05-13 21:13 ` Trond Myklebust @ 2010-05-14 5:42 ` Robert Wimmer [not found] ` <1273785234.22932.14.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 1 sibling, 0 replies; 19+ messages in thread From: Robert Wimmer @ 2010-05-14 5:42 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, I'm sorry. There was a Varnish in front of that webserver which doesn't like so big files ;-) Please try this url: http://213.252.12.34/2.6.34-rc5.cap.gz It work's for me. Thanks! Robert On 05/13/10 23:13, Trond Myklebust wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert >> > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1273785234.22932.14.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] ` <1273785234.22932.14.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2010-05-20 7:39 ` kernel [not found] ` <a133ef4ed022a00afd40b505719ae3d2-PAwl83ecUlHR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: kernel @ 2010-05-20 7:39 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, have you had some time to download the wireshark dump? Thanks! Robert On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert > > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <a133ef4ed022a00afd40b505719ae3d2-PAwl83ecUlHR7s880joybQ@public.gmane.org>]
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] ` <a133ef4ed022a00afd40b505719ae3d2-PAwl83ecUlHR7s880joybQ@public.gmane.org> @ 2010-05-25 20:01 ` Robert Wimmer 2010-06-02 11:56 ` kernel 0 siblings, 1 reply; 19+ messages in thread From: Robert Wimmer @ 2010-05-25 20:01 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, just a little reminder ;-) Thanks! Robert On 05/20/10 09:39, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote: > Hi Trond, > > have you had some time to download the wireshark dump? > > Thanks! > Robert > > On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust > <Trond.Myklebust@netapp.com> wrote: > >> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> >>> Finally I've had some time to do the next test. >>> Here is a wireshark dump (~750 MByte): >>> http://213.252.12.93/2.6.34-rc5.cap.gz >>> >>> dmesg output after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26371 >>> >>> stack trace before page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26369 >>> >>> stack trace after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26370 >>> >>> I hope the wireshark dump is not to big to download. >>> It was created with >>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >>> >>> Thanks! >>> Robert >>> >> Hi Robert, >> >> I tried the above wireshark dump URL, but it appears to point to an >> empty file. >> >> Cheers >> Trond >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-05-25 20:01 ` Robert Wimmer @ 2010-06-02 11:56 ` kernel 0 siblings, 0 replies; 19+ messages in thread From: kernel @ 2010-06-02 11:56 UTC (permalink / raw) To: Robert Wimmer Cc: Trond Myklebust, mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, currently it seems that the problem was fixed by accident... ;-) Since 2.6.34 is now in Gentoo portage I thought I should give it a try. Using my 2.6.35-r5 .config the 2.6.34 release is now working for 4 hours (instead of 5-10 minutes before). Hmmm... Hopefully it will run for some more hours and days now. Since I've definitely changed nothing besides the kernel it must have been fixed (hopefully) in one of the 2.6.34-rc's. If it's still running tomorrow I'll close the bug. Greetings Robert On Tue, 25 May 2010 22:01:54 +0200, Robert Wimmer <kernel@tauceti.net> wrote: > Hi Trond, > > just a little reminder ;-) > > Thanks! > Robert > > On 05/20/10 09:39, kernel@tauceti.net wrote: >> Hi Trond, >> >> have you had some time to download the wireshark dump? >> >> Thanks! >> Robert >> >> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust >> <Trond.Myklebust@netapp.com> wrote: >> >>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >>> >>>> Finally I've had some time to do the next test. >>>> Here is a wireshark dump (~750 MByte): >>>> http://213.252.12.93/2.6.34-rc5.cap.gz >>>> >>>> dmesg output after page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26371 >>>> >>>> stack trace before page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26369 >>>> >>>> stack trace after page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26370 >>>> >>>> I hope the wireshark dump is not to big to download. >>>> It was created with >>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >>>> >>>> Thanks! >>>> Robert >>>> >>> Hi Robert, >>> >>> I tried the above wireshark dump URL, but it appears to point to an >>> empty file. >>> >>> Cheers >>> Trond >>> ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-06-02 9:57 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4BC2E706.7010108@tauceti.net>
[not found] ` <20100412112330.GA16908@redhat.com>
[not found] ` <4BC32527.9090301@tauceti.net>
[not found] ` <20100412135223.GA17887@redhat.com>
[not found] ` <4BC43097.3060000@tauceti.net>
[not found] ` <4BCC52B9.8070200@tauceti.net>
[not found] ` <20100419131718.GB16918@redhat.com>
[not found] ` <dbf86fc1c370496138b3a74a3c74ec18@tauceti.net>
[not found] ` <20100421094249.GC30855@redhat.com>
[not found] ` <c638ec9fdee2954ec5a7a2bd405aa2ba@tauceti.net>
2010-04-22 10:03 ` [Bugme-new] [Bug 15709] New: swapper page allocation failure Michael S. Tsirkin
2010-04-23 5:26 ` Robert Wimmer
2010-04-25 9:18 ` Michael S. Tsirkin
2010-04-25 20:41 ` Robert Wimmer
2010-04-25 20:49 ` Michael S. Tsirkin
2010-04-26 12:15 ` Trond Myklebust
2010-04-26 20:25 ` Robert Wimmer
[not found] ` <4BD5F6C5.8080605-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-04-26 21:04 ` Trond Myklebust
2010-04-26 22:18 ` Robert Wimmer
2010-04-26 23:28 ` Trond Myklebust
2010-04-27 22:56 ` Robert Wimmer
[not found] ` <be8a0f012ebb2ae02522998591e6f1a5@tauceti.net>
[not found] ` <be8a0f012ebb2ae02522998591e6f1a5-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-06 21:19 ` Robert Wimmer
[not found] ` <4BE33259.3000609-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-06 21:30 ` Trond Myklebust
2010-05-13 21:08 ` Robert Wimmer
[not found] ` <4BEC6A5D.5070304-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-13 21:13 ` Trond Myklebust
2010-05-14 5:42 ` Robert Wimmer
[not found] ` <1273785234.22932.14.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-05-20 7:39 ` kernel
[not found] ` <a133ef4ed022a00afd40b505719ae3d2-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-25 20:01 ` Robert Wimmer
2010-06-02 11:56 ` kernel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).