All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org>
Cc: mst@redhat.com, Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org,
	Rusty Russell <rusty@rustcorp.com.au>,
	Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
Date: Thu, 06 May 2010 17:30:38 -0400	[thread overview]
Message-ID: <1273181438.22155.26.camel@localhost.localdomain> (raw)
In-Reply-To: <4BE33259.3000609-PAwl83ecUlHR7s880joybQ@public.gmane.org>

Sorry. I've been caught up in work in the past few days.

I can certainly help with the soft lockup if you are able to supply
either a dump that includes all threads stuck in the NFS, or a (binary)
wireshark dump that shows the NFSv4 traffic between the client and
server around the time of the hang.

Cheers
  Trond

On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
> I don't know if someone is still interested in this
> but I think Trond isn't further interested because
> the last error was of cource a "page allocation
> failure" and not a "soft lookup" which Trond was
> trying to solve. But the patch was for 2.6.34 and
> the "soft lookup" comes up only with some 2.6.30 and
> maybe some 2.6.31 kernel versions. But the first error
> I reported was a "page allocation failure" which
> all kernels >= 2.6.32 produces with this configuration
> I use (NFSv4).
> 
> Michael suggested to first solve the "soft lookup"
> before further investigating the "page allocation
> failure". We know that the "soft lookup" only
> pop's up with NFSv4 and not v3. I really want to
> use v4 but since I'm not a kernel hacker someone
> must guide me what to try next.
> 
> I know that you're all have a lot of other work to
> do but if there're no ideas left what to do next
> it's maybe best to close the bug for now and I stay with
> kernel 2.6.30 for now or go back to NFS v3 if I
> upgrade to a newer kernel. Maybe the error will
> be fixed "by accident" in >= 2.6.35 ;-) 
> 
> Thanks!
> Robert
> 
> 
> 
> On 05/03/10 10:11, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote:
> > Anything we can do to investigate this further?
> >
> > Thanks!
> > Robert
> >
> >
> > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org>
> > wrote:
> >   
> >> I've applied the patch against the kernel which I got
> >> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> >>
> >> The stack trace after mounting NFS is here:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26166
> >> /var/log/messages after soft lockup:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26167
> >>
> >> I hope that there is any usefull information in there.
> >>
> >> Thanks!
> >> Robert
> >>
> >> On 04/27/10 01:28, Trond Myklebust wrote:
> >>     
> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
> >>>   
> >>>       
> >>>>> Sure. In addition to what you did above, please do
> >>>>>
> >>>>> mount -t debugfs none /sys/kernel/debug
> >>>>>
> >>>>> and then cat the contents of the pseudofile at
> >>>>>
> >>>>> /sys/kernel/debug/tracing/stack_trace
> >>>>>
> >>>>> Please do this more or less immediately after you've finished
> >>>>>           
> > mounting
> >   
> >>>>> the NFSv4 client.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> I've uploaded the stack trace. It was generated
> >>>> directly after mounting. Here are the stacks:
> >>>>
> >>>> After mounting:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
> >>>> After the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
> >>>> The dmesg output of the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
> >>>>
> >>>>     
> >>>>         
> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
> >>>>>           
> > it
> >   
> >>>>> use the 'refer' export option anywhere? If so, then we might have to
> >>>>> test further, since those may trigger the NFSv4 submount feature.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> The server has the following settings:
> >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>
> >>>>
> >>>>     
> >>>>         
> >>> That second trace is more than 5.5K deep, more than half of which is
> >>> socket overhead :-(((.
> >>>
> >>> The process stack does not appear to have overflowed, however that
> >>>       
> > trace
> >   
> >>> doesn't include any IRQ stack overhead.
> >>>
> >>> OK... So what happens if we get rid of half of that trace by forcing
> >>> asynchronous tasks such as this to run entirely in rpciod instead of
> >>> first trying to run in the process context?
> >>>
> >>> See the attachment...
> >>>
> >>>       
> 



WARNING: multiple messages have this Message-ID (diff)
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Robert Wimmer <kernel@tauceti.net>
Cc: mst@redhat.com, Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
	Rusty Russell <rusty@rustcorp.com.au>, Mel Gorman <mel@csn.ul.ie>,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
Date: Thu, 06 May 2010 17:30:38 -0400	[thread overview]
Message-ID: <1273181438.22155.26.camel@localhost.localdomain> (raw)
In-Reply-To: <4BE33259.3000609@tauceti.net>

Sorry. I've been caught up in work in the past few days.

I can certainly help with the soft lockup if you are able to supply
either a dump that includes all threads stuck in the NFS, or a (binary)
wireshark dump that shows the NFSv4 traffic between the client and
server around the time of the hang.

Cheers
  Trond

On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
> I don't know if someone is still interested in this
> but I think Trond isn't further interested because
> the last error was of cource a "page allocation
> failure" and not a "soft lookup" which Trond was
> trying to solve. But the patch was for 2.6.34 and
> the "soft lookup" comes up only with some 2.6.30 and
> maybe some 2.6.31 kernel versions. But the first error
> I reported was a "page allocation failure" which
> all kernels >= 2.6.32 produces with this configuration
> I use (NFSv4).
> 
> Michael suggested to first solve the "soft lookup"
> before further investigating the "page allocation
> failure". We know that the "soft lookup" only
> pop's up with NFSv4 and not v3. I really want to
> use v4 but since I'm not a kernel hacker someone
> must guide me what to try next.
> 
> I know that you're all have a lot of other work to
> do but if there're no ideas left what to do next
> it's maybe best to close the bug for now and I stay with
> kernel 2.6.30 for now or go back to NFS v3 if I
> upgrade to a newer kernel. Maybe the error will
> be fixed "by accident" in >= 2.6.35 ;-) 
> 
> Thanks!
> Robert
> 
> 
> 
> On 05/03/10 10:11, kernel@tauceti.net wrote:
> > Anything we can do to investigate this further?
> >
> > Thanks!
> > Robert
> >
> >
> > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
> > wrote:
> >   
> >> I've applied the patch against the kernel which I got
> >> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> >>
> >> The stack trace after mounting NFS is here:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26166
> >> /var/log/messages after soft lockup:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26167
> >>
> >> I hope that there is any usefull information in there.
> >>
> >> Thanks!
> >> Robert
> >>
> >> On 04/27/10 01:28, Trond Myklebust wrote:
> >>     
> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
> >>>   
> >>>       
> >>>>> Sure. In addition to what you did above, please do
> >>>>>
> >>>>> mount -t debugfs none /sys/kernel/debug
> >>>>>
> >>>>> and then cat the contents of the pseudofile at
> >>>>>
> >>>>> /sys/kernel/debug/tracing/stack_trace
> >>>>>
> >>>>> Please do this more or less immediately after you've finished
> >>>>>           
> > mounting
> >   
> >>>>> the NFSv4 client.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> I've uploaded the stack trace. It was generated
> >>>> directly after mounting. Here are the stacks:
> >>>>
> >>>> After mounting:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
> >>>> After the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
> >>>> The dmesg output of the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
> >>>>
> >>>>     
> >>>>         
> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
> >>>>>           
> > it
> >   
> >>>>> use the 'refer' export option anywhere? If so, then we might have to
> >>>>> test further, since those may trigger the NFSv4 submount feature.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> The server has the following settings:
> >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>
> >>>>
> >>>>     
> >>>>         
> >>> That second trace is more than 5.5K deep, more than half of which is
> >>> socket overhead :-(((.
> >>>
> >>> The process stack does not appear to have overflowed, however that
> >>>       
> > trace
> >   
> >>> doesn't include any IRQ stack overhead.
> >>>
> >>> OK... So what happens if we get rid of half of that trace by forcing
> >>> asynchronous tasks such as this to run entirely in rpciod instead of
> >>> first trying to run in the process context?
> >>>
> >>> See the attachment...
> >>>
> >>>       
> 



WARNING: multiple messages have this Message-ID (diff)
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Robert Wimmer <kernel@tauceti.net>
Cc: mst@redhat.com, Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
	Rusty Russell <rusty@rustcorp.com.au>, Mel Gorman <mel@csn.ul.ie>,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
Date: Thu, 06 May 2010 17:30:38 -0400	[thread overview]
Message-ID: <1273181438.22155.26.camel@localhost.localdomain> (raw)
In-Reply-To: <4BE33259.3000609@tauceti.net>

Sorry. I've been caught up in work in the past few days.

I can certainly help with the soft lockup if you are able to supply
either a dump that includes all threads stuck in the NFS, or a (binary)
wireshark dump that shows the NFSv4 traffic between the client and
server around the time of the hang.

Cheers
  Trond

On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
> I don't know if someone is still interested in this
> but I think Trond isn't further interested because
> the last error was of cource a "page allocation
> failure" and not a "soft lookup" which Trond was
> trying to solve. But the patch was for 2.6.34 and
> the "soft lookup" comes up only with some 2.6.30 and
> maybe some 2.6.31 kernel versions. But the first error
> I reported was a "page allocation failure" which
> all kernels >= 2.6.32 produces with this configuration
> I use (NFSv4).
> 
> Michael suggested to first solve the "soft lookup"
> before further investigating the "page allocation
> failure". We know that the "soft lookup" only
> pop's up with NFSv4 and not v3. I really want to
> use v4 but since I'm not a kernel hacker someone
> must guide me what to try next.
> 
> I know that you're all have a lot of other work to
> do but if there're no ideas left what to do next
> it's maybe best to close the bug for now and I stay with
> kernel 2.6.30 for now or go back to NFS v3 if I
> upgrade to a newer kernel. Maybe the error will
> be fixed "by accident" in >= 2.6.35 ;-) 
> 
> Thanks!
> Robert
> 
> 
> 
> On 05/03/10 10:11, kernel@tauceti.net wrote:
> > Anything we can do to investigate this further?
> >
> > Thanks!
> > Robert
> >
> >
> > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
> > wrote:
> >   
> >> I've applied the patch against the kernel which I got
> >> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> >>
> >> The stack trace after mounting NFS is here:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26166
> >> /var/log/messages after soft lockup:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26167
> >>
> >> I hope that there is any usefull information in there.
> >>
> >> Thanks!
> >> Robert
> >>
> >> On 04/27/10 01:28, Trond Myklebust wrote:
> >>     
> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
> >>>   
> >>>       
> >>>>> Sure. In addition to what you did above, please do
> >>>>>
> >>>>> mount -t debugfs none /sys/kernel/debug
> >>>>>
> >>>>> and then cat the contents of the pseudofile at
> >>>>>
> >>>>> /sys/kernel/debug/tracing/stack_trace
> >>>>>
> >>>>> Please do this more or less immediately after you've finished
> >>>>>           
> > mounting
> >   
> >>>>> the NFSv4 client.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> I've uploaded the stack trace. It was generated
> >>>> directly after mounting. Here are the stacks:
> >>>>
> >>>> After mounting:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
> >>>> After the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
> >>>> The dmesg output of the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
> >>>>
> >>>>     
> >>>>         
> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
> >>>>>           
> > it
> >   
> >>>>> use the 'refer' export option anywhere? If so, then we might have to
> >>>>> test further, since those may trigger the NFSv4 submount feature.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> The server has the following settings:
> >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>
> >>>>
> >>>>     
> >>>>         
> >>> That second trace is more than 5.5K deep, more than half of which is
> >>> socket overhead :-(((.
> >>>
> >>> The process stack does not appear to have overflowed, however that
> >>>       
> > trace
> >   
> >>> doesn't include any IRQ stack overhead.
> >>>
> >>> OK... So what happens if we get rid of half of that trace by forcing
> >>> asynchronous tasks such as this to run entirely in rpciod instead of
> >>> first trying to run in the process context?
> >>>
> >>> See the attachment...
> >>>
> >>>       
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-05-06 21:30 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-15709-10286@https.bugzilla.kernel.org/>
2010-04-08 19:34 ` [Bugme-new] [Bug 15709] New: swapper page allocation failure Andrew Morton
2010-04-08 19:39   ` Avi Kivity
2010-04-08 20:04     ` Michael S. Tsirkin
2010-04-09 10:15       ` Robert Wimmer
2010-04-11 11:03         ` Michael S. Tsirkin
2010-04-12  9:25           ` Robert Wimmer
2010-04-12 11:23             ` Michael S. Tsirkin
2010-04-12 13:50               ` Robert Wimmer
2010-04-12 13:52                 ` Michael S. Tsirkin
2010-04-13  8:51                   ` Robert Wimmer
2010-04-19 12:55                     ` Robert Wimmer
2010-04-19 13:17                       ` Michael S. Tsirkin
2010-04-21 11:23                         ` kernel
2010-04-21  9:42                           ` Michael S. Tsirkin
2010-04-22 11:31                             ` kernel
2010-04-22 10:03                               ` Michael S. Tsirkin
2010-04-22 10:03                                 ` Michael S. Tsirkin
2010-04-23  5:26                                 ` Robert Wimmer
2010-04-23  5:26                                   ` Robert Wimmer
2010-04-25  9:18                                   ` Michael S. Tsirkin
2010-04-25  9:18                                     ` Michael S. Tsirkin
2010-04-25 20:41                                     ` Robert Wimmer
2010-04-25 20:41                                       ` Robert Wimmer
2010-04-25 20:49                                       ` Michael S. Tsirkin
2010-04-25 20:49                                         ` Michael S. Tsirkin
2010-04-26 12:15                                         ` Trond Myklebust
2010-04-26 12:15                                           ` Trond Myklebust
2010-04-26 12:15                                           ` Trond Myklebust
2010-04-26 20:25                                           ` Robert Wimmer
2010-04-26 20:25                                             ` Robert Wimmer
     [not found]                                             ` <4BD5F6C5.8080605-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-04-26 21:04                                               ` Trond Myklebust
2010-04-26 21:04                                                 ` Trond Myklebust
2010-04-26 21:04                                                 ` Trond Myklebust
2010-04-26 22:18                                                 ` Robert Wimmer
2010-04-26 22:18                                                   ` Robert Wimmer
2010-04-26 23:28                                                   ` Trond Myklebust
2010-04-27 22:56                                                     ` Robert Wimmer
2010-04-27 22:56                                                       ` Robert Wimmer
2010-05-03  8:11                                                       ` kernel
2010-05-03  8:11                                                         ` kernel
     [not found]                                                         ` <be8a0f012ebb2ae02522998591e6f1a5-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-06 21:19                                                           ` Robert Wimmer
2010-05-06 21:19                                                             ` Robert Wimmer
2010-05-06 21:19                                                             ` Robert Wimmer
     [not found]                                                             ` <4BE33259.3000609-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-06 21:30                                                               ` Trond Myklebust [this message]
2010-05-06 21:30                                                                 ` Trond Myklebust
2010-05-06 21:30                                                                 ` Trond Myklebust
2010-05-13 21:08                                                                 ` Robert Wimmer
2010-05-13 21:08                                                                   ` Robert Wimmer
     [not found]                                                                   ` <4BEC6A5D.5070304-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-13 21:13                                                                     ` Trond Myklebust
2010-05-13 21:13                                                                       ` Trond Myklebust
2010-05-13 21:13                                                                       ` Trond Myklebust
2010-05-14  5:42                                                                       ` Robert Wimmer
2010-05-14  5:42                                                                         ` Robert Wimmer
2010-05-14  5:42                                                                         ` Robert Wimmer
     [not found]                                                                       ` <1273785234.22932.14.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-05-20  7:39                                                                         ` kernel
2010-05-20  7:39                                                                           ` kernel
2010-05-20  7:39                                                                           ` kernel
     [not found]                                                                           ` <a133ef4ed022a00afd40b505719ae3d2-PAwl83ecUlHR7s880joybQ@public.gmane.org>
2010-05-25 20:01                                                                             ` Robert Wimmer
2010-05-25 20:01                                                                               ` Robert Wimmer
2010-05-25 20:01                                                                               ` Robert Wimmer
2010-06-02 11:56                                                                               ` kernel
2010-06-02 11:56                                                                                 ` kernel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1273181438.22155.26.camel@localhost.localdomain \
    --to=trond.myklebust@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org \
    --cc=kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org \
    --cc=mst@redhat.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.