public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* SUNRPC problem with 2.6.26 and beyond
@ 2008-10-22 15:35 Harry Edmon
  2008-10-22 22:51 ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Harry Edmon @ 2008-10-22 15:35 UTC (permalink / raw)
  To: linux-kernel

I have a dual quad-core Xeon system running software 
(http://www.unidata.ucar.edu/software/ldm) that relays and processes 
weather data through RPC calls, keeping a queue of data in a memory 
mapped file.  Up until 2.6.26 the system has run just fine (for example 
2.6.25.17).  But starting with 2.6.26 through 2.6.27.2 the system runs 
into a problem after approximately 24 hours.  The symptom is that the 
processing slows down to a crawl.  Using "top" I can see that the System 
time is up over 90%, with almost no User and Wait time.  If I stop and 
restart the software, most of the time it gets better - but sometimes it 
takes a reboot to fix the problem.  I have an identical system that does 
just processing and ingesting data from remote systems, and it does not 
have this problem.  I have tried a number of different kernel 
configurations, but they all show the same problem.

I suspect a problem with SUNRPC.  I notice that there were a large 
number of SUNRPC patches in 2.6.26.  I am looking for suggestions on how 
to pin down which patches are causing the problem.  Are there ways to 
figure where in the  kernel the time is being spent?  I am will to work 
on isolating the problem, but I need some suggestions on the best way to 
do it given the large number of SUNRPC patches in 2.6.26 and the fact 
that each experiment takes a day.
-- 

 Dr. Harry Edmon			E-MAIL: harry@atmos.washington.edu
 206-543-0547				harry@washington.edu
 Dept of Atmospheric Sciences		FAX:	206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SUNRPC problem with 2.6.26 and beyond
  2008-10-22 15:35 SUNRPC problem with 2.6.26 and beyond Harry Edmon
@ 2008-10-22 22:51 ` Trond Myklebust
  2008-10-22 22:54   ` Harry Edmon
  2008-10-22 22:55   ` SUNRPC problem with 2.6.26 and beyond - try again with response in correct place Harry Edmon
  0 siblings, 2 replies; 6+ messages in thread
From: Trond Myklebust @ 2008-10-22 22:51 UTC (permalink / raw)
  To: Harry Edmon; +Cc: linux-kernel

On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
> I have a dual quad-core Xeon system running software 
> (http://www.unidata.ucar.edu/software/ldm) that relays and processes 
> weather data through RPC calls, keeping a queue of data in a memory 
> mapped file.  Up until 2.6.26 the system has run just fine (for example 
> 2.6.25.17).  But starting with 2.6.26 through 2.6.27.2 the system runs 
> into a problem after approximately 24 hours.  The symptom is that the 
> processing slows down to a crawl.  Using "top" I can see that the System 
> time is up over 90%, with almost no User and Wait time.  If I stop and 
> restart the software, most of the time it gets better - but sometimes it 
> takes a reboot to fix the problem.  I have an identical system that does 
> just processing and ingesting data from remote systems, and it does not 
> have this problem.  I have tried a number of different kernel 
> configurations, but they all show the same problem.
> 
> I suspect a problem with SUNRPC.  I notice that there were a large 
> number of SUNRPC patches in 2.6.26.  I am looking for suggestions on how 
> to pin down which patches are causing the problem.  Are there ways to 
> figure where in the  kernel the time is being spent?  I am will to work 
> on isolating the problem, but I need some suggestions on the best way to 
> do it given the large number of SUNRPC patches in 2.6.26 and the fact 
> that each experiment takes a day.

The kernel sunrpc interface is not exported to user land: the glibc code
uses its own, entirely separate implementation of sunrpc.

I cannot therefore see, how your application's RPC calls can be affected
by kernel sunrpc changes.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SUNRPC problem with 2.6.26 and beyond
  2008-10-22 22:51 ` Trond Myklebust
@ 2008-10-22 22:54   ` Harry Edmon
  2008-10-22 22:55   ` SUNRPC problem with 2.6.26 and beyond - try again with response in correct place Harry Edmon
  1 sibling, 0 replies; 6+ messages in thread
From: Harry Edmon @ 2008-10-22 22:54 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

Then how do you explain the the large system time used with 2.6.26 and 
beyond?  Is it some other patch I should be looking at?

Trond Myklebust wrote:
> On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
>   
>> I have a dual quad-core Xeon system running software 
>> (http://www.unidata.ucar.edu/software/ldm) that relays and processes 
>> weather data through RPC calls, keeping a queue of data in a memory 
>> mapped file.  Up until 2.6.26 the system has run just fine (for example 
>> 2.6.25.17).  But starting with 2.6.26 through 2.6.27.2 the system runs 
>> into a problem after approximately 24 hours.  The symptom is that the 
>> processing slows down to a crawl.  Using "top" I can see that the System 
>> time is up over 90%, with almost no User and Wait time.  If I stop and 
>> restart the software, most of the time it gets better - but sometimes it 
>> takes a reboot to fix the problem.  I have an identical system that does 
>> just processing and ingesting data from remote systems, and it does not 
>> have this problem.  I have tried a number of different kernel 
>> configurations, but they all show the same problem.
>>
>> I suspect a problem with SUNRPC.  I notice that there were a large 
>> number of SUNRPC patches in 2.6.26.  I am looking for suggestions on how 
>> to pin down which patches are causing the problem.  Are there ways to 
>> figure where in the  kernel the time is being spent?  I am will to work 
>> on isolating the problem, but I need some suggestions on the best way to 
>> do it given the large number of SUNRPC patches in 2.6.26 and the fact 
>> that each experiment takes a day.
>>     
>
> The kernel sunrpc interface is not exported to user land: the glibc code
> uses its own, entirely separate implementation of sunrpc.
>
> I cannot therefore see, how your application's RPC calls can be affected
> by kernel sunrpc changes.
>
> Cheers
>   Trond
>
>   


-- 
 Dr. Harry Edmon			E-MAIL: harry@atmos.washington.edu
 206-543-0547				harry@washington.edu
 Dept of Atmospheric Sciences		FAX:	206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SUNRPC problem with 2.6.26 and beyond - try again with response in correct place.
  2008-10-22 22:51 ` Trond Myklebust
  2008-10-22 22:54   ` Harry Edmon
@ 2008-10-22 22:55   ` Harry Edmon
  2008-10-22 23:37     ` Trond Myklebust
  1 sibling, 1 reply; 6+ messages in thread
From: Harry Edmon @ 2008-10-22 22:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

Trond Myklebust wrote:
> On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
>   
>> I have a dual quad-core Xeon system running software 
>> (http://www.unidata.ucar.edu/software/ldm) that relays and processes 
>> weather data through RPC calls, keeping a queue of data in a memory 
>> mapped file.  Up until 2.6.26 the system has run just fine (for example 
>> 2.6.25.17).  But starting with 2.6.26 through 2.6.27.2 the system runs 
>> into a problem after approximately 24 hours.  The symptom is that the 
>> processing slows down to a crawl.  Using "top" I can see that the System 
>> time is up over 90%, with almost no User and Wait time.  If I stop and 
>> restart the software, most of the time it gets better - but sometimes it 
>> takes a reboot to fix the problem.  I have an identical system that does 
>> just processing and ingesting data from remote systems, and it does not 
>> have this problem.  I have tried a number of different kernel 
>> configurations, but they all show the same problem.
>>
>> I suspect a problem with SUNRPC.  I notice that there were a large 
>> number of SUNRPC patches in 2.6.26.  I am looking for suggestions on how 
>> to pin down which patches are causing the problem.  Are there ways to 
>> figure where in the  kernel the time is being spent?  I am will to work 
>> on isolating the problem, but I need some suggestions on the best way to 
>> do it given the large number of SUNRPC patches in 2.6.26 and the fact 
>> that each experiment takes a day.
>>     
>
> The kernel sunrpc interface is not exported to user land: the glibc code
> uses its own, entirely separate implementation of sunrpc.
>
> I cannot therefore see, how your application's RPC calls can be affected
> by kernel sunrpc changes.
>
> Cheers
>   Trond
>
>   
Then how do you explain the the large system time used with 2.6.26 and 
beyond?  Is it some other patch I should be looking at?
-- 

 Dr. Harry Edmon			E-MAIL: harry@atmos.washington.edu
 206-543-0547				harry@washington.edu
 Dept of Atmospheric Sciences		FAX:	206-543-0308
 University of Washington, Box 351640, Seattle, WA 98195-1640


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SUNRPC problem with 2.6.26 and beyond - try again with response in correct place.
  2008-10-22 22:55   ` SUNRPC problem with 2.6.26 and beyond - try again with response in correct place Harry Edmon
@ 2008-10-22 23:37     ` Trond Myklebust
  2008-10-23  1:27       ` Harry Edmon
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2008-10-22 23:37 UTC (permalink / raw)
  To: Harry Edmon; +Cc: linux-kernel

On Wed, 2008-10-22 at 15:55 -0700, Harry Edmon wrote:
> Trond Myklebust wrote:
> > On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
> >   
> >> I have a dual quad-core Xeon system running software 
> >> (http://www.unidata.ucar.edu/software/ldm) that relays and processes 
> >> weather data through RPC calls, keeping a queue of data in a memory 
> >> mapped file.  Up until 2.6.26 the system has run just fine (for example 
> >> 2.6.25.17).  But starting with 2.6.26 through 2.6.27.2 the system runs 
> >> into a problem after approximately 24 hours.  The symptom is that the 
> >> processing slows down to a crawl.  Using "top" I can see that the System 
> >> time is up over 90%, with almost no User and Wait time.  If I stop and 
> >> restart the software, most of the time it gets better - but sometimes it 
> >> takes a reboot to fix the problem.  I have an identical system that does 
> >> just processing and ingesting data from remote systems, and it does not 
> >> have this problem.  I have tried a number of different kernel 
> >> configurations, but they all show the same problem.
> >>
> >> I suspect a problem with SUNRPC.  I notice that there were a large 
> >> number of SUNRPC patches in 2.6.26.  I am looking for suggestions on how 
> >> to pin down which patches are causing the problem.  Are there ways to 
> >> figure where in the  kernel the time is being spent?  I am will to work 
> >> on isolating the problem, but I need some suggestions on the best way to 
> >> do it given the large number of SUNRPC patches in 2.6.26 and the fact 
> >> that each experiment takes a day.
> >>     
> >
> > The kernel sunrpc interface is not exported to user land: the glibc code
> > uses its own, entirely separate implementation of sunrpc.
> >
> > I cannot therefore see, how your application's RPC calls can be affected
> > by kernel sunrpc changes.
> >
> > Cheers
> >   Trond
> >
> >   
> Then how do you explain the the large system time used with 2.6.26 and 
> beyond?  Is it some other patch I should be looking at?

I'm not explaining it. I'm saying that nothing outside the kernel NFS
and NLM code uses the kernel sunrpc implementation. Your userland RPC
calls are using glibc's implementation of sunrpc. Those are unaffected
by patches to the kernel sunrpc layer.

If you are seeing a hang, then I suggest you start by using the strace
utility to figure out which system call is actually involved.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SUNRPC problem with 2.6.26 and beyond - try again with response in correct place.
  2008-10-22 23:37     ` Trond Myklebust
@ 2008-10-23  1:27       ` Harry Edmon
  0 siblings, 0 replies; 6+ messages in thread
From: Harry Edmon @ 2008-10-23  1:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

Trond Myklebust wrote:
> On Wed, 2008-10-22 at 15:55 -0700, Harry Edmon wrote:
>   
>> Trond Myklebust wrote:
>>     
>>> On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
>>>   
>>>       
>>>> I have a dual quad-core Xeon system running software 
>>>> (http://www.unidata.ucar.edu/software/ldm) that relays and processes 
>>>> weather data through RPC calls, keeping a queue of data in a memory 
>>>> mapped file.  Up until 2.6.26 the system has run just fine (for example 
>>>> 2.6.25.17).  But starting with 2.6.26 through 2.6.27.2 the system runs 
>>>> into a problem after approximately 24 hours.  The symptom is that the 
>>>> processing slows down to a crawl.  Using "top" I can see that the System 
>>>> time is up over 90%, with almost no User and Wait time.  If I stop and 
>>>> restart the software, most of the time it gets better - but sometimes it 
>>>> takes a reboot to fix the problem.  I have an identical system that does 
>>>> just processing and ingesting data from remote systems, and it does not 
>>>> have this problem.  I have tried a number of different kernel 
>>>> configurations, but they all show the same problem.
>>>>
>>>> I suspect a problem with SUNRPC.  I notice that there were a large 
>>>> number of SUNRPC patches in 2.6.26.  I am looking for suggestions on how 
>>>> to pin down which patches are causing the problem.  Are there ways to 
>>>> figure where in the  kernel the time is being spent?  I am will to work 
>>>> on isolating the problem, but I need some suggestions on the best way to 
>>>> do it given the large number of SUNRPC patches in 2.6.26 and the fact 
>>>> that each experiment takes a day.
>>>>     
>>>>         
>>> The kernel sunrpc interface is not exported to user land: the glibc code
>>> uses its own, entirely separate implementation of sunrpc.
>>>
>>> I cannot therefore see, how your application's RPC calls can be affected
>>> by kernel sunrpc changes.
>>>
>>> Cheers
>>>   Trond
>>>
>>>   
>>>       
>> Then how do you explain the the large system time used with 2.6.26 and 
>> beyond?  Is it some other patch I should be looking at?
>>     
>
> I'm not explaining it. I'm saying that nothing outside the kernel NFS
> and NLM code uses the kernel sunrpc implementation. Your userland RPC
> calls are using glibc's implementation of sunrpc. Those are unaffected
> by patches to the kernel sunrpc layer.
>
> If you are seeing a hang, then I suggest you start by using the strace
> utility to figure out which system call is actually involved.
>
> Cheers
>   Trond
>
>   
The problem is that it is not hanging.  The processes are running 
through a lot of systems calls.  It is just that the system time jumps 
up to over 95% on all 8 processors with 2.6.26 and beyond.  I never see 
that with 2.6.25.17.  I will try looking again and see if there are 
certain calls that are taking a lot of time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-10-23  1:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-22 15:35 SUNRPC problem with 2.6.26 and beyond Harry Edmon
2008-10-22 22:51 ` Trond Myklebust
2008-10-22 22:54   ` Harry Edmon
2008-10-22 22:55   ` SUNRPC problem with 2.6.26 and beyond - try again with response in correct place Harry Edmon
2008-10-22 23:37     ` Trond Myklebust
2008-10-23  1:27       ` Harry Edmon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox