public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [kernel.org users] cannot ssh to master.kernel.org
       [not found] <871v8bgpgb.fsf@deeprootsystems.com>
@ 2010-09-30 23:21 ` J.H.
  2010-10-01  1:24   ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: J.H. @ 2010-09-30 23:21 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: users, linux-kernel

Hey everyone,

So this morning we found 'master' basically spinning off the hook, with
a consistently high load [READ: around 160ish].  Needless to say, mail
was getting differed due to the load and I'm sure ssh was having similar
issues.

Several things were noticed on the system:

- Several cron jobs had spun up multiple copies of themselves, despite
locking being present in the jobs.  I'm not sure if they were attempting
to check the existing locks or if they were actually running through but
they were present.

- The loads were relatively high, but not seemingly because of disk i/o.
 There were a great number of processes in the run state.

- Attempting to explicitly kill may of the processes left the processes
in a zombie state, but were still consuming CPU resources.  Attempting
to kill them again did not result in the death of the processes, or the
relinquishing of the cpu resources.  Attempting to strace the process
yielded nothing of interest / use.  lsof on the zombie process did
return it's currently open file handles, including tcp connections.

- Disks all seemed to be readable and writeable.

- a sysrq+l dump of the system in this messed up state can be found at
http://pastebin.osuosl.org/35126 (this was originally requested by
Johannes Weiner)

- Perf was available in the kernel and userspace, however attempting to
run 'perf top' resulted in a stalled process sitting, seemingly, forever
in D+ state. (originally requested by Thomas Gleixner)

Considering that at one point running zombies that were eating cpu were
outnumbering the still running processes, the inability to get the loads
below 120 and the general mess of the machine, we finally bounced the
machine and let everything come back up.

One additional note, not necessarily related to the mess today, but
stuff we've been noticing.

- kswapd0 has been using a lot of cpu time.  I wouldn't be concerned
about this if it was say, 10% of a cpu, or maybe even 20% of a cpu for a
short time.  It has however been running in some cases at 100% cpu on a
single core for hours on end.  This seems to happen, in particular,
under slightly higher loads particularly relating to when there are a
number of rsyncs going on simultaneously.

Replicating the rsyncs on a nearly identical box, running an older
2.6.30 kernel, did not see this much cpu usage of kswapd0.  Johannes
Weiner was helping me look into it yesterday, but I don't think anything
was explicitly conclusive.

Anyway, thought I'd let everyone know what happened with the unexpected
outage this morning.  Things seem to have settled somewhat and the
machine is up.

- John 'Warthog9' Hawley

On 09/30/2010 09:34 AM, Kevin Hilman wrote:
> As of this morning, I can no longer ssh to master.kernel.org to push git
> trees.
> 
> Anyone else having ssh problems?
> 
> Kevin
> 
> _______________________________________________
> Users mailing list
> Users@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [kernel.org users] cannot ssh to master.kernel.org
  2010-09-30 23:21 ` [kernel.org users] cannot ssh to master.kernel.org J.H.
@ 2010-10-01  1:24   ` Dave Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2010-10-01  1:24 UTC (permalink / raw)
  To: J.H.; +Cc: Kevin Hilman, users, linux-kernel

On Thu, Sep 30, 2010 at 04:21:05PM -0700, J.H. wrote:
> Hey everyone,
> 
> So this morning we found 'master' basically spinning off the hook, with
> a consistently high load [READ: around 160ish].  Needless to say, mail
> was getting differed due to the load and I'm sure ssh was having similar
> issues.
> 
> Several things were noticed on the system:
> 
> - Several cron jobs had spun up multiple copies of themselves, despite
> locking being present in the jobs.  I'm not sure if they were attempting
> to check the existing locks or if they were actually running through but
> they were present.
> 
> - The loads were relatively high, but not seemingly because of disk i/o.
>  There were a great number of processes in the run state.
> 
> - Attempting to explicitly kill may of the processes left the processes
> in a zombie state, but were still consuming CPU resources.  Attempting
> to kill them again did not result in the death of the processes, or the
> relinquishing of the cpu resources.  Attempting to strace the process
> yielded nothing of interest / use.  lsof on the zombie process did
> return it's currently open file handles, including tcp connections.
> 
> - Disks all seemed to be readable and writeable.
> 
> - a sysrq+l dump of the system in this messed up state can be found at
> http://pastebin.osuosl.org/35126 (this was originally requested by
> Johannes Weiner)

I'd say the machine had a severe low memory condition, followed by
severe lock contention in the XFS inode cache shrinker caused by
direct reclaim running in every process trying to allocate memory
and all trying to shrink the same cache.

That is, every CPU has traces like:

Sep 30 18:31:00 hera klogd:  [<ffffffffa0196d14>] spin_lock+0xe/0x10 [xfs]
Sep 30 18:31:00 hera klogd:  [<ffffffffa0198718>] xfs_perag_get+0x2b/0x9d [xfs]
Sep 30 18:31:00 hera klogd:  [<ffffffffa01add78>] xfs_inode_ag_iterator+0x50/0xd8 [xfs]
Sep 30 18:31:00 hera klogd:  [<ffffffffa01ad176>] ?  xfs_reclaim_inode+0x0/0x206 [xfs]
Sep 30 18:31:00 hera klogd:  [<ffffffffa01ade66>] xfs_reclaim_inode_shrink+0x66/0x12a [xfs]
Sep 30 18:31:00 hera klogd:  [<ffffffff810d6d45>] shrink_slab+0xdb/0x15f
Sep 30 18:31:00 hera klogd:  [<ffffffff810d7a99>] do_try_to_free_pages+0x21a/0x342
Sep 30 18:31:00 hera klogd:  [<ffffffff810d7d72>] try_to_free_pages+0x6f/0x71
Sep 30 18:31:00 hera klogd:  [<ffffffff810d5543>] ? isolate_pages_global+0x0/0x22a
Sep 30 18:31:00 hera klogd:  [<ffffffff810d1685>] __alloc_pages_nodemask+0x3eb/0x636
Sep 30 18:31:00 hera klogd:  [<ffffffff810f83e7>] alloc_pages_current+0x95/0x9e
Sep 30 18:31:00 hera klogd:  [<ffffffff810ffe2b>] alloc_slab_page+0x1b/0x28
Sep 30 18:31:00 hera klogd:  [<ffffffff810ffe8c>] new_slab+0x54/0x1d4
Sep 30 18:31:00 hera klogd:  [<ffffffff811004a3>] __slab_alloc+0x1f3/0x3a7

> - kswapd0 has been using a lot of cpu time.  I wouldn't be concerned
> about this if it was say, 10% of a cpu, or maybe even 20% of a cpu for a
> short time.  It has however been running in some cases at 100% cpu on a
> single core for hours on end.  This seems to happen, in particular,
> under slightly higher loads particularly relating to when there are a
> number of rsyncs going on simultaneously.

Yes, this is definitely related to the hang. The rsyncs are
generating xfs inode cache pressure, and the kswapd is running the
shrinker to clean it up.  master.kernel.org is running
2.6.34.7-56.fc11.x86_64, which has this commit:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.34.y.git;a=commit;h=9bf729c0af67897ea8498ce17c29b0683f7f2028

(xfs: add a shrinker to background inode reclaim)

This was necessary to avoid a far worse OOM panic, but until about a
month ago nobody had any idea there were major lock contention
problems with this shrinker.

The fundamental problem here is not the XFS shrinker - it is the
fact that direct reclaim allows shrinkers that act on global caches
to be called concurrently on every CPU at once. Hence situations
like this can occur where parallel access to a global lock/list is
the limiting factor.

However, it's only been the in the last month that I've really
managed to get the lock contention problem sorted out (because I
found a reproducable test case) and the shrinkers actually scaling.
The series they are in are:

http://git.kernel.org/?p=linux/kernel/git/dgc/xfsdev.git;a=shortlog;h=refs/heads/metadata-scale

which is queued up in the XFS tree for merging in the .37 merge window.
The specific commits involved are:

6c97772 xfs: serialise inode reclaim within an AG
e1a48db xfs: batch inode reclaim lookup
c727163 xfs: implement batched inode lookups for AG walking
7227905 xfs: split out inode walk inode grabbing
fa78a91 xfs: split inode AG walking into separate code for reclaim
ec9cb17 xfs: lockless per-ag lookups

This is a major rework of the entire inode cache walking code, it
converts AG lookups to be lockless (RCU), it batches inode lookups
to reduce lock traffic and hold times, it reworks the "can we
reclaim inode" logic to avoid all locking until we are mostly
certain it can be reclaimed, etc.

I'm not about to ask anyone to push these changes into .36 because
they need wider testing first, so it's going to be a bit of time
before stable kernel backports are even possible.

> Replicating the rsyncs on a nearly identical box, running an older
> 2.6.30 kernel, did not see this much cpu usage of kswapd0.  Johannes
> Weiner was helping me look into it yesterday, but I don't think anything
> was explicitly conclusive.

Yes, these issues started appearing around 2.6.34 when the XFS inode
cache reclaim was pushed into a background thread for scalability
reasons.  I'd suggest that upgrade the kernel to the latest 2.6.35.y
kernel would be a good idea....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-10-01  1:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <871v8bgpgb.fsf@deeprootsystems.com>
2010-09-30 23:21 ` [kernel.org users] cannot ssh to master.kernel.org J.H.
2010-10-01  1:24   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox