howto combat highly pathologic latencies on a server?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* howto combat highly pathologic latencies on a server?
@ 2010-03-10 17:17 Hans-Peter Jansen
  2010-03-10 18:15 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hans-Peter Jansen @ 2010-03-10 17:17 UTC (permalink / raw)
  To: linux-kernel

in a commercial setting, with all those evil elements at work like VMware, 
NFS, XFS, openSUSE, diskless fat clients, you name it...

System description:

Dual socket board: Tyan S2892, 2 * AMD Opteron 285 @ 2.6 GHz, 8 GB RAM, 
PRO/1000 MT Dual Port Server NIC, Areca ARC-1261 16 channel RAID 
controller, with 3 sets of RAID 5 arrays attached:
System is running from: 4 * WD Raptor 150GB (WDC WD1500ADFD-00NLR5)
VMware (XP-) images used via NFS: 6 * WD Raptor 74 GB (WDC WD740GD-00FLA0)
Homes, diskless clients, appl. data: 4 * Hitachi 1 GB (HDE721010SLA330).

All filesystems are xfs. The server serves about 20 diskless PC's, most use 
an Intel Pro/1000 GT NIC, all attached on a 3com 3870 48-port 10/100/1000 
switch.

OS is openSUSE 11.1/i586 with kernel 2.6.27.45 (the same kernel as SLE 11).

It serves mostly NFS, SMB, and does mild database (MySQL) and email 
processing (Cyrus IMAP, Postfix...). It also drives an ancient (but very 
important) terminal based transport order mgmt system, that often syncs 
it's data. Unfortunately, it's also used for running a VMware-Server 
(1.0.10) XP-client, that itself does simple database stuff (employers time 
registration).

Users generally describe this system as slow, although the load on the 
server is less than 1.5 most of the time. Interestingly, the former system, 
using ancient kernels (2.6.11, SuSE 9.3) was perceived significantly 
quicker (but not fast..).

The diskless clients are started once in the morning (taking 60-90 sec), use 
an aufs2 layered NFS mount for their openSUSE 11.1 system, and simple NFS 
mounted homes and shared folders. 2/3th also need running a VMware XP 
client (also NFS mounted). Their CPUs range from Athlon 64 3000+ up to 
Phenom X4 955, with 2 or 4 GB RAM.

While this system usually operates fine, it suffers from delays, that are 
displayed in latencytop as: "Writing page to disk:     8425,5 ms": 
ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec 
range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png, 
ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.

>From other observations, this issue "feels" like it is induced by single 
syncronisation points in the block layer, eg. if I create heavy IO load on 
one RAID array, say resizing a VMware disk image, it can take up to a 
minute to log in by ssh, although the ssh login does not touch this area at 
all (different RAID arrays). Note, that the latencytop snapshots above are 
made during normal operation, not this kind of load..

The network side looks fine, as its main interface rarely passes 40MiB/s, 
and usually keeps in the 1 Kib/s - 5 MiB/s range. 

The xfs filesystems are mounted with rw,noatime,attr2,nobarrier,noquota 
(yes, I do have a BBU on the areca, and disk write cache is effectively 
turned off). 

The clients mount their system:
/:ro/rw,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nointr,nolock,proto=tcp,
timeo=600,retrans=2,sec=sys,mountvers=3,mountproto=udp
/home: similar
/shared: without nolock

Might later kernels mitigate this problem? As this is a production system, 
that is used 6.5 days a week, I cannot do dangerous experiments, also 
switching to 64 bit is a problem due to the legacy stuff described above...
OTOH, my users suffer from this, and anything helping in this respect is 
highly appreciated.

Thanks in advance,
Pete

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-10 17:17 howto combat highly pathologic latencies on a server? Hans-Peter Jansen
@ 2010-03-10 18:15 ` Christoph Hellwig
  2010-03-11  0:15   ` Hans-Peter Jansen
  2010-03-10 23:29 ` Dave Chinner
  2010-03-10 23:44 ` David Rees
  2 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2010-03-10 18:15 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: linux-kernel

On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> While this system usually operates fine, it suffers from delays, that are 
> displayed in latencytop as: "Writing page to disk:     8425,5 ms": 
> ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec 
> range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png, 
> ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
> 
> >From other observations, this issue "feels" like it is induced by single 
> syncronisation points in the block layer, eg. if I create heavy IO load on 
> one RAID array, say resizing a VMware disk image, it can take up to a 
> minute to log in by ssh, although the ssh login does not touch this area at 
> all (different RAID arrays). Note, that the latencytop snapshots above are 
> made during normal operation, not this kind of load..

I had very similar issues on various systems (mostly using xfs, but some
with ext3, too) using kernels before ~ 2.6.30 when using the cfq I/O
scheduler.  Switching to noop fixed that for me, or upgrading to a
recent kernel where cfq behaves better again.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-10 17:17 howto combat highly pathologic latencies on a server? Hans-Peter Jansen
  2010-03-10 18:15 ` Christoph Hellwig
@ 2010-03-10 23:29 ` Dave Chinner
  2010-03-11  0:27   ` Hans-Peter Jansen
  2010-03-11 16:58   ` Hans-Peter Jansen
  2010-03-10 23:44 ` David Rees
  2 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2010-03-10 23:29 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: linux-kernel

On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> in a commercial setting, with all those evil elements at work like VMware, 
> NFS, XFS, openSUSE, diskless fat clients, you name it...
> 
> System description:
> 
> Dual socket board: Tyan S2892, 2 * AMD Opteron 285 @ 2.6 GHz, 8 GB RAM, 
> PRO/1000 MT Dual Port Server NIC, Areca ARC-1261 16 channel RAID 
> controller, with 3 sets of RAID 5 arrays attached:
> System is running from: 4 * WD Raptor 150GB (WDC WD1500ADFD-00NLR5)
> VMware (XP-) images used via NFS: 6 * WD Raptor 74 GB (WDC WD740GD-00FLA0)
> Homes, diskless clients, appl. data: 4 * Hitachi 1 GB (HDE721010SLA330).
> 
> All filesystems are xfs. The server serves about 20 diskless PC's, most use 
> an Intel Pro/1000 GT NIC, all attached on a 3com 3870 48-port 10/100/1000 
> switch.
> 
> OS is openSUSE 11.1/i586 with kernel 2.6.27.45 (the same kernel as SLE 11).
> 
> It serves mostly NFS, SMB, and does mild database (MySQL) and email 
> processing (Cyrus IMAP, Postfix...). It also drives an ancient (but very 
> important) terminal based transport order mgmt system, that often syncs 
> it's data. Unfortunately, it's also used for running a VMware-Server 
> (1.0.10) XP-client, that itself does simple database stuff (employers time 
> registration).
> 
> Users generally describe this system as slow, although the load on the 
> server is less than 1.5 most of the time. Interestingly, the former system, 
> using ancient kernels (2.6.11, SuSE 9.3) was perceived significantly 
> quicker (but not fast..).
> 
> The diskless clients are started once in the morning (taking 60-90 sec), use 
> an aufs2 layered NFS mount for their openSUSE 11.1 system, and simple NFS 
> mounted homes and shared folders. 2/3th also need running a VMware XP 
> client (also NFS mounted). Their CPUs range from Athlon 64 3000+ up to 
> Phenom X4 955, with 2 or 4 GB RAM.
> 
> While this system usually operates fine, it suffers from delays, that are 
> displayed in latencytop as: "Writing page to disk:     8425,5 ms": 
> ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec 
> range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png, 
> ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
> 
> From other observations, this issue "feels" like it is induced by single 
> syncronisation points in the block layer, eg. if I create heavy IO load on 
> one RAID array, say resizing a VMware disk image, it can take up to a 
> minute to log in by ssh, although the ssh login does not touch this area at 
> all (different RAID arrays). Note, that the latencytop snapshots above are 
> made during normal operation, not this kind of load..
> 
> The network side looks fine, as its main interface rarely passes 40MiB/s, 
> and usually keeps in the 1 Kib/s - 5 MiB/s range. 
> 
> The xfs filesystems are mounted with rw,noatime,attr2,nobarrier,noquota 
> (yes, I do have a BBU on the areca, and disk write cache is effectively 
> turned off). 

Make sure the filesystem has the "lazy-count=1" attribute set (use
xfs_info to check, xfs_admin to change). That will remove the
superblock from most transactions and significant reduce latency of
transactions as they serialise while locking it...

Cheers,

Dave
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-10 17:17 howto combat highly pathologic latencies on a server? Hans-Peter Jansen
  2010-03-10 18:15 ` Christoph Hellwig
  2010-03-10 23:29 ` Dave Chinner
@ 2010-03-10 23:44 ` David Rees
  2010-03-11  1:20   ` Hans-Peter Jansen
  2 siblings, 1 reply; 10+ messages in thread
From: David Rees @ 2010-03-10 23:44 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: linux-kernel

On Wed, Mar 10, 2010 at 9:17 AM, Hans-Peter Jansen <hpj@urpla.net> wrote:
> While this system usually operates fine, it suffers from delays, that are
> displayed in latencytop as: "Writing page to disk:     8425,5 ms":
> ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec
> range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png,
> ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
>
> From other observations, this issue "feels" like it is induced by single
> syncronisation points in the block layer, eg. if I create heavy IO load on
> one RAID array, say resizing a VMware disk image, it can take up to a
> minute to log in by ssh, although the ssh login does not touch this area at
> all (different RAID arrays). Note, that the latencytop snapshots above are
> made during normal operation, not this kind of load..
>
> Might later kernels mitigate this problem? As this is a production system,
> that is used 6.5 days a week, I cannot do dangerous experiments, also
> switching to 64 bit is a problem due to the legacy stuff described above...
> OTOH, my users suffer from this, and anything helping in this respect is
> highly appreciated.

Seems like a 2.6.32 based kernel which has per-BDI writeback and "CFQ
low latency mode" changes might help a good deal.  I know that on one
of my bigger machines (similar in specs to yours) which has a lot of
processes which do a decent amount of IO, latency and load average has
gone down after going to a 2.6.32 kernel from a 2.6.31 kernel (Fedora
11 system).

Like Chris suggested, I've also heard that using the noop IO scheduler
can work well on Areca controllers on some kernels and workloads.
It's worth a shot and you can even try changing it at run-time.

-Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-10 18:15 ` Christoph Hellwig
@ 2010-03-11  0:15   ` Hans-Peter Jansen
  2010-03-16 14:54     ` Hans-Peter Jansen
  0 siblings, 1 reply; 10+ messages in thread
From: Hans-Peter Jansen @ 2010-03-11  0:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel

On Wednesday 10 March 2010, 19:15:48 Christoph Hellwig wrote:
> On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> > While this system usually operates fine, it suffers from delays, that
> > are displayed in latencytop as: "Writing page to disk:     8425,5 ms":
> > ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec
> > range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png,
> > ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
> >
> > >From other observations, this issue "feels" like it is induced by
> > > single
> >
> > syncronisation points in the block layer, eg. if I create heavy IO load
> > on one RAID array, say resizing a VMware disk image, it can take up to
> > a minute to log in by ssh, although the ssh login does not touch this
> > area at all (different RAID arrays). Note, that the latencytop
> > snapshots above are made during normal operation, not this kind of
> > load..
>
> I had very similar issues on various systems (mostly using xfs, but some
> with ext3, too) using kernels before ~ 2.6.30 when using the cfq I/O
> scheduler.  Switching to noop fixed that for me, or upgrading to a
> recent kernel where cfq behaves better again.

Christoph, thanks for this valuable suggestion: I've changed it to noop 
right away, and also:

vm.dirty_ratio = 20
vm.dirty_background_ratio = 1

since the defaults of 40 and 10 seem to also not fit my needs. Even the 20 
might be still oversized with 8GB total mem. 

Thanks,
Pete

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-10 23:29 ` Dave Chinner
@ 2010-03-11  0:27   ` Hans-Peter Jansen
  2010-03-11 16:58   ` Hans-Peter Jansen
  1 sibling, 0 replies; 10+ messages in thread
From: Hans-Peter Jansen @ 2010-03-11  0:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel

On Thursday 11 March 2010, 00:29:40 Dave Chinner wrote:
> On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> >
> > The xfs filesystems are mounted with rw,noatime,attr2,nobarrier,noquota
> > (yes, I do have a BBU on the areca, and disk write cache is effectively
> > turned off).
>
> Make sure the filesystem has the "lazy-count=1" attribute set (use
> xfs_info to check, xfs_admin to change). That will remove the
> superblock from most transactions and significant reduce latency of
> transactions as they serialise while locking it...

Dave, this modification sounds promising. Will do them during the weekend.
Also Christoph mentioned some pending patches for fdatasync and NFS metadata 
updates in his XFS status report from February, that sounded _really_ 
exciting. 

Happily awaiting these bits in the stable universe ;-)

Thanks,
Pete

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-10 23:44 ` David Rees
@ 2010-03-11  1:20   ` Hans-Peter Jansen
  0 siblings, 0 replies; 10+ messages in thread
From: Hans-Peter Jansen @ 2010-03-11  1:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Rees

On Thursday 11 March 2010, 00:44:54 David Rees wrote:
> On Wed, Mar 10, 2010 at 9:17 AM, Hans-Peter Jansen <hpj@urpla.net> wrote:
> > While this system usually operates fine, it suffers from delays, that
> > are displayed in latencytop as: "Writing page to disk:     8425,5 ms":
> > ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec
> > range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png,
> > ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
> >
> > From other observations, this issue "feels" like it is induced by
> > single syncronisation points in the block layer, eg. if I create heavy
> > IO load on one RAID array, say resizing a VMware disk image, it can
> > take up to a minute to log in by ssh, although the ssh login does not
> > touch this area at all (different RAID arrays). Note, that the
> > latencytop snapshots above are made during normal operation, not this
> > kind of load..
> >
> > Might later kernels mitigate this problem? As this is a production
> > system, that is used 6.5 days a week, I cannot do dangerous
> > experiments, also switching to 64 bit is a problem due to the legacy
> > stuff described above... OTOH, my users suffer from this, and anything
> > helping in this respect is highly appreciated.
>
> Seems like a 2.6.32 based kernel which has per-BDI writeback and "CFQ
> low latency mode" changes might help a good deal.  I know that on one
> of my bigger machines (similar in specs to yours) which has a lot of
> processes which do a decent amount of IO, latency and load average has
> gone down after going to a 2.6.32 kernel from a 2.6.31 kernel (Fedora
> 11 system).
>
> Like Chris suggested, I've also heard that using the noop IO scheduler
> can work well on Areca controllers on some kernels and workloads.
> It's worth a shot and you can even try changing it at run-time.

Yes, already done. Hopefully my users will notice.. As I've upgraded this 
server and the clients only two weeks ago, calming things down has highest 
priority.

Switching kernel versions in production systems is always painful, thus I  
try to avoid that, but this time I already needed to roll my own kernel for 
the clients due to some aufs2 vs. apparmor disharmony. That led to the loss 
of the latter - I can live without apparmor, but certainly not without a 
reliable layered filesystem¹.
 
Anyway, thanks for your suggestion and confirmation, David. It is 
appreciated.

Cheers,
Pete

¹) In a way, this is my primary justification to also use Linux on the 
desktops²! Install one, and get the rest (nearly) free.. 
http://download.opensuse.org/repositories/home:/frispete:/aufs2 and below..
²) Don't tell anybody, that I don't like the other OS ;-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-10 23:29 ` Dave Chinner
  2010-03-11  0:27   ` Hans-Peter Jansen
@ 2010-03-11 16:58   ` Hans-Peter Jansen
  2010-03-13 13:16     ` Dave Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Hans-Peter Jansen @ 2010-03-11 16:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dave Chinner

On Thursday 11 March 2010, 00:29:40 Dave Chinner wrote:
> On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> >
> > The xfs filesystems are mounted with rw,noatime,attr2,nobarrier,noquota
> > (yes, I do have a BBU on the areca, and disk write cache is effectively
> > turned off).
>
> Make sure the filesystem has the "lazy-count=1" attribute set (use
> xfs_info to check, xfs_admin to change). That will remove the
> superblock from most transactions and significant reduce latency of
> transactions as they serialise while locking it...

Done that now on my local test system, but on one of its filesystems, 
xfs_admin -c1 didn't succeed, it simply stopped (waiting for a futex):

Famous last syscall:
6750  futex(0x868330c8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>

Consequently, xfs_repair behaved similar, hanging in phase 6, traversing 
filesystem... I have a huge strace from this run, if someone is interested.

It's an 3 TB Raid 5 array (4 * 1 TB hd) with one FS also driven by areca:

meta-data=/dev/sdb1              isize=256    agcount=4, agsize=183105406 
blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=732421623, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Luckily, xfs_repair -P finally did succeed. Phuah.. 

This is with: xfs_repair version 2.10.1.

After calling xfs_admin -c1, all filesystems showed differences in 
superblock features (from a xfs_repair -n run). Is xfs_repair mandatory, or 
does the initial mount fix this automatically? 

Thanks,
Pete

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-11 16:58   ` Hans-Peter Jansen
@ 2010-03-13 13:16     ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2010-03-13 13:16 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: linux-kernel

On Thu, Mar 11, 2010 at 05:58:49PM +0100, Hans-Peter Jansen wrote:
> On Thursday 11 March 2010, 00:29:40 Dave Chinner wrote:
> > On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> > >
> > > The xfs filesystems are mounted with rw,noatime,attr2,nobarrier,noquota
> > > (yes, I do have a BBU on the areca, and disk write cache is effectively
> > > turned off).
> >
> > Make sure the filesystem has the "lazy-count=1" attribute set (use
> > xfs_info to check, xfs_admin to change). That will remove the
> > superblock from most transactions and significant reduce latency of
> > transactions as they serialise while locking it...
> 
> Done that now on my local test system, but on one of its filesystems, 
> xfs_admin -c1 didn't succeed, it simply stopped (waiting for a futex):
> 
> Famous last syscall:
> 6750  futex(0x868330c8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
> 
> Consequently, xfs_repair behaved similar, hanging in phase 6, traversing 
> filesystem... I have a huge strace from this run, if someone is interested.
> 
> It's an 3 TB Raid 5 array (4 * 1 TB hd) with one FS also driven by areca:
> 
> meta-data=/dev/sdb1              isize=256    agcount=4, agsize=183105406 
> blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=732421623, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=32768, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Luckily, xfs_repair -P finally did succeed. Phuah.. 
> 
> This is with: xfs_repair version 2.10.1.
> 
> After calling xfs_admin -c1, all filesystems showed differences in 
> superblock features (from a xfs_repair -n run). Is xfs_repair mandatory, or 
> does the initial mount fix this automatically? 

Mandatory - there are extra fields in the AGF headers that track
free space btree block usage (the tree itself) that need to be
calculated correctly. This allows the block usage in the filesystem
to be tracked from the AGFs rather than the superblock, hence
removing the single poにnt of contention in the allocation path...

xfs_repair does this calculation for us - putting that code into the
kernel to avoid running repair is a lot of work for a relatively
rare operation....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: howto combat highly pathologic latencies on a server?
  2010-03-11  0:15   ` Hans-Peter Jansen
@ 2010-03-16 14:54     ` Hans-Peter Jansen
  0 siblings, 0 replies; 10+ messages in thread
From: Hans-Peter Jansen @ 2010-03-16 14:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christoph Hellwig

On Thursday 11 March 2010, 01:15:14 Hans-Peter Jansen wrote:
> On Wednesday 10 March 2010, 19:15:48 Christoph Hellwig wrote:
> > On Wed, Mar 10, 2010 at 06:17:42PM +0100, Hans-Peter Jansen wrote:
> > > While this system usually operates fine, it suffers from delays, that
> > > are displayed in latencytop as: "Writing page to disk:     8425,5
> > > ms": ftp://urpla.net/lat-8.4sec.png, but we see them also in the
> > > 1.7-4.8 sec range: ftp://urpla.net/lat-1.7sec.png,
> > > ftp://urpla.net/lat-2.9sec.png, ftp://urpla.net/lat-4.6sec.png and
> > > ftp://urpla.net/lat-4.8sec.png.
> > >
> > > >From other observations, this issue "feels" like it is induced by
> > > > single
> > >
> > > syncronisation points in the block layer, eg. if I create heavy IO
> > > load on one RAID array, say resizing a VMware disk image, it can take
> > > up to a minute to log in by ssh, although the ssh login does not
> > > touch this area at all (different RAID arrays). Note, that the
> > > latencytop snapshots above are made during normal operation, not this
> > > kind of load..
> >
> > I had very similar issues on various systems (mostly using xfs, but
> > some with ext3, too) using kernels before ~ 2.6.30 when using the cfq
> > I/O scheduler.  Switching to noop fixed that for me, or upgrading to a
> > recent kernel where cfq behaves better again.
>
> Christoph, thanks for this valuable suggestion: I've changed it to noop
> right away, and also:
>
> vm.dirty_ratio = 20
> vm.dirty_background_ratio = 1
>
> since the defaults of 40 and 10 seem to also not fit my needs. Even the
> 20 might be still oversized with 8GB total mem.

That was an bad idea. I've reverted the vm tweaks, as it turned things even 
worser.

After switching to noop and activating lazy count on all filesystems, the 
pathologic behavior with running io hooks seems to be relieved, but the 
latency due to VMware-Server persists: 

Cause                                                Maximum     Percentage
Writing a page to disk                            435.8 msec          9.9 %
Writing buffer to disk (synchronous)              295.3 msec          1.6 %
Scheduler: waiting for cpu                         80.1 msec         11.7 %
Reading from a pipe                                 9.3 msec          0.0 %
Waiting for event (poll)                            5.0 msec         76.2 %
Waiting for event (select)                          4.8 msec          0.4 %
Waiting for event (epoll)                           4.7 msec          0.0 %
Truncating file                                     4.3 msec          0.0 %
Userspace lock contention                           3.3 msec          0.0 %

Process vmware-vmx (7907)                  Total: 7635.8 msec                                                           
Writing a page to disk                            435.8 msec         43.8 %
Scheduler: waiting for cpu                          9.1 msec         52.7 %
Waiting for event (poll)                            5.0 msec          3.5 %
[HostIF_SemaphoreWait]                              0.2 msec          0.0 %

Although, I set writeThrough to "FALSE" on that VM, and it is operating on a 
monolithic flat 24 GB "drive" file, it's not allowed to swap, and it's 
itself only lightly used, it always writes (? whatever) synchronously and 
trashes the latency of the whole system. (It's nearly always the one, that 
latencytop shows, with combined latencies ranging from one to eigth secs.)

I would love to migrate that stuff to a saner VM technology (e.g. kvm), but 
unfortunately, the Opteron 285 cpus are socket 940 based, and thus not 
supported by any current para-virtualisation. Correct me, if I'm wrong, 
please.

This VMware Server 1.0.* stuff is also getting in the way on trying to 
upgrade to a newer kernel. The only way getting up the kernel stairs might 
be VMware Server 2, but without serious indications, that it works way 
better, I won't take that route. Hints welcome.

Upgrading the hardware combined with using ssd drives seems the only really 
feasible approach, but given the economic preassure in the transport 
industry, that's currently not possible, either. 

Anyway thanks for your suggestions,
Pete

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-03-16 14:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-10 17:17 howto combat highly pathologic latencies on a server? Hans-Peter Jansen
2010-03-10 18:15 ` Christoph Hellwig
2010-03-11  0:15   ` Hans-Peter Jansen
2010-03-16 14:54     ` Hans-Peter Jansen
2010-03-10 23:29 ` Dave Chinner
2010-03-11  0:27   ` Hans-Peter Jansen
2010-03-11 16:58   ` Hans-Peter Jansen
2010-03-13 13:16     ` Dave Chinner
2010-03-10 23:44 ` David Rees
2010-03-11  1:20   ` Hans-Peter Jansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox