public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Writeback Stalls
@ 2012-09-19 18:36 Markus Stockhausen
  2012-09-19 21:34 ` Dave Chinner
  2012-09-20  7:12 ` Emmanuel Florac
  0 siblings, 2 replies; 5+ messages in thread
From: Markus Stockhausen @ 2012-09-19 18:36 UTC (permalink / raw)
  To: xfs@oss.sgi.com

[-- Attachment #1: Type: text/plain, Size: 2326 bytes --]

Hello,

I'm not sure if you can help me with the following problem but I must start somewhere.

We have an small sized VMWare infrastructure. 8 hosts with round about 50 VMs. 
Their images are hosted on 2 Ubuntu 12.04 NFS storage servers with each of them
having a 14 disk RAID 6 array. On top of the array runs a single XFS filesystem with 
round about 10TB of disk space.

>From time to time we see stalls in the infrastructure. The machines become unresponsive
and hang for a few seconds. The controller was the first item to be under suspect. 
But after a lot examination we created a very artifical setup that shows the real reasons
for the stalls: Writeback handling. The parameters are:

- set NFS to async
- disable controller and disk writeback cache
- enable cgroup
- set dirty_background_bytes to some very high value (4GB)
- set dirty_bytes to some very high value (4GB)
- set dirty_expire_centisecs to to some very high value (60 secs)
- set blkio.throttle.write_bps_device to a very low value (2MB/sek)

Now generate some write load on a Windows VM. During the test we observe what
is happening on the storage and the VM. The result is:

- dirty pages are increasing
- writeback is constantly 0
- VM is working well

At some point writeback is kicking in and all of a sudden the VM stalls. During
this time the setup shows

- most of the dirty pages are transfered to writeback pages
- writeback is done at the above set limit (2MB/sek)
- VM is not responding 

After writeback has finished everything goes back to normal. Additional remark:
VMs DO NOT hang if I create heavy writes on the XFS into non-VM related files.

We are interested in this kind of setup for several reasons:

1. Keep VMs reponsive

2. Allow VMs to generate short spikes of write I/Os at a higher rate than the 
disk subsystem is capable of. 

3. Write this data back to the disk in the background over a longer period
of time. Ensure that a limited writeback rate keeps enough headroom so that
read I/Os are not delayed too much.

Can you explain me if there is some blocking going on in XFS in the time
between start and end of writeback? If yes what can I do to achieve my
goals and stop the writeback stalls. 

For you help I thank you in advance.

Markus










=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Writeback Stalls
  2012-09-19 18:36 Writeback Stalls Markus Stockhausen
@ 2012-09-19 21:34 ` Dave Chinner
  2012-09-20  4:33   ` Markus Stockhausen
  2012-09-20  7:12 ` Emmanuel Florac
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2012-09-19 21:34 UTC (permalink / raw)
  To: Markus Stockhausen; +Cc: xfs@oss.sgi.com

On Wed, Sep 19, 2012 at 06:36:11PM +0000, Markus Stockhausen wrote:
> Hello,
> 
> I'm not sure if you can help me with the following problem but I must start somewhere.
> 
> We have an small sized VMWare infrastructure. 8 hosts with round about 50 VMs. 
> Their images are hosted on 2 Ubuntu 12.04 NFS storage servers with each of them
> having a 14 disk RAID 6 array. On top of the array runs a single XFS filesystem with 
> round about 10TB of disk space.
> 
> From time to time we see stalls in the infrastructure. The machines become unresponsive
> and hang for a few seconds. The controller was the first item to be under suspect. 
> But after a lot examination we created a very artifical setup that shows the real reasons
> for the stalls: Writeback handling. The parameters are:
> 

So:

> - set NFS to async

Disable the NFS server-client writeback throttling control loop
(i.e. commits block until data is synced to disk). Also, data loss
on NFS server crash.

> - disable controller and disk writeback cache

IO is exceedingly slow.

> - enable cgroup
> - set dirty_background_bytes to some very high value (4GB)
> - set dirty_bytes to some very high value (4GB)
> - set dirty_expire_centisecs to to some very high value (60 secs)

Allow lots of dirty data in memory before writeback starts

> - set blkio.throttle.write_bps_device to a very low value (2MB/sek)

And throttle writeback to 2MB/s.

Gun. Foot. Point. Shoot.

> Now generate some write load on a Windows VM. During the test we observe what
> is happening on the storage and the VM. The result is:
> 
> - dirty pages are increasing
> - writeback is constantly 0
> - VM is working well

The VM is not "working well" with this configuration - it's building
up a large cache of dirty pages that you've limited to draining at a
very slow rate. IOWs, on an NFS server, having a writeback value of
0 is exactly the problem, and disabling the client-server throttling
feedback loop only makes it worse. You want writeback on the server
to start early, not delay it until you run out of memory.

> At some point writeback is kicking in and all of a sudden the VM stalls. During
> this time the setup shows
> 
> - most of the dirty pages are transfered to writeback pages
> - writeback is done at the above set limit (2MB/sek)
> - VM is not responding 

Because you ran out of clean pages and it takes forever to write
4GB of dirty data @ 2MB/s.

> After writeback has finished everything goes back to normal. Additional remark:
> VMs DO NOT hang if I create heavy writes on the XFS into non-VM related files.

Probably because they are not in the throttled cgroup that the NFS
daemons are in.

> We are interested in this kind of setup for several reasons:
> 
> 1. Keep VMs reponsive
> 
> 2. Allow VMs to generate short spikes of write I/Os at a higher rate than the 
> disk subsystem is capable of. 
> 
> 3. Write this data back to the disk in the background over a longer period
> of time. Ensure that a limited writeback rate keeps enough headroom so that
> read I/Os are not delayed too much.

Fundamentally, you are doing it all wrong. High throughput, low
latency NFS servers write dirty data to disk fast, not leave it
memory until you run out of clean memory because that causes
everything to block waiting for writeback IO completion to be ale to
free memory...

This really isn't an XFS problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Writeback Stalls
  2012-09-19 21:34 ` Dave Chinner
@ 2012-09-20  4:33   ` Markus Stockhausen
  2012-09-20  6:05     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Stockhausen @ 2012-09-20  4:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs@oss.sgi.com

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

> Fundamentally, you are doing it all wrong. High throughput, low
> latency NFS servers write dirty data to disk fast, not leave it
> memory until you run out of clean memory because that causes
> everything to block waiting for writeback IO completion to be ale to
> free memory...

Maybe I did not make it clear enough. The above setup is only
for demonstration purposes. To expose the problem better. In
real life we can see stalls tat thange from 0.5-1 seconds. Even
with all caches active, small dirty writeback settings and unlimited
bandwidth.

In between I found others complaining about the same problem:

http://lwn.net/Articles/486313/
http://oss.sgi.com/archives/xfs/2011-09/msg00189.html

So just one last question: Can I savely revert the the mentioned 
commit d76ee18a8551e33ad7dbd55cac38bc7b094f3abb if I only 
write data to a battery backed up hardware raid controller on a 
server that is attached to an UPS?

Thanks in advance.

Markus=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Writeback Stalls
  2012-09-20  4:33   ` Markus Stockhausen
@ 2012-09-20  6:05     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2012-09-20  6:05 UTC (permalink / raw)
  To: Markus Stockhausen; +Cc: xfs@oss.sgi.com

On Thu, Sep 20, 2012 at 04:33:09AM +0000, Markus Stockhausen wrote:
> > Fundamentally, you are doing it all wrong. High throughput, low
> > latency NFS servers write dirty data to disk fast, not leave it
> > memory until you run out of clean memory because that causes
> > everything to block waiting for writeback IO completion to be ale to
> > free memory...
> 
> Maybe I did not make it clear enough.

Plenty clear enough. Maybe I did not make it clear enough:

Nobody has the time to try to diagnose a problem on a configuration
that is obviously broken and pessimal for writeback behaviour. The
first step is to report your actual problem, not an artificial
behaviour you *think* demonstrates the same problem....

> The above setup is only
> for demonstration purposes. To expose the problem better. In
> real life we can see stalls tat thange from 0.5-1 seconds. Even
> with all caches active, small dirty writeback settings and unlimited
> bandwidth.

So describe the application, etc that you see this problem. Start
with:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

and we can go from there.

> In between I found others complaining about the same problem:
> 
> http://lwn.net/Articles/486313/
> http://oss.sgi.com/archives/xfs/2011-09/msg00189.html

How do you know they are the same problem?

Indeed, they aren't even writeback problems - they are application
IO latency issues caused by the introduction of stable pages during
writeback.

> So just one last question: Can I savely revert the the mentioned 
> commit d76ee18a8551e33ad7dbd55cac38bc7b094f3abb if I only 
> write data to a battery backed up hardware raid controller on a 
> server that is attached to an UPS?

NFS servers don't use mmap, so that patch is not causing your
writeback problems.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Writeback Stalls
  2012-09-19 18:36 Writeback Stalls Markus Stockhausen
  2012-09-19 21:34 ` Dave Chinner
@ 2012-09-20  7:12 ` Emmanuel Florac
  1 sibling, 0 replies; 5+ messages in thread
From: Emmanuel Florac @ 2012-09-20  7:12 UTC (permalink / raw)
  To: Markus Stockhausen; +Cc: xfs@oss.sgi.com

Le Wed, 19 Sep 2012 18:36:11 +0000 vous écriviez:

> We have an small sized VMWare infrastructure. 8 hosts with round
> about 50 VMs. Their images are hosted on 2 Ubuntu 12.04 NFS storage
> servers with each of them having a 14 disk RAID 6 array. On top of
> the array runs a single XFS filesystem with round about 10TB of disk
> space.

You didn't mention the type of disks and controllers you're using, it
makes a very important difference. Typically in VM environment RAID-6
on 7200RPM drives behaves quite poorly because of very poor random IO
write performance. 
You should also look at the iostat -mx output on the NFS server.
You'll probably see a 100% disk usage with lots of small read and write
IOs.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-09-20  7:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-19 18:36 Writeback Stalls Markus Stockhausen
2012-09-19 21:34 ` Dave Chinner
2012-09-20  4:33   ` Markus Stockhausen
2012-09-20  6:05     ` Dave Chinner
2012-09-20  7:12 ` Emmanuel Florac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox