Writeback Stalls

* Writeback Stalls
@ 2012-09-19 18:36 Markus Stockhausen
  2012-09-19 21:34 ` Dave Chinner
  2012-09-20  7:12 ` Emmanuel Florac
  0 siblings, 2 replies; 5+ messages in thread
From: Markus Stockhausen @ 2012-09-19 18:36 UTC (permalink / raw)
  To: xfs@oss.sgi.com

[-- Attachment #1: Type: text/plain, Size: 2326 bytes --]

Hello,

I'm not sure if you can help me with the following problem but I must start somewhere.

We have an small sized VMWare infrastructure. 8 hosts with round about 50 VMs. 
Their images are hosted on 2 Ubuntu 12.04 NFS storage servers with each of them
having a 14 disk RAID 6 array. On top of the array runs a single XFS filesystem with 
round about 10TB of disk space.

>From time to time we see stalls in the infrastructure. The machines become unresponsive
and hang for a few seconds. The controller was the first item to be under suspect. 
But after a lot examination we created a very artifical setup that shows the real reasons
for the stalls: Writeback handling. The parameters are:

- set NFS to async
- disable controller and disk writeback cache
- enable cgroup
- set dirty_background_bytes to some very high value (4GB)
- set dirty_bytes to some very high value (4GB)
- set dirty_expire_centisecs to to some very high value (60 secs)
- set blkio.throttle.write_bps_device to a very low value (2MB/sek)

Now generate some write load on a Windows VM. During the test we observe what
is happening on the storage and the VM. The result is:

- dirty pages are increasing
- writeback is constantly 0
- VM is working well

At some point writeback is kicking in and all of a sudden the VM stalls. During
this time the setup shows

- most of the dirty pages are transfered to writeback pages
- writeback is done at the above set limit (2MB/sek)
- VM is not responding 

After writeback has finished everything goes back to normal. Additional remark:
VMs DO NOT hang if I create heavy writes on the XFS into non-VM related files.

We are interested in this kind of setup for several reasons:

1. Keep VMs reponsive

2. Allow VMs to generate short spikes of write I/Os at a higher rate than the 
disk subsystem is capable of. 

3. Write this data back to the disk in the background over a longer period
of time. Ensure that a limited writeback rate keeps enough headroom so that
read I/Os are not delayed too much.

Can you explain me if there is some blocking going on in XFS in the time
between start and end of writeback? If yes what can I do to achieve my
goals and stop the writeback stalls. 

For you help I thank you in advance.

Markus

=

[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]

****************************************************************************
Diese E-Mail enthÃ¤lt vertrauliche und/oder rechtlich geschÃ¼tzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtÃ¼mlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Ãœber das Internet versandte E-Mails kÃ¶nnen unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche WillenserklÃ¤rung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 KÃ¶ln

Vorstand:
Kadir Akin
Dr. Michael HÃ¶hnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht KÃ¶ln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 KÃ¶ln

executive board:
Kadir Akin
Dr. Michael HÃ¶hnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497

****************************************************************************

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread