From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52736) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V9wcE-0005FH-F2 for qemu-devel@nongnu.org; Thu, 15 Aug 2013 08:25:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V9wc6-0003eQ-DM for qemu-devel@nongnu.org; Thu, 15 Aug 2013 08:25:22 -0400 Received: from mailin.vu.nl ([130.37.164.17]:6527) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V9wc6-0003e3-2B for qemu-devel@nongnu.org; Thu, 15 Aug 2013 08:25:14 -0400 Message-ID: <520CC8A4.4090405@cs.vu.nl> Date: Thu, 15 Aug 2013 14:25:08 +0200 From: Kaveh Razavi MIME-Version: 1.0 References: <1376413436-5424-1-git-send-email-kaveh@cs.vu.nl> <20130814092912.GC14914@stefanha-thinkpad.redhat.com> <520B922B.6030806@cs.vu.nl> <20130815083230.GE22521@stefanha-thinkpad.redhat.com> In-Reply-To: <20130815083230.GE22521@stefanha-thinkpad.redhat.com> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms040209000702060702010101" Subject: Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , qemu-devel@nongnu.org, Stefan Hajnoczi --------------ms040209000702060702010101 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable On 08/15/2013 10:32 AM, Stefan Hajnoczi wrote: > I don't buy the argument about the page cache being evicted at any time= : > > At the scale where caching is important, provisioning a measily 100 MB > of RAM per guest should not be a challenge. > > cgroups can be used to isolate page cache between VMs if you want to > guaranteed caches. > > But it could be more interesting not to isolate so that the page cache > acts host-wide to reduce the overall I/O instead of narrowly focussing > on caching 100 MB for a specific image even if it is rarely accessed. > > The real downside I see is that the page cache is volatile, so you coul= d > see heavy I/O if multiple hosts reboot at the same time. > At the VM hosts, the memory is mostly allocated to VMs. Without=20 persisted caches, starting another VM from any of the possible backing=20 VM images may or may not result in network traffic (depending on the=20 page cache). Regardless of the page cache, the existing cache images=20 persisted on the disk at hosts, can eliminate this at least on VM boot. At the storage site however, I think it makes sense to dedicate memory=20 for popular backing images (via tmpfs rather than page cache). The data=20 blocks of the popular images used for booting will be accessed by all=20 VMs starting from these "template" images. > Streaming offers a rate limiting parameter so you can tune it to the > network conditions. > > Copying the full image doesn't just reduce load on the NFS server, it > also means guests can continue to run if the NFS server becomes > unreachable. That's an important property for reliability. I am not really sure whether copying the entire image reduces the load=20 on the NFS server, specially at scale. If copying the entire image at=20 scale is desired/necessary, peer-to-peer approaches are documented to=20 perform better. They are mostly implemented at the host file-system=20 layer though (search for e.g. VMTorrent). I agree on the reliability=20 consideration if you deal with an unreliable (remote) file-system. > 1) > It is persistent. The backing file chain looks like this: > > /nfs/template.qcow2 <- /local/cache.qcow2 <- /local/vm001.qcow2 > > The cache is a regular qcow2 image file that is persistent. The discar= d > command is used to evict data from the file. Copy-on-read accesses are= > used to populate the cache when the guest submits a read request. > > 2) > You can set cache size or other parameters as a qemu-nbd option (this > doesn't exist but could be implemented): > > $ qemu-img create -f qcow2 -o backing_file=3D/nfs/template.qcow2 cac= he.qcow2 > $ qemu-nbd --options cache-size=3D100MB,evict=3Dlru cache.qcow2 > > So it's the qemu-nbd process that performs the cache housekeeping work.= > The cache.qcow2 file itself just persists data and isn't aware of cache= > settings. OK, this is better, since the user can also define a policy _and_ the=20 cache can be shared by different VMs at the creation time without races. = With an eviction policy 'none' in combination with cache_size, only the=20 first accessed data blocks get cached, essentially providing the same=20 functionality as this patch. Kaveh --------------ms040209000702060702010101 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKQDCC BRowggQCoAMCAQICEG0Z6qcZT2ozIuYiMnqqcd4wDQYJKoZIhvcNAQEFBQAwga4xCzAJBgNV BAYTAlVTMQswCQYDVQQIEwJVVDEXMBUGA1UEBxMOU2FsdCBMYWtlIENpdHkxHjAcBgNVBAoT FVRoZSBVU0VSVFJVU1QgTmV0d29yazEhMB8GA1UECxMYaHR0cDovL3d3dy51c2VydHJ1c3Qu Y29tMTYwNAYDVQQDEy1VVE4tVVNFUkZpcnN0LUNsaWVudCBBdXRoZW50aWNhdGlvbiBhbmQg RW1haWwwHhcNMTEwNDI4MDAwMDAwWhcNMjAwNTMwMTA0ODM4WjCBkzELMAkGA1UEBhMCR0Ix GzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgGA1UE ChMRQ09NT0RPIENBIExpbWl0ZWQxOTA3BgNVBAMTMENPTU9ETyBDbGllbnQgQXV0aGVudGlj YXRpb24gYW5kIFNlY3VyZSBFbWFpbCBDQTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC ggEBAJKEhFtLV5jUXi+LpOFAyKNTWF9mZfEyTvefMn1V0HhMVbdClOD5J3EHxcZppLkyxPFA GpDMJ1Zifxe1cWmu5SAb5MtjXmDKokH2auGj/7jfH0htZUOMKi4rYzh337EXrMLaggLW1DJq 1GdvIBOPXDX65VSAr9hxCh03CgJQU2yVHakQFLSZlVkSMf8JotJM3FLb3uJAAVtIaN3FSrTg 7SQfOq9xXwfjrL8UO7AlcWg99A/WF1hGFYE8aIuLgw9teiFX5jSw2zJ+40rhpVJyZCaRTqWS D//gsWD9Gm9oUZljjRqLpcxCm5t9ImPTqaD8zp6Q30QZ9FxbNboW86eb/8ECAwEAAaOCAUsw ggFHMB8GA1UdIwQYMBaAFImCZ33EnSZwAEu0UEh83j2uBG59MB0GA1UdDgQWBBR6E04AdFvG eGNkJ8Ev4qBbvHnFezAOBgNVHQ8BAf8EBAMCAQYwEgYDVR0TAQH/BAgwBgEB/wIBADARBgNV HSAECjAIMAYGBFUdIAAwWAYDVR0fBFEwTzBNoEugSYZHaHR0cDovL2NybC51c2VydHJ1c3Qu Y29tL1VUTi1VU0VSRmlyc3QtQ2xpZW50QXV0aGVudGljYXRpb25hbmRFbWFpbC5jcmwwdAYI KwYBBQUHAQEEaDBmMD0GCCsGAQUFBzAChjFodHRwOi8vY3J0LnVzZXJ0cnVzdC5jb20vVVRO QWRkVHJ1c3RDbGllbnRfQ0EuY3J0MCUGCCsGAQUFBzABhhlodHRwOi8vb2NzcC51c2VydHJ1 c3QuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQCF1r54V1VtM39EUv5C1QaoAQOAivsNsv1Kv/av QUn1G1rF0q0bc24+6SZ85kyYwTAo38v7QjyhJT4KddbQPTmGZtGhm7VNm2+vKGwdr+XqdFqo 2rHA8XV6L566k3nK/uKRHlZ0sviN0+BDchvtj/1gOSBH+4uvOmVIPJg9pSW/ve9g4EnlFsjr P0OD8ODuDcHTzTNfm9C9YGqzO/761Mk6PB/tm/+bSTO+Qik5g+4zaS6CnUVNqGnagBsePdIa XXxHmaWbCG0SmYbWXVcHG6cwvktJRLiQfsrReTjrtDP6oDpdJlieYVUYtCHVmdXgQ0BCML7q peeU0rD+83X5f27nMIIFHjCCBAagAwIBAgIRAO9QjnwZZ11UHCobSa0k9KgwDQYJKoZIhvcN AQEFBQAwgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAO BgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkwNwYDVQQDEzBD T01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EwHhcNMTMw MjAxMDAwMDAwWhcNMTQwMjAxMjM1OTU5WjAfMR0wGwYJKoZIhvcNAQkBFg5rYXZlaEBjcy52 dS5ubDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAJmZVkzRieAc7qWH8DewWrxN 9SuqvI5EJQ4HZ4GPnnQroyyWmSr1KX0Z3L2H1SSgFDFNOk496gD29ejZ8jXlvxhLKPxPaClw e7AbZHzeEhoSn0EkqrNlnIRxhmUgCNmQOK7zuVwgVzOQWSOuv8PcPMLoynVdKEkhRUFZBbvf 0+XKQAlFKZB+9WZCqXkI3d5sTxuE52Gu3kG5B7FObOtv6VSwA1JjfbPzpkFQiKJI1YO0Pazq 2OAaw92G9MMHxRCSvaGuipLvQXIibtXu/M8cD6M5SQY8V5axOleezb2oEEFmqL87VdDGVSFI ZFpJ581nWVjXvZGIM/ocRxGX8Vb/JuMCAwEAAaOCAd4wggHaMB8GA1UdIwQYMBaAFHoTTgB0 W8Z4Y2QnwS/ioFu8ecV7MB0GA1UdDgQWBBQ3ERk/e0caX61S/rtk9DzeFAvR4zAOBgNVHQ8B Af8EBAMCBaAwDAYDVR0TAQH/BAIwADAgBgNVHSUEGTAXBggrBgEFBQcDBAYLKwYBBAGyMQED BQIwEQYJYIZIAYb4QgEBBAQDAgUgMEYGA1UdIAQ/MD0wOwYMKwYBBAGyMQECAQEBMCswKQYI KwYBBQUHAgEWHWh0dHBzOi8vc2VjdXJlLmNvbW9kby5uZXQvQ1BTMFcGA1UdHwRQME4wTKBK oEiGRmh0dHA6Ly9jcmwuY29tb2RvY2EuY29tL0NPTU9ET0NsaWVudEF1dGhlbnRpY2F0aW9u YW5kU2VjdXJlRW1haWxDQS5jcmwwgYgGCCsGAQUFBwEBBHwwejBSBggrBgEFBQcwAoZGaHR0 cDovL2NydC5jb21vZG9jYS5jb20vQ09NT0RPQ2xpZW50QXV0aGVudGljYXRpb25hbmRTZWN1 cmVFbWFpbENBLmNydDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3AuY29tb2RvY2EuY29tMBkG A1UdEQQSMBCBDmthdmVoQGNzLnZ1Lm5sMA0GCSqGSIb3DQEBBQUAA4IBAQBhmQAoqhSBVjWW /ivygjtVSDAIpTgWu9/sgQiTLS3vkywpFczTXKgzf1PA7YGkfN1yl+u7pgCFLC3ePcnq40n3 nLiUSN6PtkAvpE+JujXRINualucPG+vZJseHrwXbockOs6TzouTnzxCPB7QA2Juc8RItQIZ9 O778PRYD7Upqr4lIpP12qSe/vh+RQYjPnox48U4nYf7HCEQND5WAB1NRx2l1rEyGbrNXC0fg Vb8isVDIdXL+vLN/FKXpsCr59wG+ZO37lI8T5a+10+4LHHYMy44R1Mw0v5pU9pd+sKUUmxXv Em9j9pXOODqAjd+Dv4Gw8Ooy2BCI++SfpwP5MWHFMYIEHDCCBBgCAQEwgakwgZMxCzAJBgNV BAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQx GjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMTkwNwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1 dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0ECEQDvUI58GWddVBwqG0mtJPSoMAkG BSsOAwIaBQCgggJHMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8X DTEzMDgxNTEyMjUwOFowIwYJKoZIhvcNAQkEMRYEFPHcVZQ3mvgFJpw1OLMAQcO9OaFGMGwG CSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAKBggqhkiG9w0DBzAO BggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZIhvcNAwICASgw gboGCSsGAQQBgjcQBDGBrDCBqTCBkzELMAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIg TWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0 ZWQxOTA3BgNVBAMTMENPTU9ETyBDbGllbnQgQXV0aGVudGljYXRpb24gYW5kIFNlY3VyZSBF bWFpbCBDQQIRAO9QjnwZZ11UHCobSa0k9KgwgbwGCyqGSIb3DQEJEAILMYGsoIGpMIGTMQsw CQYDVQQGEwJHQjEbMBkGA1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQHEwdTYWxm b3JkMRowGAYDVQQKExFDT01PRE8gQ0EgTGltaXRlZDE5MDcGA1UEAxMwQ09NT0RPIENsaWVu dCBBdXRoZW50aWNhdGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBAhEA71COfBlnXVQcKhtJrST0 qDANBgkqhkiG9w0BAQEFAASCAQBAJblxQUT1RVHR5WsNgfTxjS93qVz/CPGTX+NVpnXEJqbe 4w2QgT7WZhxcFnTmg/GGTxPJlJ9looKxb0jNbHTBKcqVZ7k+yN/PboDAoMg290xug5su8PPL w1HMJX9tuMBrm3JRwihzS2KyAO6ROkff/Gb57Y/qgMdKiQruh/iqE7jcLyOrrqwwaa2ki9/j ubGv++3szQsTdS6QvtgXuFmyp6DECDlXOSsLH4AIGxNKbt4XurxzdmEZ6NdTVNVXvR0PO1zr 4aXTCJRQgVDVSoNi6+KV6Eh8w9YMuH4M6zD8wR9YlWnynQOuKE9uqON6wfOGkzd4vf8ucmvM hnGHAg0SAAAAAAAA --------------ms040209000702060702010101--