From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752735AbbI3PRv (ORCPT ); Wed, 30 Sep 2015 11:17:51 -0400 Received: from mail-io0-f173.google.com ([209.85.223.173]:33607 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751265AbbI3PRt (ORCPT ); Wed, 30 Sep 2015 11:17:49 -0400 Subject: Re: [PATCH] Patch to integrate RapidDisk and RapidCache RAM Drive / Caching modules into the kernel To: Petros Koutoupis , Christoph Hellwig References: <1443374244.8013.7.camel@petros-ultrathin> <20150928064936.GA22280@infradead.org> <20150928162944.GA29562@infradead.org> <56096E90.2020000@petroskoutoupis.com> <560AA116.9030300@gmail.com> <560BF1DF.3000506@petroskoutoupis.com> Cc: linux-kernel@vger.kernel.org, "devel@rapiddisk.org" From: Austin S Hemmelgarn Message-ID: <560BFCF6.9000203@gmail.com> Date: Wed, 30 Sep 2015 11:17:10 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <560BF1DF.3000506@petroskoutoupis.com> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms060403070705030404050409" X-Antivirus: avast! (VPS 150930-0, 2015-09-30), Outbound message X-Antivirus-Status: Clean Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a cryptographically signed message in MIME format. --------------ms060403070705030404050409 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable On 2015-09-30 10:29, Petros Koutoupis wrote: > Christoph and Austin, > > You both have provided me with some valuable feedback. I will do what I= > can to clean this patch up and in turn apply the same dynamic > functionality to the already in-kernel module. Also please see my > replies below. > > On 9/29/15 9:32 AM, Austin S Hemmelgarn wrote: >> On 2015-09-28 12:45, Petros Koutoupis wrote: >>> Christoph, >>> >>> See my replies below.... >>> >>> On 9/28/15 11:29 AM, Christoph Hellwig wrote: >>>> Hi Petros, >>>> >>>> On Mon, Sep 28, 2015 at 09:12:13AM -0500, Petros Koutoupis wrote: >>>>> 1. Unlike the already mainline ramdisk driver, RapidDisk is design= ed >>>>> to be >>>>> managed dynamically. That is, instead of configuring a fixed number= of >>>>> volumes and volume sizes as compile/boot time variables, RapidDisk >>>>> will >>>>> allow you to add, remove, and resize your RAM drive(s) at runtime. >>>>> Besides, >>>>> the built in module is designed to work with smaller sizes in mind >>>>> while >>>>> RapidDisk focuses on larger sizes that can reach to the multiple >>>>> Gigabytes >>>>> or even Terabytes. Much like the built in module, it will allocate >>>>> pages as >>>>> they are needed which allows for over provisioning (not that it is >>>>> advised) >>>>> of volume sizes. >>>> The ramdisk driver allows to selects sizes and count at module load >>>> load. I agree that having runtime control would be even better, but= >>>> that's best done by adding a runtime interface to the existing drive= r >>>> instead of duplicating it. >>> I understand the concern and I will definitely scope out this approac= h, >>> although at the moment, I am not sure how both approaches will play n= ice >>> together. As mentioned above, the current implementation requires the= >>> predefined number of ram drives with the specified size to be configu= red >>> at boot time (or compiled into the kernel). The only wiggle room I se= e >>> for runtime control is resizing individual volumes. >> Just because there is not code currently to do dynamic >> allocation/freeing of ramdisks in the current driver doesn't mean that= >> it isn't possible, it just means that nobody has written code to do it= >> yet. This functionality would be extremely useful (I often use >> ramdisks on a VM host as a small amount of very fast swap space for >> the virtual machines). On top of that, the deduplication would be a >> wonderful feature, although it may already be indirectly implemented >> through KSM (that is, when KSM is on and configured to scan >> everything, I'm not sure if it scans memory used by the ramdisks or no= t). >> > To my understanding KSM is only applied to KVM deployments. One way I > have seen my caching module work is users/vendors have a block device, > map it to a RapidDisk RAM drive as a RAM based Write-Through caching > node and in turn export it via a traditional SAN. The idea behind addin= g > deduplication to this module is to minimize the RAM drive footprint whe= n > used as a block level cache. KSM is usually used in KVM or other userspace VM deployments, but that=20 is by no means the only use-case. I actually use it regularly on most=20 of my systems, and it does help in some cases (for example, I run a lot=20 of distributed computing apps, often using multiple instances of the=20 same app, and those don't always share memory to the degree they should, = KSM helps with this). The write-through caching may be worth looking into, although I think=20 (not certain about this) that you can force the page cache to do=20 write-through caching only, except that can only be done globally. It would probably be better to improve upon the existing pagecache=20 implementation anyway, ideally, I would love to see: 1. The ability to tell the page cache to claim some minimum amount of=20 memory that only it can use. 2. The ability to easily tune cache parameters on a per-device (or even=20 better, per-filesystem) basis. 3. Conversion to a framework that would allow for easy development and=20 testing of different caching algorithms (although this is probably never = going to happen). >>>>> 2. The majority of RapidDisk code focuses on the use of Volatile >>>>> memory. >>>>> The support for Non-Volatile memory is a bit newer and there may be= >>>>> some >>>>> overlap here with the recently integrated pmem code. The only >>>>> advantage to >>>>> having this code within RapidDisk is to provide the user with the >>>>> ability >>>>> to manage both technologies simultaneously, through a single >>>>> interface. >>>> Which really doesn't sound like a good enough reason to duplicate it= =2E >>> I do not disagree with your comment here. This component does not hav= e >>> to be patched into the mainline. >>> >>>>> 3. The RapidCache component is designed around the Non-Volatile >>>>> functionality of RapidDisk (hence the block-level Write-Through >>>>> caching). >>>>> It is also coded and optimized around the RapidDisk sizes/variables= , >>>>> out-of-box. It is worth noting that I am in the process of expandin= g >>>>> this >>>>> module to add deduplication support. This will leverage RapidDisk's= >>>>> ability >>>>> to allocate pages only when needed and reduce the cache's memory >>>>> footprint; >>>>> making more out of less. >>>> Still needs some code comparism to our existing two caching solution= s. >>>> >>>> I'd love to see you go ahead with the dynamic ramdisk configuration = as >>>> this is clearly a very useful feature. A caching solution that is >>>> optimized for non-volatile memory does sound useful, but we'll still= >>>> need a patch better explaining how it actually is as useful as it mi= ght >>>> sound. >>> CORRECTION: I meant to say Volatile and NOT Non-Volatile. RapidCache = is >>> designed around Volatile memory. I guess I was a little to excited in= my >>> response and I do apologize for that. I will provide a code compariso= n >>> in my next e-mail, after I go through the existing RAM drive code. >> To a certain extent, I see that as potentially less useful than >> optimized for non-volatile memory. While the current incarnation of >> the pagecache in Linux could stand to have some serious performance >> improvements (just think how fast things would be if we used ARC >> instead of plain LRU), it does still do it's job well for most >> workloads (although being able to tell the kernel to reserve some >> portion of memory _just_ for the pagecache would be an interesting and= >> probably very useful feature). >> > My only concern with an ARC is CPU utilization. A lot more is required > to manage two lists. Actually, most of the CPU time spent in an ARC cache is in the=20 auto-tuning (the 'adaptive' bit), I've done testing just in userspace=20 and SLRU (ARC without the adaptive sizing of the lists) uses only a=20 little more CPU time than traditional LRU, somewhat less than ARC, and=20 does a much better job of handling COW based workloads. COW is a tough=20 workload for LRU caching (which is why ZFS uses ARC and not traditional=20 LRU), as a read-modify-write cycle ends up with the read data not being=20 needed ever again, which in turn means that MRU caching can be better in = may cases for heavy read-write COW workloads. --------------ms060403070705030404050409 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Brgwgga0MIIEnKADAgECAgMRLfgwDQYJKoZIhvcNAQENBQAweTEQMA4GA1UEChMHUm9vdCBD QTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNp Z25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2VydC5vcmcwHhcN MTUwOTIxMTEzNTEzWhcNMTYwMzE5MTEzNTEzWjBjMRgwFgYDVQQDEw9DQWNlcnQgV29UIFVz ZXIxIzAhBgkqhkiG9w0BCQEWFGFoZmVycm9pbjdAZ21haWwuY29tMSIwIAYJKoZIhvcNAQkB FhNhaGVtbWVsZ0BvaGlvZ3QuY29tMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA nQ/81tq0QBQi5w316VsVNfjg6kVVIMx760TuwA1MUaNQgQ3NyUl+UyFtjhpkNwwChjgAqfGd LIMTHAdObcwGfzO5uI2o1a8MHVQna8FRsU3QGouysIOGQlX8jFYXMKPEdnlt0GoQcd+BtESr pivbGWUEkPs1CwM6WOrs+09bAJP3qzKIr0VxervFrzrC5Dg9Rf18r9WXHElBuWHg4GYHNJ2V Ab8iKc10h44FnqxZK8RDN8ts/xX93i9bIBmHnFfyNRfiOUtNVeynJbf6kVtdHP+CRBkXCNRZ qyQT7gbTGD24P92PS2UTmDfplSBcWcTn65o3xWfesbf02jF6PL3BCrVnDRI4RgYxG3zFBJuG qvMoEODLhHKSXPAyQhwZINigZNdw5G1NqjXqUw+lIqdQvoPijK9J3eijiakh9u2bjWOMaleI SMRR6XsdM2O5qun1dqOrCgRkM0XSNtBQ2JjY7CycIx+qifJWsRaYWZz0aQU4ZrtAI7gVhO9h pyNaAGjvm7PdjEBiXq57e4QcgpwzvNlv8pG1c/hnt0msfDWNJtl3b6elhQ2Pz4w/QnWifZ8E BrFEmjeeJa2dqjE3giPVWrsH+lOvQQONsYJOuVb8b0zao4vrWeGmW2q2e3pdv0Axzm/60cJQ haZUv8+JdX9ZzqxOm5w5eUQSclt84u+D+hsCAwEAAaOCAVkwggFVMAwGA1UdEwEB/wQCMAAw VgYJYIZIAYb4QgENBEkWR1RvIGdldCB5b3VyIG93biBjZXJ0aWZpY2F0ZSBmb3IgRlJFRSBo ZWFkIG92ZXIgdG8gaHR0cDovL3d3dy5DQWNlcnQub3JnMA4GA1UdDwEB/wQEAwIDqDBABgNV HSUEOTA3BggrBgEFBQcDBAYIKwYBBQUHAwIGCisGAQQBgjcKAwQGCisGAQQBgjcKAwMGCWCG SAGG+EIEATAyBggrBgEFBQcBAQQmMCQwIgYIKwYBBQUHMAGGFmh0dHA6Ly9vY3NwLmNhY2Vy dC5vcmcwMQYDVR0fBCowKDAmoCSgIoYgaHR0cDovL2NybC5jYWNlcnQub3JnL3Jldm9rZS5j cmwwNAYDVR0RBC0wK4EUYWhmZXJyb2luN0BnbWFpbC5jb22BE2FoZW1tZWxnQG9oaW9ndC5j b20wDQYJKoZIhvcNAQENBQADggIBADMnxtSLiIunh/TQcjnRdf63yf2D8jMtYUm4yDoCF++J jCXbPQBGrpCEHztlNSGIkF3PH7ohKZvlqF4XePWxpY9dkr/pNyCF1PRkwxUURqvuHXbu8Lwn 8D3U2HeOEU3KmrfEo65DcbanJCMTTW7+mU9lZICPP7ZA9/zB+L0Gm1UNFZ6AU50N/86vjQfY WgkCd6dZD4rQ5y8L+d/lRbJW7ZGEQw1bSFVTRpkxxDTOwXH4/GpQfnfqTAtQuJ1CsKT12e+H NSD/RUWGTr289dA3P4nunBlz7qfvKamxPymHeBEUcuICKkL9/OZrnuYnGROFwcdvfjGE5iLB kjp/ttrY4aaVW5EsLASNgiRmA6mbgEAMlw3RwVx0sVelbiIAJg9Twzk4Ct6U9uBKiJ8S0sS2 8RCSyTmCRhJs0vvva5W9QUFGmp5kyFQEoSfBRJlbZfGX2ehI2Hi3U2/PMUm2ONuQG1E+a0AP u7I0NJc/Xil7rqR0gdbfkbWp0a+8dAvaM6J00aIcNo+HkcQkUgtfrw+C2Oyl3q8IjivGXZqT 5UdGUb2KujLjqjG91Dun3/RJ/qgQlotH7WkVBs7YJVTCxfkdN36rToPcnMYOI30FWa0Q06gn F6gUv9/mo6riv3A5bem/BdbgaJoPnWQD9D8wSyci9G4LKC+HQAMdLmGoeZfpJzKHMYIE0TCC BM0CAQEwgYAweTEQMA4GA1UEChMHUm9vdCBDQTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNl cnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNpZ25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcN AQkBFhJzdXBwb3J0QGNhY2VydC5vcmcCAxEt+DANBglghkgBZQMEAgMFAKCCAiEwGAYJKoZI hvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTUwOTMwMTUxNzEwWjBPBgkq hkiG9w0BCQQxQgRAH0dpJPzahsyEkMMKqDMENwnGd9kgMlb3w6Z/XS8lnvJuO5scfA7rBp2F t15IYi+UevtOUgCratcJK/+c2ezqszBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGRBgkrBgEEAYI3EAQxgYMwgYAweTEQMA4GA1UE ChMHUm9vdCBDQTEeMBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlD QSBDZXJ0IFNpZ25pbmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2Vy dC5vcmcCAxEt+DCBkwYLKoZIhvcNAQkQAgsxgYOggYAweTEQMA4GA1UEChMHUm9vdCBDQTEe MBwGA1UECxMVaHR0cDovL3d3dy5jYWNlcnQub3JnMSIwIAYDVQQDExlDQSBDZXJ0IFNpZ25p bmcgQXV0aG9yaXR5MSEwHwYJKoZIhvcNAQkBFhJzdXBwb3J0QGNhY2VydC5vcmcCAxEt+DAN BgkqhkiG9w0BAQEFAASCAgBjbyDo+lL5qZ5h7bhi8bKt9CuX5x3v/kW63kvg2wdD5zq46dPk reMkGip4wVmZ0eciIyxNXTvCmkf6m35xpFrjBHUZRZxoJ1entvniskgTTRupEMhMdDmypBA6 KtiCXZC2k5LVA4BHv9T7XsGP0OweUaAfS3ukjmHqfdI/FPFs7Dk/M/XDLTbPyLSdB+YooTNz QKlgwxS8h1OxX12mie4UDrS8L7R3gZldH0SfsG27i8TgTVbUmd9yAYf8s80sjIp0paDLTJe7 xq0IULZ/ps6iXfymu7uy9obb0pAvaHjlD0fjKmPRlhgzlpfrPsca7fSXB87Nj5WUsq3XJJiO LgN2bOdmBwW468TWS6jvDjwJJshJA3veYV6+MmMj2C4AKoupYuYFeMt1OXr1oPDlgLJGCiJg I38l+cprR8Ib6xjsFsuWCuA5ZFPEjiVi/lR36K3JgTdT08LSqKQ8yrnH9jmXyGjpRUiYH3nq NFxt7OX6ZJ3eCUn18Z7Y2ioWA5LPBfDddON2z6BkR7KhY7qd38ooQfS6ZCNVw4tI++PITprD 7Lz2dx/QsJjgi0dvvi5tugJdFy4Sit2VN97KEuWE36ErpP5PQWaT0CCLkcQUZyVjmVI6Lq2/ 8T3vWpeNmFjVgt+DiR9LP7K7gg5VceG2ssn/en5SVTGiDv5vN01otrCUFAAAAAAAAA== --------------ms060403070705030404050409--