From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39617) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlZsS-0008U8-51 for qemu-devel@nongnu.org; Fri, 24 Apr 2015 05:26:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlZsO-0007Ui-TP for qemu-devel@nongnu.org; Fri, 24 Apr 2015 05:26:28 -0400 Date: Fri, 24 Apr 2015 10:26:21 +0100 From: Stefan Hajnoczi Message-ID: <20150424092621.GD13278@stefanha-thinkpad.redhat.com> References: <1429629639-14328-1-git-send-email-berto@igalia.com> <20150422102602.GA20029@stefanha-thinkpad.redhat.com> <20150423101504.GG8811@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="XvKFcGCOAo53UbWW" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alberto Garcia Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, Stefan Hajnoczi --XvKFcGCOAo53UbWW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 23, 2015 at 01:50:28PM +0200, Alberto Garcia wrote: > On Thu 23 Apr 2015 12:15:04 PM CEST, Stefan Hajnoczi wrote: >=20 > >> For a cache size of 128MB, the PSS is actually ~10MB larger without > >> the patch, which seems to come from posix_memalign(). > > > > Do you mean RSS or are you using a tool that reports a "PSS" number > > that I don't know about? > > > > We should understand what is going on instead of moving the code > > around to hide/delay the problem. >=20 > Both RSS and PSS ("proportional set size", also reported by the kernel). >=20 > I'm not an expert in memory allocators, but I measured the overhead like > this: >=20 > An L2 cache of 128MB implies a refcount cache of 32MB, in total 160MB. > With a default cluster size of 64k, that's 2560 cache entries. >=20 > So I wrote a test case that allocates 2560 blocks of 64k each using > posix_memalign and mmap, and here's how their /proc//smaps compare: >=20 > -Size: 165184 kB > -Rss: 10244 kB > -Pss: 10244 kB > +Size: 161856 kB > +Rss: 0 kB > +Pss: 0 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 0 kB > -Private_Dirty: 10244 kB > -Referenced: 10244 kB > -Anonymous: 10244 kB > +Private_Dirty: 0 kB > +Referenced: 0 kB > +Anonymous: 0 kB > AnonHugePages: 0 kB > Swap: 0 kB > KernelPageSize: 4 kB >=20 > Those are the 10MB I saw. For the record I also tried with malloc() and > the results are similar to those of posix_memalign(). The posix_memalign() call wastes memory. I compared: posix_memalign(&memptr, 65536, 2560 * 65536); memset(memptr, 0, 2560 * 65536); with: for (i =3D 0; i < 2560; i++) { posix_memalign(&memptr, 65536, 65536); memset(memptr, 0, 65536); } Here are the results: -Size: 163920 kB -Rss: 163860 kB -Pss: 163860 kB +Size: 337800 kB +Rss: 183620 kB +Pss: 183620 kB Note the memset simulates a fully occupied cache. The 19 MB RSS difference between the two seems wasteful. The large "Size" difference hints that the mmap pattern is very different when posix_memalign() is called multiple times. We could avoid the 19 MB overhead by switching to a single allocation. What's more is that dropping the memset() to simulate no cache entry usage (like your example) gives us a grand total of 20 kB RSS. There is no point in delaying allocations if we do a single big upfront allocation. I'd prefer a patch that replaces the small allocations with a single big one. That's a win in both empty and full cache cases. Stefan --XvKFcGCOAo53UbWW Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVOgw9AAoJEJykq7OBq3PIb6kH/1A63S5Q2J7sywn8naqLP2as YUksGFsQPSppCqaOBTKVPeHx/odkk/ENQUZQ+inT/3y+U9+TQKWaSDd9ppSLsOYt /7V9q2BErj+DlB2yjGolIdoFJbjHrEuYsePwY5xIHy89sQhoWpMCipLagFUhsECG VrQNTixNq+BdINeq2PZOq6vrHELk9coenNRXSk5WYh/nCLhnXPXu2qNNk+Sl7YRP jWDqgMKSYZsMJ7n7byoOK6Z/CjFt4LPQtASnb6ppa9kDFQqpLTV8rxDubDRqlLQC D7k/OhASYSdmBS6XgbSnGtzFrHIX7rbeBNa+SSuwCXEch44C3hUnvIdTrO9EU9w= =GSSH -----END PGP SIGNATURE----- --XvKFcGCOAo53UbWW--