From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43918)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1gD9gp-0007x0-81
	for qemu-devel@nongnu.org; Thu, 18 Oct 2018 10:54:21 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1gD9gj-00028F-8Y
	for qemu-devel@nongnu.org; Thu, 18 Oct 2018 10:54:18 -0400
Received: from mx1.redhat.com ([209.132.183.28]:47188)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <dgilbert@redhat.com>) id 1gD9gi-00027X-Vy
	for qemu-devel@nongnu.org; Thu, 18 Oct 2018 10:54:13 -0400
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com
	[10.5.11.23])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id D6BB33DBCD
	for <qemu-devel@nongnu.org>; Thu, 18 Oct 2018 14:54:11 +0000 (UTC)
Date: Thu, 18 Oct 2018 15:54:06 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20181018145406.GE2632@work-vm>
References: <87efcqniza.fsf@dusky.pond.sub.org> <20181016133340.GB2427@work-vm>
	<87va5zjort.fsf@dusky.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87va5zjort.fsf@dusky.pond.sub.org>
Subject: Re: [Qemu-devel] When it's okay to treat OOM as fatal?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > * Markus Armbruster (armbru@redhat.com) wrote:
> >> We sometimes use g_new() & friends, which abort() on OOM, and sometimes
> >> g_try_new() & friends, which can fail, and therefore require error
> >> handling.
> >> 
> >> HACKING points out the difference, but is mum on when to use what:
> >> 
> >>     3. Low level memory management
> >> 
> >>     Use of the malloc/free/realloc/calloc/valloc/memalign/posix_memalign
> >>     APIs is not allowed in the QEMU codebase. Instead of these routines,
> >>     use the GLib memory allocation routines g_malloc/g_malloc0/g_new/
> >>     g_new0/g_realloc/g_free or QEMU's qemu_memalign/qemu_blockalign/qemu_vfree
> >>     APIs.
> >> 
> >>     Please note that g_malloc will exit on allocation failure, so there
> >>     is no need to test for failure (as you would have to with malloc).
> >>     Calling g_malloc with a zero size is valid and will return NULL.
> >> 
> >>     Prefer g_new(T, n) instead of g_malloc(sizeof(T) * n) for the following
> >>     reasons:
> >> 
> >>       a. It catches multiplication overflowing size_t;
> >>       b. It returns T * instead of void *, letting compiler catch more type
> >>          errors.
> >> 
> >>     Declarations like T *v = g_malloc(sizeof(*v)) are acceptable, though.
> >> 
> >>     Memory allocated by qemu_memalign or qemu_blockalign must be freed with
> >>     qemu_vfree, since breaking this will cause problems on Win32.
> >> 
> >> Now, in my personal opinion, handling OOM gracefully is worth the
> >> (commonly considerable) trouble when you're coding for an Apple II or
> >> similar.  Anything that pages commonly becomes unusable long before
> >> allocations fail.
> >
> > That's not always my experience; I've seen cases where you suddenly
> > allocate a load more memory and hit OOM fairly quickly on that hot
> > process.  Most of the time on the desktop you're right.
> >
> >> Anything that overcommits will send you a (commonly
> >> lethal) signal instead.  Anything that tries handling OOM gracefully,
> >> and manages to dodge both these bullets somehow, will commonly get it
> >> wrong and crash.
> >
> > If your qemu has maped it's main memory from hugetlbfs or similar pools
> > then we're looking at the other memory allocations; and that's a bit of
> > an interesting difference where those other allocations should be a lot
> > smaller.
> >
> >> But others are entitled to their opinions as much as I am.  I just want
> >> to know what our rules are, preferably in the form of a patch to
> >> HACKING.
> >
> > My rule is to try not to break a happily running VM by some new
> > activity; I don't worry about it during startup.
> >
> > So for example, I don't like it when starting a migration, allocates
> > some more memory and kills the VM - the user had a happy stable VM
> > upto that point.  Migration gets the blame at this point.
> 
> I don't doubt reliable OOM handling would be nice.  I do doubt it's
> practical for an application like QEMU.

Well, our use of glib certainly makes it much much harder.
I just try and make sure anywhere that I'm allocating a non-trivial
amount of memory (especially anything guest or user controlled) uses
the _try_ variants.  That should keep a lot of the larger allocations.
However, it scares me that we've got things that can return big chunks
of JSON for example, and I don't think they're being careful about it.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK