From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35229) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aJ16l-0003QK-0p for qemu-devel@nongnu.org; Tue, 12 Jan 2016 10:43:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aJ16h-0004Qi-Dg for qemu-devel@nongnu.org; Tue, 12 Jan 2016 10:43:42 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54956) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aJ16h-0004Qc-5s for qemu-devel@nongnu.org; Tue, 12 Jan 2016 10:43:39 -0500 Date: Tue, 12 Jan 2016 15:43:34 +0000 From: "Daniel P. Berrange" Message-ID: <20160112154334.GR17626@redhat.com> References: <1450452647-118105-1-git-send-email-imammedo@redhat.com> <1450452647-118105-3-git-send-email-imammedo@redhat.com> <20151218164823.GH7228@redhat.com> <567441AA.1010309@redhat.com> <87wpsbwj9k.fsf@blackfin.pond.sub.org> <5693D271.3090608@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <5693D271.3090608@suse.de> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 2/2] qdev: safely fail device_add if unable to allocate device Reply-To: "Daniel P. Berrange" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andreas =?utf-8?Q?F=C3=A4rber?= Cc: qemu-devel@nongnu.org, Igor Mammedov , Markus Armbruster , pbonzini@redhat.com On Mon, Jan 11, 2016 at 05:04:01PM +0100, Andreas F=C3=A4rber wrote: > Am 18.12.2015 um 22:15 schrieb Markus Armbruster: > > Eric Blake writes: > >> On 12/18/2015 09:48 AM, Daniel P. Berrange wrote: > >>> On Fri, Dec 18, 2015 at 04:30:47PM +0100, Igor Mammedov wrote: > >>>> qdev_device_add() currently uses object_new() which > >>>> will abort if there memory allocation for device instance > >>>> fails. While it's fine it startup, it is not desirable > >>>> diring hotplug. > >>>> > >>>> Try to allocate memory for object first and fail safely > >>>> if allocation fails. > >> > >>> This just avoids one small malloc failure. > >>> > >>>> + object_initialize(dev, obj_size, driver); > >>> > >>> This is going to call g_new many more times, so you'll > >>> still hit OOM almost immediately. eg the call to > >>> g_hash_table_new_full() in object_initialize_with_type > >>> will abort on OOM, not to mention anything run in a > >>> instance constructor function registered against the > >>> class. There's no way to avoid this given that we have > >>> chosen to use GLib in QEMU, so I don't really see any > >>> point in replacing the 'object_new' call with g_try_malloc > >=20 > > Seconded. > >=20 > >> We just had a recent thread on it, and I tend to agree that this ser= ies > >> isn't helpful. > >> > >> https://lists.gnu.org/archive/html/qemu-devel/2015-12/threads.html#0= 1238 > >=20 > > Plenty of arguments there why recovering from scattered random > > allocation failures isn't useful, and recovering from all of them isn= 't > > practical. Limit recovery attempts to big allocations, and spend (so= me > > of) the saved cycles on actually testing the recovery code. >=20 > Jumping in late... I had done work into this direction around 2012/2013 > and I believe that changing the hotplug allocation is the right thing t= o > do here. >=20 > http://patchwork.ozlabs.org/patch/269595/ >=20 > Igor's error handling could still use some improvement (at the time we > did not have the out argument) and my suggested solution was a similar > one to error_abort (pre-allocation and special handling), to avoid > allocation of Error objects and strings on OOM. >=20 > Daniel, any allocations other than the core QOM API inside an > instance_init function are plain wrong, has been pointed out to patch > authors and is not an argument for not handling hot-plug errors/design > properly. My huge QOM conversions always had in mind that instance_init > cannot do random allocations and moved them to realize, which may fail, > including for OOM, and should be handled gracefully for hot-plug. For > startup I don't see it as critical - it may be inconvenient not to get = a > nice error message but you don't risk losing the guest's data. >=20 > Now to the claim that this is just a small allocation. If some 20-char > string allocation fails, there's little QEMU can realistically do > recovery-wise. But a PowerPCCPU device for instance comes with huge > multi-level instruction-dispatch tables even in KVM mode. Therefore my > opinion that this user-initiated scenario and thus code location is > exactly the one we need to handle, even if many others remain unhandled= . > It's also why CPU hot-plug needs to take QOM design into account, and > proposals parametrizing core count etc. make size precalculation tricky= . I'm still don't think that trying to recover from OOM on device hotplug is going to be usable/useful in practice. Thinking only about QEMU, and not the rest of the OS, we already have many other threads running in parallel which are potentially allocating memory too. If we're close enough to the limit that a device hotplug triggers OOM, we have to consider that another thread may trigger OOM concurrently. Even if the device hotplug allocation is moderately large, we may succeeed in allocat= ing the large chunk, only to pushed so close to the limit, that any other sma= ll allocation just after then pushes us into OOM. So even coping with this one failure is not going to make the hotplug code robust in general - it is just giving a false sense of reliability IMHO. If people are running QEMU on hosts that are so heavily overcommit on RAM that a device hotplug pushes QEMU into OOM, they're pretty much doomed regardless of whether we try to handle te one allocation failure shown in this patch. I just don't see anything useful coming from trying to handle allocation failure here, given that we've decided to accept abort-on-OOM as the general policy throughout QEMU. Regards, Daniel --=20 |: http://berrange.com -o- http://www.flickr.com/photos/dberrange= / :| |: http://libvirt.org -o- http://virt-manager.or= g :| |: http://autobuild.org -o- http://search.cpan.org/~danberr= / :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vn= c :|