From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39296) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZkTQ-0007Lc-Jh for qemu-devel@nongnu.org; Wed, 09 Sep 2015 14:52:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZZkTL-0006BV-KJ for qemu-devel@nongnu.org; Wed, 09 Sep 2015 14:52:00 -0400 Received: from mx4-phx2.redhat.com ([209.132.183.25]:52070) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZkTL-0006Ay-D5 for qemu-devel@nongnu.org; Wed, 09 Sep 2015 14:51:55 -0400 Date: Wed, 9 Sep 2015 14:51:53 -0400 (EDT) From: Jason Dillaman Message-ID: <980375080.25911577.1441824713332.JavaMail.zimbra@redhat.com> In-Reply-To: <1000957815.25879188.1441820902018.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: andrey@xdel.ru Cc: qemu-devel@nongnu.org >> Bumping this... >> >> For now, we are rarely suffering with an unlimited cache growth issue >> which can be observed on all post-1.4 versions of qemu with rbd >> backend in a writeback mode and certain pattern of a guest operations. >> The issue is confirmed for virtio and can be re-triggered by issuing >> excessive amount of write requests without completing returned acks >> from a emulator` cache timely. Since most applications behave in a >> right way, the oom issue is very rare (and we developed an ugly >> workaround for such situations long ago). If anybody is interested in >> fixing this, I can send a prepared image for a reproduction or >> instructions to make one, whichever is preferable. >> >> Thanks! > >A gentle bump: for at least rbd backend with writethrough/writeback >cache it is possible to achieve unlimited growth with lot of large >unfinished ops, what can be considered as a DoS. Usually it is >triggered by poorly written applications in the wild, like proprietary >KV databases or MSSQL under Windows, but regular applications, >primarily OSS databases, can trigger the RSS growth for hundreds of >megabytes just easily. There is probably no straight way to limit >in-flight request size by re-chunking it, as supposedly malicious >guest can inflate it up to very high numbers, but it`s fine to crash >such a guest, saving real-world stuff with simple in-flight op count >limiter looks like more achievable option. Any chance you can provide the reproducer VM image via ceph-post-file [1]? Using the latest Firefly release with QEMU 2.3.1, I was unable to reproduce unlimited growth while hammering the VM with a randwrite fio job with iodepth=256, blocksize=4k. [1] http://ceph.com/docs/master/man/8/ceph-post-file/ -- Jason