qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
	Sanjay Rao <srao@redhat.com>,
	Boaz Ben Shabat <bbenshab@redhat.com>,
	Joe Mario <jmario@redhat.com>
Subject: Re: [PATCH] coroutine: cap per-thread local pool size
Date: Tue, 19 Mar 2024 13:55:10 -0400	[thread overview]
Message-ID: <20240319175510.GA1127203@fedora> (raw)
In-Reply-To: <ZfmWhDaG5mN-GCeO@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4014 bytes --]

On Tue, Mar 19, 2024 at 01:43:32PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 18, 2024 at 02:34:29PM -0400, Stefan Hajnoczi wrote:
> > The coroutine pool implementation can hit the Linux vm.max_map_count
> > limit, causing QEMU to abort with "failed to allocate memory for stack"
> > or "failed to set up stack guard page" during coroutine creation.
> > 
> > This happens because per-thread pools can grow to tens of thousands of
> > coroutines. Each coroutine causes 2 virtual memory areas to be created.
> 
> This sounds quite alarming. What usage scenario is justified in
> creating so many coroutines ?

The coroutine pool hides creation and deletion latency. The pool
initially has a modest size of 64, but each virtio-blk device increases
the pool size by num_queues * queue_size (256) / 2.

The issue pops up with large SMP guests (i.e. large num_queues) with
multiple virtio-blk devices.

> IIUC, coroutine stack size is 1 MB, and so tens of thousands of
> coroutines implies 10's of GB of memory just on stacks alone.
> 
> > Eventually vm.max_map_count is reached and memory-related syscalls fail.
> 
> On my system max_map_count is 1048576, quite alot higher than
> 10's of 1000's. Hitting that would imply ~500,000 coroutines and
> ~500 GB of stacks !

Fedora recently increased the limit to 1048576. Before that it was
65k-ish and still is on most other distros.

Regarding why QEMU might have 65k coroutines pooled, it's because the
existing coroutine pool algorithm is per thread. So if the max pool size
is 15k but you have 4 IOThreads then up to 4 x 15k total coroutines can
be sitting in pools. This patch addresses this by setting a small fixed
size on per thread pools (256).

> 
> > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> > index 5fd2dbaf8b..2790959eaf 100644
> > --- a/util/qemu-coroutine.c
> > +++ b/util/qemu-coroutine.c
> 
> > +static unsigned int get_global_pool_hard_max_size(void)
> > +{
> > +#ifdef __linux__
> > +    g_autofree char *contents = NULL;
> > +    int max_map_count;
> > +
> > +    /*
> > +     * Linux processes can have up to max_map_count virtual memory areas
> > +     * (VMAs). mmap(2), mprotect(2), etc fail with ENOMEM beyond this limit. We
> > +     * must limit the coroutine pool to a safe size to avoid running out of
> > +     * VMAs.
> > +     */
> > +    if (g_file_get_contents("/proc/sys/vm/max_map_count", &contents, NULL,
> > +                            NULL) &&
> > +        qemu_strtoi(contents, NULL, 10, &max_map_count) == 0) {
> > +        /*
> > +         * This is a conservative upper bound that avoids exceeding
> > +         * max_map_count. Leave half for non-coroutine users like library
> > +         * dependencies, vhost-user, etc. Each coroutine takes up 2 VMAs so
> > +         * halve the amount again.
> > +         */
> > +        return max_map_count / 4;
> 
> That's 256,000 coroutines, which still sounds incredibly large
> to me.

Any ideas for tweaking this heuristic?

> 
> > +    }
> > +#endif
> > +
> > +    return UINT_MAX;
> 
> Why UINT_MAX as a default ?  If we can't read procfs, we should
> assume some much smaller sane default IMHO, that corresponds to
> what current linux default max_map_count would be.

This line is not Linux-specific. I don't know if other OSes have an
equivalent to max_map_count.

I agree with defaulting to 64k-ish on Linux.

Stefan

> 
> > +}
> > +
> > +static void __attribute__((constructor)) qemu_coroutine_init(void)
> > +{
> > +    qemu_mutex_init(&global_pool_lock);
> > +    global_pool_hard_max_size = get_global_pool_hard_max_size();
> >  }
> > -- 
> > 2.44.0
> > 
> > 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2024-03-19 17:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18 18:34 [PATCH] coroutine: cap per-thread local pool size Stefan Hajnoczi
2024-03-19 13:32 ` Kevin Wolf
2024-03-19 13:45   ` Stefan Hajnoczi
2024-03-19 14:23   ` Sanjay Rao
2024-03-19 13:43 ` Daniel P. Berrangé
2024-03-19 16:54   ` Kevin Wolf
2024-03-19 17:10     ` Daniel P. Berrangé
2024-03-19 17:41       ` Kevin Wolf
2024-03-19 20:14         ` Daniel P. Berrangé
2024-03-19 17:55   ` Stefan Hajnoczi [this message]
2024-03-19 20:10     ` Daniel P. Berrangé
2024-03-20 13:35       ` Stefan Hajnoczi
2024-03-20 14:09         ` Daniel P. Berrangé
2024-03-21 12:21           ` Kevin Wolf
2024-03-21 16:59             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240319175510.GA1127203@fedora \
    --to=stefanha@redhat.com \
    --cc=bbenshab@redhat.com \
    --cc=berrange@redhat.com \
    --cc=jmario@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=srao@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).