From: Stefan Hajnoczi <stefanha@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Dongsheng Yang <dongsheng.yang@easystack.cn>,
ceph-devel@vger.kernel.org, vromanso@redhat.com,
kwolf@redhat.com, mimehta@redhat.com, acardace@redhat.com
Subject: Re: rbd kernel block driver memory usage
Date: Thu, 26 Jan 2023 09:36:28 -0500 [thread overview]
Message-ID: <Y9KP7EX9+Ub/StL/@fedora> (raw)
In-Reply-To: <CAOi1vP8+nQMsGPK-SW-FG4C2HAgp76dEHeTEwQ2xxi2oJLH1aA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2213 bytes --]
On Thu, Jan 26, 2023 at 02:48:27PM +0100, Ilya Dryomov wrote:
> On Wed, Jan 25, 2023 at 5:57 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > Hi,
> > What sort of memory usage is expected under heavy I/O to an rbd block
> > device with O_DIRECT?
> >
> > For example:
> > - Page cache: none (O_DIRECT)
> > - Socket snd/rcv buffers: yes
>
> Hi Stefan,
>
> There is a socket open to each OSD (object storage daemon). A Ceph
> cluster may have tens, hundreds or even thousands of OSDs (although the
> latter is rare -- usually folks end up with several smaller clusters
> instead a single large cluster). Under heavy random I/O and given
> a big enough RBD image, it's reasonable to assume that most if not all
> OSDs would be involved and therefore their sessions would be active.
>
> A thing to note is that, by default, OSD sessions are shared between
> RBD devices. So as long as all RBD images that are mapped on a node
> belong to the same cluster, the same set of sockets would be used.
>
> Idle OSD sockets get closed after 60 seconds of inactivity.
>
>
> > - Internal rbd buffers?
> >
> > I am trying to understand how similar Linux rbd block devices behave
> > compared to local block device memory consumption (like NVMe PCI).
>
> RBD doesn't do any internal buffering. Data is read from/written to
> the wire directly to/from BIO pages. The only exception to that is the
> "secure" mode -- built-in encryption for Ceph on-the-wire protocol. In
> that case the data is buffered, partly because RBD obviously can't mess
> with plaintext data in the BIO and partly because the Linux kernel
> crypto API isn't flexible enough.
>
> There is some memory overhead associated with each I/O (OSD request
> metadata encoding, mostly). It's surely larger than in the NVMe PCI
> case. I don't have the exact number but it should be less than 4K per
> I/O in almost all cases. This memory is coming out of private SLAB
> caches and could be reclaimable had we set SLAB_RECLAIM_ACCOUNT on
> them.
Thanks, this information is very useful. I was trying to get a sense of
whether to look deeper into the rbd driver in a OOM kill scenario.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-01-26 14:37 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-25 16:57 rbd kernel block driver memory usage Stefan Hajnoczi
2023-01-26 13:48 ` Ilya Dryomov
2023-01-26 14:36 ` Stefan Hajnoczi [this message]
2023-01-26 15:49 ` Anthony D'Atri
2023-01-27 9:58 ` Ilya Dryomov
2023-01-26 18:14 ` Maged Mokhtar
2023-01-26 21:51 ` Stefan Hajnoczi
2023-01-27 9:40 ` Maged Mokhtar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y9KP7EX9+Ub/StL/@fedora \
--to=stefanha@redhat.com \
--cc=acardace@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dongsheng.yang@easystack.cn \
--cc=idryomov@gmail.com \
--cc=kwolf@redhat.com \
--cc=mimehta@redhat.com \
--cc=vromanso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.