All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <mchristi@redhat.com>
To: Martin Raiber <martin@urbackup.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [RFC PATCH] Add proc interface to set PF_MEMALLOC flags
Date: Wed, 11 Sep 2019 11:56:56 -0500	[thread overview]
Message-ID: <5D792758.2060706@redhat.com> (raw)
In-Reply-To: <0102016d1f7af966-334f093b-2a62-4baa-9678-8d90d5fba6d9-000000@eu-west-1.amazonses.com>

On 09/11/2019 03:40 AM, Martin Raiber wrote:
> On 10.09.2019 10:35 Damien Le Moal wrote:
>> Mike,
>>
>> On 2019/09/09 19:26, Mike Christie wrote:
>>> Forgot to cc linux-mm.
>>>
>>> On 09/09/2019 11:28 AM, Mike Christie wrote:
>>>> There are several storage drivers like dm-multipath, iscsi, and nbd that
>>>> have userspace components that can run in the IO path. For example,
>>>> iscsi and nbd's userspace deamons may need to recreate a socket and/or
>>>> send IO on it, and dm-multipath's daemon multipathd may need to send IO
>>>> to figure out the state of paths and re-set them up.
>>>>
>>>> In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
>>>> memalloc_*_save/restore functions to control the allocation behavior,
>>>> but for userspace we would end up hitting a allocation that ended up
>>>> writing data back to the same device we are trying to allocate for.
>>>>
>>>> This patch allows the userspace deamon to set the PF_MEMALLOC* flags
>>>> through procfs. It currently only supports PF_MEMALLOC_NOIO, but
>>>> depending on what other drivers and userspace file systems need, for
>>>> the final version I can add the other flags for that file or do a file
>>>> per flag or just do a memalloc_noio file.
>> Awesome. That probably will be the perfect solution for the problem we hit with
>> tcmu-runner a while back (please see this thread:
>> https://www.spinics.net/lists/linux-fsdevel/msg148912.html).
>>
>> I think we definitely need nofs as well for dealing with cases where the backend
>> storage for the user daemon is a file.
>>
>> I will give this patch a try as soon as possible (I am traveling currently).
>>
>> Best regards.
> 
> I had issues with this as well, and work on this is appreciated! In my
> case it is a loop block device on a fuse file system.
> Setting PF_LESS_THROTTLE was the one that helped the most, though, so
> add an option for that as well? I set this via prctl() for the thread
> calling it (was easiest to add to).
> 
> Sorry, I have no idea about the current rationale, but wouldn't it be
> better to have a way to mask a set of block devices/file systems not to
> write-back to in a thread. So in my case I'd specify that the fuse
> daemon threads cannot write-back to the file system and loop device
> running on top of the fuse file system, while all other block
> devices/file systems can be write-back to (causing less swapping/OOM
> issues).

I'm not sure I understood you.

The storage daemons I mentioned normally kick off N threads per M
devices. The threads handle duties like IO and error handling for those
devices. Those threads would set the flag, so those IO/error-handler
related operations do not end up writing back to them. So it works
similar to how storage drivers work in the kernel where iscsi_tcp has an
xmit thread and that does memalloc_noreclaim_save. Only the threads for
those specific devices being would set the flag.

In your case, it sounds like you have a thread/threads that would
operate on multiple devices and some need the behavior and some do not.
Is that right?



  reply	other threads:[~2019-09-11 16:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09 16:28 [RFC PATCH] Add proc interface to set PF_MEMALLOC flags Mike Christie
2019-09-09 18:26 ` Mike Christie
2019-09-10  8:35   ` Damien Le Moal
2019-09-11  8:40     ` Martin Raiber
2019-09-11 16:56       ` Mike Christie [this message]
2019-09-11 19:21         ` Martin Raiber
2019-09-12 16:22           ` Mike Christie
2019-09-12 16:27             ` Mike Christie
2019-09-11  8:43     ` Martin Raiber
2019-09-10 22:12   ` Tetsuo Handa
2019-09-10 23:28     ` Kirill A. Shutemov
2019-09-11 15:23     ` Mike Christie
2019-09-10 10:00 ` Kirill A. Shutemov
2019-09-10 12:05   ` Damien Le Moal
2019-09-10 12:41     ` Kirill A. Shutemov
2019-09-10 13:37       ` Damien Le Moal
2019-09-10 16:06   ` Mike Christie
2019-09-11  3:13   ` Hillf Danton
2019-09-11 10:07     ` Tetsuo Handa
2019-09-11 15:44       ` Mike Christie
2019-09-11 13:52     ` Hillf Danton
2019-09-11 14:20       ` Tetsuo Handa
2019-09-11  8:23 ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D792758.2060706@redhat.com \
    --to=mchristi@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin@urbackup.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.