public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Michal Hocko <mhocko@kernel.org>
Cc: axboe@kernel.dk, linux-bcache@vger.kernel.org,
	linux-block@vger.kernel.org, hare@suse.de, mkoutny@suse.com,
	Oleg Nesterov <oleg@redhat.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/2] bcache: ignore pending signals in bcache_device_init()
Date: Tue, 3 Mar 2020 01:47:14 +0800	[thread overview]
Message-ID: <5693a819-e323-3232-2048-413566c2254f@suse.de> (raw)
In-Reply-To: <20200302172840.GQ4380@dhcp22.suse.cz>

On 2020/3/3 1:28 上午, Michal Hocko wrote:
> On Tue 03-03-20 01:06:38, Coly Li wrote:
>> On 2020/3/2 9:40 下午, Michal Hocko wrote:
> [...]
>>> I cannot really comment for the systemd part but it is quite unexpected
>>> for it to have signals ignored completely.
>>>
>>
>> I see. But so far I don't have better solution to fix this problem.
>> Asking people to do extra configure to udev rules is very tricky, most
>> of common users will be scared. I need to get it fixed by no-extra
>> configuration.
> 
> Well, I believe that having an explicit documentation that large cache
> requires more tim to initialize is quite reasonable way forward. We
> already do have similar situations with memory hotplug on extra large
> machines with a small block size.
>  
>>>>> Is there any problem to simply increase the timeout on the system which
>>>>> uses a large bcache?
>>>>>
>>>>
>>>> At this moment, this is a workaround. Christoph Hellwig also suggests to
>>>> fix kthread_run()/kthread_create(). Now I am looking for method to
>>>> distinct that the parent process is killed by OOM killer and not by
>>>> other processes in kthread_run()/kthread_create(), but the solution is
>>>> not clear to me yet.
>>>
>>> It is really hard to comment on this because I do not have a sufficient
>>> insight but in genereal. The oom victim context can be checked by
>>> tsk_is_oom_victim but kernel threads are subject of the oom killer
>>> because they do not own any address space. I also suspect that none of
>>> the data you allocate for the cache is accounted per any specific
>>> process.
>>
>> You are right, the cached data is not bonded to process, it is bonded to
>> specific backing block devices.
>>
>> In my case, kthread_run()/kthread_create() is called in context of
>> registration process (/lib/udev/bcache-register), so it is unnecessary
>> to worry about kthread address space. So maybe I can check
>> tsk_is_oom_victim to judge whether current process is killing by OOM
>> killer other then simply calling pending_signal() ?
> 
> What exactly are going to do about this situation? If you want to bail
> out then you should simply do that on any pending fatal signal. Why is
> OOM special?
> 

Forget this point here, there is no different to ignore the signal
inside or outside kthread_create()/kthread_run(), they are all not good.

>>>> When meta-data size is around 40GB, registering cache device will take
>>>> around 55 minutes on my machine for current Linux kernel. I have patch
>>>> to reduce the time to around 7~8 minutes but still too long. I may add a
>>>> timeout in bcache udev rule for example 10 munites, but when the cache
>>>> device get large and large, the timeout will be not enough eventually.
>>>>
>>>> As I mentioned, this is a workaround to fix the problem now. Fixing
>>>> kthread_run()/kthread_create() may take longer time for me. If there is
>>>> hint to make it, please offer me.
>>>
>>> My main question is why there is any need to touch the kernel code. You
>>> can still update the systemd/udev timeout AFAIK. This would be the
>>> proper workaround from my (admittedly limited) POV.
>>>
>>
>> I see your concern. But the udev timeout is global for all udev rules, I
>> am not sure whether change it to a very long time is good ... (indeed I
>> tried to add event_timeout=3600 but I can still see the signal received).
> 
> All processes which can finish in the default timeout are not going to
> regress when the forcefull temination happens later. I cannot really
> comment on the systemd part because I am not familiar with internals but
> the mere existence of the timeout suggests that the workflow should be
> prepared for longer timeouts because initialization taking a long time
> is something we have to live with. Something just takes too long to
> initialized.
>  

OK, I am convinced. Documentation work is required then.

>> Ignore the pending signal in bcache registering code is the only method
>> currently I know to avoid bcache auto-register failure in boot time. If
>> there is other way I can achieve the same goal, I'd like to try.
> 
> The problem is that you are effectivelly violating signal delivery
> semantic because you are ignoring user signal without userspace ever
> noticing. That to me sounds like a free ticket to all sorts of
> hard-to-debug problems. What if the userspace expects the signal to be
> handled? Really, you want to increase the existing timeout to workaround
> it. If you can make the initialization process more swift then even
> better.
> 

Copied. Indeed I have more questions here, I will ask in other thread.


>> BTW, by the mean time, I am still looking for the reason why
>> event_timeout=3600 in /etc/udev/udev.conf does not take effect...
> 
> I have a vague recollection that there are mutlitple timeouts and
> setting only some is not sufficient.
> 

Let me try out how to extend udevd timeout.

Thanks for your discussion and suggestion.

-- 

Coly Li

  reply	other threads:[~2020-03-02 17:47 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-02  9:34 [PATCH 0/2] bcache patches for Linux v5.6-rc5 Coly Li
2020-03-02  9:34 ` [PATCH 1/2] bcache: ignore pending signals in bcache_device_init() Coly Li
2020-03-02 12:27   ` Michal Hocko
2020-03-02 13:29     ` Coly Li
2020-03-02 13:40       ` Michal Hocko
2020-03-02 17:06         ` Coly Li
2020-03-02 17:28           ` Michal Hocko
2020-03-02 17:47             ` Coly Li [this message]
2020-03-03  1:22               ` Guoqing Jiang
2020-03-03  1:30                 ` Coly Li
2020-03-03  6:58           ` Сорокин Артем Сергеевич
2020-04-13  8:17             ` Coly Li
2020-03-02 13:49     ` Oleg Nesterov
2020-03-02 17:16       ` Coly Li
2020-03-02 17:19         ` Jens Axboe
2020-03-02 17:32           ` Coly Li
2020-03-02 20:33             ` Jens Axboe
2020-03-03  1:08               ` Coly Li
2020-03-03  7:22             ` Hannes Reinecke
2020-03-03  8:05       ` Michal Hocko
2020-03-03 12:19         ` Oleg Nesterov
2020-03-03 16:03           ` Michal Hocko
2020-03-04 11:36             ` Oleg Nesterov
2020-03-04 11:53               ` Oleg Nesterov
2020-03-04 18:42                 ` Jens Axboe
2020-03-04 11:57               ` Michal Hocko
2020-03-04 12:13                 ` Oleg Nesterov
2020-03-04 12:22                   ` Michal Hocko
2020-03-04 12:33                     ` Oleg Nesterov
2020-03-04 12:41                       ` Michal Hocko
2020-03-04 13:02                         ` Oleg Nesterov
2020-03-04 13:21                           ` Michal Hocko
2020-03-02 15:01     ` Jens Axboe
2020-03-02  9:34 ` [PATCH 2/2] bcache: fix code comments for ignore pending signals Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5693a819-e323-3232-2048-413566c2254f@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox