Re: Unexpected switches to in-band

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Philippe Gerum <rpm@xenomai.org>
To: Giulio Moro <giulio@bela.io>
Cc: "Łukasz Majewski" <lukma@nabladev.com>,
	Xenomai <xenomai@lists.linux.dev>
Subject: Re: Unexpected switches to in-band
Date: Sat, 11 Oct 2025 19:43:55 +0200	[thread overview]
Message-ID: <87y0phqqp0.fsf@xenomai.org> (raw)
In-Reply-To: <87ikgls9kh.fsf@xenomai.org> (Philippe Gerum's message of "Sat, 11 Oct 2025 18:10:54 +0200")

Philippe Gerum <rpm@xenomai.org> writes:

> Philippe Gerum <rpm@xenomai.org> writes:
>
>> Giulio Moro <giulio@bela.io> writes:
>>
>>>> It seems like latmus is trying at some point access some evicted from
>>>> cache memory page...
>>>
>>>> In my case I do use two simple test programs to allocate C++ <vector> -
>>>
>>> Thanks, that put me on a path to reliably reproduce this.
>>>
>>> Swap is disabled. I, then set up the system so that the oom-killer has is easy: it kills the allocating process. This is much faster at executing than going through the list of programs and picking the one with the worst score.
>>>
>>> echo 2 | sudo tee /proc/sys/vm/overcommit_memory
>>> echo 0 | sudo tee /proc/sys/vm/overcommit_ratio
>>> echo 1 | sudo tee /proc/sys/vm/oom_kill_allocating_task
>>>
>>> I then have a C++ program allocating 50MiB, and I run 4 or more instances of it, one per core:
>>>  while sleep 0.1; do ./alloc& ./alloc& ./alloc& ./alloc; done
>>>
>>> Furthermore, I have four instance of dd in the background:
>>>
>>> dd if=/dev/zero of=/dev/null
>>>
>>> With that, I can trigger latmus's inband switch pretty reliably within seconds (e.g.: latmus -m -K -p 360). If instead of latmus I run our application, it seems to be even faster and more reliable at triggering an in-band switch (once I set T_WOSS | T_WOLI | T_WOSX | T_HMSIG for the thread), and sigdebug_marked() confirms it is marked as sigdebug. While running it inside gdb I can inspect the backtrace upon receving the signal and it seems to be happening in seemingly harmless places. Most of the time it happens at some depth inside evl_usleep(), sometimes it happens inside libc's sinf(), sometimes somewhere else in our rt thread. I'd guess I just see it happen at random places, so the fact that it happens more often in evl_usleep() it's just because the thread spends 85% of the time in it. Note that our application never uses raw_copy_from_user(): the only call into the kernel from the real-time thread is via evl_usleep()
>>>
>>> It may be of interest that if I disabled (T_WOSS | T_WOLI | T_WOSX | T_HMSIG) in our application and thus it's free to keep running when receiving an ISW, I can see the number of ISW grows quickly in the first few seconds of execution to something like 20 but then remains constant . Similarly, latmus with -K seems to accumulate several (5 to 10) ISW at the beginning and then proceed without any further ISW for several minutes. They eventually occasionally occur again, but much more sparingly than in the first few seconds.
>>>
>>> For completeness, here's the C++ program I use for testing. I attempt to allocate memory in smaller chunks and get close as close as I can to filling up system memory across the four processes before the oom kills one of them.
>>>
>>> #include <vector>
>>>
>>> int main()
>>> {
>>> 	std::vector<std::vector<char>> all;
>>> 	for(unsigned int n = 0; n < 5; ++n)
>>> 	{
>>> 		all.emplace_back();
>>> 		all.back().resize(10 * 1024 * 1024);
>>> 	}
>>> 	return 0;
>>> }
>>
>> Ok, so it looks like both of you observe the same issue, which seems to
>> be arch-independent. Checking the code which takes care of preventing
>> COW for oob-enabled processes (a behavior which may cause unwanted minor
>> faults in a tricky way), I stumbled upon a really bad bug. Could you
>> check whether the patch below helps?
>>
>> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
>> index 73de18353e79..c6b1efcbd833 100644
>> --- a/include/linux/sched/coredump.h
>> +++ b/include/linux/sched/coredump.h
>> @@ -91,7 +91,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>>  
>>  #define MMF_VM_MERGE_ANY	30
>>  #define MMF_VM_MERGE_ANY_MASK	(1 << MMF_VM_MERGE_ANY)
>> -#define MMF_DOVETAILED		31	/* mm belongs to a dovetailed process */
>> +#define MMF_DOVETAILED		18	/* mm belongs to a dovetailed process */
>>  
>>  #define MMF_TOPDOWN		31	/* mm searches top down by default */
>>  #define MMF_TOPDOWN_MASK	(1 << MMF_TOPDOWN)
>
> Note: this issue only affects kernels from v6.10 and on

Actually, this issue only affects v6.12 among the kernel releases we
currently maintain, v6.16 is fine.

-- 
Philippe.

next prev parent reply	other threads:[~2025-10-11 18:00 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-09  4:33 Unexpected switches to in-band Giulio Moro
2025-10-09 13:17 ` Łukasz Majewski
2025-10-09 19:05   ` Giulio Moro
2025-10-10 10:24     ` Łukasz Majewski
2025-10-10 12:21       ` Giulio Moro
2025-10-10 13:08         ` Łukasz Majewski
2025-10-11  4:25           ` Giulio Moro
2025-10-11 15:55     ` Philippe Gerum
2025-10-11 16:10       ` Philippe Gerum
2025-10-11 16:47         ` Giulio Moro
2025-10-11 16:56           ` Philippe Gerum
2025-10-11 17:15           ` Philippe Gerum
2025-10-11 19:46             ` Giulio Moro
2025-10-12  8:54               ` Philippe Gerum
2025-10-12 14:44               ` Philippe Gerum
2025-10-20  7:47           ` Łukasz Majewski
2025-10-20 12:46             ` Giulio Moro
2025-10-20 14:01               ` Philippe Gerum
2025-10-21 11:13                 ` Łukasz Majewski
2025-10-23 13:54                 ` Łukasz Majewski
2025-10-26 20:04                   ` Philippe Gerum
2025-10-27 11:05                     ` Łukasz Majewski
2025-10-27 11:35                       ` Philippe Gerum
2025-10-27 12:54                         ` Łukasz Majewski
2025-10-27 16:25                       ` Łukasz Majewski
2025-10-27 18:16                         ` Giulio Moro
2025-10-27 22:42                           ` Giulio Moro
2025-10-29  9:18                         ` Philippe Gerum
2025-10-29 13:51                           ` Łukasz Majewski
2025-10-30 12:26                             ` Łukasz Majewski
2025-10-30 16:17                               ` Philippe Gerum
2025-10-31 15:56                                 ` Łukasz Majewski
2025-10-31 16:30                                   ` Philippe Gerum
2025-10-31 17:34                                     ` Jan Kiszka
2025-10-31 18:09                                       ` Philippe Gerum
2025-10-31 18:11                                         ` Philippe Gerum
2025-11-01 11:32                                           ` Łukasz Majewski
2025-11-03  7:57                                           ` Florian Bezdeka
2025-11-03  9:29                                             ` Jan Kiszka
2025-11-01 11:31                                         ` Łukasz Majewski
2025-10-31 18:13                                       ` Philippe Gerum
2025-11-01 15:59                                     ` Łukasz Majewski
2025-11-01 16:33                                       ` Giulio Moro
2025-11-03 14:06                                         ` Philippe Gerum
2025-11-04  7:53                                           ` Łukasz Majewski
2025-11-04  8:19                                             ` Philippe Gerum
2025-11-03 14:00                                       ` Philippe Gerum
2025-10-30 16:26                               ` Philippe Gerum
2025-10-11 17:43         ` Philippe Gerum [this message]
2025-10-11 15:37 ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0phqqp0.fsf@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=giulio@bela.io \
    --cc=lukma@nabladev.com \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.