From: Anders Blomdell <anders.blomdell@domain.hid>
To: Jan Kiszka <jan.kiszka@domain.hid>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai-core] Potential problem with rt_eepro100
Date: Thu, 04 Nov 2010 15:53:22 +0100 [thread overview]
Message-ID: <4CD2C8E2.9090608@domain.hid> (raw)
In-Reply-To: <4CD2C50F.1090604@domain.hid>
Jan Kiszka wrote:
> Am 04.11.2010 14:18, Anders Blomdell wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>> Am 04.11.2010 10:26, Jan Kiszka wrote:
>>>>> Am 04.11.2010 10:16, Gilles Chanteperdrix wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> Take a step back and look at the root cause for this issue again. Unlocked
>>>>>>>
>>>>>>> if need-resched
>>>>>>> __xnpod_schedule
>>>>>>>
>>>>>>> is inherently racy and will always be (not only for the remote
>>>>>>> reschedule case BTW).
>>>>>> Ok, let us examine what may happen with this code if we only set the
>>>>>> XNRESCHED bit on the local cpu. First, other bits than XNRESCHED do not
>>>>>> matter, because they can not change under our feet. So, we have two
>>>>>> cases for this race:
>>>>>> 1- we see the XNRESCHED bit, but it has been cleared once nklock is
>>>>>> locked in __xnpod_schedule.
>>>>>> 2- we do not see the XNRESCHED bit, but it get set right after we test it.
>>>>>>
>>>>>> 1 is not a problem.
>>>>> Yes, as long as we remove the debug check from the scheduler code (or
>>>>> fix it somehow). The scheduling code already catches this race.
>>>>>
>>>>>> 2 is not a problem, because anything which sets the XNRESCHED (it may
>>>>>> only be an interrupt in fact) bit will cause xnpod_schedule to be called
>>>>>> right after that.
>>>>>>
>>>>>> So no, no race here provided that we only set the XNRESCHED bit on the
>>>>>> local cpu.
>>>>>>
>>>>>> So we either have to accept this and remove the
>>>>>>> debugging check from the scheduler or push the check back to
>>>>>>> __xnpod_schedule where it once came from. When this it cleaned up, we
>>>>>>> can look into the remote resched protocol again.
>>>>>> The problem of the debug check is that it checks whether the scheduler
>>>>>> state is modified without the XNRESCHED bit being set. And this is the
>>>>>> problem, because yes, in that case, we have a race: the scheduler state
>>>>>> may be modified before the XNRESCHED bit is set by an IPI.
>>>>>>
>>>>>> If we want to fix the debug check, we have to have a special bit, on in
>>>>>> the sched->status flag, only for the purpose of debugging. Or remove the
>>>>>> debug check.
>>>>> Exactly my point. Is there any benefit in keeping the debug check? The
>>>>> code to make it work may end up as "complex" as the logic it verifies,
>>>>> at least that's my current feeling.
>>>>>
>>>> This would be the radical approach of removing the check (and cleaning
>>>> up some bits). If it's acceptable, I would split it up properly.
>>> This debug check saved our asses when debugging SMP issues, and I
>>> suspect it may help debugging skin issues. So, I think we should try and
>>> keep it.
>>>
>>>
>>> At first sight, here you are more breaking things than cleaning them.
>> Still, it has the SMP record for my test program, still runs with ftrace
>> on (after 2 hours, where it previously failed after maximum 23 minutes).
>
> My version was indeed still buggy, I'm reworking it ATM.
Any reason why the two changes below would fail (I need to get things
working real soon now).
>
>> If I get the gist of Jan's changes, they are (using the IPI to transfer
>> one bit of information: your cpu needs to reschedule):
>>
>> xnsched_set_resched:
>> - setbits((__sched__)->status, XNRESCHED);
>>
>> xnpod_schedule_handler:
>> + xnsched_set_resched(sched);
>>
>> If you (we?) decide to keep the debug checks, under what circumstances
>> would the current check trigger (in laymans language, that I'll be able
>> to understand)?
>
> That's actually what /me is wondering as well. I do not see yet how you
> can reliably detect a missed reschedule reliably (that was the purpose
> of the debug check) given the racy nature between signaling resched and
> processing the resched hints.
The only thing I can think of are atomic set/clear on an independent
variable.
/Anders
next prev parent reply other threads:[~2010-11-04 14:53 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4CC82C8D.3080808@domain.hid>
[not found] ` <4CC84327.9070202@domain.hid>
2010-10-28 7:34 ` [Xenomai-core] [RTnet-users] Potential problem with rt_eepro100 Anders Blomdell
2010-10-28 7:40 ` Jan Kiszka
2010-10-28 9:34 ` Anders Blomdell
2010-10-28 10:18 ` Jan Kiszka
2010-10-28 13:02 ` [Xenomai-core] " Anders Blomdell
2010-10-28 15:05 ` Anders Blomdell
2010-10-28 15:09 ` Jan Kiszka
2010-10-28 15:18 ` Anders Blomdell
2010-10-28 15:34 ` Jan Kiszka
2010-10-29 17:42 ` Anders Blomdell
2010-10-29 18:06 ` Jan Kiszka
2010-10-29 19:29 ` Anders Blomdell
2010-11-01 16:55 ` Anders Blomdell
2010-11-03 8:17 ` Jan Kiszka
2010-11-03 10:33 ` Anders Blomdell
2010-11-03 11:44 ` Anders Blomdell
2010-11-03 11:50 ` Jan Kiszka
2010-11-03 11:55 ` Jan Kiszka
2010-11-03 12:07 ` Anders Blomdell
2010-11-03 12:17 ` Jan Kiszka
2010-11-03 13:40 ` Anders Blomdell
2010-11-03 16:02 ` Anders Blomdell
2010-11-03 16:46 ` Anders Blomdell
2010-11-03 16:53 ` Jan Kiszka
2010-11-03 19:38 ` Anders Blomdell
2010-11-03 20:41 ` Philippe Gerum
2010-11-03 22:03 ` Jan Kiszka
2010-11-03 22:11 ` Jan Kiszka
2010-11-03 22:56 ` Jan Kiszka
2010-11-03 23:11 ` Gilles Chanteperdrix
2010-11-03 23:15 ` Jan Kiszka
2010-11-03 23:18 ` Gilles Chanteperdrix
2010-11-03 23:41 ` Jan Kiszka
2010-11-03 23:44 ` Gilles Chanteperdrix
2010-11-03 23:49 ` Jan Kiszka
2010-11-03 23:56 ` Gilles Chanteperdrix
2010-11-04 0:06 ` Jan Kiszka
2010-11-04 0:13 ` Gilles Chanteperdrix
2010-11-04 7:30 ` Jan Kiszka
2010-11-04 8:45 ` Anders Blomdell
2010-11-04 9:10 ` Jan Kiszka
2010-11-04 9:17 ` Gilles Chanteperdrix
2010-11-04 9:16 ` Gilles Chanteperdrix
2010-11-04 9:18 ` Gilles Chanteperdrix
2010-11-04 9:26 ` Jan Kiszka
2010-11-04 9:32 ` Jan Kiszka
2010-11-04 10:42 ` Anders Blomdell
2010-11-04 12:39 ` Gilles Chanteperdrix
2010-11-04 13:18 ` Anders Blomdell
2010-11-04 14:37 ` Jan Kiszka
2010-11-04 14:53 ` Anders Blomdell [this message]
2010-11-04 15:33 ` Jan Kiszka
2010-11-04 22:08 ` Gilles Chanteperdrix
2010-11-04 23:10 ` Jan Kiszka
2010-11-04 23:25 ` Gilles Chanteperdrix
2010-11-04 23:32 ` Jan Kiszka
2010-11-04 23:46 ` Gilles Chanteperdrix
2010-11-05 0:09 ` Jan Kiszka
2010-11-05 0:11 ` Gilles Chanteperdrix
2010-11-05 1:35 ` Gilles Chanteperdrix
2010-11-05 9:59 ` Anders Blomdell
2010-11-04 22:06 ` Gilles Chanteperdrix
2010-11-04 23:17 ` Jan Kiszka
2010-11-04 23:24 ` Gilles Chanteperdrix
2010-11-04 23:35 ` Jan Kiszka
2010-11-05 1:28 ` Gilles Chanteperdrix
2010-11-05 10:21 ` Anders Blomdell
2010-11-06 0:27 ` Gilles Chanteperdrix
2010-11-06 20:26 ` Anders Blomdell
2010-11-06 20:37 ` Gilles Chanteperdrix
2010-11-06 22:49 ` Philippe Gerum
2010-11-07 1:00 ` Jan Kiszka
2010-11-07 8:31 ` Gilles Chanteperdrix
2010-11-07 9:46 ` Jan Kiszka
2010-11-07 9:57 ` Gilles Chanteperdrix
2010-11-07 10:00 ` Jan Kiszka
2010-11-07 10:03 ` Philippe Gerum
2010-11-07 10:08 ` Jan Kiszka
2010-11-07 10:12 ` Gilles Chanteperdrix
2010-11-07 10:14 ` Jan Kiszka
2010-11-07 10:49 ` Philippe Gerum
2010-11-07 9:46 ` Philippe Gerum
2010-11-11 15:46 ` Gilles Chanteperdrix
2010-11-12 15:36 ` Jan Kiszka
2010-11-13 18:31 ` Gilles Chanteperdrix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CD2C8E2.9090608@domain.hid \
--to=anders.blomdell@domain.hid \
--cc=jan.kiszka@domain.hid \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.