Embedded Linux development

Embedded Linux development
 help / color / mirror / Atom feed

* Re: Why is the deferred initcall patch not mainline?
From: Rob Landley @ 2014-10-23 18:37 UTC (permalink / raw)
  To: Bird, Tim, Nicolas Pitre
  Cc: Grant Likely, Borislav Petkov, Geert Uytterhoeven,
	linux-embedded@vger.kernel.org, Dirk Behme, Alexandre Belloni,
	challinan@gmail.com
In-Reply-To: <F5184659D418E34EA12B1903EE5EF5FD01BCD433E688@seldmbx02.corpusers.net>

On 10/23/14 12:21, Bird, Tim wrote:
> On Wednesday, October 22, 2014 8:49 AM, Nicolas Pitre [nico@fluxnic.net] wrote:
>> On Wed, 22 Oct 2014, Rob Landley wrote:
>> Otherwise the standard hotplug notification mechanism is already
>> available.
> 
> I'm not sure why this attention to reading the status.  The salient feature
> here is that the initializations are deferred until user space tells the kernel
> to proceed.  It's the initiation of the trigger from user-space that matters.
> The whole purpose of this feature is to defer some driver initializations until
> the product can get into a state where it is already ready to perform it's primary
> function.  Only user space knows when that is.
> 
> There seems to be a desire to have an automatic mechanism for triggering
> the deferred initializations.  I'm OK with this, as long as there's some reasonable
> use case for it.  There are lots of possible trigger mechanisms, including just
> a simple timer, but I think it's important that the primary use case of 
> 'trigger-when-user-space-says-to' is still supported.

The patches were reference but not (re-?)posted. People were talking
about waiting for the "real root filesystem" to show up, which strike me
as the wrong approach. Glad to hear the patch series is taking a better one.

> This code is really intended for a very specialized kernel configuration, where all
> the modules are statically linked, and indeed module loading itself is turned off. 
> I think that's a minority of Linux deployments out there.

Yeah, but not as rare as you're implying. That's how I build most of my
systems, for example.

Modules mean you need bits of the kernel to live in the root filesystem
image (and to match it exactly due to stable-api-nonsense.txt), which
complicates both build and upgrade. Unloading modules has never really
been properly supported, so there's no actual size or complexity
advantage to modules: you need it once and the resource is consumed
until next reboot. And of course there's security fun (spraying it down
with cryptography makes it "awkward" more than "safe", and doesn't
change that you now have a multimode kernel that sometimes does one
thing and sometimes does another).

Not Going There with modules is a valid response for embedded systems if
I want to know what I'm deploying.

> This configuration
> implies some other attributes, like configuration for very small size and/or very
> fast boot, where KALLSYMS may not be present, and other kernel features may
> not be available as well.

A new feature can have requirements. Not every existing deployment can
take advantage of any given new feature anyway. (Your _biggest_ blocker
will be that they're using a ${VENDOR:-broadcom} BSP that's stuck on
2.6.32 in 2014 and upgrading to a kernel version less than 5 years old
will never happen as long as you source hardware from vendors that fork
software rather than getting support upstream.)

> Indeed, in the smallest systems /proc or /sys may not
> be there, so an alternative (maybe a sysctl or even a new syscall) might be
> appropriate. 

A) Those don't interest me. As far as I'm concerned, they're not Linux.

B) If you propose a new syscall for this, it will never be merged. The
mechanism they implemented for this sort of thing is sysfs and hotplug.

> Quite frankly, the hacky way this is often done is to make stuff like this a
> one-time side effect of a rarely called syscall (like sync).  Please note I'm not
> recommending this for mainline, just pointing out there are interesting ways
> that embedded developers just make the existing code work for their weird
> cases.
> 
> Maybe there are some use cases for doing deferred initializations, particularly
> automatically, for systems that do have modules turned on (i.e. for modules
> that are, in that case, still statically linked to the kernel for whatever reason).
> I would welcome some discussion of these, to select an appropriate trigger
> mechanism for those cases.  But we should not let the primary purpose of this
> feature get lost in that discussion.

I thought it was common to defer at least some device probing until the
/dev node got opened. Which is a chicken and egg problem with regards to
the dev node showing up so you _can_ open them, which screwed up devfs
to the point of unworkability, and the answer to that was sysfs. So
having sysfs trigger deferred init from userspace makes perfect sense,
doing it that way means history is on your side and the kernel guys are
more likely to approve because it smells like what they've already done.

>   -- Tim

Rob

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-10-23 19:05 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: Bird, Tim, Rob Landley, Grant Likely, Borislav Petkov,
	Geert Uytterhoeven, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <20141023181336.GC10477@piout.net>

On Thu, 23 Oct 2014, Alexandre Belloni wrote:

> On 23/10/2014 at 13:56:44 -0400, Nicolas Pitre wrote :
> > On Thu, 23 Oct 2014, Bird, Tim wrote:
> > 
> > > I'm not sure why this attention to reading the status.  The salient feature
> > > here is that the initializations are deferred until user space tells the kernel
> > > to proceed.  It's the initiation of the trigger from user-space that matters.
> > > The whole purpose of this feature is to defer some driver initializations until
> > > the product can get into a state where it is already ready to perform it's primary
> > > function.  Only user space knows when that is.
> > 
> > This is still a rather restrictive view of the problem IMHO.
> > 
> > Let's step back a bit. Your concern is that some initcalls are taking 
> > too long and preventing user space from executing early, right?  I'm 
> > suggesting that they no longer prevent user space from executing 
> > earlier.  Why would you then still want an explicit trigger from user 
> > space?
> > 
> > > There seems to be a desire to have an automatic mechanism for triggering
> > > the deferred initializations.  I'm OK with this, as long as there's some reasonable
> > > use case for it.  There are lots of possible trigger mechanisms, including just
> > > a simple timer, but I think it's important that the primary use case of 
> > > 'trigger-when-user-space-says-to' is still supported.
> > 
> > Why a trigger?  I'm suggesting no trigger at all is needed.
> > 
> > Let all initcalls start initializing whenever they can.  Simply that 
> > they shouldn't prevent user space from running early.
> > 
> > Because initcalls are running in parallel, then they must be using 
> > separate kernel threads.  It may be possible to adjust their priority so 
> > if one of them is actually using a lot of CPU cycles then it will run 
> > only when all the other threads (including user space) are no longer 
> > running.
> > 
> 
> You probably can't do that without introducing race conditions. A number
> of userspace libraries and script are actually expecting init and probe
> to be synchronous.

They already have to cope with the fact that most things can be 
available through not-yet-loaded modules, or may never be there at all. 
If not then they should be fixed.

And if you do rely on such a feature for your small embedded 
system then you won't have that many libs and scripts to fix.

> I will refer to the async probe discussion and the
> following thread:
> 
> http://thread.gmane.org/gmane.linux.kernel/1781529

I still don't think that is a good idea at all.  This async probe 
concept requires a trigger from user space and that opens many cans of 
worms as user space now has to be aware of specific kernel driver 
modules, their ordering dependencies, etc.

My point is simply not to defer any initialization at all.  This way you 
don't have to select which module or initcall to send a trigger for 
later on.

Once again, what is the actual problem you want to solve?  If it is 
about making sure user space can execute ASAP then _that_ should be the 
topic, not figuring out how to implement a particular solution.

> Anyway, your userspace will have to have a way to know what has been
> initialized.

Hotplug notifications via dbus.

> On my side, I was also using that mechanism to delay the network stack 
> init but I still want to know when my dhcp client can start for 
> example.

Ditto.  And not only do you want to know when the network stack is 
initialized, but you also need to wait for a link to be established 
before DHCP can work.


Nicolas

^ permalink raw reply

* RE: Why is the deferred initcall patch not mainline?
From: Bird, Tim @ 2014-10-23 20:10 UTC (permalink / raw)
  To: Nicolas Pitre, Alexandre Belloni
  Cc: Rob Landley, Grant Likely, Borislav Petkov, Geert Uytterhoeven,
	linux-embedded@vger.kernel.org, Dirk Behme, challinan@gmail.com
In-Reply-To: <alpine.LFD.2.11.1410231420080.6969@knanqh.ubzr>

On Thursday, October 23, 2014 12:05 PM, Nicolas Pitre wrote:
>
> On Thu, 23 Oct 2014, Alexandre Belloni wrote:
>
> > On 23/10/2014 at 13:56:44 -0400, Nicolas Pitre wrote :
> > > On Thu, 23 Oct 2014, Bird, Tim wrote:
> > >
> > > > I'm not sure why this attention to reading the status.  The salient feature
> > > > here is that the initializations are deferred until user space tells the kernel
> > > > to proceed.  It's the initiation of the trigger from user-space that matters.
> > > > The whole purpose of this feature is to defer some driver initializations until
> > > > the product can get into a state where it is already ready to perform it's primary
> > > > function.  Only user space knows when that is.
> > >
> > > This is still a rather restrictive view of the problem IMHO.
> > >
> > > Let's step back a bit. Your concern is that some initcalls are taking
> > > too long and preventing user space from executing early, right?
Well,  not exactly.

That is not the exact problem we're trying to solve, although it is close.
The problem is not that users-space doesn't start early enough, per se,
it's that there are a set of drivers statically linked to the kernel that are
not needed until after (possibly well after) user space starts.
Any cycles whatsoever being spent on those drivers (either in their
initialization routines, or in processing them or scheduling them)
impairs the primary function of the device.  On a very old presentation
I gave on this, the use case I gave was getting a picture of a baby's smile.
USB drivers are NOT needed for this, but they *are* needed for full
product operation.

In some cases, the system may want to defer initialization of some drivers
until explicit action through the user interface.  So the trigger may not be
called until well after boot is "completed".

> > > I'm suggesting that they no longer prevent user space from executing
> > > earlier.  Why would you then still want an explicit trigger from user
> > > space?
Because only the user space knows when it is now OK to initialize those
drivers, and begin using CPU cycles on them.

> > >
> > > > There seems to be a desire to have an automatic mechanism for triggering
> > > > the deferred initializations.  I'm OK with this, as long as there's some reasonable
> > > > use case for it.  There are lots of possible trigger mechanisms, including just
> > > > a simple timer, but I think it's important that the primary use case of
> > > > 'trigger-when-user-space-says-to' is still supported.
> > >
> > > Why a trigger?  I'm suggesting no trigger at all is needed.
> > >
> > > Let all initcalls start initializing whenever they can.  Simply that
> > > they shouldn't prevent user space from running early.
> > >
> > > Because initcalls are running in parallel, then they must be using
> > > separate kernel threads.  It may be possible to adjust their priority so
> > > if one of them is actually using a lot of CPU cycles then it will run
> > > only when all the other threads (including user space) are no longer
> > > running.
> > >
> >
> > You probably can't do that without introducing race conditions. A number
> > of userspace libraries and script are actually expecting init and probe
> > to be synchronous.
>
> They already have to cope with the fact that most things can be
> available through not-yet-loaded modules, or may never be there at all.
> If not then they should be fixed.
>
> And if you do rely on such a feature for your small embedded
> system then you won't have that many libs and scripts to fix.
>
> > I will refer to the async probe discussion and the
> > following thread:
> >
> > http://thread.gmane.org/gmane.linux.kernel/1781529
>
> I still don't think that is a good idea at all.  This async probe
> concept requires a trigger from user space and that opens many cans of
> worms as user space now has to be aware of specific kernel driver
> modules, their ordering dependencies, etc.
>
> My point is simply not to defer any initialization at all.  This way you
> don't have to select which module or initcall to send a trigger for
> later on.

If you are going to avoid having a sub-set of modules consume
CPU cycles in early boot, you're going to have to identify them somehow.
How do you propose to enumerate the modules to defer (or
de-prioritize, as the case may be)?

Note that this solution should work on UP systems, were there is
essentially a zero-sum game on using CPU cycles at boot.

>
> Once again, what is the actual problem you want to solve?  If it is
> about making sure user space can execute ASAP then _that_ should be the
> topic, not figuring out how to implement a particular solution.

See above.  The actual problem is that we want some sub-set of statically
linked drivers to not consume any cycles during a period of time defined
by user space.  This is rather trivial to accomplish with modules, and the
proposed implementation tries to provide similar functionality for a statically
linked kernel.  I'm open to discussing solutions other than the particular
implementation proposed, just not ones that don't actually solve that problem.

> > Anyway, your userspace will have to have a way to know what has been
> > initialized.
>
> Hotplug notifications via dbus.

In the original code, it was return from a write system call.  No dbus required.

I hope I'm clarifying the desired functionality here, and not just appearing
obstinate and unwilling to discuss alternate solutions.
 -- Tim



^ permalink raw reply

* RE: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-10-23 20:50 UTC (permalink / raw)
  To: Bird, Tim
  Cc: Alexandre Belloni, Rob Landley, Grant Likely, Borislav Petkov,
	Geert Uytterhoeven, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <F5184659D418E34EA12B1903EE5EF5FD01BCD433E68C@seldmbx02.corpusers.net>

On Thu, 23 Oct 2014, Bird, Tim wrote:

> On Thursday, October 23, 2014 12:05 PM, Nicolas Pitre wrote:
> >
> > On Thu, 23 Oct 2014, Alexandre Belloni wrote:
> >
> > > On 23/10/2014 at 13:56:44 -0400, Nicolas Pitre wrote :
> > > > On Thu, 23 Oct 2014, Bird, Tim wrote:
> > > >
> > > > > I'm not sure why this attention to reading the status.  The salient feature
> > > > > here is that the initializations are deferred until user space tells the kernel
> > > > > to proceed.  It's the initiation of the trigger from user-space that matters.
> > > > > The whole purpose of this feature is to defer some driver initializations until
> > > > > the product can get into a state where it is already ready to perform it's primary
> > > > > function.  Only user space knows when that is.
> > > >
> > > > This is still a rather restrictive view of the problem IMHO.
> > > >
> > > > Let's step back a bit. Your concern is that some initcalls are taking
> > > > too long and preventing user space from executing early, right?
> Well,  not exactly.
> 
> That is not the exact problem we're trying to solve, although it is close.
> The problem is not that users-space doesn't start early enough, per se,
> it's that there are a set of drivers statically linked to the kernel that are
> not needed until after (possibly well after) user space starts.
> Any cycles whatsoever being spent on those drivers (either in their
> initialization routines, or in processing them or scheduling them)
> impairs the primary function of the device.  On a very old presentation
> I gave on this, the use case I gave was getting a picture of a baby's smile.
> USB drivers are NOT needed for this, but they *are* needed for full
> product operation.

As I suggested earlier, those cycles spent on those drivers may be 
deferred to a moment when the CPU has nothing else to do anyway by 
giving a lower priority to the threads handling them.

> In some cases, the system may want to defer initialization of some drivers
> until explicit action through the user interface.  So the trigger may not be
> called until well after boot is "completed".

In that case the "trigger" for initializing those drivers should be the 
first time they're accessed from user space.  That could be the very 
first time libusb or similar tries to enumerate available USB devices 
for example.  No special interface needed.

> > > > I'm suggesting that they no longer prevent user space from executing
> > > > earlier.  Why would you then still want an explicit trigger from user
> > > > space?
> Because only the user space knows when it is now OK to initialize those
> drivers, and begin using CPU cycles on them.

So what?  That is still not a good answer.

User space shouldn't have to care as long as it has all the CPU cycles 
it wants in priority.  But as soon as user space relinquishes the CPU 
then there is no reason why driver initialization couldn't take over 
until user space is made runnable again.

[...]
> > My point is simply not to defer any initialization at all.  This way you
> > don't have to select which module or initcall to send a trigger for
> > later on.
> 
> If you are going to avoid having a sub-set of modules consume
> CPU cycles in early boot, you're going to have to identify them somehow.
> How do you propose to enumerate the modules to defer (or
> de-prioritize, as the case may be)?

Anything that is not involved with making the root fs available.

> Note that this solution should work on UP systems, were there is
> essentially a zero-sum game on using CPU cycles at boot.

The scheduler knows how to prioritize things on UP as well.  The top 
priority thread will always go to sleep at some point allowing other 
threads to run. But I'm sure you know all that.

> > Once again, what is the actual problem you want to solve?  If it is
> > about making sure user space can execute ASAP then _that_ should be the
> > topic, not figuring out how to implement a particular solution.
> 
> See above.  The actual problem is that we want some sub-set of statically
> linked drivers to not consume any cycles during a period of time defined
> by user space. 

Once again you're defining a solution (i.e. not consume any cycles ...) 
rather than the problem motivating this particular solution. That's not 
how you're going to have something merged upstream.

And I'm not saying your solution is completely bad either if you're 
looking for the simplest way and willing to keep it to yourself.  What 
I'm saying is that there are other possible solutions that could solve 
your initial problem _and_ be acceptable to mainline... but they're 
unlikely to look like what you have now.

Nicolas

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Rob Landley @ 2014-10-23 22:01 UTC (permalink / raw)
  To: Nicolas Pitre, Alexandre Belloni
  Cc: Bird, Tim, Grant Likely, Borislav Petkov, Geert Uytterhoeven,
	linux-embedded@vger.kernel.org, Dirk Behme, challinan@gmail.com
In-Reply-To: <alpine.LFD.2.11.1410231420080.6969@knanqh.ubzr>

On 10/23/14 14:05, Nicolas Pitre wrote:
> On Thu, 23 Oct 2014, Alexandre Belloni wrote:
> 
>> On 23/10/2014 at 13:56:44 -0400, Nicolas Pitre wrote :
>>> On Thu, 23 Oct 2014, Bird, Tim wrote:
>>> Why a trigger?  I'm suggesting no trigger at all is needed.
>>>
>>> Let all initcalls start initializing whenever they can.  Simply that 
>>> they shouldn't prevent user space from running early.
>>>
>>> Because initcalls are running in parallel, then they must be using 
>>> separate kernel threads.  It may be possible to adjust their priority so 
>>> if one of them is actually using a lot of CPU cycles then it will run 
>>> only when all the other threads (including user space) are no longer 
>>> running.
>>>
>>
>> You probably can't do that without introducing race conditions. A number
>> of userspace libraries and script are actually expecting init and probe
>> to be synchronous.
> 
> They already have to cope with the fact that most things can be 
> available through not-yet-loaded modules, or may never be there at all. 
> If not then they should be fixed.
> 
> And if you do rely on such a feature for your small embedded 
> system then you won't have that many libs and scripts to fix.

There are userspace libraries distinguishing between init and probe?
I.E. treating them as two separate things already?

So how were they accessing them as two separate things before this patch
set?

>> I will refer to the async probe discussion and the
>> following thread:
>>
>> http://thread.gmane.org/gmane.linux.kernel/1781529
> 
> I still don't think that is a good idea at all.  This async probe 
> concept requires a trigger from user space and that opens many cans of 
> worms as user space now has to be aware of specific kernel driver 
> modules, their ordering dependencies, etc.
> 
> My point is simply not to defer any initialization at all.  This way you 
> don't have to select which module or initcall to send a trigger for 
> later on.

Why would this be hard?

for i in $(find /sys/module -name initstate)
do
  [ "$(cat $i)" != live ] && echo "kick" > $i
done

And I'm confused that you're concerned about init order so your solution
is to do nothing, thereby preserving the existing init order which could
not _possibly_ be exposed verbatim to userspace...

> Once again, what is the actual problem you want to solve?  If it is 
> about making sure user space can execute ASAP then _that_ should be the 
> topic, not figuring out how to implement a particular solution.
> 
>> Anyway, your userspace will have to have a way to know what has been
>> initialized.
> 
> Hotplug notifications via dbus.

Wait, we need a _third_ mechanism for hotplug notifications now? (The
/proc/sys/kernel/hotplug helper, netlink, and you want another one?)

>> On my side, I was also using that mechanism to delay the network stack 
>> init but I still want to know when my dhcp client can start for 
>> example.
> 
> Ditto.  And not only do you want to know when the network stack is 
> initialized, but you also need to wait for a link to be established 
> before DHCP can work.

Um, doesn't the existing hotplug mechanism _already_ give us
notification that eth0 and similar showed up? (Pretty sure I hit that
while poking at mdev, although it was a while ago...)

Increasingly confused,

Rob

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Rob Landley @ 2014-10-23 22:37 UTC (permalink / raw)
  To: Nicolas Pitre, Bird, Tim
  Cc: Alexandre Belloni, Grant Likely, Borislav Petkov,
	Geert Uytterhoeven, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <alpine.LFD.2.11.1410231616570.6969@knanqh.ubzr>

On 10/23/14 15:50, Nicolas Pitre wrote:
> On Thu, 23 Oct 2014, Bird, Tim wrote:
> 
>> On Thursday, October 23, 2014 12:05 PM, Nicolas Pitre wrote:
>>>
>>> On Thu, 23 Oct 2014, Alexandre Belloni wrote:
>>>
>>>> On 23/10/2014 at 13:56:44 -0400, Nicolas Pitre wrote :
>>>>> On Thu, 23 Oct 2014, Bird, Tim wrote:
>>>>>
>>>>>> I'm not sure why this attention to reading the status.  The salient feature
>>>>>> here is that the initializations are deferred until user space tells the kernel
>>>>>> to proceed.  It's the initiation of the trigger from user-space that matters.
>>>>>> The whole purpose of this feature is to defer some driver initializations until
>>>>>> the product can get into a state where it is already ready to perform it's primary
>>>>>> function.  Only user space knows when that is.
>>>>>
>>>>> This is still a rather restrictive view of the problem IMHO.
>>>>>
>>>>> Let's step back a bit. Your concern is that some initcalls are taking
>>>>> too long and preventing user space from executing early, right?
>> Well,  not exactly.
>>
>> That is not the exact problem we're trying to solve, although it is close.
>> The problem is not that users-space doesn't start early enough, per se,
>> it's that there are a set of drivers statically linked to the kernel that are
>> not needed until after (possibly well after) user space starts.
>> Any cycles whatsoever being spent on those drivers (either in their
>> initialization routines, or in processing them or scheduling them)
>> impairs the primary function of the device.  On a very old presentation
>> I gave on this, the use case I gave was getting a picture of a baby's smile.
>> USB drivers are NOT needed for this, but they *are* needed for full
>> product operation.
> 
> As I suggested earlier, those cycles spent on those drivers may be 
> deferred to a moment when the CPU has nothing else to do anyway by 
> giving a lower priority to the threads handling them.

Unless you're using realtime priorities your kernel will spend about 5%
of its time servicing the lowest priority threads no matter what you do,
to avoid priority inversion lockups of the kind that cost us a mars
probe back in the 90's.

http://research.microsoft.com/en-us/um/people/mbj/Mars_Pathfinder/Authoritative_Account.html

Doing hardware probing at low priorities can cause really _fun_ latency
spikes in the system as something grabs a lock and then sleeps. (And
doing this at the realtime scheduling where it won't do that translates
those latency spikes into the aforementioned hard lockup, so not
actually a solution per se.)

Trying to fix this in the general case is the priority inheritance
problem, and last I heard was really hard. Maybe it's been fixed in the
past few years and I hadn't noticed. (The rise of SMP made it a less
pressing issue, but system bringup is its own little world.)

The reliable fix to priority inversion is to let low priority jobs still
get a decent crack at the CPU so clogs clear themselves naturally. And
this means that scheduling it down as far as it goes does _not_ simply
make low priority jobs go away.

>> In some cases, the system may want to defer initialization of some drivers
>> until explicit action through the user interface.  So the trigger may not be
>> called until well after boot is "completed".
> 
> In that case the "trigger" for initializing those drivers should be the 
> first time they're accessed from user space.

Which gets us back to one of the big reasons <strike>systemd</strike>
devfsd failed years ago: you have to probe the hardware in order to know
which /dev nodes to create, so you can't have accessing the /dev node
probe the hardware. (There's no /dev node for a usb controller...)

> That could be the very
> first time libusb or similar tries to enumerate available USB devices 
> for example.  No special interface needed.

So now you're requiring libusb enumerating usb devices, when before this
you could just reach out and open /dev/ttyUSB0 and it would be there.

This is an embedded solution?

>>>>> I'm suggesting that they no longer prevent user space from executing
>>>>> earlier.  Why would you then still want an explicit trigger from user
>>>>> space?
>> Because only the user space knows when it is now OK to initialize those
>> drivers, and begin using CPU cycles on them.
> 
> So what?  That is still not a good answer.

Why?

I believe Tim's proposal was to take a category of existing device
probing, one already done on a background thread, and wait to start it
until userspace says "go". That's about as nonintrusive a change as you get.

You're talking about requiring weird arbitrary things to have side effects.

> User space shouldn't have to care as long as it has all the CPU cycles 
> it wants in priority.

That's not how scheduling works. The realtime people have been trying to
make scheduling work that wasy for _years_ and it's still a flaming pain
to use their stuff without hard lockups and weird inexplicable dropouts.

> But as soon as user space relinquishes the CPU 
> then there is no reason why driver initialization couldn't take over 
> until user space is made runnable again.

There is an entire academic literature on this. Google "priority inversion".

> [...]
>>> My point is simply not to defer any initialization at all.  This way you
>>> don't have to select which module or initcall to send a trigger for
>>> later on.
>>
>> If you are going to avoid having a sub-set of modules consume
>> CPU cycles in early boot, you're going to have to identify them somehow.
>> How do you propose to enumerate the modules to defer (or
>> de-prioritize, as the case may be)?
> 
> Anything that is not involved with making the root fs available.

If you're running in initramfs we haven't necessarily done _any_ driver
probing yet. That's what initramfs is for. You can put device firmware
in there so static drivers can make hotplug firmware loading requests to
userspce during their device programming. (It's one of those usermode
helper callback things.)

>> Note that this solution should work on UP systems, were there is
>> essentially a zero-sum game on using CPU cycles at boot.
> 
> The scheduler knows how to prioritize things on UP as well.  The top 
> priority thread will always go to sleep at some point allowing other 
> threads to run. But I'm sure you know all that.

The top priority threads will get preempted.

(Did you follow any of the work Con Kolivas and company were doing a few
years ago?)

Rob

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-10-24  0:31 UTC (permalink / raw)
  To: Rob Landley
  Cc: Alexandre Belloni, Bird, Tim, Grant Likely, Borislav Petkov,
	Geert Uytterhoeven, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <54497AB8.10609@landley.net>

On Thu, 23 Oct 2014, Rob Landley wrote:

> On 10/23/14 14:05, Nicolas Pitre wrote:
> > On Thu, 23 Oct 2014, Alexandre Belloni wrote:
> > 
> >> On 23/10/2014 at 13:56:44 -0400, Nicolas Pitre wrote :
> >>> On Thu, 23 Oct 2014, Bird, Tim wrote:
> >>> Why a trigger?  I'm suggesting no trigger at all is needed.
> >>>
> >>> Let all initcalls start initializing whenever they can.  Simply that 
> >>> they shouldn't prevent user space from running early.
> >>>
> >>> Because initcalls are running in parallel, then they must be using 
> >>> separate kernel threads.  It may be possible to adjust their priority so 
> >>> if one of them is actually using a lot of CPU cycles then it will run 
> >>> only when all the other threads (including user space) are no longer 
> >>> running.
> >>>
> >>
> >> You probably can't do that without introducing race conditions. A number
> >> of userspace libraries and script are actually expecting init and probe
> >> to be synchronous.
> > 
> > They already have to cope with the fact that most things can be 
> > available through not-yet-loaded modules, or may never be there at all. 
> > If not then they should be fixed.
> > 
> > And if you do rely on such a feature for your small embedded 
> > system then you won't have that many libs and scripts to fix.
> 
> There are userspace libraries distinguishing between init and probe?
> I.E. treating them as two separate things already?

Why not?

> So how were they accessing them as two separate things before this patch
> set?

Before engaging a conversation with a device, you verify if it exists 
first?

> >> I will refer to the async probe discussion and the
> >> following thread:
> >>
> >> http://thread.gmane.org/gmane.linux.kernel/1781529
> > 
> > I still don't think that is a good idea at all.  This async probe 
> > concept requires a trigger from user space and that opens many cans of 
> > worms as user space now has to be aware of specific kernel driver 
> > modules, their ordering dependencies, etc.
> > 
> > My point is simply not to defer any initialization at all.  This way you 
> > don't have to select which module or initcall to send a trigger for 
> > later on.
> 
> Why would this be hard?
> 
> for i in $(find /sys/module -name initstate)
> do
>   [ "$(cat $i)" != live ] && echo "kick" > $i
> done

You should have a look at /sys/bus/*/*probe then.  Maybe it does what 
you need already.

> And I'm confused that you're concerned about init order so your solution
> is to do nothing, thereby preserving the existing init order which could
> not _possibly_ be exposed verbatim to userspace...

The kernel already has the deferred probe mechanism to cope with the 
init ordering which, as experience has shown, may only be dealt with at 
run time.  All attempts to create that ordering statically in the past 
have failed.  So what do you want exposed verbatim to user space again?

> > Once again, what is the actual problem you want to solve?  If it is 
> > about making sure user space can execute ASAP then _that_ should be the 
> > topic, not figuring out how to implement a particular solution.
> > 
> >> Anyway, your userspace will have to have a way to know what has been
> >> initialized.
> > 
> > Hotplug notifications via dbus.
> 
> Wait, we need a _third_ mechanism for hotplug notifications now? (The
> /proc/sys/kernel/hotplug helper, netlink, and you want another one?)

No, I actually meant hotplug and netlink.  My bad.

> >> On my side, I was also using that mechanism to delay the network stack 
> >> init but I still want to know when my dhcp client can start for 
> >> example.
> > 
> > Ditto.  And not only do you want to know when the network stack is 
> > initialized, but you also need to wait for a link to be established 
> > before DHCP can work.
> 
> Um, doesn't the existing hotplug mechanism _already_ give us
> notification that eth0 and similar showed up? (Pretty sure I hit that
> while poking at mdev, although it was a while ago...)

Indeed it does. So no new user space notification mechanisms are needed 
which is my point.


Nicolas

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-10-24  0:36 UTC (permalink / raw)
  To: Rob Landley
  Cc: Bird, Tim, Alexandre Belloni, Grant Likely, Borislav Petkov,
	Geert Uytterhoeven, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <54498344.8080507@landley.net>

On Thu, 23 Oct 2014, Rob Landley wrote:

> Doing hardware probing at low priorities can cause really _fun_ latency
> spikes in the system as something grabs a lock and then sleeps. (And
> doing this at the realtime scheduling where it won't do that translates
> those latency spikes into the aforementioned hard lockup, so not
> actually a solution per se.)
> 
> Trying to fix this in the general case is the priority inheritance
> problem, and last I heard was really hard. Maybe it's been fixed in the
> past few years and I hadn't noticed. (The rise of SMP made it a less
> pressing issue, but system bringup is its own little world.)
> 
I know you're a smart *ss.  But:

1) All this is not about fixing the RT scheduler for the general case.

2) System bring-up being its own world may have special scheduling 
   treatment that doesn't necessarily have to be RT.

3) You, too, conveniently avoided to define the initial problem so far.
   That makes for rather sterile conversations about alternative 
   solutions that could score higher on the mainline acceptance scale.

> >> In some cases, the system may want to defer initialization of some drivers
> >> until explicit action through the user interface.  So the trigger may not be
> >> called until well after boot is "completed".
> > 
> > In that case the "trigger" for initializing those drivers should be the 
> > first time they're accessed from user space.
> 
> Which gets us back to one of the big reasons <strike>systemd</strike>
> devfsd failed years ago: you have to probe the hardware in order to know
> which /dev nodes to create, so you can't have accessing the /dev node
> probe the hardware. (There's no /dev node for a usb controller...)

There is /sys/bus/usb/devices that could be accessed in order to trigger 
the initial setup and probe.  It is most likely that libusb does that, 
but this could be made to work with a simple 'cat' or 'touch' invocation 
as well.

> > That could be the very first time libusb or similar tries to 
> > enumerate available USB devices for example.  No special interface 
> > needed.
> 
> So now you're requiring libusb enumerating usb devices, when before this
> you could just reach out and open /dev/ttyUSB0 and it would be there.

You can't just "reach out" with the deferred initcall scheme either, do 
you?

> This is an embedded solution?
> 
> >>>>> I'm suggesting that they no longer prevent user space from executing
> >>>>> earlier.  Why would you then still want an explicit trigger from user
> >>>>> space?
> >> Because only the user space knows when it is now OK to initialize those
> >> drivers, and begin using CPU cycles on them.
> > 
> > So what?  That is still not a good answer.
> 
> Why?
> 
> I believe Tim's proposal was to take a category of existing device
> probing, one already done on a background thread, and wait to start it
> until userspace says "go". That's about as nonintrusive a change as you get.

You might still be able to do better.

If you really want to be non intrusive, you could e.g. make those 
background threads into SIGSTOP and let user space SIGCONT them as it 
sees fit.  No new special interfaces needed.

> You're talking about requiring weird arbitrary things to have side effects.

Like if stalling arbitrary initcalls wouldn't have side effects?

What I'm suggesting is to let the system do its thing the most efficient 
way while giving a strong bias to running user space first.  How 
arbitrarily weird can that be?

> If you're running in initramfs we haven't necessarily done _any_ driver
> probing yet. That's what initramfs is for. You can put device firmware
> in there so static drivers can make hotplug firmware loading requests to
> userspce during their device programming. (It's one of those usermode
> helper callback things.)

True if you need firmware, or if you want to actually load modules to 
get to the root fs device.  Otherwise all built-in driver init functions 
have been called and waited for at that point.

> >> Note that this solution should work on UP systems, were there is
> >> essentially a zero-sum game on using CPU cycles at boot.
> > 
> > The scheduler knows how to prioritize things on UP as well.  The top 
> > priority thread will always go to sleep at some point allowing other 
> > threads to run. But I'm sure you know all that.
> 
> The top priority threads will get preempted.
> 
> (Did you follow any of the work Con Kolivas and company were doing a few
> years ago?)

Yeah... and I also notice it is still maintained, still out of mainline.

As you know already, you can do anything you want on your own.  That's 
granted by the GPL.

Nicolas

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Rob Landley @ 2014-10-24 19:38 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Bird, Tim, Alexandre Belloni, Grant Likely, Borislav Petkov,
	Geert Uytterhoeven, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <alpine.LFD.2.11.1410231847400.6969@knanqh.ubzr>

On 10/23/14 19:36, Nicolas Pitre wrote:
> On Thu, 23 Oct 2014, Rob Landley wrote:
> 3) You, too, conveniently avoided to define the initial problem so far.
>    That makes for rather sterile conversations about alternative 
>    solutions that could score higher on the mainline acceptance scale.

With modules, you can already defer large portions of kernel-side system
bringup until userspace is ready for them. With static linking, you can't.

This patch series sounds like it lets static drivers hold off their
initialization until userspace sends them an insmod-equivalent event
through sysfs, possibly with associated arguments since the module
codepath already implements that so exposing it through the new
mechanism in the static linking case would be trivial.

Seems conceptually fairly straightforward to me, but I'm just guessing
since nobody's yet linked to the patches during this thread (that I've
noticed).

>>>> In some cases, the system may want to defer initialization of some drivers
>>>> until explicit action through the user interface.  So the trigger may not be
>>>> called until well after boot is "completed".
>>>
>>> In that case the "trigger" for initializing those drivers should be the 
>>> first time they're accessed from user space.
>>
>> Which gets us back to one of the big reasons <strike>systemd</strike>
>> devfsd failed years ago: you have to probe the hardware in order to know
>> which /dev nodes to create, so you can't have accessing the /dev node
>> probe the hardware. (There's no /dev node for a usb controller...)
> 
> There is /sys/bus/usb/devices that could be accessed in order to trigger 
> the initial setup and probe.  It is most likely that libusb does that, 
> but this could be made to work with a simple 'cat' or 'touch' invocation 
> as well.

Please let me know which "devices" to trigger to launch an encrypted
ramdisk driver that has nontrivial setup because it needs to generate
keys (and collect enough entropy to do so). Or how about a driver that
programs a set of gpio pins to a specific behavior, obviously that's
triggered by examining the hardware.

A module can produce multiple /dev nodes from one piece of hardware, a
piece of hardware can produce no dev nodes (speaking of usb, the actual
bus-level driver), dev nodes may not have any associated hardware but
still require setup (/dev/urandom if you care about the quality of the
entropy pool)...

This is why devfs didn't work. You're trying to do this at the wrong
level. If you want to defer a module's init, doing so at the _module_
level is the only coherent way to do it.

>>> That could be the very first time libusb or similar tries to 
>>> enumerate available USB devices for example.  No special interface 
>>> needed.
>>
>> So now you're requiring libusb enumerating usb devices, when before this
>> you could just reach out and open /dev/ttyUSB0 and it would be there.
> 
> You can't just "reach out" with the deferred initcall scheme either, do 
> you?

You can already can do this with modules. Just don't insmod until you're
ready.

Right now the implementation ties together "the code is in kernel" with
"the code starts running", so you can't both statically link the module
and control when it starts doing stuff. That really _seems_ like it's
just an implementation detail: decoupling them so the code is in kernel
but doesn't call its init function until userspace tells it to does not
sound like a huge conceptual stretch.

Is there an actual reason to invent a whole new unrelated thing instead?

>> This is an embedded solution?
>>
>>>>>>> I'm suggesting that they no longer prevent user space from executing
>>>>>>> earlier.  Why would you then still want an explicit trigger from user
>>>>>>> space?
>>>> Because only the user space knows when it is now OK to initialize those
>>>> drivers, and begin using CPU cycles on them.
>>>
>>> So what?  That is still not a good answer.
>>
>> Why?
>>
>> I believe Tim's proposal was to take a category of existing device
>> probing, one already done on a background thread, and wait to start it
>> until userspace says "go". That's about as nonintrusive a change as you get.
> 
> You might still be able to do better.

We have a mechanism available in one context. Would you rather make that
mechanism available in another context, or design a whole new mechanism
from scratch?

> If you really want to be non intrusive, you could e.g. make those 
> background threads into SIGSTOP and let user space SIGCONT them as it 
> sees fit.  No new special interfaces needed.

We have an existing module namespace, and existing mechanisms that use
it to control this sort of thing. Are you suggesting a lookup mechanism
that says "here's the threat that would be initializing this module if
we hadn't started the thread SIGSTOP"? (With each one in its own thread
so you have the same level of granularity the existing mechanism provides?)

>> You're talking about requiring weird arbitrary things to have side effects.
> 
> Like if stalling arbitrary initcalls wouldn't have side effects?

You're arguing that modules, as the exist today, couldn't possibly work.

Modules exist today. Somehow the system survives having them loaded long
after boot time. (The system even lets you load a whole separate kernel
through kexec and then _not_ immediately reboot into it, for disaster
recovery purposes and crash dumps and such.)

> What I'm suggesting is to let the system do its thing the most efficient 
> way while giving a strong bias to running user space first.  How 
> arbitrarily weird can that be?

I'm suggesting "we have all this module infrastructure, it's not
currently hooked up to work with static linking which is a thing people
legitimately want to do, it doesn't sound like much work to _make_ it
work there, so why would you invent a whole new thing?"

Honestly, the biggest change discussed so far is adding a fourth letter
to menuconfig's "tristate" entries to say "static linking, but wait to
start it running until userspace pokes something under /sys/modules".

The kernel guys previously leveraged all this infrastructure, even in
the static linked case, to make suspend work. There's precedent for this
sort of thing.

>> If you're running in initramfs we haven't necessarily done _any_ driver
>> probing yet. That's what initramfs is for. You can put device firmware
>> in there so static drivers can make hotplug firmware loading requests to
>> userspce during their device programming. (It's one of those usermode
>> helper callback things.)
> 
> True if you need firmware, or if you want to actually load modules to 
> get to the root fs device.  Otherwise all built-in driver init functions 
> have been called and waited for at that point.

The difference between "built-in driver" and "loaded module" is kind of
arbitrary in this context. I could write a driver with a stub init
function that just adds a sysfs file, and then have the exact same
startup code called when you cat a file in sysfs. (The kernel devs would
freak if you did that on a per-module basis, modifying numerous existing
modules to do that would be frowned on, and without the kconfig changes
you'd be hardwiring configuration choices into the source code which is
just wrong. But it's not actually a big change in terms of amount of
code to make it _work_, unless I'm missing something really obvious.)

>>>> Note that this solution should work on UP systems, were there is
>>>> essentially a zero-sum game on using CPU cycles at boot.
>>>
>>> The scheduler knows how to prioritize things on UP as well.  The top 
>>> priority thread will always go to sleep at some point allowing other 
>>> threads to run. But I'm sure you know all that.
>>
>> The top priority threads will get preempted.
>>
>> (Did you follow any of the work Con Kolivas and company were doing a few
>> years ago?)
> 
> Yeah... and I also notice it is still maintained, still out of mainline.

The fact the dysfunctional kernel development process burned out yet
another developer doesn't say much about the problem that developer was
trying to solve:

http://apcmag.com/why_i_quit_kernel_developer_con_kolivas.htm

Here's the story of squashfs taking seven years to get into mainline,
including its author taking a year off from work (living off savings) to
finally get it into Linus's tree _after_ it was already in every major
distro:

https://lwn.net/Articles/563578/

That says way more about the kernel development process being
dysfunctional than it does about squashfs. (Or Con's scheduling work.)

Squashfs is our idea of a _success_ story. Val Henson (now Aurora)
quitting to cofound the Ada Initiative (and thus union mounts falling by
the wayside) was not because of the union mounts problem space or
implementation quality. Alan Cox didn't stop maintaining his tree,
switch his blog to welsh, and bugger off for a year to get an MBA
because there was something wrong with his tree, it was a result of his
interactions with Linus. (I was tangentially involved behind the scenes
on that due to that "patch penguin" thing, got some non-public info
that's now old news.)

I myself spent over five years getting the perl removal patches accepted
into mainline, and the gap between my first "why can't initramfs use
tmpfs so going cat /dev/zero > /bigfile isn't guaranteed to bring down
the system?" and me actually getting the patches merged was close to a
decade. In neither case did the gap have anything to do with the actual
merits of any code or design idea in question.

The kernel clique circling the wagons is actually fairly old news:

http://www.zdnet.com/graying-linux-developers-look-for-new-blood-7000020026/

Such social reasons are part of why "make existing module mechanism
available in new context" seems (to me) more likely to work than
"reinvent devfs". But ymmv...

> As you know already, you can do anything you want on your own.  That's 
> granted by the GPL.

I'm pretty sure I could have done anything I wanted on my own with
System 6 unix in the 1970's (modulo being 7 years old), since the BSD
guys _did_ and their stuff is still around (and is powering obscure
things like the iPhone). And I learned C in 1989 to apply "mod" files to
the WWIV bulletin board system (an open source development community
that didn't even have the "patch" program).

But by all means, credit the GPL for the existence of open source.
Apache would never have become the dominant webserver without the GPL,
nobody would use or develop openssh or dropbear under non-GPL licenses,
our userspace successes like firefox and chrome are clearly because of
the GPL, x.org could fork away from xfree86 because of the GPL...

Did you notice that there's no such thing as "the GPL" anymore? Linux
and Samba implement two ends of the same protocol, each one is GPL, and
they can't share code. Poor QEMU wants to suck GPL processor definitions
out of binutils/gdb to emulate processors and GPL driver code out of
Linux to emulate devices, and there _is_ no license that allows it to
combine code from both sources. (Making qemu "GPLv2 or later" means it
couldn't accept code from _either_ source.)

> Nicolas

I'm going to recuse myself from the rest of this thread because I'm
clearly getting annoyed with us talking past each other. Somebody's got
an actual patch (which they still haven't linked to). I'll shut up and
let them show you the code.

Rob

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Geert Uytterhoeven @ 2014-10-24 20:28 UTC (permalink / raw)
  To: Rob Landley
  Cc: Nicolas Pitre, Bird, Tim, Alexandre Belloni, Grant Likely,
	Borislav Petkov, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <544AAA9F.8060808@landley.net>

On Fri, Oct 24, 2014 at 9:38 PM, Rob Landley <rob@landley.net> wrote:
> I'm going to recuse myself from the rest of this thread because I'm
> clearly getting annoyed with us talking past each other. Somebody's got
> an actual patch (which they still haven't linked to). I'll shut up and
> let them show you the code.

Several patches are linked from
http://elinux.org/Deferred_Initcalls

Latest version is
http://elinux.org/images/5/51/0001-Port-deferred-initcalls-to-3.10.patch

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-10-24 21:05 UTC (permalink / raw)
  To: Rob Landley
  Cc: Bird, Tim, Alexandre Belloni, Grant Likely, Borislav Petkov,
	Geert Uytterhoeven, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <544AAA9F.8060808@landley.net>

On Fri, 24 Oct 2014, Rob Landley wrote:

> On 10/23/14 19:36, Nicolas Pitre wrote:
> 
> > As you know already, you can do anything you want on your own.  That's 
> > granted by the GPL.
> 
> I'm pretty sure I could have done anything I wanted on my own with
> System 6 unix in the 1970's (modulo being 7 years old), since the BSD
> guys _did_ and their stuff is still around (and is powering obscure
> things like the iPhone).

Incidentally there is this thing called Linux powering similarly obscur 
curiosities such as Android, and outnumbering iPhones in terms of units 
shipped.

So what's your point again?

> And I learned C in 1989 to apply "mod" files to the WWIV bulletin 
> board system (an open source development community that didn't even 
> have the "patch" program).

You needed to pay a license to get the WWIV source code.  At least that 
was the case when I was a sysop in 1993.

> But by all means, credit the GPL for the existence of open source.

Oh my!  Obviously that's exactly what I did, right?

And now you want me to take what you say seriously?

The impression I get from your diatribe is that you might be living in 
the past.  I don't dispute the fact that You had issues with the Linux 
community before, but one has to admit that a _lot_ of people don't. And 
I'm lucky enough to be one of them, and in that context I was trying to 
help.

> Did you notice that there's no such thing as "the GPL" anymore? Linux
> and Samba implement two ends of the same protocol, each one is GPL, and
> they can't share code. Poor QEMU wants to suck GPL processor definitions
> out of binutils/gdb to emulate processors and GPL driver code out of
> Linux to emulate devices, and there _is_ no license that allows it to
> combine code from both sources. (Making qemu "GPLv2 or later" means it
> couldn't accept code from _either_ source.)

And now we're far far away from $subject that started this thread.  
This is going nowhere.

> I'm going to recuse myself from the rest of this thread because I'm
> clearly getting annoyed with us talking past each other. Somebody's got
> an actual patch (which they still haven't linked to). I'll shut up and
> let them show you the code.

On that I agree with you.

Nicolas

^ permalink raw reply

* Create/Package kernel headers for self-compiled kernel
From: Benedikt Kleinmeier @ 2014-10-27  9:27 UTC (permalink / raw)
  To: 'linux-embedded@vger.kernel.org'

Hi,

I run a host/target development environment. On a Debian host, I patch the Linux kernel 3.2.60 with the RT patch and compile it. Afterwards, I move config, System.map, initrd and vmlinuz files to a Debian target.

Now, I would like to compile a kernel module on the target, but the header files are missing.

Linux' Makefile contains the target "headers_install" but this only installs the headers for user space programs under "/usr/include/linux". I need the files referenced by "/lib/modules/<version>/build". Some of the referenced files are also created during Linux' "make" process, e.g. .config, include and Makefile. For pre-compiled kernels in Debian, the headers are combined to a package called "linux-headers-<version>". See here https://packages.debian.org/wheezy/amd64/linux-headers-3.2.0-4-amd64/filelist.

Is there a distribution-independent way to create or package these files?

Thank you very much.

-- 
Benedikt Kleinmeier
Fraunhofer-Institut für Eingebettete Systeme und Kommunikationstechnik ESK

Hansastraße 32 | 80686 München 
Telefon, Fax: +49 89 547088-0 | +49 89 547088-220
E-Mail: benedikt.kleinmeier@esk.fraunhofer.de
Internet:
http://www.esk.fraunhofer.de 
http://www.twitter.com/FraunhoferESK
http://www.facebook.com/FraunhoferESK 

^ permalink raw reply

* AW: Create/Package kernel headers for self-compiled kernel
From: Benedikt Kleinmeier @ 2014-10-27  9:53 UTC (permalink / raw)
  To: 'linux-embedded@vger.kernel.org'; +Cc: 'Nicholas Mc Guire'
In-Reply-To: <20141027093934.GA9894@opentech.at>

Hi Nicholas,

> apt-get install kernel-package (toosl for building kernel .deb)
> 
> CONCURRENCY_LEVEL=6 make-kpkg --initrd  kernel-image kernel-headers

Thanks for the hint with Debian's "kernel-package". I will give it a try.

Nonetheless, I would prefer a distribution-independent way of packaging the header files because today I have to use Debian but tomorrow I will have to use another distribution.

> the other question is why do you not build the modules on the host ? no
> reason to build that on the target box.

You are right. It's just for a quick test. In the long run, I build all stuff on the host environment.

Thanks a lot,
Benedikt

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-10-27 20:29 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rob Landley, Bird, Tim, Alexandre Belloni, Grant Likely,
	Borislav Petkov, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <CAMuHMdVhfNAVdB3iHcyfXfjk1ES-bX9Xm5ySKND7qsHTFNL72Q@mail.gmail.com>

On Fri, 24 Oct 2014, Geert Uytterhoeven wrote:

> Several patches are linked from
> http://elinux.org/Deferred_Initcalls
> 
> Latest version is
> http://elinux.org/images/5/51/0001-Port-deferred-initcalls-to-3.10.patch

In the hope of providing some constructive and concrete feedback to this 
thread, here's what I have to say about the patch linked above ( I 
looked only at the latest version):

- Commented out code is not acceptable for mainline. But everyone knows 
  that already.

- Returning a null byte through the /proc file is dubious.

- The /proc interface is probably not the best. I'd go with an entry in 
  /sys/kernel instead.

- If the deferred_initcall section is empty, this could return 1 upfront 
  and do the free_initmem() earlier as it used to.

- It was mentioned somewhere that the config system could use a 4th 
  state in addition to n, m and y.  That would be required before this 
  goes upstream simply to express all the dependencies between modules.  
  Right now if a core module is configured with m, then all the 
  submodules that depend on it inherit the modular-only restriction.  
  The same would need to be enforced for deferred initcalls.

- Currently all deferred initcalls are lumped together in a single 
  section with no regards to the original initcall level. This is likely 
  to cause trouble if two initcalls are called in a different order than 
  intended. Nothing prevents that from happening right now.

This patch is still not generic enough for mainline inclusion IMHO.  It 
currently falls in the "you better know what you're doing" category and 
that is possibly good enough for its actual users.  Trying to make this 
more generic is going to require some more work.  And this would have to 
come with serious arguments explaining why simply using modules in the 
first place is not acceptable.


Nicolas

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Alexandre Belloni @ 2014-10-27 21:37 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Geert Uytterhoeven, Rob Landley, Bird, Tim, Grant Likely,
	Borislav Petkov, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <alpine.LFD.2.11.1410271546010.18007@knanqh.ubzr>

On 27/10/2014 at 16:29:10 -0400, Nicolas Pitre wrote :
> On Fri, 24 Oct 2014, Geert Uytterhoeven wrote:
> 
> > Several patches are linked from
> > http://elinux.org/Deferred_Initcalls
> > 
> > Latest version is
> > http://elinux.org/images/5/51/0001-Port-deferred-initcalls-to-3.10.patch
> 
> In the hope of providing some constructive and concrete feedback to this 
> thread, here's what I have to say about the patch linked above ( I 
> looked only at the latest version):
> 
> - Commented out code is not acceptable for mainline. But everyone knows 
>   that already.
> 
> - Returning a null byte through the /proc file is dubious.
> 
> - The /proc interface is probably not the best. I'd go with an entry in 
>   /sys/kernel instead.
> 
> - If the deferred_initcall section is empty, this could return 1 upfront 
>   and do the free_initmem() earlier as it used to.
> 
> - It was mentioned somewhere that the config system could use a 4th 
>   state in addition to n, m and y.  That would be required before this 
>   goes upstream simply to express all the dependencies between modules.  
>   Right now if a core module is configured with m, then all the 
>   submodules that depend on it inherit the modular-only restriction.  
>   The same would need to be enforced for deferred initcalls.
> 
> - Currently all deferred initcalls are lumped together in a single 
>   section with no regards to the original initcall level. This is likely 
>   to cause trouble if two initcalls are called in a different order than 
>   intended. Nothing prevents that from happening right now.
> 
> This patch is still not generic enough for mainline inclusion IMHO.  It 
> currently falls in the "you better know what you're doing" category and 
> that is possibly good enough for its actual users.  Trying to make this 
> more generic is going to require some more work.  And this would have to 
> come with serious arguments explaining why simply using modules in the 
> first place is not acceptable.
> 

That one is easy, you simply can't compile the network stack as a
module and it is huge.

I completely agree with all your arguments and I'm not sure it is worth
making it foolproof.

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Tim Bird @ 2014-10-29 23:49 UTC (permalink / raw)
  To: Nicolas Pitre, Geert Uytterhoeven
  Cc: Rob Landley, Alexandre Belloni, Grant Likely, Borislav Petkov,
	linux-embedded@vger.kernel.org, Dirk Behme, challinan@gmail.com,
	Grant Likely
In-Reply-To: <alpine.LFD.2.11.1410271546010.18007@knanqh.ubzr>

On 10/27/2014 01:29 PM, Nicolas Pitre wrote:
> On Fri, 24 Oct 2014, Geert Uytterhoeven wrote:
> 
>> Several patches are linked from
>> http://elinux.org/Deferred_Initcalls
>>
>> Latest version is
>> http://elinux.org/images/5/51/0001-Port-deferred-initcalls-to-3.10.patch
> 
> In the hope of providing some constructive and concrete feedback to this 
> thread, here's what I have to say about the patch linked above ( I 
> looked only at the latest version):
> 
> - Commented out code is not acceptable for mainline. But everyone knows 
>   that already.
> 
> - Returning a null byte through the /proc file is dubious.
> 
> - The /proc interface is probably not the best. I'd go with an entry in 
>   /sys/kernel instead.
> 
> - If the deferred_initcall section is empty, this could return 1 upfront 
>   and do the free_initmem() earlier as it used to.
> 
> - It was mentioned somewhere that the config system could use a 4th 
>   state in addition to n, m and y.  That would be required before this 
>   goes upstream simply to express all the dependencies between modules.  
>   Right now if a core module is configured with m, then all the 
>   submodules that depend on it inherit the modular-only restriction.  
>   The same would need to be enforced for deferred initcalls.
> 
> - Currently all deferred initcalls are lumped together in a single 
>   section with no regards to the original initcall level. This is likely 
>   to cause trouble if two initcalls are called in a different order than 
>   intended. Nothing prevents that from happening right now.
> 
> This patch is still not generic enough for mainline inclusion IMHO.  It 
> currently falls in the "you better know what you're doing" category and 
> that is possibly good enough for its actual users.  Trying to make this 
> more generic is going to require some more work.  And this would have to 
> come with serious arguments explaining why simply using modules in the 
> first place is not acceptable.

Sorry to take so long to reply.  This feedback is very welcome,
and I appreciate the time taken to review the patch.  I
apologize in advance for the rather long response...

I have been thinking about the points you made previously,
and have given the problem space some more thought.  I agree
that as it stands this is a very niche solution, and it would
be good to think about the broader picture and how things
might be designed differently to make the "feature" usable
more easily and to a broader group.

Taking a step back, the overall goal is to allow user space
to do stuff while the kernel is still initializing statically
linked drivers, so the device's primary function can be ready
as soon as possible (and not wait for secondarily-needed
functionality to initialize). For things that are able to be
made into a module (and for situations where the kernel module
loading is turned on), this feature should not be needed in
its current form.  In that case, user space already has control
over module load ordering and timing.

The way the feature is expressed in the current code is that a
set of drivers are marked for deferred initialization (I'll refer
to this as issue 0).  Then, at boot: 1) most drivers are initialized
normally, 2) user space is started, and then 3) user space indicates
to the kernel that the deferred drivers should be initialized.

This is very coarse, allowing only two categories of drivers: (ignoring
other boot phases for the moment) - regular drivers and deferred drivers.
It also requires source code changes to mark the drivers to be deferred.
Finally, it requires an explicit notification from user-space to complete
the process.  All of these attributes are undesirable.

There may also be an opportunity here to work out more granular driver
load ordering, which would benefit other systems (especially those that
are hitting the EPROBE_DEFER issue).

As it stands now, the ordering of the initcalls within a particular level
is pretty much arbitrary (determined by link order, usually without oversight
by the developer).  Just FYI, here are some numbers culled from a recent
kernel:

initcall macro		number of instances in kernel source
--------------		------------------------------------
early_init		446
core_init		614
postcore_init		150
arch_init		751
subsys_init		573
fs_init			1372
device_init		1211
late_init		440

I'm going to rattle off a few ideas - I'm not sure which ones might
stick,  I just want to bounce these around and see what people think.
Note that I didn't think of most of these, but I'm just repeating ones
that have been stated, and adding a few thoughts of my own.

First, if the ordering of initialization is not the default
provided by the kernel, it needs to be recorded somewhere.  A developer
needs to express it (or a tool needs to figure it out), but if it is
going to be reflected in the final kernel behaviour (or image), the
kernel needs it at boot time (if not compile time).  The current
initcall system hardcodes a "level" for each driver initialization
routine in the source code itself, by putting it in the macro
name for each init routine.  There can
only be one such order expressed in the code itself.

For developers who wish to express another order (or priority), a
new mechanism will need to be used.  If possible, I strongly prefer
putting this into the KCONFIG system, as that is where other details
about kernel configuration are stored, and there are pre-existing tools
for dealing with the format.  I am hesitant to create a special language
or config format for this (unless it is much simpler than adding something
to Kconfig).  As Nicolas pointed out, Kconfig already has information
about dependencies in terms of not allowing a driver to be a module
if a dependent module is statically linked. Having the tool warn for
violations of that ordering would be valuable.

Possibly, we could use a fourth driver state ('D' for deferred), but
this still only allows very coarse ordering granularity.
How about if we added a numeric value for each driver, and had the macro
somehow use that number in ordering or deferring the driver initialization?
Say we supported order groups 0-9, with order 8 and 9 being deferred?

So we could add something like:
CONFIG_USB_EHCI_HCD_INITORDER=9

Here are some questions...
Do all driver initialization routines have a corresponding config
variable? Also, do we really want to manually add all these CONFIG
items?  Is there a way to allow expressing a config item like this,
automatically, without having to create each one in a Kconfig file?
Is the set of routines that we might want to defer small enough that
we could get by with just defining only a specific set of these
(rather than for all possible drivers and initcalls)?  
Can we get by with just listing exceptions to default ordering, or
is something more comprehensive required?

Another possibility is a binary post-processor, which reorders
the initcall tables in the kernel, after the compile has finished.
So, rather than relying on the compiler, there would be a separate
tool to modify the kernel binary to have the desired init ordering.
The initcall macro could be extended to provide input to this tool,
and the tool could read a separate configuration file indicating
the routines that should be reordered in the boot sequence.

Another idea would be to make the starting of user-space it's own
initialization routine, which was not necessarily started as the last thing
after all other statically linked driver initializations.  Then, it
could begin operation before other drivers were initialized. It's init
order could be controlled using the same mechanism as other initcalls.

Right now, user space starts as if it were a late_initcall, with an
INITORDER=9, but if this were configurable, that might solve a lot
of the problem.  A developer could push the order of user-space start
earlier into the initialization sequence, if they needed to.

If stricter ordering was required, such as making sure user-space
got cycles before other drivers, then the threads managing such
initializations would need to be prioritized.  Maybe user space
could elevate it's scheduling priority, or a configuration item
could indicate a high starting scheduling priority, so that user
space would be guaranteed to run before other (lower-priority)
init routines. This would allow lower-priority initializations
to proceed in piecemeal fashion (using up cycles whenever the
high-priority user-space was not busy).  The "trigger"
for allowing low-priority initializations to proceed could then be
something like the user-space thread lowering its scheduling priority
back to "normal".  This would use already-existing syscalls, and
would not require a /sys or /proc trigger mechanism.

I'm not sure if the problem drivers (USB and networking) are
interruptible during their init routines (especially on UP machines).
This would need to be tested, to see if they can start in the
background and not cause a big delay to the higher priority task.

Grant Likely suggested deferring the ordering decision in a
way that allowed it to be expressed at runtime rather than at
compile-time. That, I think, would require a more substantial
rework of the initcall system, probably requiring to make it
text-driven.  It does have the possibility of solving some
other driver init ordering problems that are now being
addressed with EPROBE_DEFER.  My guess is that making the initcall
system text-driven would increase the size of it to a degree
that it would make more sense just to turn on the loadable
module system.  But I'm open to ideas how this might be done
efficiently.  I don't see how this could be done in a binary
fashion, as I'm pretty sure Grant would intend for this
ordering information to live outside a particular binary
instance of the kernel (similar to device tree).

I think a lot of this is what Nicolas was getting at last week,
and I didn't understand the ideas he was putting forth. Since
this is a niche case, it may not be worth rewriting the
initcall system to handle it.  But I'm interested in whether
people think this is worth working on or not.  This patch *has*
been useful (and used), so there's clearly an unfulfilled need.
And maybe this discussion can result in a solution that is more
general and amenable to mainlining.

Thanks for listening.
 -- Tim

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Geert Uytterhoeven @ 2014-10-30  8:51 UTC (permalink / raw)
  To: Tim Bird
  Cc: Nicolas Pitre, Rob Landley, Alexandre Belloni, Grant Likely,
	Borislav Petkov, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <54517D1A.4030206@sonymobile.com>

On Thu, Oct 30, 2014 at 12:49 AM, Tim Bird <tim.bird@sonymobile.com> wrote:
> The way the feature is expressed in the current code is that a
> set of drivers are marked for deferred initialization (I'll refer
> to this as issue 0).  Then, at boot: 1) most drivers are initialized
> normally, 2) user space is started, and then 3) user space indicates
> to the kernel that the deferred drivers should be initialized.

One (IMHO important) point in the current implementation is that the call
to free_initmem() is also delayed until after initialization of the
deferred drivers.

This is different from modular drivers, which are loaded after free_initmem().

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-11-02  2:37 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Tim Bird, Rob Landley, Alexandre Belloni, Grant Likely,
	Borislav Petkov, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <CAMuHMdWFMa2ZYCY53XdfDeyhPtYOmbEBvV1bXG5NuOZ3YAS2sg@mail.gmail.com>

On Thu, 30 Oct 2014, Geert Uytterhoeven wrote:

> On Thu, Oct 30, 2014 at 12:49 AM, Tim Bird <tim.bird@sonymobile.com> wrote:
> > The way the feature is expressed in the current code is that a
> > set of drivers are marked for deferred initialization (I'll refer
> > to this as issue 0).  Then, at boot: 1) most drivers are initialized
> > normally, 2) user space is started, and then 3) user space indicates
> > to the kernel that the deferred drivers should be initialized.
> 
> One (IMHO important) point in the current implementation is that the call
> to free_initmem() is also delayed until after initialization of the
> deferred drivers.
> 
> This is different from modular drivers, which are loaded after free_initmem().

This is because modules have their __initmem sections freed right after 
each module is initialized.

The deferred initcalls could also have a separate initmem section which 
freeing is also deferred.  But I don't think it makes such a big 
difference in the end.


Nicolas

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Nicolas Pitre @ 2014-11-02  3:46 UTC (permalink / raw)
  To: Tim Bird
  Cc: Geert Uytterhoeven, Rob Landley, Alexandre Belloni, Grant Likely,
	Borislav Petkov, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com, Grant Likely
In-Reply-To: <54517D1A.4030206@sonymobile.com>

On Wed, 29 Oct 2014, Tim Bird wrote:

> I have been thinking about the points you made previously,
> and have given the problem space some more thought.  I agree
> that as it stands this is a very niche solution, and it would
> be good to think about the broader picture and how things
> might be designed differently to make the "feature" usable
> more easily and to a broader group.
> 
> Taking a step back, the overall goal is to allow user space
> to do stuff while the kernel is still initializing statically
> linked drivers, so the device's primary function can be ready
> as soon as possible (and not wait for secondarily-needed
> functionality to initialize). For things that are able to be
> made into a module (and for situations where the kernel module
> loading is turned on), this feature should not be needed in
> its current form.  In that case, user space already has control
> over module load ordering and timing.
> 
> The way the feature is expressed in the current code is that a
> set of drivers are marked for deferred initialization (I'll refer
> to this as issue 0).  Then, at boot: 1) most drivers are initialized
> normally, 2) user space is started, and then 3) user space indicates
> to the kernel that the deferred drivers should be initialized.
> 
> This is very coarse, allowing only two categories of drivers: (ignoring
> other boot phases for the moment) - regular drivers and deferred drivers.
> It also requires source code changes to mark the drivers to be deferred.
> Finally, it requires an explicit notification from user-space to complete
> the process.  All of these attributes are undesirable.
> 
> There may also be an opportunity here to work out more granular driver
> load ordering, which would benefit other systems (especially those that
> are hitting the EPROBE_DEFER issue).
> 
> As it stands now, the ordering of the initcalls within a particular level
> is pretty much arbitrary (determined by link order, usually without oversight
> by the developer).  Just FYI, here are some numbers culled from a recent
> kernel:
> 
> initcall macro		number of instances in kernel source
> --------------		------------------------------------
> early_init		446
> core_init		614
> postcore_init		150
> arch_init		751
> subsys_init		573
> fs_init		1372
> device_init		1211
> late_init		440

Did you count module_init instances which are folded into the 
device_init leven when built-in?

> I'm going to rattle off a few ideas - I'm not sure which ones might
> stick,  I just want to bounce these around and see what people think.
> Note that I didn't think of most of these, but I'm just repeating ones
> that have been stated, and adding a few thoughts of my own.
> 
> First, if the ordering of initialization is not the default
> provided by the kernel, it needs to be recorded somewhere.  A developer
> needs to express it (or a tool needs to figure it out), but if it is
> going to be reflected in the final kernel behaviour (or image), the
> kernel needs it at boot time (if not compile time).  The current
> initcall system hardcodes a "level" for each driver initialization
> routine in the source code itself, by putting it in the macro
> name for each init routine.  There can
> only be one such order expressed in the code itself.
> 
> For developers who wish to express another order (or priority), a
> new mechanism will need to be used.  If possible, I strongly prefer
> putting this into the KCONFIG system, as that is where other details
> about kernel configuration are stored, and there are pre-existing tools
> for dealing with the format.  I am hesitant to create a special language
> or config format for this (unless it is much simpler than adding something
> to Kconfig).  As Nicolas pointed out, Kconfig already has information
> about dependencies in terms of not allowing a driver to be a module
> if a dependent module is statically linked. Having the tool warn for
> violations of that ordering would be valuable.

I think you're confusing two issues: ordering and dependency.  The 
dependency affects some of the ordering, but only a small portion of it.  
Within an initcall level the ordering is a result of the link order and 
therefore rather arbitrary.

IMHO the current initcall level system is simply too simple for the 
current kernel complexity.  The number of levels, and especially their 
names, are also completely arbitrary.  It probably made sense back when 
initcalls were introduced, but it is just too inflexible now.

Initcalls should instead be turned into targets and prerequisites, just 
like dependencies in a makefile.  This way, the ultimate target "execute 
/sbin/init in userspace" could indicate its prerequisite as "mount root 
fs".  Then "mount root fs" could have "USB storage" as a prerequisite 
depending on the boot args. From "USB storage" you could have two 
prerequisites: "USB stack" and "USB device enumeration".  And so down to 
the very first initcalls with no prerequisites.  Oh and I forgot to list 
"open console device" as another prerequisite for "execute /sbin/init". 
And "boot args" would have dependencies of its own, like "parse DT" 
maybe. Etc.

This way, no arbitrary initcall levels would be needed.  And we wouldn't 
have to play games when choosing the right initcall level when creating 
a new subsystem that has to sit in between existing levels.

That also makes a very clear minimum execution dependency tree for the 
work to be done in order to make user space usable as soon as possible. 
And parallel initcall execution would also be unambiguous, just like 
simultaneous jobs using 'make'.

Anything else like manual ordering or prioritizing of drivers when 
configuring the kernel will lead to madness.  The ultimate best ordering 
can and must be unique. Some of that ordering automation is already done 
by the module tools based on symbol dependencies when you run modprobe 
so only the required modules are loaded, and in the right order.

> Here are some questions...
> Do all driver initialization routines have a corresponding config
> variable?

No.  Many built-in initcalls are not configurable.

> Another possibility is a binary post-processor, which reorders
> the initcall tables in the kernel, after the compile has finished.
> So, rather than relying on the compiler, there would be a separate
> tool to modify the kernel binary to have the desired init ordering.
> The initcall macro could be extended to provide input to this tool,
> and the tool could read a separate configuration file indicating
> the routines that should be reordered in the boot sequence.

Thing is: initcall ordering should not be represented with a list.  It 
is actually a tree.  And the tree can be dynamic depending on the 
dependencies things like kernel command line arguments may create.

> Another idea would be to make the starting of user-space it's own
> initialization routine, which was not necessarily started as the last thing
> after all other statically linked driver initializations.  Then, it
> could begin operation before other drivers were initialized. It's init
> order could be controlled using the same mechanism as other initcalls.

Absolutely.

> Right now, user space starts as if it were a late_initcall, with an
> INITORDER=9, but if this were configurable, that might solve a lot
> of the problem.  A developer could push the order of user-space start
> earlier into the initialization sequence, if they needed to.

But it is figuring out all the dependencies, and _only_ those 
dependencies which is the actual problem.

> Grant Likely suggested deferring the ordering decision in a
> way that allowed it to be expressed at runtime rather than at
> compile-time.

We probably came to the same conclusion then.

> That, I think, would require a more substantial
> rework of the initcall system, probably requiring to make it
> text-driven. 

Nah...  The initcall specifier macro would simply have to accept a list 
of prerequisites.  Then a runtime equivalent would have to be created.

> It does have the possibility of solving some
> other driver init ordering problems that are now being
> addressed with EPROBE_DEFER.  My guess is that making the initcall
> system text-driven would increase the size of it to a degree
> that it would make more sense just to turn on the loadable
> module system.

Not everything can be turned into a module though.  And the in-kernel 
initcalls still could benefit from explicit dependencies to get rid of 
the multi-level thing we have now.

And this can be implemented gradually.  This could start with only a few 
meta initcalls called early_init,
core_init, postcore_init, etc. and they would initially depend on each 
other, simply executing the existing initcalls in a backward compatible 
way.  Then the legacy initcalls could be moved over one by one to the 
new system.

I'll try to find some time to play with this idea and see if it can go 
somewhere in practice.

Nicolas

^ permalink raw reply

* Re: Why is the deferred initcall patch not mainline?
From: Geert Uytterhoeven @ 2014-11-02  9:01 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tim Bird, Rob Landley, Alexandre Belloni, Grant Likely,
	Borislav Petkov, linux-embedded@vger.kernel.org, Dirk Behme,
	challinan@gmail.com
In-Reply-To: <alpine.LFD.2.11.1411012233090.11690@knanqh.ubzr>

On Sun, Nov 2, 2014 at 3:37 AM, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Thu, 30 Oct 2014, Geert Uytterhoeven wrote:
>> On Thu, Oct 30, 2014 at 12:49 AM, Tim Bird <tim.bird@sonymobile.com> wrote:
>> > The way the feature is expressed in the current code is that a
>> > set of drivers are marked for deferred initialization (I'll refer
>> > to this as issue 0).  Then, at boot: 1) most drivers are initialized
>> > normally, 2) user space is started, and then 3) user space indicates
>> > to the kernel that the deferred drivers should be initialized.
>>
>> One (IMHO important) point in the current implementation is that the call
>> to free_initmem() is also delayed until after initialization of the
>> deferred drivers.
>>
>> This is different from modular drivers, which are loaded after free_initmem().
>
> This is because modules have their __initmem sections freed right after
> each module is initialized.

I know.

But it means _all_ init sections are kept until userspace kicks the deferred
initcalls, and they have completed.

> The deferred initcalls could also have a separate initmem section which
> freeing is also deferred.  But I don't think it makes such a big
> difference in the end.

Yes, it can be handled.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [CFP] FOSDEM 2015 Embedded Devroom
From: Geert Uytterhoeven @ 2014-11-18  9:40 UTC (permalink / raw)


Devroom date: Saturday January 31st and February 1st 2015 in Brussels,
Belgium

CFP deadline: Monday December 1st 2014
Final notification two weeks later, December 15th 2014

CFP Introduction
----------------------
Embedded software is transforming the world, and FOSS embedded software is
leading the way. From automotive to the Internet of Things, small devices,
embedded systems, industrial process control and automatons are beginning
their inevitable rise to the singularity. Join in! Or be assimilated. ;)

NB: This year FOSDEM plans to record all presentations, no exceptions.
Please only propose a talk that you're really able and willing to share.

Topics Sought
------------------
The embedded devroom seeks topics related to automotive, mobile,
autonomous, and generally small systems. Related areas are of course of
interest as well and our definition of "embedded" is elastic. Are you
controlling and launching rockets, made your own set-top box, built some
heating control for your house, hacked your mobile phone or just built a
small gadget using FOSS software then you might have exactly what we're
looking for. This year the automotive devroom has been merged into the
embedded devroom so automotive projects are for sure on topic here too.

CFP Schedule And Submission Details
----------------------------------------------------
Please submit proposals no later than the first of December.

Please use the following URL to submit your talk to FOSDEM 2015:

https://penta.fosdem.org/submission/FOSDEM15

and follow the following rules:

    * Select as the Track "embedded devroom".

    * Include a title.  (Note that "Subtitle" entry doesn't appear on
all conference documents, so make sure "Title" can stand on its own without
"Subtitle" present.) Also try to make it catchy and descriptive of what you
will talk about. For example "Launching rockets with an Arduino" and not
"Programming a microcontroller for aerodynamic control"

   * Include an Abstract of about 500 characters and a full description of
any length you wish, but in both fields, please be concise, but clear and
descriptive.

   * Indicate whether you seek a 25 or 45 minute slot.

   * Use the "Links" sub-area to your past work in the field you'd like to
share.

   * Affirmative confirmation that you agree to CC-By-SA-4.0 or Cc-By-4.0
licensing of your talk, in the "Submission Notes" field. Add a statement
such as this:  "Should my presentation be scheduled for FOSDEM 2015, I
hereby agree to license all recordings, slides and any other materials
presented under the Creative Commons Attribution Share-Alike 4.0
International license.  Sincerely, s/YOUR_NAME/"

   * Also in the notes field, note your affirmative confirmation of
availability to speak on Saturday 31 January or 1 February 2015 in
Brussels. Also mention if you plan to do a demo and if you have special
requirements.

   * Make sure you give a valid email so we can contact you easily in case
we have more questions or want to make sure you received the confirmation.
Should you have any questions we can send more information as needed.

Thank you very much and we look forward to reviewing your proposals!

-- Philippe and Jeremiah along with the other members of
the FOSDEM embedded devroom team.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH] selftest: size: Add size test for Linux kernel
From: Tim Bird @ 2014-11-19 21:14 UTC (permalink / raw)
  To: Shuah Khan, linux-api
  Cc: linux-kernel@vger.kernel.org, linux-embedded@vger.kernel.org


This test shows the amount of memory used by the system.
Note that this is dependent on the user-space that is loaded
when this program runs.  Optimally, this program would be
run as the init program itself.

The program is optimized for size itself, to avoid conflating
its own execution with that of the system software.
The code is compiled statically, with no stdlibs. On my x86_64 system,
this results in a statically linked binary of less than 5K.

Signed-off-by: Tim Bird <tim.bird@sonymobile.com>
---
 tools/testing/selftests/Makefile        |   1 +
 tools/testing/selftests/size/Makefile   |  21 +++++++
 tools/testing/selftests/size/get_size.c | 105 ++++++++++++++++++++++++++++++++
 3 files changed, 127 insertions(+)
 create mode 100644 tools/testing/selftests/size/Makefile
 create mode 100644 tools/testing/selftests/size/get_size.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 45f145c..fa91aef 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += user
 TARGETS += sysctl
 TARGETS += firmware
 TARGETS += ftrace
+TARGETS += size
 
 TARGETS_HOTPLUG = cpu-hotplug
 TARGETS_HOTPLUG += memory-hotplug
diff --git a/tools/testing/selftests/size/Makefile b/tools/testing/selftests/size/Makefile
new file mode 100644
index 0000000..51e5fbd
--- /dev/null
+++ b/tools/testing/selftests/size/Makefile
@@ -0,0 +1,21 @@
+#ifndef CC
+	CC = $(CROSS_COMPILE)gcc
+#endif
+
+#ifndef STRIP
+	STRIP = $(CROSS_COMPILE)strip
+#endif
+
+all: get_size
+
+get_size: get_size.c
+	$(CC) --static -ffreestanding -nostartfiles \
+		-Wl,--entry=main get_size.c -o get_size \
+		`cc -print-libgcc-file-name`
+	$(STRIP) -s get_size
+
+run_tests: all
+	./get_size
+
+clean:
+	$(RM) get_size
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
new file mode 100644
index 0000000..f8ffc80
--- /dev/null
+++ b/tools/testing/selftests/size/get_size.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright 2014 Sony
+ *
+ * Licensed under the terms of the GNU GPL License version 2
+ *
+ * Selftest for runtime system size
+ *
+ * Prints the amount of RAM that the currently running system is using.
+ *
+ * This program tries to be as small as possible itself, to
+ * avoid perturbing the system memory utilization with its
+ * own execution.  It also attempts to have as few dependencies
+ * on kernel features as possible.
+ *
+ * It should be statically linked, with startup libs avoided.
+ * It uses no library calls, and only the following 3 syscalls:
+ *   sysinfo(), write(), and _exit()
+ *
+ * For output, it avoids printf (which in some C libraries
+ * has large external dependencies) by implementing its own
+ * strlen(), number output and print() routines.
+ */
+
+#include <sys/sysinfo.h>
+#include <unistd.h>
+
+#define STDOUT_FILENO 1
+
+my_strlen(const char *s)
+{
+	int len = 0;
+
+	while (*s++)
+		len++;
+	return len;
+}
+
+/*
+ * num_to_str - put digits from num into *s, left to right
+ *   do this by dividing the number by powers of 10
+ *   the tricky part is to omit leading zeros
+ *   don't print zeros until we've started printing any numbers at all
+ */
+static void num_to_str(unsigned long num, char *s)
+{
+	unsigned long long temp, div;
+	int started;
+
+	temp = num;
+	div = 1000000000000000000LL;
+	started = 0;
+	while (div) {
+		if (temp/div || started) {
+			*s++ = (unsigned char)(temp/div + '0');
+			started = 1;
+		}
+		temp -= (temp/div)*div;
+		div /= 10;
+	}
+	*s = 0;
+}
+
+print_num(unsigned long num)
+{
+	char num_buf[30];
+
+	num_to_str(num, num_buf);
+	write(STDOUT_FILENO, num_buf, my_strlen(num_buf));
+}
+
+print(char *s)
+{
+	write(STDOUT_FILENO, s, my_strlen(s));
+}
+
+void main(int argc, char **argv)
+{
+	int ccode;
+	unsigned long used;
+	struct sysinfo info;
+	unsigned long long temp;
+
+	print("Testing system size.\n");
+	print("1..1\n");
+
+	ccode = sysinfo(&info);
+	if (ccode < 0) {
+		print("not ok 1 get size runtime size\n");
+		print("# could not get sysinfo\n");
+		_exit(ccode);
+	}
+
+	/* ignore cache complexities for now */
+	temp = info.totalram - info.freeram - info.bufferram;
+	temp = temp * info.mem_unit;
+	temp = temp / 1024;
+
+	used = temp;
+
+	print("ok 1 get runtime size # size = ");
+	print_num(used);
+	print(" K\n");
+
+	_exit(0);
+}
-- 
1.8.2.2

^ permalink raw reply related

* [PATCH v2] selftest: size: Add size test for Linux kernel
From: Tim Bird @ 2014-11-20  0:13 UTC (permalink / raw)
  To: Shuah Khan, linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-embedded-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <546D084D.20002-/MT0OVThwyLZJqsBc5GL+g@public.gmane.org>


This test shows the amount of memory used by the system.
Note that this is dependent on the user-space that is loaded
when this program runs.  Optimally, this program would be
run as the init program itself.

The program is optimized for size itself, to avoid conflating
its own execution with that of the system software.
The code is compiled statically, with no stdlibs. On my x86_64 system,
this results in a statically linked binary of less than 5K.

Changes from v1:
  - use more correct Copyright string in get_size.c

Signed-off-by: Tim Bird <tim.bird-/MT0OVThwyLZJqsBc5GL+g@public.gmane.org>
---
 tools/testing/selftests/Makefile        |   1 +
 tools/testing/selftests/size/Makefile   |  21 +++++++
 tools/testing/selftests/size/get_size.c | 105 ++++++++++++++++++++++++++++++++
 3 files changed, 127 insertions(+)
 create mode 100644 tools/testing/selftests/size/Makefile
 create mode 100644 tools/testing/selftests/size/get_size.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 45f145c..fa91aef 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += user
 TARGETS += sysctl
 TARGETS += firmware
 TARGETS += ftrace
+TARGETS += size
 
 TARGETS_HOTPLUG = cpu-hotplug
 TARGETS_HOTPLUG += memory-hotplug
diff --git a/tools/testing/selftests/size/Makefile b/tools/testing/selftests/size/Makefile
new file mode 100644
index 0000000..51e5fbd
--- /dev/null
+++ b/tools/testing/selftests/size/Makefile
@@ -0,0 +1,21 @@
+#ifndef CC
+	CC = $(CROSS_COMPILE)gcc
+#endif
+
+#ifndef STRIP
+	STRIP = $(CROSS_COMPILE)strip
+#endif
+
+all: get_size
+
+get_size: get_size.c
+	$(CC) --static -ffreestanding -nostartfiles \
+		-Wl,--entry=main get_size.c -o get_size \
+		`cc -print-libgcc-file-name`
+	$(STRIP) -s get_size
+
+run_tests: all
+	./get_size
+
+clean:
+	$(RM) get_size
diff --git a/tools/testing/selftests/size/get_size.c b/tools/testing/selftests/size/get_size.c
new file mode 100644
index 0000000..f8ffc80
--- /dev/null
+++ b/tools/testing/selftests/size/get_size.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright 2014 Sony Mobile Communications Inc.
+ *
+ * Licensed under the terms of the GNU GPL License version 2
+ *
+ * Selftest for runtime system size
+ *
+ * Prints the amount of RAM that the currently running system is using.
+ *
+ * This program tries to be as small as possible itself, to
+ * avoid perturbing the system memory utilization with its
+ * own execution.  It also attempts to have as few dependencies
+ * on kernel features as possible.
+ *
+ * It should be statically linked, with startup libs avoided.
+ * It uses no library calls, and only the following 3 syscalls:
+ *   sysinfo(), write(), and _exit()
+ *
+ * For output, it avoids printf (which in some C libraries
+ * has large external dependencies) by implementing its own
+ * strlen(), number output and print() routines.
+ */
+
+#include <sys/sysinfo.h>
+#include <unistd.h>
+
+#define STDOUT_FILENO 1
+
+my_strlen(const char *s)
+{
+	int len = 0;
+
+	while (*s++)
+		len++;
+	return len;
+}
+
+/*
+ * num_to_str - put digits from num into *s, left to right
+ *   do this by dividing the number by powers of 10
+ *   the tricky part is to omit leading zeros
+ *   don't print zeros until we've started printing any numbers at all
+ */
+static void num_to_str(unsigned long num, char *s)
+{
+	unsigned long long temp, div;
+	int started;
+
+	temp = num;
+	div = 1000000000000000000LL;
+	started = 0;
+	while (div) {
+		if (temp/div || started) {
+			*s++ = (unsigned char)(temp/div + '0');
+			started = 1;
+		}
+		temp -= (temp/div)*div;
+		div /= 10;
+	}
+	*s = 0;
+}
+
+print_num(unsigned long num)
+{
+	char num_buf[30];
+
+	num_to_str(num, num_buf);
+	write(STDOUT_FILENO, num_buf, my_strlen(num_buf));
+}
+
+print(char *s)
+{
+	write(STDOUT_FILENO, s, my_strlen(s));
+}
+
+void main(int argc, char **argv)
+{
+	int ccode;
+	unsigned long used;
+	struct sysinfo info;
+	unsigned long long temp;
+
+	print("Testing system size.\n");
+	print("1..1\n");
+
+	ccode = sysinfo(&info);
+	if (ccode < 0) {
+		print("not ok 1 get size runtime size\n");
+		print("# could not get sysinfo\n");
+		_exit(ccode);
+	}
+
+	/* ignore cache complexities for now */
+	temp = info.totalram - info.freeram - info.bufferram;
+	temp = temp * info.mem_unit;
+	temp = temp / 1024;
+
+	used = temp;
+
+	print("ok 1 get runtime size # size = ");
+	print_num(used);
+	print(" K\n");
+
+	_exit(0);
+}
-- 
1.8.2.2

^ permalink raw reply related

* [PATCH] kselftest: Move the docs to the Documentation dir
From: Tim Bird @ 2014-11-20  0:16 UTC (permalink / raw)
  To: Shuah Khan, linux-api
  Cc: linux-kernel@vger.kernel.org, linux-embedded@vger.kernel.org


Also, adjust the formatting a bit, and expand the section about using
TARGETS= on the make command line.

Signed-off-by: Tim Bird <tim.bird@sonymobile.com>
---
 Documentation/kselftest.txt        | 69 ++++++++++++++++++++++++++++++++++++++
 tools/testing/selftests/README.txt | 61 ---------------------------------
 2 files changed, 69 insertions(+), 61 deletions(-)
 create mode 100644 Documentation/kselftest.txt
 delete mode 100644 tools/testing/selftests/README.txt

diff --git a/Documentation/kselftest.txt b/Documentation/kselftest.txt
new file mode 100644
index 0000000..a87d840
--- /dev/null
+++ b/Documentation/kselftest.txt
@@ -0,0 +1,69 @@
+Linux Kernel Selftests
+
+The kernel contains a set of "self tests" under the tools/testing/selftests/
+directory. These are intended to be small unit tests to exercise individual
+code paths in the kernel.
+
+On some systems, hot-plug tests could hang forever waiting for cpu and
+memory to be ready to be offlined. A special hot-plug target is created
+to run full range of hot-plug tests. In default mode, hot-plug tests run
+in safe mode with a limited scope. In limited mode, cpu-hotplug test is
+run on a single cpu as opposed to all hotplug capable cpus, and memory
+hotplug test is run on 2% of hotplug capable memory instead of 10%.
+
+Running the selftests (hotplug tests are run in limited mode)
+=============================================================
+
+To build the tests:
+  $ make -C tools/testing/selftests
+
+
+To run the tests:
+  $ make -C tools/testing/selftests run_tests
+
+To build and run the tests with a single command, use:
+  $ make kselftest
+
+- note that some tests will require root privileges.
+
+
+Running a subset of selftests
+========================================
+You can use the "TARGETS" variable on the make command line to specify
+single test to run, or a list of tests to run.
+
+To run only tests targeted for a single subsystem:
+  $  make -C tools/testing/selftests TARGETS=ptrace run_tests
+
+You can specify multiple tests to build and run:
+  $  make TARGETS="size timers" kselftest
+
+See the top-level tools/testing/selftests/Makefile for the list of all
+possible targets.
+
+
+Running the full range hotplug selftests
+========================================
+
+To build the hotplug tests:
+  $ make -C tools/testing/selftests hotplug
+
+To run the hotplug tests:
+  $ make -C tools/testing/selftests run_hotplug
+
+- note that some tests will require root privileges.
+
+
+Contributing new tests
+======================
+
+In general, the rules for for selftests are
+
+ * Do as much as you can if you're not root;
+
+ * Don't take too long;
+
+ * Don't break the build on any architecture, and
+
+ * Don't cause the top-level "make run_tests" to fail if your feature is
+   unconfigured.
diff --git a/tools/testing/selftests/README.txt b/tools/testing/selftests/README.txt
deleted file mode 100644
index 2660d5f..0000000
--- a/tools/testing/selftests/README.txt
+++ /dev/null
@@ -1,61 +0,0 @@
-Linux Kernel Selftests
-
-The kernel contains a set of "self tests" under the tools/testing/selftests/
-directory. These are intended to be small unit tests to exercise individual
-code paths in the kernel.
-
-On some systems, hot-plug tests could hang forever waiting for cpu and
-memory to be ready to be offlined. A special hot-plug target is created
-to run full range of hot-plug tests. In default mode, hot-plug tests run
-in safe mode with a limited scope. In limited mode, cpu-hotplug test is
-run on a single cpu as opposed to all hotplug capable cpus, and memory
-hotplug test is run on 2% of hotplug capable memory instead of 10%.
-
-Running the selftests (hotplug tests are run in limited mode)
-=============================================================
-
-To build the tests:
-
-  $ make -C tools/testing/selftests
-
-
-To run the tests:
-
-  $ make -C tools/testing/selftests run_tests
-
-- note that some tests will require root privileges.
-
-To run only tests targeted for a single subsystem: (including
-hotplug targets in limited mode)
-
-  $  make -C tools/testing/selftests TARGETS=cpu-hotplug run_tests
-
-See the top-level tools/testing/selftests/Makefile for the list of all possible
-targets.
-
-Running the full range hotplug selftests
-========================================
-
-To build the tests:
-
-  $ make -C tools/testing/selftests hotplug
-
-To run the tests:
-
-  $ make -C tools/testing/selftests run_hotplug
-
-- note that some tests will require root privileges.
-
-Contributing new tests
-======================
-
-In general, the rules for for selftests are
-
- * Do as much as you can if you're not root;
-
- * Don't take too long;
-
- * Don't break the build on any architecture, and
-
- * Don't cause the top-level "make run_tests" to fail if your feature is
-   unconfigured.
-- 
1.8.2.2

^ permalink raw reply related

* Re: [PATCH] kselftest: Move the docs to the Documentation dir
From: Shuah Khan @ 2014-11-20 13:56 UTC (permalink / raw)
  To: Tim Bird, linux-api, corbet
  Cc: linux-kernel@vger.kernel.org, linux-embedded@vger.kernel.org,
	Shuah Khan, linux-doc
In-Reply-To: <546D32D0.9090206@sonymobile.com>

On 11/19/2014 05:16 PM, Tim Bird wrote:
> 
> Also, adjust the formatting a bit, and expand the section about using
> TARGETS= on the make command line.
> 
> Signed-off-by: Tim Bird <tim.bird@sonymobile.com>
> ---
>  Documentation/kselftest.txt        | 69 ++++++++++++++++++++++++++++++++++++++
>  tools/testing/selftests/README.txt | 61 ---------------------------------
>  2 files changed, 69 insertions(+), 61 deletions(-)
>  create mode 100644 Documentation/kselftest.txt
>  delete mode 100644 tools/testing/selftests/README.txt
> 
> diff --git a/Documentation/kselftest.txt b/Documentation/kselftest.txt
> new file mode 100644
> index 0000000..a87d840
> --- /dev/null
> +++ b/Documentation/kselftest.txt
>

Tim,

Thanks for doing this. Looks good to me. I think you missed
Documentation maintainer. Adding linux-doc and Jon Corbet to
the thread with my ack to take this through Documentation tree.

Acked-by: shuahkh@osg.samsung.com

thanks,
-- Shuah

-- 
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox