public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Max Krasnyansky <maxk@qualcomm.com>
To: rusty@rustcorp.com.au
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Module loading/unloading and "The Stop Machine"
Date: Thu, 07 Feb 2008 18:38:04 -0800	[thread overview]
Message-ID: <47ABC08C.8010101@qualcomm.com> (raw)

Hi Rusty,

I was hopping you could answer a couple of questions about module loading/unloading
and the stop machine.
There was a recent discussion on LKML about CPU isolation patches I'm working on.
One of the patches makes stop machine ignore the isolated CPUs. People of course had
questions about that. So I started looking into more details and got this silly, crazy 
idea that maybe we do not need the stop machine any more :)

As far as I can tell the stop machine is basically a safety net in case some locking
and recounting mechanisms aren't bullet proof. In other words if a subsystem can actually
handle registration/unregistration in a robust way, module loader/unloader does not 
necessarily have to halt entire machine in order to load/unload a module that belongs
to that subsystem. I may of course be completely wrong on that.
 
The problem with the stop machine is that it's a very very big gun :). In a sense that 
it totally kills all the latencies and stuff since the entire machine gets halted while
module is being (un)loaded. Which is a major issue for any realtime apps. Specifically 
for CPU isolation the issue is that high-priority rt user-space thread prevents stop 
machine threads from running and entire box just hangs waiting for it. 
I'm kind of surprised that folks who use monster boxes with over 100 CPUs have not 
complained. It's must be a huge hit for those machines to halt the entire thing. 

It seems that over the last few years most subsystems got much better at locking and 
refcounting. And I'm hopping that we can avoid halting the entire machine these days.
For CPU isolation in particular the solution is simple. We can just ignore isolated CPUs. 
What I'm trying to figure out is how safe it is and whether we can avoid full halt 
altogether.

So. Here is what I tried today on my Core2 Duo laptop
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -204,11 +204,14 @@ int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu)
>  
>         /* No CPUs can come up or down during this. */
>         lock_cpu_hotplug();
> +/*
>         p = __stop_machine_run(fn, data, cpu);
>         if (!IS_ERR(p))
>                 ret = kthread_stop(p);
>         else
>                 ret = PTR_ERR(p);
> +*/
> +       ret = fn(data);
>         unlock_cpu_hotplug();
>  
>         return ret;

ie Completely disabled stop machine. It just loads/unloads modules without full halt.
I then ran three scripts:

while true; do
        /sbin/modprobe -r uhci_hcd
        /sbin/modprobe uhci_hcd
        sleep 10
done

while true; do
        /sbin/modprobe -r tg3
        /sbin/modprobe tg3
        sleep 2
done

while true; do
        /usr/sbin/tcpdump -i eth0
done

The machine has a bunch of USB devices connected to it. The two most interesting 
are a Bluetooth dongle and a USB mouse. By loading/unloading UHCI driver we're touching
Sysfs, USB stack, Bluetooth stack, HID layer, Input layer. The X is running and is using 
that USB mouse. The Bluetooth services are running too.
By loading/unloading TG3 driver we're touching sysfs, network stack (a bunch of layers).
The machine is running NetworkManager and tcpdumping on the eth0 which is registered 
by TG3.
This is a pretty good stress test in general let alone the disabled stop machine. 

I left all that running for the whole day while doing normal day to day things. 
Compiling a bunch of things, emails, office apps, etc. That's where I'm writing this
email from :). It's still running all that :) 

So the question is do we still need stop machine ? I must be missing something obvious.
But things seem to be working pretty well without it. I certainly feel much better about 
at least ignoring isolated CPUs during stop machine execution. Which btw I've doing
for a couple of years now on a wide range of the machines where people are inserting 
modules left and right. 

What do you think ?

Thanx
Max

             reply	other threads:[~2008-02-08  2:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-08  2:38 Max Krasnyansky [this message]
2008-02-08 23:11 ` Module loading/unloading and "The Stop Machine" Max Krasnyanskiy
2008-02-14  4:02 ` Tejun Heo
2008-02-22  1:24   ` Max Krasnyanskiy
2008-02-22  1:38     ` Tejun Heo
2008-02-22  1:47       ` Max Krasnyanskiy
2008-02-22  1:59         ` Tejun Heo
2008-02-22  2:16           ` Max Krasnyanskiy
2008-02-22 11:53         ` Andi Kleen
2008-02-22 22:41           ` Max Krasnyanskiy
2008-03-04  1:21           ` Rusty Russell
2008-02-14  5:02 ` Jike Song
2008-02-14  5:38   ` Tejun Heo
2008-03-04  3:51   ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47ABC08C.8010101@qualcomm.com \
    --to=maxk@qualcomm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox