* Re: [PATCH 03/16] net: Basic network namespace infrastructure.
From: Eric W. Biederman @ 2007-09-10 15:53 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: Linux Containers, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller
In-Reply-To: <46E543A0.7010104-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
> Eric W. Biederman wrote:
>
> [snip]
>
>> --- /dev/null
>> +++ b/include/net/net_namespace.h
>> @@ -0,0 +1,68 @@
>> +/*
>> + * Operations on the network namespace
>> + */
>> +#ifndef __NET_NET_NAMESPACE_H
>> +#define __NET_NET_NAMESPACE_H
>> +
>> +#include <asm/atomic.h>
>> +#include <linux/workqueue.h>
>> +#include <linux/list.h>
>> +
>> +struct net {
>
> Isn't this name is too generic? Why not net_namespace?
My fingers went on strike. struct netns probably wouldn't be bad.
The very long and spelled out version seemed painful. I don't really
care except that not changing it is easier for me. I'm happy to do
whatever is considered most maintainable.
>> + /* Walk through the list backwards calling the exit functions
>> + * for the pernet modules whose init functions did not fail.
>> + */
>> + for (ptr = ptr->prev; ptr != &pernet_list; ptr = ptr->prev) {
>
> Good reason for adding list_for_each_continue_reverse :)
Sounds reasonable.
>> +static int register_pernet_operations(struct list_head *list,
>> + struct pernet_operations *ops)
>> +{
>> + struct net *net, *undo_net;
>> + int error;
>> +
>> + error = 0;
>> + list_add_tail(&ops->list, list);
>> + for_each_net(net) {
>> + if (ops->init) {
>
> Maybe it's better to do it vice-versa?
> if (ops->init)
> for_each_net(net)
> ops->init(net);
> ...
My gut feel says it is more readable with the test on the inside.
Although something like
if (ops->init)
goto out;
might be more readable.
>> +int register_pernet_device(struct pernet_operations *ops)
>> +{
>> + int error;
>> + mutex_lock(&net_mutex);
>> + error = register_pernet_operations(&pernet_list, ops);
>> + if (!error && (first_device == &pernet_list))
>
> Very minor: why do you give the name "device" to some pernet_operations?
Subsystems need to be initialized before and cleaned up after network
devices. We don't have much in the way that needs this distinction,
but we have just enough that it is useful to make this distinction.
Looking at my complete patchset all I have in this category is the
loopback device, and it is important on the teardown side of things
that the loopback device be unregistered before I clean up the
protocols or else I get weird leaks.
Reflecting on it I'm not quite certain about the setup side of things.
I'm on the fence if I need to completely dynamically allocate the
loopback device or if I need to embed it struct net.
There also may be a call for some other special network devices
to have one off instances in each network namespace. With the
new netlink creation API it isn't certain we need that idiom anymore
but before that point I was certain we would have other network
devices besides the loopback that would care.
Eric
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Arjan van de Ven @ 2007-09-10 17:02 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Linus Torvalds, Nick Piggin, Satyam Sharma, Herbert Xu,
Paul Mackerras, Christoph Lameter, Chris Snook, Ilpo Jarvinen,
Paul E. McKenney, Stefan Richter, Linux Kernel Mailing List,
linux-arch, Netdev, Andrew Morton, ak, heiko.carstens,
David Miller, schwidefsky, wensong, horms, wjiang, cfriesen,
zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <200709101538.25132.vda.linux@googlemail.com>
On Mon, 10 Sep 2007 15:38:23 +0100
Denys Vlasenko <vda.linux@googlemail.com> wrote:
> On Monday 10 September 2007 15:51, Arjan van de Ven wrote:
> > On Mon, 10 Sep 2007 11:56:29 +0100
> > Denys Vlasenko <vda.linux@googlemail.com> wrote:
> >
> > >
> > > Well, if you insist on having it again:
> > >
> > > Waiting for atomic value to be zero:
> > >
> > > while (atomic_read(&x))
> > > continue;
> > >
> >
> > and this I would say is buggy code all the way.
> >
> > Not from a pure C level semantics, but from a "busy waiting is
> > buggy" semantics level and a "I'm inventing my own locking"
> > semantics level.
>
> After inspecting arch/*, I cannot agree with you.
the arch/ people obviously are allowed to do their own locking stuff...
BECAUSE THEY HAVE TO IMPLEMENT THAT!
the arch maintainers know EXACTLY how their hw behaves (well, we hope)
so they tend to be the exception to many rules in the kernel....
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Denys Vlasenko @ 2007-09-10 16:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: Kyle Moffett, Arjan van de Ven, Nick Piggin, Satyam Sharma,
Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
heiko.carstens, David Miller, schwidefsky, wensong, horms, wjiang,
cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <alpine.LFD.0.999.0709100807260.16478@woody.linux-foundation.org>
On Monday 10 September 2007 16:09, Linus Torvalds wrote:
> On Mon, 10 Sep 2007, Denys Vlasenko wrote:
> > static inline int
> > qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
> > {
> > int return_status = QLA_SUCCESS;
> > unsigned long loop_timeout ;
> > scsi_qla_host_t *pha = to_qla_parent(ha);
> >
> > /* wait for 5 min at the max for loop to be ready */
> > loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);
> >
> > while ((!atomic_read(&pha->loop_down_timer) &&
> > atomic_read(&pha->loop_state) == LOOP_DOWN) ||
> > atomic_read(&pha->loop_state) != LOOP_READY) {
> > if (atomic_read(&pha->loop_state) == LOOP_DEAD) {
> ...
> > Is above correct or buggy? Correct, because msleep is a barrier.
> > Is it obvious? No.
>
> It's *buggy*. But it has nothing to do with any msleep() in the loop, or
> anything else.
>
> And more importantly, it would be equally buggy even *with* a "volatile"
> atomic_read().
I am not saying that this code is okay, this isn't the point.
(The code is in fact awful for several more reasons).
My point is that people are confused as to what atomic_read()
exactly means, and this is bad. Same for cpu_relax().
First one says "read", and second one doesn't say "barrier".
This is real code from current kernel which demonstrates this:
"I don't know that cpu_relax() is a barrier already":
drivers/kvm/kvm_main.c
while (atomic_read(&completed) != needed) {
cpu_relax();
barrier();
}
"I think that atomic_read() is a read from memory and therefore
I don't need a barrier":
arch/x86_64/kernel/crash.c
msecs = 1000; /* Wait at most a second for the other cpus to stop */
while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
mdelay(1);
msecs--;
}
Since neither camp seems to give up, I am proposing renaming
them to something less confusing, and make everybody happy.
cpu_relax_barrier()
atomic_value(&x)
atomic_fetch(&x)
I'm not native English speaker, do these sound better?
--
vda
^ permalink raw reply
* Re: [PATCH] sb1250-mac.c: De-typedef, de-volatile, de-etc...
From: Ralf Baechle @ 2007-09-10 16:58 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Andrew Morton, Jeff Garzik, netdev, linux-mips, linux-kernel
In-Reply-To: <Pine.LNX.4.64N.0709101310030.25038@blysk.ds.pg.gda.pl>
On Mon, Sep 10, 2007 at 01:20:38PM +0100, Maciej W. Rozycki wrote:
> Remove typedefs, volatiles and convert kmalloc()/memset() pairs to
> kcalloc(). Also reformat the surrounding clutter.
>
> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
> ---
> Per your request, Andrew, a while ago. It builds, runs, passes
> checkpatch.pl and sparse. No semantic changes.
One step closer to sanity for this driver. So it's got my ACK.
Ralf
^ permalink raw reply
* Re: auto recycling of TIME_WAIT connections
From: Rick Jones @ 2007-09-10 17:23 UTC (permalink / raw)
To: Pádraig Brady; +Cc: netdev
In-Reply-To: <46E51BDC.10907@draigBrady.com>
Pádraig Brady wrote:
> Rick Jones wrote:
>>This was an issue over a decade ago with SPECweb96 benchmarking. The
>>initial solution was to make the explicit bind() calls and not rely on
>>the anonymous/ephemeral port space. After that, one starts adding
>>additional IP's into the mix (at least where possible). And if that
>>fails, one has to go back to the beginning and ask oneself exactly why a
>>client is trying to churn through so many connections per second in the
>>first place.
>
>
> right. This is for benchmarking mainly.
> Sane applications would use persistent connections,
> or a different form of IPC.
All the more reason to go the "add more client IP's" path then. It
gives you more connections per second, and gives you a much broader
umber of "client" IP's hitting the server which will be more realistic.
That is one thing I like very much about polygraph (based on what I've
read) - it's use of _lots_ of client IPs to better simulate reality. I
think other web-oriented benchmarks should start to include that as well
for there are stacks which do indeed make "decisions" based on whether
or not a destination is perceived to be "local" or not.
rick jones
^ permalink raw reply
* [PATCH 1/3] rfkill: Remove IRDA
From: Ivo van Doorn @ 2007-09-10 17:54 UTC (permalink / raw)
To: davem; +Cc: Dmitry Torokhov, netdev, Inaky Perez-Gonzalez
As Dmitry pointed out earlier, rfkill-input.c
doesn't support irda because there are no users
and we shouldn't add unrequired KEY_ defines.
However, RFKILL_TYPE_IRDA was defined in the
rfkill.h header file and would confuse people
about whether it is implemented or not.
This patch removes IRDA support completely,
so it can be added whenever a driver wants the
feature.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
---
include/linux/rfkill.h | 8 +++-----
net/rfkill/Kconfig | 2 +-
net/rfkill/rfkill.c | 5 +----
3 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h
index a8a6ea8..c4546e1 100644
--- a/include/linux/rfkill.h
+++ b/include/linux/rfkill.h
@@ -31,13 +31,11 @@
* enum rfkill_type - type of rfkill switch.
* RFKILL_TYPE_WLAN: switch is no a Wireless network devices.
* RFKILL_TYPE_BlUETOOTH: switch is on a bluetooth device.
- * RFKILL_TYPE_IRDA: switch is on an infrared devices.
*/
enum rfkill_type {
- RFKILL_TYPE_WLAN = 0,
- RFKILL_TYPE_BLUETOOTH = 1,
- RFKILL_TYPE_IRDA = 2,
- RFKILL_TYPE_MAX = 3,
+ RFKILL_TYPE_WLAN ,
+ RFKILL_TYPE_BLUETOOTH,
+ RFKILL_TYPE_MAX,
};
enum rfkill_state {
diff --git a/net/rfkill/Kconfig b/net/rfkill/Kconfig
index 8b31759..d28a6d9 100644
--- a/net/rfkill/Kconfig
+++ b/net/rfkill/Kconfig
@@ -5,7 +5,7 @@ menuconfig RFKILL
tristate "RF switch subsystem support"
help
Say Y here if you want to have control over RF switches
- found on many WiFi, Bluetooth and IRDA cards.
+ found on many WiFi and Bluetooth cards.
To compile this driver as a module, choose M here: the
module will be called rfkill.
diff --git a/net/rfkill/rfkill.c b/net/rfkill/rfkill.c
index db3395b..50e0102 100644
--- a/net/rfkill/rfkill.c
+++ b/net/rfkill/rfkill.c
@@ -106,9 +106,6 @@ static ssize_t rfkill_type_show(struct device *dev,
case RFKILL_TYPE_BLUETOOTH:
type = "bluetooth";
break;
- case RFKILL_TYPE_IRDA:
- type = "irda";
- break;
default:
BUG();
}
@@ -281,7 +278,7 @@ static void rfkill_remove_switch(struct rfkill *rfkill)
/**
* rfkill_allocate - allocate memory for rfkill structure.
* @parent: device that has rf switch on it
- * @type: type of the switch (wlan, bluetooth, irda)
+ * @type: type of the switch (RFKILL_TYPE_*)
*
* This function should be called by the network driver when it needs
* rfkill structure. Once the structure is allocated the driver shoud
--
1.5.3
^ permalink raw reply related
* [PATCH 2/3] rfkill: Add support for ultrawideband
From: Ivo van Doorn @ 2007-09-10 17:55 UTC (permalink / raw)
To: davem; +Cc: Dmitry Torokhov, netdev, Inaky Perez-Gonzalez
In-Reply-To: <200709101954.50852.IvDoorn@gmail.com>
This patch will add support for UWB keys to rfkill,
support for this has been requested by Inaky.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
---
include/linux/input.h | 1 +
include/linux/rfkill.h | 2 ++
net/rfkill/rfkill-input.c | 9 +++++++++
net/rfkill/rfkill.c | 3 +++
4 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/include/linux/input.h b/include/linux/input.h
index cf2b561..8e5828d 100644
--- a/include/linux/input.h
+++ b/include/linux/input.h
@@ -360,6 +360,7 @@ struct input_absinfo {
#define KEY_BLUETOOTH 237
#define KEY_WLAN 238
+#define KEY_UWB 239
#define KEY_UNKNOWN 240
diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h
index c4546e1..f9a50da 100644
--- a/include/linux/rfkill.h
+++ b/include/linux/rfkill.h
@@ -31,10 +31,12 @@
* enum rfkill_type - type of rfkill switch.
* RFKILL_TYPE_WLAN: switch is no a Wireless network devices.
* RFKILL_TYPE_BlUETOOTH: switch is on a bluetooth device.
+ * RFKILL_TYPE_UWB: switch is on a Ultra wideband device.
*/
enum rfkill_type {
RFKILL_TYPE_WLAN ,
RFKILL_TYPE_BLUETOOTH,
+ RFKILL_TYPE_UWB,
RFKILL_TYPE_MAX,
};
diff --git a/net/rfkill/rfkill-input.c b/net/rfkill/rfkill-input.c
index 9f746be..8e4516a 100644
--- a/net/rfkill/rfkill-input.c
+++ b/net/rfkill/rfkill-input.c
@@ -81,6 +81,7 @@ static void rfkill_schedule_toggle(struct rfkill_task *task)
static DEFINE_RFKILL_TASK(rfkill_wlan, RFKILL_TYPE_WLAN);
static DEFINE_RFKILL_TASK(rfkill_bt, RFKILL_TYPE_BLUETOOTH);
+static DEFINE_RFKILL_TASK(rfkill_uwb, RFKILL_TYPE_UWB);
static void rfkill_event(struct input_handle *handle, unsigned int type,
unsigned int code, int down)
@@ -93,6 +94,9 @@ static void rfkill_event(struct input_handle *handle, unsigned int type,
case KEY_BLUETOOTH:
rfkill_schedule_toggle(&rfkill_bt);
break;
+ case KEY_UWB:
+ rfkill_schedule_toggle(&rfkill_uwb);
+ break;
default:
break;
}
@@ -148,6 +152,11 @@ static const struct input_device_id rfkill_ids[] = {
.evbit = { BIT(EV_KEY) },
.keybit = { [LONG(KEY_BLUETOOTH)] = BIT(KEY_BLUETOOTH) },
},
+ {
+ .flags = INPUT_DEVICE_ID_MATCH_EVBIT | INPUT_DEVICE_ID_MATCH_KEYBIT,
+ .evbit = { BIT(EV_KEY) },
+ .keybit = { [LONG(KEY_UWB)] = BIT(KEY_UWB) },
+ },
{ }
};
diff --git a/net/rfkill/rfkill.c b/net/rfkill/rfkill.c
index 50e0102..03ed7fd 100644
--- a/net/rfkill/rfkill.c
+++ b/net/rfkill/rfkill.c
@@ -106,6 +106,9 @@ static ssize_t rfkill_type_show(struct device *dev,
case RFKILL_TYPE_BLUETOOTH:
type = "bluetooth";
break;
+ case RFKILL_TYPE_UWB:
+ type = "ultrawideband";
+ break;
default:
BUG();
}
--
1.5.3
^ permalink raw reply related
* [PATCH 3/3] rfkill: Add rfkill documentation
From: Ivo van Doorn @ 2007-09-10 17:56 UTC (permalink / raw)
To: davem; +Cc: Dmitry Torokhov, netdev, Inaky Perez-Gonzalez
In-Reply-To: <200709101954.50852.IvDoorn@gmail.com>
Add a documentation file which contains
a short description about rfkill with some
notes about drivers and the userspace interface.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
---
Documentation/rfkill.txt | 88 ++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 88 insertions(+), 0 deletions(-)
create mode 100644 Documentation/rfkill.txt
diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
new file mode 100644
index 0000000..93c76fc
--- /dev/null
+++ b/Documentation/rfkill.txt
@@ -0,0 +1,88 @@
+rfkill - RF switch subsystem support
+====================================
+
+1 Implementation details
+2 Driver support
+3 Userspace support
+
+===============================================================================
+1: Implementation details
+
+The rfkill switch subsystem offers support for keys often found on laptops
+to enable wireless devices like WiFi and Bluetooth.
+
+This is done by providing the user 3 possibilities:
+ - The rfkill system handles all events, userspace is not aware of events.
+ - The rfkill system handles all events, userspace is informed about the event.
+ - The rfkill system does not handle events, userspace handles all events.
+
+The buttons to enable and disable the wireless radios are important in
+situations where the user is for example using his laptop on a location where
+wireless radios _must_ be disabled (e.g airplanes).
+Because of this requirement, userspace support for the keys should not be
+made mandatory. Because userspace might want to perform some additional smarter
+tasks when the key is pressed, rfkill still provides userspace the possibility
+to take over the task to handle the key events.
+
+The system inside the kernel has been split into 2 seperate sections:
+ 1 - RFKILL
+ 2 - RFKILL_INPUT
+
+The first option enables rfkill support and will make sure userspace will
+be notified of any events through the input device. It also creates several
+sysfs entries which can be used by userspace. See section "Userspace support".
+
+The second option provides a rfkill input handler. This handler will
+listen to all rfkill key events and will toggle the radio accordingly,
+with this option enabled userspace could either do nothing or simply
+perform monitoring tasks.
+
+====================================
+2: Driver support
+
+Drivers who wish to build in rfkill subsystem support should
+make sure their driver depends of the Kconfig option RFKILL, it should
+_not_ depend on RFKILL_INPUT.
+
+Unless key events trigger a interrupt to which the driver listens, polling
+will be required to determine the key state changes. For this the input
+layer providers the input-polldev handler.
+
+A driver should implement a few steps to correctly make use of the
+rfkill subsystem. First for non-polling drivers:
+
+ - rfkill_allocate()
+ - input_allocate_device()
+ - rfkill_register()
+ - input_register_device()
+
+For polling drivers:
+
+ - rfkill_allocate()
+ - input_allocate_polled_device()
+ - rfkill_register()
+ - input_register_polled_device()
+
+When a key event has been detected, the correct event should be
+send over the input device which has been registered by the driver.
+
+====================================
+3: Userspace support
+
+For each key a input device will be created which will send out the correct
+key event when the rfkill key has been pressed.
+
+The following sysfs entries will be created:
+
+ name: Name assigned by driver to this key (interface or driver name).
+ type: Name of the key type ("wlan", "bluetooth", etc).
+ state: Current state of the key. 1: On, 0: Off.
+ claim: 1: Userspace handles events, 0: Kernel handles events
+
+Both the "state" and "claim" entries are also writable. For the "state" entry
+this means that when 1 or 0 is written all radios will be toggled accordingly.
+For the "claim" entry writing 1 to it will mean that the kernel will no longer
+handle key events even though RFKILL_INPUT input was enabled. When "claim" has
+been set to 0, userspace should make sure it will listen for the input events
+or check the sysfs "state" entry regularly to correctly perform the required
+tasks when the rkfill key is pressed.
--
1.5.3
^ permalink raw reply related
* why does tcp_v[46]_conn_request not inc MIB stats
From: Rick Jones @ 2007-09-10 18:42 UTC (permalink / raw)
To: Linux Network Development list
I've been digging around to see about inducing /proc/net/tcp to show
some "interesting" things for listen sockets (eg backlog depth, its max,
and dropped connection requests). While there I've noticed that both
tcp_v[46]_syn_recv_sock and tcp_v[46]conn_request both check that the
listen queue is full, but only tcp_v[46]_syn_recv_sock increments some
mib stats for dropped connection requests.
Is that deliberate, or is that a hole in the stats?
rick jones
^ permalink raw reply
* Re: [PATCH resend] Fix a lock problem in generic phy code
From: Hans-Jürgen Koch @ 2007-09-10 18:45 UTC (permalink / raw)
To: Herbert Xu; +Cc: linux-kernel, netdev, jeff, afleming
In-Reply-To: <E1IUf6h-0006qu-00@gondolin.me.apana.org.au>
Am Montag 10 September 2007 schrieb Herbert Xu:
> Hans-J??rgen Koch <hjk@linutronix.de> wrote:
> > The following patch fixes it. Tested on an AT91SAM9263-EK board, kernel
> > 2.6.23-rc4 and -rc3-mm1.
>
> Could you please audit all instances of physdev->lock and add
> _bh where necessary? I can see that at least phys_stop also
> needs the _bh.
I think the patch does all that's necessary. At least, there're no error
messages in the logs anymore. I didn't check if there's an error on
unload, though.
>
> We should also consider whether it makes sense to move the
> timer into a work queue.
Somebody who's more involved in that PHY stuff should probably do that.
I'm on holidays, sitting in a hotel room, and will probably not have the
time to have a deeper look into that.
Thanks,
Hans
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-09-10 18:59 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Paul Mackerras, heiko.carstens, horms, Stefan Richter,
Satyam Sharma, Linux Kernel Mailing List, David Miller,
Paul E. McKenney, Ilpo Järvinen, ak, cfriesen, rpjday,
Netdev, jesper.juhl, linux-arch, Andrew Morton, zlynx,
schwidefsky, Chris Snook, Herbert Xu, Linus Torvalds, wensong,
wjiang
In-Reply-To: <c298f0e3f9e49c9dd2d326115d156084@kernel.crashing.org>
On Fri, 17 Aug 2007, Segher Boessenkool wrote:
> "volatile" has nothing to do with reordering. atomic_dec() writes
> to memory, so it _does_ have "volatile semantics", implicitly, as
> long as the compiler cannot optimise the atomic variable away
> completely -- any store counts as a side effect.
Stores can be reordered. Only x86 has (mostly) implicit write ordering. So
no atomic_dec has no volatile semantics and may be reordered on a variety
of processors. Writes to memory may not follow code order on several
processors.
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-09-10 18:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: Denys Vlasenko, Kyle Moffett, Arjan van de Ven, Nick Piggin,
Satyam Sharma, Herbert Xu, Paul Mackerras, Chris Snook,
Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
heiko.carstens, David Miller, schwidefsky, wensong, horms, wjiang,
cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <alpine.LFD.0.999.0709100807260.16478@woody.linux-foundation.org>
On Mon, 10 Sep 2007, Linus Torvalds wrote:
> The fact is, "volatile" *only* makes things worse. It generates worse
> code, and never fixes any real bugs. This is a *fact*.
Yes, lets just drop the volatiles now! We need a patch that gets rid of
them.... Volunteers?
^ permalink raw reply
* Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.
From: Serge E. Hallyn @ 2007-09-10 19:07 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Linux Containers, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller
In-Reply-To: <m1tzq4u92n.fsf_-_-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>
> The simplest thing to implement is moving network devices between
> namespaces. However with the same attribute IFLA_NET_NS_PID we can
> easily implement creating devices in the destination network
> namespace as well. However that is a little bit trickier so this
> patch sticks to what is simple and easy.
>
> A pid is used to identify a process that happens to be a member
> of the network namespace we want to move the network device to.
>
> Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
> include/linux/if_link.h | 1 +
> net/core/rtnetlink.c | 35 +++++++++++++++++++++++++++++++++++
> 2 files changed, 36 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 422084d..84c3492 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -78,6 +78,7 @@ enum
> IFLA_LINKMODE,
> IFLA_LINKINFO,
> #define IFLA_LINKINFO IFLA_LINKINFO
> + IFLA_NET_NS_PID,
> __IFLA_MAX
> };
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 44f91bb..1b9c32d 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -35,6 +35,7 @@
> #include <linux/security.h>
> #include <linux/mutex.h>
> #include <linux/if_addr.h>
> +#include <linux/nsproxy.h>
>
> #include <asm/uaccess.h>
> #include <asm/system.h>
> @@ -727,6 +728,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> [IFLA_WEIGHT] = { .type = NLA_U32 },
> [IFLA_OPERSTATE] = { .type = NLA_U8 },
> [IFLA_LINKMODE] = { .type = NLA_U8 },
> + [IFLA_NET_NS_PID] = { .type = NLA_U32 },
> };
>
> static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> @@ -734,12 +736,45 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> [IFLA_INFO_DATA] = { .type = NLA_NESTED },
> };
>
> +static struct net *get_net_ns_by_pid(pid_t pid)
> +{
> + struct task_struct *tsk;
> + struct net *net;
> +
> + /* Lookup the network namespace */
> + net = ERR_PTR(-ESRCH);
> + rcu_read_lock();
> + tsk = find_task_by_pid(pid);
> + if (tsk) {
> + task_lock(tsk);
> + if (tsk->nsproxy)
> + net = get_net(tsk->nsproxy->net_ns);
> + task_unlock(tsk);
Thinking... Ok, I'm not sure this is 100% safe in the target tree, but
the long-term correct way probably isn't yet implemented in the net-
tree. Eventually you will want to:
net_ns = NULL;
rcu_read_lock();
tsk = find_task_by_pid(); /* or _pidns equiv? */
nsproxy = task_nsproxy(tsk);
if (nsproxy)
net_ns = get_net(nsproxy->net_ns);
rcu_read_unlock;
What you have here is probably unsafe if tsk is the last task pointing
to it's nsproxy and it does an unshare, bc unshare isn't protected by
task_lock, and you're not rcu_dereferencing tsk->nsproxy (which
task_nsproxy does). At one point we floated a patch to reuse the same
nsproxy in that case which would prevent you having to worry about it,
but that isn't being done in -mm now so i doubt it's in -net.
> + }
> + rcu_read_unlock();
> + return net;
> +}
> +
> static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
> struct nlattr **tb, char *ifname, int modified)
> {
> int send_addr_notify = 0;
> int err;
>
> + if (tb[IFLA_NET_NS_PID]) {
> + struct net *net;
> + net = get_net_ns_by_pid(nla_get_u32(tb[IFLA_NET_NS_PID]));
> + if (IS_ERR(net)) {
> + err = PTR_ERR(net);
> + goto errout;
> + }
> + err = dev_change_net_namespace(dev, net, ifname);
> + put_net(net);
> + if (err)
> + goto errout;
> + modified = 1;
> + }
> +
> if (tb[IFLA_MAP]) {
> struct rtnl_link_ifmap *u_map;
> struct ifmap k_map;
> --
> 1.5.3.rc6.17.g1911
>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply
* Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.
From: Eric W. Biederman @ 2007-09-10 19:30 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: David Miller, Linux Containers, netdev
In-Reply-To: <20070910190708.GA6872@sergelap.austin.ibm.com>
"Serge E. Hallyn" <serue@us.ibm.com> writes:
>>
>> +static struct net *get_net_ns_by_pid(pid_t pid)
>> +{
>> + struct task_struct *tsk;
>> + struct net *net;
>> +
>> + /* Lookup the network namespace */
>> + net = ERR_PTR(-ESRCH);
>> + rcu_read_lock();
>> + tsk = find_task_by_pid(pid);
>> + if (tsk) {
>> + task_lock(tsk);
>> + if (tsk->nsproxy)
>> + net = get_net(tsk->nsproxy->net_ns);
>> + task_unlock(tsk);
>
> Thinking... Ok, I'm not sure this is 100% safe in the target tree, but
> the long-term correct way probably isn't yet implemented in the net-
> tree. Eventually you will want to:
>
> net_ns = NULL;
> rcu_read_lock();
> tsk = find_task_by_pid(); /* or _pidns equiv? */
> nsproxy = task_nsproxy(tsk);
> if (nsproxy)
> net_ns = get_net(nsproxy->net_ns);
> rcu_read_unlock;
>
> What you have here is probably unsafe if tsk is the last task pointing
> to it's nsproxy and it does an unshare, bc unshare isn't protected by
> task_lock, and you're not rcu_dereferencing tsk->nsproxy (which
> task_nsproxy does). At one point we floated a patch to reuse the same
> nsproxy in that case which would prevent you having to worry about it,
> but that isn't being done in -mm now so i doubt it's in -net.
That change isn't merged upstream yet, so it isn't in David's
net-2.6.24 tree. Currently task->nsproxy is protected but
task_lock(current). So the code is fine.
I am aware that removing the task_lock(current) for the setting
of current->nsproxy is currently in the works, and I have planned
to revisit this later when all of these pieces come together.
For now the code is fine.
If need be we can drop this patch to remove the potential merge
conflict. But I figured it was useful for this part of the user space
interface to be available for review.
Eric
^ permalink raw reply
* [RFC PATCH 2/2] SCTP: Convert bind_addr_list locking to RCU
From: Vlad Yasevich @ 2007-09-10 19:46 UTC (permalink / raw)
To: netdev; +Cc: lksctp-developers, Vlad Yasevich
In-Reply-To: <11894535901969-git-send-email-vladislav.yasevich@hp.com>
Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU. This includes the
sctp_bind_addrs structure and it's list of bound addresses.
This list is currently protected by an external rw_lock and that
looks like an overkill. There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other. These are also relatively rare, so we
should be good with RCU.
The readers are varied and they are easily converted to RCU.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
include/net/sctp/structs.h | 3 -
net/sctp/associola.c | 14 +------
net/sctp/bind_addr.c | 59 ++++++++++++++++++----------
net/sctp/endpointola.c | 26 ++++---------
net/sctp/ipv6.c | 12 ++---
net/sctp/protocol.c | 25 +++++-------
net/sctp/sm_make_chunk.c | 17 +++-----
net/sctp/socket.c | 92 ++++++++++++++------------------------------
8 files changed, 97 insertions(+), 151 deletions(-)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 2591c49..1d46f7d 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1222,9 +1222,6 @@ struct sctp_ep_common {
* bind_addr.address_list is our set of local IP addresses.
*/
struct sctp_bind_addr bind_addr;
-
- /* Protection during address list comparisons. */
- rwlock_t addr_lock;
};
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 2ad1caf..9bad8ba 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -99,7 +99,6 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
/* Initialize the bind addr area. */
sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
- rwlock_init(&asoc->base.addr_lock);
asoc->state = SCTP_STATE_CLOSED;
@@ -937,8 +936,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
{
struct sctp_transport *transport;
- sctp_read_lock(&asoc->base.addr_lock);
-
if ((htons(asoc->base.bind_addr.port) == laddr->v4.sin_port) &&
(htons(asoc->peer.port) == paddr->v4.sin_port)) {
transport = sctp_assoc_lookup_paddr(asoc, paddr);
@@ -952,7 +949,6 @@ struct sctp_transport *sctp_assoc_is_match(struct sctp_association *asoc,
transport = NULL;
out:
- sctp_read_unlock(&asoc->base.addr_lock);
return transport;
}
@@ -1376,19 +1372,13 @@ int sctp_assoc_set_bind_addr_from_cookie(struct sctp_association *asoc,
int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
const union sctp_addr *laddr)
{
- int found;
+ int found = 0;
- sctp_read_lock(&asoc->base.addr_lock);
if ((asoc->base.bind_addr.port == ntohs(laddr->v4.sin_port)) &&
sctp_bind_addr_match(&asoc->base.bind_addr, laddr,
- sctp_sk(asoc->base.sk))) {
+ sctp_sk(asoc->base.sk)))
found = 1;
- goto out;
- }
- found = 0;
-out:
- sctp_read_unlock(&asoc->base.addr_lock);
return found;
}
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index 7fc369f..9c7db1f 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -167,7 +167,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
INIT_LIST_HEAD(&addr->list);
INIT_RCU_HEAD(&addr->rcu);
- list_add_tail(&addr->list, &bp->address_list);
+
+ rcu_read_lock();
+ list_add_tail_rcu(&addr->list, &bp->address_list);
+ rcu_read_unlock();
SCTP_DBG_OBJCNT_INC(addr);
return 0;
@@ -178,20 +181,23 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
*/
int sctp_del_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *del_addr)
{
- struct list_head *pos, *temp;
- struct sctp_sockaddr_entry *addr;
+ struct sctp_sockaddr_entry *addr, *temp;
- list_for_each_safe(pos, temp, &bp->address_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock_bh();
+ list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
if (sctp_cmp_addr_exact(&addr->a, del_addr)) {
/* Found the exact match. */
- list_del(pos);
- kfree(addr);
- SCTP_DBG_OBJCNT_DEC(addr);
-
- return 0;
+ addr->valid = 0;
+ list_del_rcu(&addr->list);
+ break;
}
}
+ rcu_read_unlock_bh();
+
+ if (addr && !addr->valid) {
+ call_rcu_bh(&addr->rcu, sctp_local_addr_free);
+ SCTP_DBG_OBJCNT_DEC(addr);
+ }
return -EINVAL;
}
@@ -302,15 +308,20 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
struct sctp_sock *opt)
{
struct sctp_sockaddr_entry *laddr;
- struct list_head *pos;
-
- list_for_each(pos, &bp->address_list) {
- laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
- if (opt->pf->cmp_addr(&laddr->a, addr, opt))
- return 1;
+ int match = 0;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+ if (!laddr->valid)
+ continue;
+ if (opt->pf->cmp_addr(&laddr->a, addr, opt)) {
+ match = 1;
+ break;
+ }
}
+ rcu_read_unlock();
- return 0;
+ return match;
}
/* Find the first address in the bind address list that is not present in
@@ -325,27 +336,31 @@ union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr *bp,
union sctp_addr *addr;
void *addr_buf;
struct sctp_af *af;
- struct list_head *pos;
int i;
- list_for_each(pos, &bp->address_list) {
- laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+ if (!laddr->valid)
+ continue;
addr_buf = (union sctp_addr *)addrs;
for (i = 0; i < addrcnt; i++) {
addr = (union sctp_addr *)addr_buf;
af = sctp_get_af_specific(addr->v4.sin_family);
if (!af)
- return NULL;
+ break;
if (opt->pf->cmp_addr(&laddr->a, addr, opt))
break;
addr_buf += af->sockaddr_len;
}
- if (i == addrcnt)
+ if (i == addrcnt) {
+ rcu_read_unlock();
return &laddr->a;
+ }
}
+ rcu_read_unlock();
return NULL;
}
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 1404a9e..fa10af5 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -92,7 +92,6 @@ static struct sctp_endpoint *sctp_endpoint_init(struct sctp_endpoint *ep,
/* Initialize the bind addr area */
sctp_bind_addr_init(&ep->base.bind_addr, 0);
- rwlock_init(&ep->base.addr_lock);
/* Remember who we are attached to. */
ep->base.sk = sk;
@@ -225,21 +224,14 @@ void sctp_endpoint_put(struct sctp_endpoint *ep)
struct sctp_endpoint *sctp_endpoint_is_match(struct sctp_endpoint *ep,
const union sctp_addr *laddr)
{
- struct sctp_endpoint *retval;
+ struct sctp_endpoint *retval = NULL;
- sctp_read_lock(&ep->base.addr_lock);
if (htons(ep->base.bind_addr.port) == laddr->v4.sin_port) {
if (sctp_bind_addr_match(&ep->base.bind_addr, laddr,
- sctp_sk(ep->base.sk))) {
+ sctp_sk(ep->base.sk)))
retval = ep;
- goto out;
- }
}
- retval = NULL;
-
-out:
- sctp_read_unlock(&ep->base.addr_lock);
return retval;
}
@@ -261,9 +253,7 @@ static struct sctp_association *__sctp_endpoint_lookup_assoc(
list_for_each(pos, &ep->asocs) {
asoc = list_entry(pos, struct sctp_association, asocs);
if (rport == asoc->peer.port) {
- sctp_read_lock(&asoc->base.addr_lock);
*transport = sctp_assoc_lookup_paddr(asoc, paddr);
- sctp_read_unlock(&asoc->base.addr_lock);
if (*transport)
return asoc;
@@ -295,20 +285,20 @@ struct sctp_association *sctp_endpoint_lookup_assoc(
int sctp_endpoint_is_peeled_off(struct sctp_endpoint *ep,
const union sctp_addr *paddr)
{
- struct list_head *pos;
struct sctp_sockaddr_entry *addr;
struct sctp_bind_addr *bp;
- sctp_read_lock(&ep->base.addr_lock);
bp = &ep->base.bind_addr;
- list_for_each(pos, &bp->address_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr, &bp->address_list, list) {
+ if (!addr->valid)
+ continue;
if (sctp_has_association(&addr->a, paddr)) {
- sctp_read_unlock(&ep->base.addr_lock);
+ rcu_read_unlock();
return 1;
}
}
- sctp_read_unlock(&ep->base.addr_lock);
+ rcu_read_unlock();
return 0;
}
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index fc2e4e2..4f6dc55 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -302,9 +302,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
union sctp_addr *saddr)
{
struct sctp_bind_addr *bp;
- rwlock_t *addr_lock;
struct sctp_sockaddr_entry *laddr;
- struct list_head *pos;
sctp_scope_t scope;
union sctp_addr *baddr = NULL;
__u8 matchlen = 0;
@@ -324,14 +322,14 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
scope = sctp_scope(daddr);
bp = &asoc->base.bind_addr;
- addr_lock = &asoc->base.addr_lock;
/* Go through the bind address list and find the best source address
* that matches the scope of the destination address.
*/
- sctp_read_lock(addr_lock);
- list_for_each(pos, &bp->address_list) {
- laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+ if (!laddr->valid)
+ continue;
if ((laddr->use_as_src) &&
(laddr->a.sa.sa_family == AF_INET6) &&
(scope <= sctp_scope(&laddr->a))) {
@@ -353,7 +351,7 @@ static void sctp_v6_get_saddr(struct sctp_association *asoc,
__FUNCTION__, asoc, NIP6(daddr->v6.sin6_addr));
}
- sctp_read_unlock(addr_lock);
+ rcu_read_unlock();
}
/* Make a copy of all potential local addresses. */
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index ac52f9e..a1030ed 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -222,7 +222,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
(copy_flags & SCTP_ADDR6_ALLOWED) &&
(copy_flags & SCTP_ADDR6_PEERSUPP)))) {
error = sctp_add_bind_addr(bp, &addr->a, 1,
- GFP_ATOMIC);
+ GFP_ATOMIC);
if (error)
goto end_copy;
}
@@ -426,9 +426,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
struct rtable *rt;
struct flowi fl;
struct sctp_bind_addr *bp;
- rwlock_t *addr_lock;
struct sctp_sockaddr_entry *laddr;
- struct list_head *pos;
struct dst_entry *dst = NULL;
union sctp_addr dst_saddr;
@@ -457,23 +455,20 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
goto out;
bp = &asoc->base.bind_addr;
- addr_lock = &asoc->base.addr_lock;
if (dst) {
/* Walk through the bind address list and look for a bind
* address that matches the source address of the returned dst.
*/
- sctp_read_lock(addr_lock);
- list_for_each(pos, &bp->address_list) {
- laddr = list_entry(pos, struct sctp_sockaddr_entry,
- list);
- if (!laddr->use_as_src)
+ rcu_read_lock();
+ list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+ if (!laddr->valid || !laddr->use_as_src)
continue;
sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
goto out_unlock;
}
- sctp_read_unlock(addr_lock);
+ rcu_read_unlock();
/* None of the bound addresses match the source address of the
* dst. So release it.
@@ -485,10 +480,10 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
/* Walk through the bind address list and try to get a dst that
* matches a bind address as the source address.
*/
- sctp_read_lock(addr_lock);
- list_for_each(pos, &bp->address_list) {
- laddr = list_entry(pos, struct sctp_sockaddr_entry, list);
-
+ rcu_read_lock();
+ list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+ if (!laddr->valid)
+ continue;
if ((laddr->use_as_src) &&
(AF_INET == laddr->a.sa.sa_family)) {
fl.fl4_src = laddr->a.v4.sin_addr.s_addr;
@@ -500,7 +495,7 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
}
out_unlock:
- sctp_read_unlock(addr_lock);
+ rcu_read_unlock();
out:
if (dst)
SCTP_DEBUG_PRINTK("rt_dst:%u.%u.%u.%u, rt_src:%u.%u.%u.%u\n",
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 79856c9..caaa29f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1531,7 +1531,7 @@ no_hmac:
/* Also, add the destination address. */
if (list_empty(&retval->base.bind_addr.address_list)) {
sctp_add_bind_addr(&retval->base.bind_addr, &chunk->dest, 1,
- GFP_ATOMIC);
+ GFP_ATOMIC);
}
retval->next_tsn = retval->c.initial_tsn;
@@ -2613,22 +2613,17 @@ static int sctp_asconf_param_success(struct sctp_association *asoc,
switch (asconf_param->param_hdr.type) {
case SCTP_PARAM_ADD_IP:
- sctp_local_bh_disable();
- sctp_write_lock(&asoc->base.addr_lock);
- list_for_each(pos, &bp->address_list) {
- saddr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock_bh();
+ list_for_each_entry_rcu(saddr, &bp->address_list, list) {
+ if (!saddr->valid)
+ continue;
if (sctp_cmp_addr_exact(&saddr->a, &addr))
saddr->use_as_src = 1;
}
- sctp_write_unlock(&asoc->base.addr_lock);
- sctp_local_bh_enable();
+ rcu_read_unlock_bh();
break;
case SCTP_PARAM_DEL_IP:
- sctp_local_bh_disable();
- sctp_write_lock(&asoc->base.addr_lock);
retval = sctp_del_bind_addr(bp, &addr);
- sctp_write_unlock(&asoc->base.addr_lock);
- sctp_local_bh_enable();
list_for_each(pos, &asoc->peer.transport_addr_list) {
transport = list_entry(pos, struct sctp_transport,
transports);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index a3acf78..35cc30c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -367,14 +367,10 @@ SCTP_STATIC int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
if (!bp->port)
bp->port = inet_sk(sk)->num;
- /* Add the address to the bind address list. */
- sctp_local_bh_disable();
- sctp_write_lock(&ep->base.addr_lock);
-
- /* Use GFP_ATOMIC since BHs are disabled. */
+ /* Add the address to the bind address list.
+ * Use GFP_ATOMIC since BHs will be disabled.
+ */
ret = sctp_add_bind_addr(bp, addr, 1, GFP_ATOMIC);
- sctp_write_unlock(&ep->base.addr_lock);
- sctp_local_bh_enable();
/* Copy back into socket for getsockname() use. */
if (!ret) {
@@ -497,7 +493,6 @@ static int sctp_send_asconf_add_ip(struct sock *sk,
void *addr_buf;
struct sctp_af *af;
struct list_head *pos;
- struct list_head *p;
int i;
int retval = 0;
@@ -544,14 +539,15 @@ static int sctp_send_asconf_add_ip(struct sock *sk,
if (i < addrcnt)
continue;
- /* Use the first address in bind addr list of association as
- * Address Parameter of ASCONF CHUNK.
+ /* Use the first valid address in bind addr list of
+ * association as Address Parameter of ASCONF CHUNK.
*/
- sctp_read_lock(&asoc->base.addr_lock);
bp = &asoc->base.bind_addr;
- p = bp->address_list.next;
- laddr = list_entry(p, struct sctp_sockaddr_entry, list);
- sctp_read_unlock(&asoc->base.addr_lock);
+ rcu_read_lock();
+ list_for_each_entry_rcu(laddr, &bp->address_list, list)
+ if (laddr->valid)
+ break;
+ rcu_read_unlock();
chunk = sctp_make_asconf_update_ip(asoc, &laddr->a, addrs,
addrcnt, SCTP_PARAM_ADD_IP);
@@ -567,8 +563,6 @@ static int sctp_send_asconf_add_ip(struct sock *sk,
/* Add the new addresses to the bind address list with
* use_as_src set to 0.
*/
- sctp_local_bh_disable();
- sctp_write_lock(&asoc->base.addr_lock);
addr_buf = addrs;
for (i = 0; i < addrcnt; i++) {
addr = (union sctp_addr *)addr_buf;
@@ -578,8 +572,6 @@ static int sctp_send_asconf_add_ip(struct sock *sk,
GFP_ATOMIC);
addr_buf += af->sockaddr_len;
}
- sctp_write_unlock(&asoc->base.addr_lock);
- sctp_local_bh_enable();
}
out:
@@ -651,14 +643,8 @@ static int sctp_bindx_rem(struct sock *sk, struct sockaddr *addrs, int addrcnt)
* socket routing and failover schemes. Refer to comments in
* sctp_do_bind(). -daisy
*/
- sctp_local_bh_disable();
- sctp_write_lock(&ep->base.addr_lock);
-
retval = sctp_del_bind_addr(bp, sa_addr);
- sctp_write_unlock(&ep->base.addr_lock);
- sctp_local_bh_enable();
-
addr_buf += af->sockaddr_len;
err_bindx_rem:
if (retval < 0) {
@@ -748,11 +734,9 @@ static int sctp_send_asconf_del_ip(struct sock *sk,
* make sure that we do not delete all the addresses in the
* association.
*/
- sctp_read_lock(&asoc->base.addr_lock);
bp = &asoc->base.bind_addr;
laddr = sctp_find_unmatch_addr(bp, (union sctp_addr *)addrs,
addrcnt, sp);
- sctp_read_unlock(&asoc->base.addr_lock);
if (!laddr)
continue;
@@ -766,23 +750,18 @@ static int sctp_send_asconf_del_ip(struct sock *sk,
/* Reset use_as_src flag for the addresses in the bind address
* list that are to be deleted.
*/
- sctp_local_bh_disable();
- sctp_write_lock(&asoc->base.addr_lock);
addr_buf = addrs;
for (i = 0; i < addrcnt; i++) {
laddr = (union sctp_addr *)addr_buf;
af = sctp_get_af_specific(laddr->v4.sin_family);
- list_for_each(pos1, &bp->address_list) {
- saddr = list_entry(pos1,
- struct sctp_sockaddr_entry,
- list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(saddr, &bp->address_list, list) {
if (sctp_cmp_addr_exact(&saddr->a, laddr))
saddr->use_as_src = 0;
}
+ rcu_read_unlock();
addr_buf += af->sockaddr_len;
}
- sctp_write_unlock(&asoc->base.addr_lock);
- sctp_local_bh_enable();
/* Update the route and saddr entries for all the transports
* as some of the addresses in the bind address list are
@@ -4057,11 +4036,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
int __user *optlen)
{
sctp_assoc_t id;
- struct list_head *pos;
struct sctp_bind_addr *bp;
struct sctp_association *asoc;
struct sctp_sockaddr_entry *addr;
- rwlock_t *addr_lock;
int cnt = 0;
if (len < sizeof(sctp_assoc_t))
@@ -4078,17 +4055,13 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
*/
if (0 == id) {
bp = &sctp_sk(sk)->ep->base.bind_addr;
- addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
} else {
asoc = sctp_id2assoc(sk, id);
if (!asoc)
return -EINVAL;
bp = &asoc->base.bind_addr;
- addr_lock = &asoc->base.addr_lock;
}
- sctp_read_lock(addr_lock);
-
/* If the endpoint is bound to 0.0.0.0 or ::0, count the valid
* addresses from the global local address list.
*/
@@ -4115,12 +4088,15 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
goto done;
}
- list_for_each(pos, &bp->address_list) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr, &bp->address_list, list) {
+ if (!addr->valid)
+ continue;
cnt ++;
}
+ rcu_read_unlock();
done:
- sctp_read_unlock(addr_lock);
return cnt;
}
@@ -4204,7 +4180,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
{
struct sctp_bind_addr *bp;
struct sctp_association *asoc;
- struct list_head *pos;
int cnt = 0;
struct sctp_getaddrs_old getaddrs;
struct sctp_sockaddr_entry *addr;
@@ -4212,7 +4187,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
union sctp_addr temp;
struct sctp_sock *sp = sctp_sk(sk);
int addrlen;
- rwlock_t *addr_lock;
int err = 0;
void *addrs;
void *buf;
@@ -4234,13 +4208,11 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
*/
if (0 == getaddrs.assoc_id) {
bp = &sctp_sk(sk)->ep->base.bind_addr;
- addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
} else {
asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
if (!asoc)
return -EINVAL;
bp = &asoc->base.bind_addr;
- addr_lock = &asoc->base.addr_lock;
}
to = getaddrs.addrs;
@@ -4254,8 +4226,6 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
if (!addrs)
return -ENOMEM;
- sctp_read_lock(addr_lock);
-
/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
* addresses from the global local address list.
*/
@@ -4271,8 +4241,10 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
}
buf = addrs;
- list_for_each(pos, &bp->address_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr, &bp->address_list, list) {
+ if (!addr->valid)
+ continue;
memcpy(&temp, &addr->a, sizeof(temp));
sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
@@ -4282,10 +4254,9 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len,
cnt ++;
if (cnt >= getaddrs.addr_num) break;
}
+ rcu_read_unlock();
copy_getaddrs:
- sctp_read_unlock(addr_lock);
-
/* copy the entire address list into the user provided space */
if (copy_to_user(to, addrs, bytes_copied)) {
err = -EFAULT;
@@ -4307,7 +4278,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
{
struct sctp_bind_addr *bp;
struct sctp_association *asoc;
- struct list_head *pos;
int cnt = 0;
struct sctp_getaddrs getaddrs;
struct sctp_sockaddr_entry *addr;
@@ -4315,7 +4285,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
union sctp_addr temp;
struct sctp_sock *sp = sctp_sk(sk);
int addrlen;
- rwlock_t *addr_lock;
int err = 0;
size_t space_left;
int bytes_copied = 0;
@@ -4336,13 +4305,11 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
*/
if (0 == getaddrs.assoc_id) {
bp = &sctp_sk(sk)->ep->base.bind_addr;
- addr_lock = &sctp_sk(sk)->ep->base.addr_lock;
} else {
asoc = sctp_id2assoc(sk, getaddrs.assoc_id);
if (!asoc)
return -EINVAL;
bp = &asoc->base.bind_addr;
- addr_lock = &asoc->base.addr_lock;
}
to = optval + offsetof(struct sctp_getaddrs,addrs);
@@ -4352,8 +4319,6 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
if (!addrs)
return -ENOMEM;
- sctp_read_lock(addr_lock);
-
/* If the endpoint is bound to 0.0.0.0 or ::0, get the valid
* addresses from the global local address list.
*/
@@ -4372,8 +4337,10 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
}
buf = addrs;
- list_for_each(pos, &bp->address_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr, &bp->address_list, list) {
+ if (!addr->valid)
+ continue;
memcpy(&temp, &addr->a, sizeof(temp));
sctp_get_pf_specific(sk->sk_family)->addr_v4map(sp, &temp);
addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
@@ -4387,10 +4354,9 @@ static int sctp_getsockopt_local_addrs(struct sock *sk, int len,
cnt ++;
space_left -= addrlen;
}
+ rcu_read_unlock();
copy_getaddrs:
- sctp_read_unlock(addr_lock);
-
if (copy_to_user(to, addrs, bytes_copied)) {
err = -EFAULT;
goto out;
@@ -4405,7 +4371,7 @@ copy_getaddrs:
goto out;
error_lock:
- sctp_read_unlock(addr_lock);
+ rcu_read_unlock();
out:
kfree(addrs);
--
1.5.2.4
^ permalink raw reply related
* [RFC PATH 0/2] Add RCU locking to SCTP address management
From: Vlad Yasevich @ 2007-09-10 19:46 UTC (permalink / raw)
To: netdev; +Cc: lksctp-developers
Hi All
This is my first attempt to add RCU synchronization to pieces of SCTP
and I want to make sure I do this right.
The RCU docs a somewhat outdated, and the calling conventions differ
between subsystems, so I am using what I've been able to find.
A bit of a background...
The whole problem started with a panic that led me to NULL deref while
walking a global list of addresses that SCTP maitains. That list
is modified curing the NETDEV_UP and NETDEV_DOWN notifier processing
and all readers are run in user context. It looks like the list was
modified while a user app was walking it and we crashed.
Easy enough to fix by adding locking. However, some notifiers
(ipv6 addrconf) run in atomic context and it says that they can't sleep.
(Q1. Why are ipv6 notifiers atomic, while IPv4 notifires are blocking?)
So, I decided to add RCU locking around that list. These events are
already synchronized, so no additional locking on the writer side are need,
so should be make for a nice RCU conversion.
While doing this conversion, I saw that the same structures are used to
maintain a list of bound addresses. This seemed like another opportunity
to use RCU. In this case, the readers are wide spread, but there are
only 2 wirters: BH processing of specific chunks, and bind() calls.
Again the writers are already synchronized on the socket lock, so they
will keep out of each others way. Readers are widespread, but rcu takes
care of that nicely.
So, can folks please take a look and make sure I didn't mess up the
rcu conventions.
Thanks a lot
-vlad
^ permalink raw reply
* [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
From: Vlad Yasevich @ 2007-09-10 19:46 UTC (permalink / raw)
To: netdev; +Cc: lksctp-developers, Vlad Yasevich
In-Reply-To: <11894535901969-git-send-email-vladislav.yasevich@hp.com>
sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers. As a result,
the readers can access an entry that has been freed and
crash the sytem.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
---
include/net/sctp/sctp.h | 1 +
include/net/sctp/structs.h | 2 +
net/sctp/bind_addr.c | 2 +
net/sctp/ipv6.c | 33 ++++++++++++++++++++--------
net/sctp/protocol.c | 50 ++++++++++++++++++++++++++++++-------------
net/sctp/socket.c | 38 ++++++++++++++++++++++-----------
6 files changed, 88 insertions(+), 38 deletions(-)
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d529045..c9cc00c 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -123,6 +123,7 @@
* sctp/protocol.c
*/
extern struct sock *sctp_get_ctl_sock(void);
+extern void sctp_local_addr_free(struct rcu_head *head);
extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
sctp_scope_t, gfp_t gfp,
int flags);
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index c0d5848..2591c49 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -737,8 +737,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk);
/* This is a structure for holding either an IPv6 or an IPv4 address. */
struct sctp_sockaddr_entry {
struct list_head list;
+ struct rcu_head rcu;
union sctp_addr a;
__u8 use_as_src;
+ __u8 valid;
};
typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index fdb287a..7fc369f 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
addr->a.v4.sin_port = htons(bp->port);
addr->use_as_src = use_as_src;
+ addr->valid = 1;
INIT_LIST_HEAD(&addr->list);
+ INIT_RCU_HEAD(&addr->rcu);
list_add_tail(&addr->list, &bp->address_list);
SCTP_DBG_OBJCNT_INC(addr);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index f8aa23d..fc2e4e2 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -77,13 +77,18 @@
#include <asm/uaccess.h>
-/* Event handler for inet6 address addition/deletion events. */
+/* Event handler for inet6 address addition/deletion events.
+ * This even is part of the atomic notifier call chain
+ * and thus happens atomically and can NOT sleep. As a result
+ * we can't and really don't need to add any locks to guard the
+ * RCU.
+ */
static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
void *ptr)
{
struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
- struct sctp_sockaddr_entry *addr;
- struct list_head *pos, *temp;
+ struct sctp_sockaddr_entry *addr = NULL;
+ struct sctp_sockaddr_entry *temp;
switch (ev) {
case NETDEV_UP:
@@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
sizeof(struct in6_addr));
addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
- list_add_tail(&addr->list, &sctp_local_addr_list);
+ addr->valid = 1;
+ rcu_read_lock();
+ list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+ rcu_read_unlock();
}
break;
case NETDEV_DOWN:
- list_for_each_safe(pos, temp, &sctp_local_addr_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
- if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
- list_del(pos);
- kfree(addr);
+ rcu_read_lock();
+ list_for_each_entry_safe(addr, temp,
+ &sctp_local_addr_list, list) {
+ if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
+ &ifa->addr)) {
+ addr->valid = 0;
+ list_del_rcu(&addr->list);
break;
}
}
-
+ rcu_read_unlock();
+ if (addr && !addr->valid)
+ call_rcu(&addr->rcu, sctp_local_addr_free);
break;
}
@@ -368,6 +380,7 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist,
addr->a.v6.sin6_addr = ifp->addr;
addr->a.v6.sin6_scope_id = dev->ifindex;
INIT_LIST_HEAD(&addr->list);
+ INIT_RCU_HEAD(&addr->rcu);
list_add_tail(&addr->list, addrlist);
}
}
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index e98579b..ac52f9e 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -153,6 +153,7 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
addr->a.v4.sin_family = AF_INET;
addr->a.v4.sin_port = 0;
addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
+ INIT_RCU_HEAD(&addr->rcu);
list_add_tail(&addr->list, addrlist);
}
}
@@ -192,16 +193,24 @@ static void sctp_free_local_addr_list(void)
}
}
+void sctp_local_addr_free(struct rcu_head *head)
+{
+ struct sctp_sockaddr_entry *e = container_of(head,
+ struct sctp_sockaddr_entry, rcu);
+ kfree(e);
+}
+
/* Copy the local addresses which are valid for 'scope' into 'bp'. */
int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
gfp_t gfp, int copy_flags)
{
struct sctp_sockaddr_entry *addr;
int error = 0;
- struct list_head *pos, *temp;
- list_for_each_safe(pos, temp, &sctp_local_addr_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+ if (!addr->valid)
+ continue;
if (sctp_in_scope(&addr->a, scope)) {
/* Now that the address is in scope, check to see if
* the address type is really supported by the local
@@ -221,6 +230,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
}
end_copy:
+ rcu_read_unlock();
return error;
}
@@ -600,13 +610,17 @@ static void sctp_v4_seq_dump_addr(struct seq_file *seq, union sctp_addr *addr)
seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
}
-/* Event handler for inet address addition/deletion events. */
+/* Event handler for inet address addition/deletion events.
+ * This is part of the blocking notifier call chain that is
+ * guarted by a mutex. As a result we don't need to add
+ * any additional guards for the RCU
+ */
static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
void *ptr)
{
struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
- struct sctp_sockaddr_entry *addr;
- struct list_head *pos, *temp;
+ struct sctp_sockaddr_entry *addr = NULL;
+ struct sctp_sockaddr_entry *temp;
switch (ev) {
case NETDEV_UP:
@@ -615,19 +629,25 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
addr->a.v4.sin_family = AF_INET;
addr->a.v4.sin_port = 0;
addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
- list_add_tail(&addr->list, &sctp_local_addr_list);
+ addr->valid = 1;
+ rcu_read_lock();
+ list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+ rcu_read_unlock();
}
break;
case NETDEV_DOWN:
- list_for_each_safe(pos, temp, &sctp_local_addr_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_safe(addr, temp,
+ &sctp_local_addr_list, list) {
if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
- list_del(pos);
- kfree(addr);
+ addr->valid = 0;
+ list_del_rcu(&addr->list);
break;
}
}
-
+ rcu_read_unlock();
+ if (addr && !addr->valid)
+ call_rcu(&addr->rcu, sctp_local_addr_free);
break;
}
@@ -1227,6 +1247,9 @@ SCTP_STATIC __exit void sctp_exit(void)
sctp_v6_del_protocol();
inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
+ /* Unregister notifier for inet address additions/deletions. */
+ unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
+
/* Free the local address list. */
sctp_free_local_addr_list();
@@ -1240,9 +1263,6 @@ SCTP_STATIC __exit void sctp_exit(void)
inet_unregister_protosw(&sctp_stream_protosw);
inet_unregister_protosw(&sctp_seqpacket_protosw);
- /* Unregister notifier for inet address additions/deletions. */
- unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
-
sctp_sysctl_unregister();
list_del(&sctp_ipv4_specific.list);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3335460..a3acf78 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4057,9 +4057,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
int __user *optlen)
{
sctp_assoc_t id;
+ struct list_head *pos;
struct sctp_bind_addr *bp;
struct sctp_association *asoc;
- struct list_head *pos, *temp;
struct sctp_sockaddr_entry *addr;
rwlock_t *addr_lock;
int cnt = 0;
@@ -4096,15 +4096,19 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
addr = list_entry(bp->address_list.next,
struct sctp_sockaddr_entry, list);
if (sctp_is_any(&addr->a)) {
- list_for_each_safe(pos, temp, &sctp_local_addr_list) {
- addr = list_entry(pos,
- struct sctp_sockaddr_entry,
- list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr,
+ &sctp_local_addr_list, list) {
+ if (!addr->valid)
+ continue;
+
if ((PF_INET == sk->sk_family) &&
(AF_INET6 == addr->a.sa.sa_family))
continue;
+
cnt++;
}
+ rcu_read_unlock();
} else {
cnt = 1;
}
@@ -4127,14 +4131,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
int max_addrs, void *to,
int *bytes_copied)
{
- struct list_head *pos, *next;
struct sctp_sockaddr_entry *addr;
union sctp_addr temp;
int cnt = 0;
int addrlen;
- list_for_each_safe(pos, next, &sctp_local_addr_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+ if (!addr->valid)
+ continue;
+
if ((PF_INET == sk->sk_family) &&
(AF_INET6 == addr->a.sa.sa_family))
continue;
@@ -4149,6 +4155,7 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
cnt ++;
if (cnt >= max_addrs) break;
}
+ rcu_read_unlock();
return cnt;
}
@@ -4156,14 +4163,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
size_t space_left, int *bytes_copied)
{
- struct list_head *pos, *next;
struct sctp_sockaddr_entry *addr;
union sctp_addr temp;
int cnt = 0;
int addrlen;
- list_for_each_safe(pos, next, &sctp_local_addr_list) {
- addr = list_entry(pos, struct sctp_sockaddr_entry, list);
+ rcu_read_lock();
+ list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
+ if (!addr->valid)
+ continue;
+
if ((PF_INET == sk->sk_family) &&
(AF_INET6 == addr->a.sa.sa_family))
continue;
@@ -4171,8 +4180,10 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
sctp_get_pf_specific(sk->sk_family)->addr_v4map(sctp_sk(sk),
&temp);
addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
- if (space_left < addrlen)
- return -ENOMEM;
+ if (space_left < addrlen) {
+ cnt = -ENOMEM;
+ break;
+ }
memcpy(to, &temp, addrlen);
to += addrlen;
@@ -4180,6 +4191,7 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
space_left -= addrlen;
*bytes_copied += addrlen;
}
+ rcu_read_unlock();
return cnt;
}
--
1.5.2.4
^ permalink raw reply related
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Kyle Moffett @ 2007-09-10 19:59 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Linus Torvalds, Arjan van de Ven, Nick Piggin, Satyam Sharma,
Herbert Xu, Paul Mackerras, Christoph Lameter, Chris Snook,
Ilpo Jarvinen, Paul E. McKenney, Stefan Richter,
Linux Kernel Mailing List, linux-arch, Netdev, Andrew Morton, ak,
heiko.carstens, David Miller, schwidefsky, wensong, horms, wjiang,
cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <200709101746.34513.vda.linux@googlemail.com>
On Sep 10, 2007, at 12:46:33, Denys Vlasenko wrote:
> My point is that people are confused as to what atomic_read()
> exactly means, and this is bad. Same for cpu_relax(). First one
> says "read", and second one doesn't say "barrier".
Q&A:
Q: When is it OK to use atomic_read()?
A: You are asking the question, so never.
Q: But I need to check the value of the atomic at this point in time...
A: Your code is buggy if it needs to do that on an atomic_t for
anything other than debugging or optimization. Use either
atomic_*_return() or a lock and some normal integers.
Q: "So why can't the atomic_read DTRT magically?"
A: Because "the right thing" depends on the situation and is usually
best done with something other than atomic_t.
If somebody can post some non-buggy code which is correctly using
atomic_read() *and* depends on the compiler generating extra
nonsensical loads due to "volatile" then the issue *might* be
reconsidered. This also includes samples of code which uses
atomic_read() and needs memory barriers (so that we can fix the buggy
code, not so we can change atomic_read()). So far the only code
samples anybody has posted are buggy regardless of whether or not the
value and/or accessors are flagged "volatile" or not. And hey, maybe
the volatile ops *should* be implemented in inline ASM for future-
proof-ness, but that's a separate issue.
Cheers,
Kyle Moffett
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-09-10 20:54 UTC (permalink / raw)
To: Christoph Lameter
Cc: Segher Boessenkool, Paul Mackerras, heiko.carstens, horms,
Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
David Miller, Ilpo Järvinen, ak, cfriesen, rpjday, Netdev,
jesper.juhl, linux-arch, Andrew Morton, zlynx, schwidefsky,
Chris Snook, Herbert Xu, Linus Torvalds, wensong, wjiang
In-Reply-To: <Pine.LNX.4.64.0709101157060.24491@schroedinger.engr.sgi.com>
On Mon, Sep 10, 2007 at 11:59:29AM -0700, Christoph Lameter wrote:
> On Fri, 17 Aug 2007, Segher Boessenkool wrote:
>
> > "volatile" has nothing to do with reordering. atomic_dec() writes
> > to memory, so it _does_ have "volatile semantics", implicitly, as
> > long as the compiler cannot optimise the atomic variable away
> > completely -- any store counts as a side effect.
>
> Stores can be reordered. Only x86 has (mostly) implicit write ordering. So
> no atomic_dec has no volatile semantics and may be reordered on a variety
> of processors. Writes to memory may not follow code order on several
> processors.
The one exception to this being the case where process-level code is
communicating to an interrupt handler running on that same CPU -- on
all CPUs that I am aware of, a given CPU always sees its own writes
in order.
Thanx, Paul
^ permalink raw reply
* Re: ne driver crashes when unloaded in 2.6.22.6
From: Dan Williams @ 2007-09-10 21:31 UTC (permalink / raw)
To: Chris Rankin; +Cc: netdev
In-Reply-To: <140033.19941.qm@web52911.mail.re2.yahoo.com>
On Sun, 2007-09-09 at 23:12 +0100, Chris Rankin wrote:
> --- Dan Williams <dcbw@redhat.com> wrote:
> > Offhand question, does your ne2000 card support carrier detection?
>
> Err... there is a /sys/class/net/eth0/carrier entry (I think - not in front of that machine right
> now). IIRC it said "1" when I read it.
Does it read '0' when you unplug the cable?
Dan
> Cheers,
> Chris
>
>
>
> ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-09-10 21:36 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Segher Boessenkool, Paul Mackerras, heiko.carstens, horms,
Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
David Miller, Ilpo Järvinen, ak, cfriesen, rpjday, Netdev,
jesper.juhl, linux-arch, Andrew Morton, zlynx, schwidefsky,
Chris Snook, Herbert Xu, Linus Torvalds, wensong, wjiang
In-Reply-To: <20070910205434.GF11801@linux.vnet.ibm.com>
On Mon, 10 Sep 2007, Paul E. McKenney wrote:
> The one exception to this being the case where process-level code is
> communicating to an interrupt handler running on that same CPU -- on
> all CPUs that I am aware of, a given CPU always sees its own writes
> in order.
Yes but that is due to the code path effectively continuing in the
interrupt handler. The cpu makes sure that op codes being executed always
see memory in a consistent way. The basic ordering problem with out of
order writes is therefore coming from other processors concurrently
executing code and holding variables in registers that are modified
elsewhere. The only solution that I know of are one or the other form of
barrier.
^ permalink raw reply
* Re: [RFC PATCH 1/2] SCTP: Add RCU synchronization around sctp_localaddr_list
From: Paul E. McKenney @ 2007-09-10 21:47 UTC (permalink / raw)
To: Vlad Yasevich; +Cc: netdev, lksctp-developers
In-Reply-To: <11894535913932-git-send-email-vladislav.yasevich@hp.com>
On Mon, Sep 10, 2007 at 03:46:29PM -0400, Vlad Yasevich wrote:
> sctp_localaddr_list is modified dynamically via NETDEV_UP
> and NETDEV_DOWN events, but there is not synchronization
> between writer (even handler) and readers. As a result,
> the readers can access an entry that has been freed and
> crash the sytem.
Good start, but few questions interspersed below...
Thanx, Paul
> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> ---
> include/net/sctp/sctp.h | 1 +
> include/net/sctp/structs.h | 2 +
> net/sctp/bind_addr.c | 2 +
> net/sctp/ipv6.c | 33 ++++++++++++++++++++--------
> net/sctp/protocol.c | 50 ++++++++++++++++++++++++++++++-------------
> net/sctp/socket.c | 38 ++++++++++++++++++++++-----------
> 6 files changed, 88 insertions(+), 38 deletions(-)
>
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index d529045..c9cc00c 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -123,6 +123,7 @@
> * sctp/protocol.c
> */
> extern struct sock *sctp_get_ctl_sock(void);
> +extern void sctp_local_addr_free(struct rcu_head *head);
> extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
> sctp_scope_t, gfp_t gfp,
> int flags);
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index c0d5848..2591c49 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -737,8 +737,10 @@ const union sctp_addr *sctp_source(const struct sctp_chunk *chunk);
> /* This is a structure for holding either an IPv6 or an IPv4 address. */
> struct sctp_sockaddr_entry {
> struct list_head list;
> + struct rcu_head rcu;
> union sctp_addr a;
> __u8 use_as_src;
> + __u8 valid;
> };
>
> typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index fdb287a..7fc369f 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -163,8 +163,10 @@ int sctp_add_bind_addr(struct sctp_bind_addr *bp, union sctp_addr *new,
> addr->a.v4.sin_port = htons(bp->port);
>
> addr->use_as_src = use_as_src;
> + addr->valid = 1;
>
> INIT_LIST_HEAD(&addr->list);
> + INIT_RCU_HEAD(&addr->rcu);
> list_add_tail(&addr->list, &bp->address_list);
> SCTP_DBG_OBJCNT_INC(addr);
>
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index f8aa23d..fc2e4e2 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -77,13 +77,18 @@
>
> #include <asm/uaccess.h>
>
> -/* Event handler for inet6 address addition/deletion events. */
> +/* Event handler for inet6 address addition/deletion events.
> + * This even is part of the atomic notifier call chain
> + * and thus happens atomically and can NOT sleep. As a result
> + * we can't and really don't need to add any locks to guard the
> + * RCU.
> + */
> static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
> void *ptr)
> {
> struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
> - struct sctp_sockaddr_entry *addr;
> - struct list_head *pos, *temp;
> + struct sctp_sockaddr_entry *addr = NULL;
> + struct sctp_sockaddr_entry *temp;
>
> switch (ev) {
> case NETDEV_UP:
> @@ -94,19 +99,26 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
> memcpy(&addr->a.v6.sin6_addr, &ifa->addr,
> sizeof(struct in6_addr));
> addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex;
> - list_add_tail(&addr->list, &sctp_local_addr_list);
> + addr->valid = 1;
> + rcu_read_lock();
> + list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
> + rcu_read_unlock();
If we are under the protection of the update-side mutex, the rcu_read_lock()
and rcu_read_unlock() are (harmlessly) redundant. If we are not under
the protection of some mutex, what prevents concurrent list_add_tail_rcu()
calls from messing up the sctp_sockaddr_entry list?
> }
> break;
> case NETDEV_DOWN:
> - list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> - if (ipv6_addr_equal(&addr->a.v6.sin6_addr, &ifa->addr)) {
> - list_del(pos);
> - kfree(addr);
> + rcu_read_lock();
> + list_for_each_entry_safe(addr, temp,
> + &sctp_local_addr_list, list) {
> + if (ipv6_addr_equal(&addr->a.v6.sin6_addr,
> + &ifa->addr)) {
> + addr->valid = 0;
> + list_del_rcu(&addr->list);
> break;
> }
> }
> -
> + rcu_read_unlock();
> + if (addr && !addr->valid)
> + call_rcu(&addr->rcu, sctp_local_addr_free);
Are we under the protection of the update-side lock here? If not,
what prevents two different tasks from executing this in parallel,
potentially tangling both the list that the sctp_sockaddr_entry list and
the internal RCU lists? (It is forbidden to call_rcu() a given element
twice concurrently.)
If we are in fact under the protection of the update-side lock, the
rcu_read_lock() and rcu_read_unlock() pairs are redundant (though this
is harmless, aside from the (small) potential for confusion).
> break;
> }
>
> @@ -368,6 +380,7 @@ static void sctp_v6_copy_addrlist(struct list_head *addrlist,
> addr->a.v6.sin6_addr = ifp->addr;
> addr->a.v6.sin6_scope_id = dev->ifindex;
> INIT_LIST_HEAD(&addr->list);
> + INIT_RCU_HEAD(&addr->rcu);
> list_add_tail(&addr->list, addrlist);
> }
> }
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index e98579b..ac52f9e 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -153,6 +153,7 @@ static void sctp_v4_copy_addrlist(struct list_head *addrlist,
> addr->a.v4.sin_family = AF_INET;
> addr->a.v4.sin_port = 0;
> addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
> + INIT_RCU_HEAD(&addr->rcu);
> list_add_tail(&addr->list, addrlist);
> }
> }
> @@ -192,16 +193,24 @@ static void sctp_free_local_addr_list(void)
> }
> }
>
> +void sctp_local_addr_free(struct rcu_head *head)
> +{
> + struct sctp_sockaddr_entry *e = container_of(head,
> + struct sctp_sockaddr_entry, rcu);
> + kfree(e);
> +}
> +
> /* Copy the local addresses which are valid for 'scope' into 'bp'. */
> int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
> gfp_t gfp, int copy_flags)
> {
> struct sctp_sockaddr_entry *addr;
> int error = 0;
> - struct list_head *pos, *temp;
>
> - list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> + rcu_read_lock();
> + list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> + if (!addr->valid)
> + continue;
What happens if the update-side code removes the element from the list
and marks it !->valid right here?
If this turns out to be harmless, why not just dispense with the ->valid
flag entirely?
> if (sctp_in_scope(&addr->a, scope)) {
> /* Now that the address is in scope, check to see if
> * the address type is really supported by the local
> @@ -221,6 +230,7 @@ int sctp_copy_local_addr_list(struct sctp_bind_addr *bp, sctp_scope_t scope,
> }
>
> end_copy:
> + rcu_read_unlock();
> return error;
> }
>
> @@ -600,13 +610,17 @@ static void sctp_v4_seq_dump_addr(struct seq_file *seq, union sctp_addr *addr)
> seq_printf(seq, "%d.%d.%d.%d ", NIPQUAD(addr->v4.sin_addr));
> }
>
> -/* Event handler for inet address addition/deletion events. */
> +/* Event handler for inet address addition/deletion events.
> + * This is part of the blocking notifier call chain that is
> + * guarted by a mutex. As a result we don't need to add
> + * any additional guards for the RCU
> + */
> static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
> void *ptr)
> {
> struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
> - struct sctp_sockaddr_entry *addr;
> - struct list_head *pos, *temp;
> + struct sctp_sockaddr_entry *addr = NULL;
> + struct sctp_sockaddr_entry *temp;
>
> switch (ev) {
> case NETDEV_UP:
> @@ -615,19 +629,25 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
> addr->a.v4.sin_family = AF_INET;
> addr->a.v4.sin_port = 0;
> addr->a.v4.sin_addr.s_addr = ifa->ifa_local;
> - list_add_tail(&addr->list, &sctp_local_addr_list);
> + addr->valid = 1;
> + rcu_read_lock();
> + list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
> + rcu_read_unlock();
Based on the additions to the header comment, I am assuming that we
hold an update-side mutex. This means that the rcu_read_lock() and
rcu_read_unlock() are (harmlessly) redundant.
> }
> break;
> case NETDEV_DOWN:
> - list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> + rcu_read_lock();
> + list_for_each_entry_safe(addr, temp,
> + &sctp_local_addr_list, list) {
> if (addr->a.v4.sin_addr.s_addr == ifa->ifa_local) {
> - list_del(pos);
> - kfree(addr);
> + addr->valid = 0;
> + list_del_rcu(&addr->list);
> break;
> }
> }
> -
> + rcu_read_unlock();
Ditto.
> + if (addr && !addr->valid)
> + call_rcu(&addr->rcu, sctp_local_addr_free);
This one is OK, since we hold the update-side mutex.
> break;
> }
>
> @@ -1227,6 +1247,9 @@ SCTP_STATIC __exit void sctp_exit(void)
> sctp_v6_del_protocol();
> inet_del_protocol(&sctp_protocol, IPPROTO_SCTP);
>
> + /* Unregister notifier for inet address additions/deletions. */
> + unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
> +
> /* Free the local address list. */
> sctp_free_local_addr_list();
>
> @@ -1240,9 +1263,6 @@ SCTP_STATIC __exit void sctp_exit(void)
> inet_unregister_protosw(&sctp_stream_protosw);
> inet_unregister_protosw(&sctp_seqpacket_protosw);
>
> - /* Unregister notifier for inet address additions/deletions. */
> - unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
> -
> sctp_sysctl_unregister();
> list_del(&sctp_ipv4_specific.list);
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 3335460..a3acf78 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4057,9 +4057,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
> int __user *optlen)
> {
> sctp_assoc_t id;
> + struct list_head *pos;
> struct sctp_bind_addr *bp;
> struct sctp_association *asoc;
> - struct list_head *pos, *temp;
> struct sctp_sockaddr_entry *addr;
> rwlock_t *addr_lock;
> int cnt = 0;
> @@ -4096,15 +4096,19 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len,
> addr = list_entry(bp->address_list.next,
> struct sctp_sockaddr_entry, list);
> if (sctp_is_any(&addr->a)) {
> - list_for_each_safe(pos, temp, &sctp_local_addr_list) {
> - addr = list_entry(pos,
> - struct sctp_sockaddr_entry,
> - list);
> + rcu_read_lock();
> + list_for_each_entry_rcu(addr,
> + &sctp_local_addr_list, list) {
> + if (!addr->valid)
> + continue;
> +
Again, what happens if the element is deleted just at this point?
If harmless, might be good to get rid of ->valid.
> if ((PF_INET == sk->sk_family) &&
> (AF_INET6 == addr->a.sa.sa_family))
> continue;
> +
> cnt++;
> }
> + rcu_read_unlock();
We are just counting these things, right? If on the other hand we are
keeping a reference outside of rcu_read_lock() protection, then there
needs to be some explicit mechanism preventing the corresponding entry
from being freed.
> } else {
> cnt = 1;
> }
> @@ -4127,14 +4131,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
> int max_addrs, void *to,
> int *bytes_copied)
> {
> - struct list_head *pos, *next;
> struct sctp_sockaddr_entry *addr;
> union sctp_addr temp;
> int cnt = 0;
> int addrlen;
>
> - list_for_each_safe(pos, next, &sctp_local_addr_list) {
> - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> + rcu_read_lock();
> + list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> + if (!addr->valid)
> + continue;
> +
Same question as before -- what happens if the element is deleted by some
other CPU (thus clearing ->valid) just at this point?
> if ((PF_INET == sk->sk_family) &&
> (AF_INET6 == addr->a.sa.sa_family))
> continue;
> @@ -4149,6 +4155,7 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
> cnt ++;
> if (cnt >= max_addrs) break;
> }
> + rcu_read_unlock();
>
> return cnt;
> }
> @@ -4156,14 +4163,16 @@ static int sctp_copy_laddrs_old(struct sock *sk, __u16 port,
> static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
> size_t space_left, int *bytes_copied)
> {
> - struct list_head *pos, *next;
> struct sctp_sockaddr_entry *addr;
> union sctp_addr temp;
> int cnt = 0;
> int addrlen;
>
> - list_for_each_safe(pos, next, &sctp_local_addr_list) {
> - addr = list_entry(pos, struct sctp_sockaddr_entry, list);
> + rcu_read_lock();
> + list_for_each_entry_rcu(addr, &sctp_local_addr_list, list) {
> + if (!addr->valid)
> + continue;
> +
And the same question here as well...
> if ((PF_INET == sk->sk_family) &&
> (AF_INET6 == addr->a.sa.sa_family))
> continue;
> @@ -4171,8 +4180,10 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
> sctp_get_pf_specific(sk->sk_family)->addr_v4map(sctp_sk(sk),
> &temp);
> addrlen = sctp_get_af_specific(temp.sa.sa_family)->sockaddr_len;
> - if (space_left < addrlen)
> - return -ENOMEM;
> + if (space_left < addrlen) {
> + cnt = -ENOMEM;
> + break;
> + }
> memcpy(to, &temp, addrlen);
>
> to += addrlen;
> @@ -4180,6 +4191,7 @@ static int sctp_copy_laddrs(struct sock *sk, __u16 port, void *to,
> space_left -= addrlen;
> *bytes_copied += addrlen;
> }
> + rcu_read_unlock();
>
> return cnt;
> }
> --
> 1.5.2.4
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-09-10 21:50 UTC (permalink / raw)
To: Christoph Lameter
Cc: Segher Boessenkool, Paul Mackerras, heiko.carstens, horms,
Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
David Miller, Ilpo Järvinen, ak, cfriesen, rpjday, Netdev,
jesper.juhl, linux-arch, Andrew Morton, zlynx, schwidefsky,
Chris Snook, Herbert Xu, Linus Torvalds, wensong, wjiang
In-Reply-To: <Pine.LNX.4.64.0709101433090.26043@schroedinger.engr.sgi.com>
On Mon, Sep 10, 2007 at 02:36:26PM -0700, Christoph Lameter wrote:
> On Mon, 10 Sep 2007, Paul E. McKenney wrote:
>
> > The one exception to this being the case where process-level code is
> > communicating to an interrupt handler running on that same CPU -- on
> > all CPUs that I am aware of, a given CPU always sees its own writes
> > in order.
>
> Yes but that is due to the code path effectively continuing in the
> interrupt handler. The cpu makes sure that op codes being executed always
> see memory in a consistent way. The basic ordering problem with out of
> order writes is therefore coming from other processors concurrently
> executing code and holding variables in registers that are modified
> elsewhere. The only solution that I know of are one or the other form of
> barrier.
So we are agreed then -- volatile accesses may be of some assistance when
interacting with interrupt handlers running on the same CPU (presumably
when using per-CPU variables), but are generally useless when sharing
variables among CPUs. Correct?
Thanx, Paul
^ permalink raw reply
* Re: why does tcp_v[46]_conn_request not inc MIB stats
From: Sridhar Samudrala @ 2007-09-10 21:54 UTC (permalink / raw)
To: Rick Jones; +Cc: Linux Network Development list
In-Reply-To: <46E5900A.3010009@hp.com>
On Mon, 2007-09-10 at 11:42 -0700, Rick Jones wrote:
> I've been digging around to see about inducing /proc/net/tcp to show
> some "interesting" things for listen sockets (eg backlog depth, its max,
> and dropped connection requests).
backlog depth(acceptq length) for a listening socket should be available
with the newer kernels. The following patch exports this value via the rx_queue
field in /proc/net/tcp.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47da8ee681d04e68ca1b1812c10e28162150d453
> While there I've noticed that both
> tcp_v[46]_syn_recv_sock and tcp_v[46]conn_request both check that the
> listen queue is full, but only tcp_v[46]_syn_recv_sock increments some
> mib stats for dropped connection requests.
>
> Is that deliberate, or is that a hole in the stats?
looks like it is a hole in the stats. I think we should increment
LISTENOVERFLOWS or LISTENDROPS in tcp_v[46]_conn_request too if the
SYN is dropped.
Thanks
Sridhar
^ permalink raw reply
* Re: [PATCH] ipconfig.c: De-clutter IP configuration report
From: Jan Engelhardt @ 2007-09-10 22:04 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: netdev, David S. Miller, linux-kernel
In-Reply-To: <Pine.LNX.4.64N.0709101249230.25038@blysk.ds.pg.gda.pl>
On Sep 10 2007 13:09, Maciej W. Rozycki wrote:
> The new code builds fine; no semantic changes.
>
> Please apply,
>
> Maciej
>
>patch-mips-2.6.23-rc5-20070904-ipconfig-printk-2
>diff -up --recursive --new-file linux-mips-2.6.23-rc5-20070904.macro/net/ipv4/ipconfig.c linux-mips-2.6.23-rc5-20070904/net/ipv4/ipconfig.c
>--- linux-mips-2.6.23-rc5-20070904.macro/net/ipv4/ipconfig.c 2007-09-04 04:56:22.000000000 +0000
>+++ linux-mips-2.6.23-rc5-20070904/net/ipv4/ipconfig.c 2007-09-10 11:53:19.000000000 +0000
>@@ -1364,17 +1364,17 @@ static int __init ip_auto_config(void)
> /*
> * Clue in the operator.
> */
>- printk("IP-Config: Complete:");
>- printk("\n device=%s", ic_dev->name);
>- printk(", addr=%u.%u.%u.%u", NIPQUAD(ic_myaddr));
>- printk(", mask=%u.%u.%u.%u", NIPQUAD(ic_netmask));
>- printk(", gw=%u.%u.%u.%u", NIPQUAD(ic_gateway));
>- printk(",\n host=%s, domain=%s, nis-domain=%s",
>- utsname()->nodename, ic_domain, utsname()->domainname);
>- printk(",\n bootserver=%u.%u.%u.%u", NIPQUAD(ic_servaddr));
>- printk(", rootserver=%u.%u.%u.%u", NIPQUAD(root_server_addr));
>- printk(", rootpath=%s", root_server_path);
>- printk("\n");
>+ pr_info("IP-Config: Complete:\n");
>+ pr_info(" device=%s, addr=%u.%u.%u.%u, "
>+ "mask=%u.%u.%u.%u, gw=%u.%u.%u.%u,\n",
>+ ic_dev->name, NIPQUAD(ic_myaddr),
>+ NIPQUAD(ic_netmask), NIPQUAD(ic_gateway));
>+ pr_info(" host=%s, domain=%s, nis-domain=%s,\n",
>+ utsname()->nodename, ic_domain, utsname()->domainname);
>+ pr_info(" bootserver=%u.%u.%u.%u, "
>+ "rootserver=%u.%u.%u.%u, rootpath=%s\n",
>+ NIPQUAD(ic_servaddr),
>+ NIPQUAD(root_server_addr), root_server_path);
> #endif /* !SILENT */
>
> return 0;
It should really be done in userspace. And ripped from the kernel.
Jan
--
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox