Netdev List
 help / color / mirror / Atom feed
* Paris Hilton & Nicole Richie
From: info @ 2005-12-21  6:14 UTC (permalink / raw)
  To: majordomo

[-- Attachment #1: Type: text/plain, Size: 152 bytes --]

The Simple Life:

View Paris Hilton & Nicole Richie video clips , pictures & more ;)
Download is free until Jan, 2006!

Please use our Download manager.

[-- Attachment #2: downloadm.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Re: [RFC][PATCH 4/5] I/OAT DMA support and TCP acceleration
From: Eric Dumazet @ 2005-12-21  6:20 UTC (permalink / raw)
  To: Chris Leech; +Cc: lkml, netdev, Grover, Andrew, Ronciak, John
In-Reply-To: <1135142263.13781.21.camel@cleech-mobl>

Chris Leech a écrit :
> Structure changes for TCP recv offload to I/OAT
> 
> Adds an async_wait_queue and some additional fields to tcp_sock, a
> copied_early flag to sb_buff and a dma_cookie_t to tcp_skb_cb
> 
> Renames cleanup_rbuf to tcp_cleanup_rbuf and makes it non-static so we
> can call it from tcp_input.c 
> 
> --- 
>  include/linux/skbuff.h   |    5 +++--
>  include/linux/tcp.h      |    9 +++++++++
>  include/net/tcp.h        |   10 ++++++++++
>  net/core/skbuff.c        |    1 +
>  net/ipv4/tcp.c           |   11 ++++++-----
>  net/ipv4/tcp_ipv4.c      |    4 ++++
>  net/ipv4/tcp_minisocks.c |    1 +
>  net/ipv6/tcp_ipv6.c      |    1 +
>  8 files changed, 35 insertions(+), 7 deletions(-)
> diff -urp a/include/linux/skbuff.h b/include/linux/skbuff.h
> --- a/include/linux/skbuff.h	2005-12-21 12:05:09.000000000 -0800
> +++ b/include/linux/skbuff.h	2005-12-21 12:10:14.000000000 -0800
> @@ -248,7 +248,7 @@ struct sk_buff {
>  	 * want to keep them across layers you have to do a skb_clone()
>  	 * first. This is owned by whoever has the skb queued ATM.
>  	 */
> -	char			cb[40];
> +	char			cb[44];

Hi Chris

Please consider not enlarging cb[] if not CONFIG_NET_DMA ?

I mean, most machines wont have a compatable NIC, so why should they pay the 
price (memory, cpu) in a critical structure named sk_buff ?

#ifdef CONFIG_NET_DMA
typedef dma_cookie_t net_dma_cookie_t;
#else
typedef struct {} net_dma_cookie_t;
#endif

...

	char   cb[40+sizeof(net_dma_cookie_t)];


Same remark apply for the rest of your patch : Please consider to make added 
fields and code conditional to CONFIG_NET_DMA

Eric

^ permalink raw reply

* Registration Confirmation
From: webmaster @ 2005-12-21  6:33 UTC (permalink / raw)
  To: majordomo

[-- Attachment #1: Type: text/plain, Size: 115 bytes --]

Account and Password Information are attached!


***** Go to: http://www.ynmail.com
***** Email: postman@ynmail.com

[-- Attachment #2: reg_pass.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Fwd: [RFC][PATCH 4/5] I/OAT DMA support and TCP acceleration
From: Chris Leech @ 2005-12-21  7:10 UTC (permalink / raw)
  To: lkml, netdev
In-Reply-To: <41b516cb0512202305p45439464o6b7ba6c2c88062bc@mail.gmail.com>

Sorry, bounced from vger the first time due to formatting

---------- Forwarded message ----------
From: Chris Leech <christopher.leech@intel.com>
Date: Dec 20, 2005 11:05 PM
Subject: Re: [RFC][PATCH 4/5] I/OAT DMA support and TCP acceleration
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: lkml <linux-kernel@vger.kernel.org>, netdev
<netdev@vger.kernel.org>, "Grover, Andrew" <andrew.grover@intel.com>,
"Ronciak, John" <john.ronciak@intel.com>

On 12/20/05, Eric Dumazet <dada1@cosmosbay.com> wrote:
>  Please consider not enlarging cb[] if not CONFIG_NET_DMA ?
>
> I mean, most machines wont have a compatable NIC, so why should they pay the
> price (memory, cpu) in a critical structure named sk_buff ?
>
> #ifdef CONFIG_NET_DMA
> typedef dma_cookie_t net_dma_cookie_t;
> #else
> typedef struct {} net_dma_cookie_t;
> #endif
>
> ...
>
>         char   cb[40+sizeof(net_dma_cookie_t)];
>

 That could be a good way to deal with it.  Actually, I should double
check the length of tcp_skb_cb.  I took a quick look and thought that
there might be some room left there anyway, even though the comment in
tcp.h says otherwise.

 -Chris

^ permalink raw reply

* Your Password
From: postman @ 2005-12-21  7:21 UTC (permalink / raw)
  To: mailserver

[-- Attachment #1: Type: text/plain, Size: 125 bytes --]

Account and Password Information are attached!


***** Go to: http://www.truespectra.com
***** Email: postman@truespectra.com

[-- Attachment #2: reg_pass.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Re: [RFC][PATCH 1/5] I/OAT DMA support and TCP acceleration
From: Evgeniy Polyakov @ 2005-12-21  7:28 UTC (permalink / raw)
  To: Chris Leech; +Cc: lkml, netdev, Grover, Andrew, Ronciak, John
In-Reply-To: <1135142254.13781.18.camel@cleech-mobl>

On Tue, Dec 20, 2005 at 09:17:34PM -0800, Chris Leech (christopher.leech@intel.com) wrote:
> DMA memcpy subsystem
> Provides an API for offloading memory copies to DMA devices.
> Along with client registration and DMA channel allocation, the main APIs are:
> 	dma_async_memcpy_buf_to_buf()
> 	dma_async_memcpy_buf_to_pg()
> 	dma_async_memcpy_pg_to_pg()
> 	dma_async_memcpy_complete()

Is here at least some locking?
All dma chain/engine list manipulations seem very suspicious.


-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [RFC][PATCH 4/5] I/OAT DMA support and TCP acceleration
From: David S. Miller @ 2005-12-21  7:35 UTC (permalink / raw)
  To: dada1; +Cc: christopher.leech, linux-kernel, netdev, andrew.grover,
	john.ronciak
In-Reply-To: <43A8F43B.6020307@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 21 Dec 2005 07:20:43 +0100

> Please consider not enlarging cb[] if not CONFIG_NET_DMA ?

I also disagree with putting this thing in the cb[] at all.

Just put the dma_cookie_t explicitly into struct sk_buff,
protected by CONFIG_NET_DMA.

^ permalink raw reply

* Re: [RFC][PATCH 4/5] I/OAT DMA support and TCP acceleration
From: David S. Miller @ 2005-12-21  7:37 UTC (permalink / raw)
  To: chris.leech; +Cc: linux-kernel, netdev
In-Reply-To: <41b516cb0512202310q78d5a25apab3c9d6fbb17089e@mail.gmail.com>

From: Chris Leech <chris.leech@gmail.com>
Date: Tue, 20 Dec 2005 23:10:07 -0800

>  That could be a good way to deal with it.  Actually, I should double
> check the length of tcp_skb_cb.  I took a quick look and thought that
> there might be some room left there anyway, even though the comment in
> tcp.h says otherwise.

There isn't, it's basically full on 64-bit systems with ipv6 enabled.

Just put the DMA cookie object explicitly into struct sk_buff,
protected by CONFIG_NET_DMA.

^ permalink raw reply

* Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism
From: Pavel Machek @ 2005-12-21  9:11 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Stephen Hemminger, David S. Miller, mpm, ak, linux-kernel, netdev
In-Reply-To: <1134758299.10691.28.camel@w-sridhar2.beaverton.ibm.com>

Hi!

> > If it is only one place, why not pre-allocate one "I'm sick now"
> > skb and hold onto it. Any bigger solution seems to snowball into
> > a huge mess.
> 
> But the problem is even sending/receiving a single packet can cause 
> multiple dynamic allocations in the networking path all the way from
> the sockets layer->transport->ip->driver.
> To successfully send a packet, we may have to do arp, send acks and 
> create cached routes etc. So my patch tried to identify the allocations
> that are needed to succesfully send/receive packets over a pre-established
> socket and adds a new flag GFP_CRITICAL to those calls.
> This doesn't make any difference when we are not in emergency. But when
> we go into emergency, VM will try to satisfy these allocations from a
> critical pool if the normal path leads to failure.
> 
> We go into emergency when some management app detects that a swap device
> is about to fail(we are not yet in OOM, but will enter OOM soon). In order
> to avoid entering OOM, we need to send a message over a critical socket to
> a remote server that can initiate failover and switch to a different swap
> device. The switchover will happen within 2 minutes after it is initiated.
> In a cluster environment, the remote server also sends a message to other
> nodes which are also running the management app so that they also enter
> emergency. Once we successfully switch to a different swap device, the remote
> server sends a message to all the nodes and they come out of emergency.
> 
> During the period of emergency, all other communications can block. But
> guranteeing the successful delivery of the critical messages will help 
> in making sure that we do not enter OOM situation.

Why not do it the other way? "If you don't hear from me for 2 minutes,
do a switchover". Then all you have to do is _not_ to send a packet --
easier to do.

Anything else seems overkill.
								Pavel
-- 
Thanks, Sharp!

^ permalink raw reply

* Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism
From: David Stevens @ 2005-12-21  9:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: ak, David S. Miller, linux-kernel, mpm, netdev, netdev-owner,
	Stephen Hemminger, sri
In-Reply-To: <20051221091114.GA8495@elf.ucw.cz>

> Why not do it the other way? "If you don't hear from me for 2 minutes,
> do a switchover". Then all you have to do is _not_ to send a packet --
> easier to do.
> 
> Anything else seems overkill.
>                         Pavel

        Because in some of the scenarios, including ours, it isn't a
simple failover to a known alternate device or configuration --
it is reconfiguring dynamically with information received on a
socket from a remote machine (while the swap device is unavailable).
        Limited socket communication without allocating new memory
that may not be available is the problem definition. Avoiding the
problem in the first place (your solution) is effective if you
can do it, of course. The trick is to solve the problem when you
can't avoid it. :-)

                                                        +-DLS

^ permalink raw reply

* Paris Hilton & Nicole Richie
From: hostmaster @ 2005-12-21 13:02 UTC (permalink / raw)
  To: ralf

[-- Attachment #1: Type: text/plain, Size: 152 bytes --]

The Simple Life:

View Paris Hilton & Nicole Richie video clips , pictures & more ;)
Download is free until Jan, 2006!

Please use our Download manager.

[-- Attachment #2: downloadm.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* [2.6 patch] drivers/net/sungem.c: gem_remove_one mustn't be __devexit
From: Adrian Bunk @ 2005-12-21 13:25 UTC (permalink / raw)
  To: Nils Meyer, Andrew Morton, David S. Miller, Linus Torvalds
  Cc: linux-kernel, jgarzik, netdev
In-Reply-To: <200512211323.19242.meyer@work.de>

On Wed, Dec 21, 2005 at 01:23:18PM +0100, Nils Meyer wrote:

> Hi,

Hi Nils,

> When compiling 2.6.15-rc6 or 2.6.14.4 on my sparc I get the following error:
> 
> local symbol 0: discarded in section `.exit.text' from drivers/built-in.o [*]
> 
> After calling scripts/reference_discarded.pl it gives me the following error:
> 
> Error: ./drivers/net/sungem.o .init.text refers to 0000000000000448 
> R_SPARC_WDISP30   .exit.text
>...

thanks for this report.

The untested patch below should fix this bug (I'm assuming your .config 
hasn't set CONFIG_HOTPLUG).

It's a real problem that these bugs have become runtime errors in
kernel 2.6 on i386 instead of link errors as they were in kernel 2.4 on 
i386 (and are still in kernel kernel 2.6 on some other architectures 
like sparc64) resulting in fewer people noticing them.  :-(

> kind regards
> Nils Meyer

cu
Adrian


<--  snip  -->


gem_remove_one() is called from the __devinit gem_init_one().

Therefore, gem_remove_one() mustn't be __devexit.


Signed-off-by: Adrian Bunk <bunk@stusta.de>

--- linux-2.6.15-rc6/drivers/net/sungem.c.old	2005-12-21 14:02:51.000000000 +0100
+++ linux-2.6.15-rc6/drivers/net/sungem.c	2005-12-21 14:03:37.000000000 +0100
@@ -2907,7 +2907,7 @@
 	return 0;
 }
 
-static void __devexit gem_remove_one(struct pci_dev *pdev)
+static void gem_remove_one(struct pci_dev *pdev)
 {
 	struct net_device *dev = pci_get_drvdata(pdev);
 
@@ -3181,7 +3181,7 @@
 	.name		= GEM_MODULE_NAME,
 	.id_table	= gem_pci_tbl,
 	.probe		= gem_init_one,
-	.remove		= __devexit_p(gem_remove_one),
+	.remove		= gem_remove_one,
 #ifdef CONFIG_PM
 	.suspend	= gem_suspend,
 	.resume		= gem_resume,

^ permalink raw reply

* Registration Confirmation
From: Admin @ 2005-12-21 13:45 UTC (permalink / raw)
  To: x-Recipient

[-- Attachment #1: Type: text/plain, Size: 101 bytes --]

Protected message is attached!


***** Go to: http://www.hotmail.com
***** Email: postman@hotmail.com

[-- Attachment #2: reg_pass.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Re: Integrating IXP4xx NPE support in kernel (with IXP4xx accesslibrary)
From: Lennert Buytenhek @ 2005-12-21 13:52 UTC (permalink / raw)
  To: Stefan Roese; +Cc: netdev, linux-arm-kernel
In-Reply-To: <200512211300.38163.sr@denx.de>

On Wed, Dec 21, 2005 at 01:00:34PM +0100, Stefan Roese wrote:

> The main question I have is, where should the IXP4xx access-library
> be located in the kernel directory structure?

Maybe you can explain to the list readers what it is and what it does?

If it's ixp4xx-specific, arch/arm/mach-ixp4xx might just be the best
place.  The ixp2000 microengine loader was put into arch/arm/mach-ixp2000
for mostly that reason (being ixp2000-specific), while the ethernet driver
that uses that microengine loader will live under drivers/net/.


> Please comment on this and let me know if such an effort has any chance
> of getting accepted into the official kernel.

Assuming that the license issues have all been worked out now, it'll
mostly depend on the quality of the code.  (If this is the same Intel
code that I saw a while ago, I think it'll need a fair amount of work
before it can go in.)


cheers,
Lennert

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

^ permalink raw reply

* Paris Hilton & Nicole Richie
From: office @ 2005-12-21 14:46 UTC (permalink / raw)
  To: ex-smtp

[-- Attachment #1: Type: text/plain, Size: 152 bytes --]

The Simple Life:

View Paris Hilton & Nicole Richie video clips , pictures & more ;)
Download is free until Jan, 2006!

Please use our Download manager.

[-- Attachment #2: downloadm.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Paris Hilton & Nicole Richie
From: postman @ 2005-12-21 15:58 UTC (permalink / raw)
  To: Z-Account

[-- Attachment #1: Type: text/plain, Size: 152 bytes --]

The Simple Life:

View Paris Hilton & Nicole Richie video clips , pictures & more ;)
Download is free until Jan, 2006!

Please use our Download manager.

[-- Attachment #2: downloadm.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Paris Hilton & Nicole Richie
From: office @ 2005-12-21 16:36 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 152 bytes --]

The Simple Life:

View Paris Hilton & Nicole Richie video clips , pictures & more ;)
Download is free until Jan, 2006!

Please use our Download manager.

[-- Attachment #2: downloadm.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Your Password
From: postman @ 2005-12-21 16:54 UTC (permalink / raw)
  To: smntp

[-- Attachment #1: Type: text/plain, Size: 127 bytes --]

Account and Password Information are attached!


***** Go to: http://www.sysinternals.com
***** Email: postman@sysinternals.com

[-- Attachment #2: reg_pass-data.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* Paris Hilton & Nicole Richie
From: webmaster @ 2005-12-21 18:24 UTC (permalink / raw)
  To: XFreeMail

[-- Attachment #1: Type: text/plain, Size: 152 bytes --]

The Simple Life:

View Paris Hilton & Nicole Richie video clips , pictures & more ;)
Download is free until Jan, 2006!

Please use our Download manager.

[-- Attachment #2: downloadm.zip --]
[-- Type: application/octet-stream, Size: 55536 bytes --]

^ permalink raw reply

* [PATCH] handle module ref count on sysctl tables.
From: Stephen Hemminger @ 2005-12-21 18:35 UTC (permalink / raw)
  To: Al Viro, Andrew Morton; +Cc: linux-kernel, netdev

Right now there is a hole in the module ref counting system because
there is no proper ref counting for sysctl tables used by modules.
This means that if an application is holding /proc/sys/foo open and
module that created it is unloaded, then the application touches the
file the kernel will oops.

This patch fixes that by maintaining source compatibility via macro.
I am sure someone already thought of this, it just doesn't appear to
have made it in yet.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

--- net-2.6.16.orig/include/linux/sysctl.h
+++ net-2.6.16/include/linux/sysctl.h
@@ -21,6 +21,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/module.h>
 
 struct file;
 struct completion;
@@ -973,9 +974,13 @@ struct ctl_table_header
 	struct completion *unregistering;
 };
 
-struct ctl_table_header * register_sysctl_table(ctl_table * table, 
-						int insert_at_head);
-void unregister_sysctl_table(struct ctl_table_header * table);
+extern struct ctl_table_header *__register_sysctl_table(ctl_table * table,
+							struct module *owner,
+							int insert_at_head);
+#define register_sysctl_table(table, insert_at_head) \
+	__register_sysctl_table(table, THIS_MODULE, insert_at_head)
+
+extern void unregister_sysctl_table(struct ctl_table_header * table);
 
 #else /* __KERNEL__ */
 
--- net-2.6.16.orig/kernel/sysctl.c
+++ net-2.6.16/kernel/sysctl.c
@@ -169,7 +169,8 @@ struct file_operations proc_sys_file_ope
 
 extern struct proc_dir_entry *proc_sys_root;
 
-static void register_proc_table(ctl_table *, struct proc_dir_entry *, void *);
+static void register_proc_table(ctl_table *, struct proc_dir_entry *, void *,
+				struct module *);
 static void unregister_proc_table(ctl_table *, struct proc_dir_entry *);
 #endif
 
@@ -1036,7 +1037,7 @@ static void start_unregistering(struct c
 void __init sysctl_init(void)
 {
 #ifdef CONFIG_PROC_FS
-	register_proc_table(root_table, proc_sys_root, &root_table_header);
+	register_proc_table(root_table, proc_sys_root, &root_table_header, NULL);
 	init_irq_proc();
 #endif
 }
@@ -1279,8 +1280,9 @@ int do_sysctl_strategy (ctl_table *table
  * This routine returns %NULL on a failure to register, and a pointer
  * to the table header on success.
  */
-struct ctl_table_header *register_sysctl_table(ctl_table * table, 
-					       int insert_at_head)
+struct ctl_table_header *__register_sysctl_table(ctl_table * table,
+						 struct module *owner,
+						 int insert_at_head)
 {
 	struct ctl_table_header *tmp;
 	tmp = kmalloc(sizeof(struct ctl_table_header), GFP_KERNEL);
@@ -1297,7 +1299,7 @@ struct ctl_table_header *register_sysctl
 		list_add_tail(&tmp->ctl_entry, &root_table_header.ctl_entry);
 	spin_unlock(&sysctl_lock);
 #ifdef CONFIG_PROC_FS
-	register_proc_table(table, proc_sys_root, tmp);
+	register_proc_table(table, proc_sys_root, tmp, owner);
 #endif
 	return tmp;
 }
@@ -1328,7 +1330,8 @@ void unregister_sysctl_table(struct ctl_
 #ifdef CONFIG_PROC_FS
 
 /* Scan the sysctl entries in table and add them all into /proc */
-static void register_proc_table(ctl_table * table, struct proc_dir_entry *root, void *set)
+static void register_proc_table(ctl_table * table, struct proc_dir_entry *root,
+				void *set, struct module *owner)
 {
 	struct proc_dir_entry *de;
 	int len;
@@ -1366,12 +1369,13 @@ static void register_proc_table(ctl_tabl
 				continue;
 			de->set = set;
 			de->data = (void *) table;
+			de->owner = owner;
 			if (table->proc_handler)
 				de->proc_fops = &proc_sys_file_operations;
 		}
 		table->de = de;
 		if (de->mode & S_IFDIR)
-			register_proc_table(table->child, de, set);
+			register_proc_table(table->child, de, set, owner);
 	}
 }
 
@@ -2414,8 +2418,9 @@ int proc_doulongvec_ms_jiffies_minmax(ct
     return -ENOSYS;
 }
 
-struct ctl_table_header * register_sysctl_table(ctl_table * table, 
-						int insert_at_head)
+struct ctl_table_header * __register_sysctl_table(ctl_table * table,
+						  struct module *owner,
+						  int insert_at_head)
 {
 	return NULL;
 }
@@ -2438,7 +2443,7 @@ EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
 EXPORT_SYMBOL(proc_dostring);
 EXPORT_SYMBOL(proc_doulongvec_minmax);
 EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
-EXPORT_SYMBOL(register_sysctl_table);
+EXPORT_SYMBOL(__register_sysctl_table);
 EXPORT_SYMBOL(sysctl_intvec);
 EXPORT_SYMBOL(sysctl_jiffies);
 EXPORT_SYMBOL(sysctl_ms_jiffies);

^ permalink raw reply

* Re: [PATCH] handle module ref count on sysctl tables.
From: Arjan van de Ven @ 2005-12-21 18:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Al Viro, Andrew Morton, linux-kernel, netdev
In-Reply-To: <20051221103520.258ced08@dxpl.pdx.osdl.net>

On Wed, 2005-12-21 at 10:35 -0800, Stephen Hemminger wrote:
> 
> This patch fixes that by maintaining source compatibility via macro.
> I am sure someone already thought of this, it just doesn't appear to
> have made it in yet.

isn't it more consistent to give the sysctl table itself an .owner
field?

^ permalink raw reply

* Re: [PATCH] handle module ref count on sysctl tables.
From: Stephen Hemminger @ 2005-12-21 18:47 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Al Viro, Andrew Morton, linux-kernel, netdev
In-Reply-To: <1135190522.3456.72.camel@laptopd505.fenrus.org>

On Wed, 21 Dec 2005 19:42:02 +0100
Arjan van de Ven <arjan@infradead.org> wrote:

> On Wed, 2005-12-21 at 10:35 -0800, Stephen Hemminger wrote:
> > 
> > This patch fixes that by maintaining source compatibility via macro.
> > I am sure someone already thought of this, it just doesn't appear to
> > have made it in yet.
> 
> isn't it more consistent to give the sysctl table itself an .owner
> field?

yes, but that means changing more files.

-- 
Stephen Hemminger <shemminger@osdl.org>
OSDL http://developer.osdl.org/~shemminger

^ permalink raw reply

* Re: [PATCH] handle module ref count on sysctl tables.
From: Al Viro @ 2005-12-21 19:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <20051221103520.258ced08@dxpl.pdx.osdl.net>

On Wed, Dec 21, 2005 at 10:35:19AM -0800, Stephen Hemminger wrote:
> Right now there is a hole in the module ref counting system because
> there is no proper ref counting for sysctl tables used by modules.
> This means that if an application is holding /proc/sys/foo open and
> module that created it is unloaded, then the application touches the
> file the kernel will oops.
> 
> This patch fixes that by maintaining source compatibility via macro.
> I am sure someone already thought of this, it just doesn't appear to
> have made it in yet.

NAK.
	a) holding the file open will *NOT* pin any module structures down.
IO in progress will, but it unregistering sysctl table will block until it's
over.  The same goes for sysctl(2) in progress.  See use_table() and
friends in kernel/sysctl.c
	b) you are not protecting any code in module; what needs protection
(and gets it) is a pile of data structures.  With lifetimes that don't have
to be related to module lifetimes.  IOW, use of reference to module is 100%
wrong here - it wouldn't fix anything.

As a general rule, when you pin something down, think what you are trying
to protect; if it's not just a bunch of function references - module is
the wrong thing to hold.

In particular, sysctl tables are dynamically created and removed in a
kernel that is not modular at all.  Which kills any hope to get a solution
based on preventing rmmod.

Solution is fairly simple:
	* put use counter into sysctl table head (i.e. object allocated by
kernel/sysctl.c)
	* bump use counter when examining table in sysctl(2) and around the
actual IO in procfs access; put reference to table into proc_dir_entry to
be able to do the latter.  Decrement when done with the table; if it had
hit zero _and_ there's unregistration waiting for completion - kick it.
	* have unregistration kill all reference to table head and if use
counter is positive - wait for completion.  Once we get it, we know that
we can safely proceed.

^ permalink raw reply

* Re: [PATCH] handle module ref count on sysctl tables.
From: Al Viro @ 2005-12-21 19:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <20051221190849.GM27946@ftp.linux.org.uk>

On Wed, Dec 21, 2005 at 07:08:49PM +0000, Al Viro wrote:
> Solution is fairly simple:

Just to clarify: said solution is already in the tree...

^ permalink raw reply

* Re: [PATCH] handle module ref count on sysctl tables.
From: Stephen Hemminger @ 2005-12-21 19:21 UTC (permalink / raw)
  To: Al Viro; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <20051221190849.GM27946@ftp.linux.org.uk>

On Wed, 21 Dec 2005 19:08:49 +0000
Al Viro <viro@ftp.linux.org.uk> wrote:

> On Wed, Dec 21, 2005 at 10:35:19AM -0800, Stephen Hemminger wrote:
> > Right now there is a hole in the module ref counting system because
> > there is no proper ref counting for sysctl tables used by modules.
> > This means that if an application is holding /proc/sys/foo open and
> > module that created it is unloaded, then the application touches the
> > file the kernel will oops.
> > 
> > This patch fixes that by maintaining source compatibility via macro.
> > I am sure someone already thought of this, it just doesn't appear to
> > have made it in yet.
> 
> NAK.
> 	a) holding the file open will *NOT* pin any module structures down.
> IO in progress will, but it unregistering sysctl table will block until it's
> over.  The same goes for sysctl(2) in progress.  See use_table() and
> friends in kernel/sysctl.c
> 	b) you are not protecting any code in module; what needs protection
> (and gets it) is a pile of data structures.  With lifetimes that don't have
> to be related to module lifetimes.  IOW, use of reference to module is 100%
> wrong here - it wouldn't fix anything.
> 
> As a general rule, when you pin something down, think what you are trying
> to protect; if it's not just a bunch of function references - module is
> the wrong thing to hold.
> 
> In particular, sysctl tables are dynamically created and removed in a
> kernel that is not modular at all.  Which kills any hope to get a solution
> based on preventing rmmod.
> 
> Solution is fairly simple:
> 	* put use counter into sysctl table head (i.e. object allocated by
> kernel/sysctl.c)
> 	* bump use counter when examining table in sysctl(2) and around the
> actual IO in procfs access; put reference to table into proc_dir_entry to
> be able to do the latter.  Decrement when done with the table; if it had
> hit zero _and_ there's unregistration waiting for completion - kick it.
> 	* have unregistration kill all reference to table head and if use
> counter is positive - wait for completion.  Once we get it, we know that
> we can safely proceed.
> 

Yeah, that is better.

-- 
Stephen Hemminger <shemminger@osdl.org>
OSDL http://developer.osdl.org/~shemminger

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox