Netdev List
 help / color / mirror / Atom feed
* nfs: [PATCH 27/31] enable swap on NFS
From: Suresh Jayaraman @ 2009-10-01 14:10 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

From: Peter Zijlstra <a.p.zijlstra@chello.nl> 

Implement all the new swapfile a_ops for NFS. This will set the NFS socket to
SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset
SOCK_MEMALLOC before engaging the protocol ->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related objects
and the early (re)setting of SOCK_MEMALLOC should allow us to receive the
packets required for the TCP connection buildup.

(swapping continues over a server reset during heavy network traffic)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 fs/nfs/Kconfig              |   10 ++++++
 fs/nfs/file.c               |   18 +++++++++++
 fs/nfs/write.c              |   22 +++++++++++++
 include/linux/nfs_fs.h      |    2 +
 include/linux/sunrpc/xprt.h |    5 ++-
 net/sunrpc/Kconfig          |    5 +++
 net/sunrpc/sched.c          |    9 ++++-
 net/sunrpc/xprtsock.c       |   70 ++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 138 insertions(+), 3 deletions(-)

Index: mmotm/fs/nfs/file.c
===================================================================
--- mmotm.orig/fs/nfs/file.c
+++ mmotm/fs/nfs/file.c
@@ -468,6 +468,18 @@ static int nfs_launder_page(struct page
 	return nfs_wb_page(inode, page);
 }
 
+#ifdef CONFIG_NFS_SWAP
+static int nfs_swapon(struct file *file)
+{
+	return xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 1);
+}
+
+static int nfs_swapoff(struct file *file)
+{
+	return xs_swapper(NFS_CLIENT(file->f_mapping->host)->cl_xprt, 0);
+}
+#endif
+
 const struct address_space_operations nfs_file_aops = {
 	.readpage = nfs_readpage,
 	.readpages = nfs_readpages,
@@ -480,6 +492,12 @@ const struct address_space_operations nf
 	.releasepage = nfs_release_page,
 	.direct_IO = nfs_direct_IO,
 	.launder_page = nfs_launder_page,
+#ifdef CONFIG_NFS_SWAP
+	.swapon = nfs_swapon,
+	.swapoff = nfs_swapoff,
+	.swap_out = nfs_swap_out,
+	.swap_in = nfs_readpage,
+#endif
 };
 
 /*
Index: mmotm/fs/nfs/write.c
===================================================================
--- mmotm.orig/fs/nfs/write.c
+++ mmotm/fs/nfs/write.c
@@ -344,6 +344,28 @@ int nfs_writepage(struct page *page, str
 	return ret;
 }
 
+static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
+		unsigned int offset, unsigned int count);
+
+int nfs_swap_out(struct file *file, struct page *page,
+		 struct writeback_control *wbc)
+{
+	struct nfs_open_context *ctx = nfs_file_open_context(file);
+	int status;
+
+	status = nfs_writepage_setup(ctx, page, 0, nfs_page_length(page));
+	if (status < 0) {
+		nfs_set_pageerror(page);
+		goto out;
+	}
+
+	status = nfs_writepage_locked(page, wbc);
+
+out:
+	unlock_page(page);
+	return status;
+}
+
 static int nfs_writepages_callback(struct page *page, struct writeback_control *wbc, void *data)
 {
 	int ret;
Index: mmotm/include/linux/nfs_fs.h
===================================================================
--- mmotm.orig/include/linux/nfs_fs.h
+++ mmotm/include/linux/nfs_fs.h
@@ -473,6 +473,8 @@ extern int  nfs_writepages(struct addres
 extern int  nfs_flush_incompatible(struct file *file, struct page *page);
 extern int  nfs_updatepage(struct file *, struct page *, unsigned int, unsigned int);
 extern int nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);
+extern int  nfs_swap_out(struct file *file, struct page *page,
+			 struct writeback_control *wbc);
 
 /*
  * Try to write back everything synchronously (but check the
Index: mmotm/include/linux/sunrpc/xprt.h
===================================================================
--- mmotm.orig/include/linux/sunrpc/xprt.h
+++ mmotm/include/linux/sunrpc/xprt.h
@@ -153,7 +153,9 @@ struct rpc_xprt {
 	unsigned int		max_reqs;	/* total slots */
 	unsigned long		state;		/* transport state */
 	unsigned char		shutdown   : 1,	/* being shut down */
-				resvport   : 1; /* use a reserved port */
+				resvport   : 1, /* use a reserved port */
+				swapper    : 1; /* we're swapping over this
+						   transport */
 	unsigned int		bind_index;	/* bind function index */
 
 	/*
@@ -285,6 +287,7 @@ void			xprt_release_rqst_cong(struct rpc
 void			xprt_disconnect_done(struct rpc_xprt *xprt);
 void			xprt_force_disconnect(struct rpc_xprt *xprt);
 void			xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
+int			xs_swapper(struct rpc_xprt *xprt, int enable);
 
 /*
  * Reserved bit positions in xprt->state
Index: mmotm/net/sunrpc/sched.c
===================================================================
--- mmotm.orig/net/sunrpc/sched.c
+++ mmotm/net/sunrpc/sched.c
@@ -735,7 +735,10 @@ struct rpc_buffer {
 void *rpc_malloc(struct rpc_task *task, size_t size)
 {
 	struct rpc_buffer *buf;
-	gfp_t gfp = RPC_IS_SWAPPER(task) ? GFP_ATOMIC : GFP_NOWAIT;
+	gfp_t gfp = GFP_NOWAIT;
+
+	if (RPC_IS_SWAPPER(task))
+		gfp |= __GFP_MEMALLOC;
 
 	size += sizeof(struct rpc_buffer);
 	if (size <= RPC_BUFFER_MAXSIZE)
@@ -806,6 +809,8 @@ static void rpc_init_task(struct rpc_tas
 		kref_get(&task->tk_client->cl_kref);
 		if (task->tk_client->cl_softrtry)
 			task->tk_flags |= RPC_TASK_SOFT;
+		if (task->tk_client->cl_xprt->swapper)
+			task->tk_flags |= RPC_TASK_SWAPPER;
 	}
 
 	if (task->tk_ops->rpc_call_prepare != NULL)
@@ -831,7 +836,7 @@ static void rpc_init_task(struct rpc_tas
 static struct rpc_task *
 rpc_alloc_task(void)
 {
-	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOFS);
+	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOIO);
 }
 
 /*
Index: mmotm/net/sunrpc/xprtsock.c
===================================================================
--- mmotm.orig/net/sunrpc/xprtsock.c
+++ mmotm/net/sunrpc/xprtsock.c
@@ -1719,6 +1719,57 @@ static inline void xs_reclassify_socket6
 }
 #endif
 
+#ifdef CONFIG_SUNRPC_SWAP
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+
+	if (xprt->swapper)
+		sk_set_memalloc(transport->inet);
+}
+
+#define RPC_BUF_RESERVE_PAGES \
+	kmalloc_estimate_objs(sizeof(struct rpc_rqst), GFP_KERNEL, RPC_MAX_SLOT_TABLE)
+#define RPC_RESERVE_PAGES	(RPC_BUF_RESERVE_PAGES + TX_RESERVE_PAGES)
+
+/**
+ * xs_swapper - Tag this transport as being used for swap.
+ * @xprt: transport to tag
+ * @enable: enable/disable
+ *
+ */
+int xs_swapper(struct rpc_xprt *xprt, int enable)
+{
+	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
+			xprt);
+	int err = 0;
+
+	if (enable) {
+		/*
+		 * keep one extra sock reference so the reserve won't dip
+		 * when the socket gets reconnected.
+		 */
+		err = sk_adjust_memalloc(1, RPC_RESERVE_PAGES);
+		if (!err) {
+			xprt->swapper = 1;
+			xs_set_memalloc(xprt);
+		}
+	} else if (xprt->swapper) {
+		xprt->swapper = 0;
+		sk_clear_memalloc(transport->inet);
+		sk_adjust_memalloc(-1, -RPC_RESERVE_PAGES);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(xs_swapper);
+#else
+static void xs_set_memalloc(struct rpc_xprt *xprt)
+{
+}
+#endif
+
 static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 {
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
@@ -1743,6 +1794,8 @@ static void xs_udp_finish_connecting(str
 		transport->sock = sock;
 		transport->inet = sk;
 
+		xs_set_memalloc(xprt);
+
 		write_unlock_bh(&sk->sk_callback_lock);
 	}
 	xs_udp_do_set_buffer_size(xprt);
@@ -1760,11 +1813,15 @@ static void xs_udp_connect_worker4(struc
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int err, status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 
@@ -1788,6 +1845,7 @@ static void xs_udp_connect_worker4(struc
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /**
@@ -1802,11 +1860,15 @@ static void xs_udp_connect_worker6(struc
 		container_of(work, struct sock_xprt, connect_worker.work);
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int err, status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	/* Start by resetting any existing state */
 	xs_reset_transport(transport);
 
@@ -1830,6 +1892,7 @@ static void xs_udp_connect_worker6(struc
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 /*
@@ -1904,6 +1967,8 @@ static int xs_tcp_finish_connecting(stru
 	if (!xprt_bound(xprt))
 		return -ENOTCONN;
 
+	xs_set_memalloc(xprt);
+
 	/* Tell the socket layer to start connecting... */
 	xprt->stat.connect_count++;
 	xprt->stat.connect_start = jiffies;
@@ -1924,11 +1989,15 @@ static void xs_tcp_setup_socket(struct r
 			struct sock_xprt *))
 {
 	struct socket *sock = transport->sock;
+	unsigned long pflags = current->flags;
 	int status = -EIO;
 
 	if (xprt->shutdown)
 		goto out;
 
+	if (xprt->swapper)
+		current->flags |= PF_MEMALLOC;
+
 	if (!sock) {
 		clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
 		sock = create_sock(xprt, transport);
@@ -1981,6 +2050,7 @@ out_eagain:
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
+	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 }
 
 static struct socket *xs_create_tcp_sock4(struct rpc_xprt *xprt,
Index: mmotm/fs/nfs/Kconfig
===================================================================
--- mmotm.orig/fs/nfs/Kconfig
+++ mmotm/fs/nfs/Kconfig
@@ -74,6 +74,16 @@ config NFS_V4
 
 	  If unsure, say N.
 
+config NFS_SWAP
+	bool "Provide swap over NFS support"
+	default n
+	depends on NFS_FS
+	select SUNRPC_SWAP
+	help
+	  This option enables swapon to work on files located on NFS mounts.
+
+	  For more details, see Documentation/network-swap.txt
+
 config NFS_V4_1
 	bool "NFS client support for NFSv4.1 (DEVELOPER ONLY)"
 	depends on NFS_V4 && EXPERIMENTAL
Index: mmotm/net/sunrpc/Kconfig
===================================================================
--- mmotm.orig/net/sunrpc/Kconfig
+++ mmotm/net/sunrpc/Kconfig
@@ -17,6 +17,11 @@ config SUNRPC_XPRT_RDMA
 
 	  If unsure, say N.
 
+config SUNRPC_SWAP
+	def_bool n
+	depends on SUNRPC
+	select NETVM
+
 config RPCSEC_GSS_KRB5
 	tristate "Secure RPC: Kerberos V mechanism (EXPERIMENTAL)"
 	depends on SUNRPC && EXPERIMENTAL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 28/31] nfs: fix various memory recursions possible with swap over NFS.
From: Suresh Jayaraman @ 2009-10-01 14:10 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

From: Peter Zijlstra <a.p.zijlstra@chello.nl> 

GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO,
just not of any filesystem data.

The problem is that previuosly NOFS was correct because that avoids
recursion into the NFS code, it now is not, because also IO (swap) can
lead to this recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 fs/nfs/pagelist.c |    2 +-
 fs/nfs/write.c    |    7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

Index: mmotm/fs/nfs/write.c
===================================================================
--- mmotm.orig/fs/nfs/write.c
+++ mmotm/fs/nfs/write.c
@@ -48,7 +48,7 @@ static mempool_t *nfs_commit_mempool;
 
 struct nfs_write_data *nfs_commitdata_alloc(void)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -67,7 +67,7 @@ void nfs_commit_free(struct nfs_write_da
 
 struct nfs_write_data *nfs_writedata_alloc(unsigned int pagecount)
 {
-	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS);
+	struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOIO);
 
 	if (p) {
 		memset(p, 0, sizeof(*p));
@@ -77,7 +77,8 @@ struct nfs_write_data *nfs_writedata_all
 		if (pagecount <= ARRAY_SIZE(p->page_array))
 			p->pagevec = p->page_array;
 		else {
-			p->pagevec = kcalloc(pagecount, sizeof(struct page *), GFP_NOFS);
+			p->pagevec = kcalloc(pagecount, sizeof(struct page *),
+					GFP_NOIO);
 			if (!p->pagevec) {
 				mempool_free(p, nfs_wdata_mempool);
 				p = NULL;
Index: mmotm/fs/nfs/pagelist.c
===================================================================
--- mmotm.orig/fs/nfs/pagelist.c
+++ mmotm/fs/nfs/pagelist.c
@@ -27,7 +27,7 @@ static inline struct nfs_page *
 nfs_page_alloc(void)
 {
 	struct nfs_page	*p;
-	p = kmem_cache_alloc(nfs_page_cachep, GFP_KERNEL);
+	p = kmem_cache_alloc(nfs_page_cachep, GFP_NOIO);
 	if (p) {
 		memset(p, 0, sizeof(*p));
 		INIT_LIST_HEAD(&p->wb_list);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 29/31] Cope with racy nature of sync_page in swap_sync_page
From: Suresh Jayaraman @ 2009-10-01 14:10 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

From: NeilBrown <neilb@suse.de>

sync_page is called without that PageLock held.  This means that,
for example, PageSwapCache can be cleared at any time.
We need to be careful not to put much trust any any part of the page.

So allow page_swap_info to return NULL of the page is no longer
in a SwapCache, and handle the NULL gracefully in swap_sync_page.

No other calls need to handle the NULL as that all hold PageLock,
so PageSwapCache cannot be cleared by surprise.  Add a WARN_ON to 
document this fact and help find out if I am wrong.

Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 mm/page_io.c  |    2 ++
 mm/swapfile.c |    8 +++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

Index: mmotm/mm/page_io.c
===================================================================
--- mmotm.orig/mm/page_io.c
+++ mmotm/mm/page_io.c
@@ -137,6 +137,8 @@ void swap_sync_page(struct page *page)
 {
 	struct swap_info_struct *sis = page_swap_info(page);
 
+	if (!sis)
+		return;
 	if (sis->flags & SWP_FILE) {
 		struct address_space *mapping = sis->swap_file->f_mapping;
 
Index: mmotm/mm/swapfile.c
===================================================================
--- mmotm.orig/mm/swapfile.c
+++ mmotm/mm/swapfile.c
@@ -2185,7 +2185,13 @@ get_swap_info_struct(unsigned type)
 struct swap_info_struct *page_swap_info(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
-	BUG_ON(!PageSwapCache(page));
+	if (!PageSwapCache(page) || !swap.val) {
+		/* This should only happen from sync_page.
+		 * In other cases the page should be locked and
+		 * should be in a SwapCache
+		 */
+		return NULL;
+	}
 	return &swap_info[swp_type(swap)];
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 30/31] Fix use of uninitialized variable in cache_grow()
From: Suresh Jayaraman @ 2009-10-01 14:10 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

From: Miklos Szeredi <mszeredi@suse.cz>

This fixes a bug in reserve-slub.patch.

If cache_grow() was called with objp != NULL then the 'reserve' local
variable wasn't initialized. This resulted in ac->reserve being set to
a rubbish value.  Due to this in some circumstances huge amounts of
slab pages were allocated (due to slab_force_alloc() returning true),
which caused atomic page allocation failures and slowdown of the
system.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 mm/slab.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: mmotm/mm/slab.c
===================================================================
--- mmotm.orig/mm/slab.c
+++ mmotm/mm/slab.c
@@ -2760,7 +2760,7 @@ static int cache_grow(struct kmem_cache
 	size_t offset;
 	gfp_t local_flags;
 	struct kmem_list3 *l3;
-	int reserve;
+	int reserve = -1;
 
 	/*
 	 * Be lazy and only check for valid flags here,  keeping it out of the
@@ -2816,7 +2816,8 @@ static int cache_grow(struct kmem_cache
 	if (local_flags & __GFP_WAIT)
 		local_irq_disable();
 	check_irq_off();
-	slab_set_reserve(cachep, reserve);
+	if (reserve != -1)
+		slab_set_reserve(cachep, reserve);
 	spin_lock(&l3->list_lock);
 
 	/* Make slab active. */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 31/31] swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
From: Suresh Jayaraman @ 2009-10-01 14:11 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

While testing Swap over NFS patchset, I noticed an oops that was triggered
during swapon. Investigating further, the NULL pointer deference is due to the
SSD device check/optimization in the swapon code that assumes s_bdev is not
NULL.

inode->i_sb->s_bdev could be NULL in a few cases. For e.g. one such case is
loopback NFS mount, there could be others as well. Fix this by ensuring s_bdev
is not NULL before we try to deference s_bdev.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 mm/swapfile.c |   26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

Index: mmotm/mm/swapfile.c
===================================================================
--- mmotm.orig/mm/swapfile.c
+++ mmotm/mm/swapfile.c
@@ -160,10 +160,12 @@ static int discard_swap(struct swap_info
 				continue;
 		}
 
-		err = blkdev_issue_discard(si->bdev, start_block,
+		if (si->bdev) {
+			err = blkdev_issue_discard(si->bdev, start_block,
 						nr_blocks, GFP_KERNEL);
-		if (err)
-			break;
+			if (err)
+				break;
+		}
 
 		cond_resched();
 	}
@@ -199,9 +201,11 @@ static void discard_swap_cluster(struct
 
 			start_block <<= PAGE_SHIFT - 9;
 			nr_blocks <<= PAGE_SHIFT - 9;
-			if (blkdev_issue_discard(si->bdev, start_block,
+			if (si->bdev) {
+				if (blkdev_issue_discard(si->bdev, start_block,
 							nr_blocks, GFP_NOIO))
-				break;
+					break;
+			}
 		}
 
 		lh = se->list.next;
@@ -1991,12 +1995,14 @@ SYSCALL_DEFINE2(swapon, const char __use
 		goto bad_swap;
 	}
 
-	if (blk_queue_nonrot(bdev_get_queue(p->bdev))) {
-		p->flags |= SWP_SOLIDSTATE;
-		p->cluster_next = 1 + (random32() % p->highest_bit);
+	if (p->bdev) {
+		if (blk_queue_nonrot(bdev_get_queue(p->bdev))) {
+			p->flags |= SWP_SOLIDSTATE;
+			p->cluster_next = 1 + (random32() % p->highest_bit);
+		}
+		if (discard_swap(p) == 0)
+			p->flags |= SWP_DISCARDABLE;
 	}
-	if (discard_swap(p) == 0)
-		p->flags |= SWP_DISCARDABLE;
 
 	mutex_lock(&swapon_mutex);
 	spin_lock(&swap_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 0/2] cfg80211: firmware and hardware version
From: Kalle Valo @ 2009-10-01 14:18 UTC (permalink / raw)
  To: John W. Linville; +Cc: Luis R. Rodriguez, linux-wireless, netdev
In-Reply-To: <20091001011340.GA3123@tuxdriver.com>

"John W. Linville" <linville@tuxdriver.com> writes:

> On Fri, Sep 25, 2009 at 09:53:35AM -0700, Luis R. Rodriguez wrote:
>
>> So for Wake-on-Wireless I ran into the same, ethtool just did not
>> offer the same wake up events needed for wireless. I could have
>> technically used ethtool and expanded it to support wireless but it
>> just seemed dirty.
>> 
>> I agree that using ethtool seems overkill compared to the patches
>> you posted.
>
> I think you either overestimate the amount of trouble for implementing
> (minimal) ethtool support or you underestimate the amount of
> functionality available through that interface.

I'm not worried about the implementation complexity, and as your
patches show it was easy. My concern is the overall design for
wireless devices. Instead of using nl80211 for everything, with some
features we would use nl80211/iw and with some ethtool. That's just
confusing and I don't like that. I would prefer that nl80211 provides
everything, it makes things so much easier.

> That, or you just don't like using something named "eth"tool for
> wireless -- but hey, let's be honest about the frames we
> send/receive to/from the kernel... :-)

I don't have a problem with the name :) But ethernet is still so much
different from 802.11 that there isn't that much to share and we in
wireless will need different features.

One example is the hw version, ethtool only provides u32 to userspace
and moves the burden of translating hw id to the user. For us a string
is much better choise because when debuggin we need to often (or
always?) know the chip version.

But this is not something I will start fighting about. If you still
think that ethtool is the way to go, I'm perfectly fine with it.

>> The ethtool interface provides functionality for viewing and modifying
> eeprom contents, dumping registers, trigger self-tests, basic driver
> info, getting and setting message reporting levels, external card
> identification (hey, _could_ be useful!), and some other bits like
> checksum offload that might(?) be useful in the future.  I understand
> regarding the WoW vs. WoL issue but probably the answer is just to
> add a new method for WoW...?

I took a look at ethtool help output from debian unstable and I think
this is the set of features we can use in wireless:

        ethtool -i|--driver DEVNAME     Show driver information
        ethtool -d|--register-dump DEVNAME      Do a register dump
                [ raw on|off ]
                [ file FILENAME ]
        ethtool -e|--eeprom-dump DEVNAME        Do a EEPROM dump
                [ raw on|off ]
                [ offset N ]
                [ length N ]
        ethtool -E|--change-eeprom DEVNAME      Change bytes in device
        EEPROM
                [ magic N ]
                [ offset N ]
                [ value N ]
        ethtool -p|--identify DEVNAME   Show visible port
        identification (e.g. blinking)
               [ TIME-IN-SECONDS ]
        ethtool -t|--test DEVNAME       Execute adapter self test
               [ online | offline ]

But here are the features which I doubt we will ever use:

        ethtool -s|--change DEVNAME     Change generic options
                [ speed %%d ]
                [ duplex half|full ]
                [ port tp|aui|bnc|mii|fibre ]
                [ autoneg on|off ]
                [ advertise %%x ]
                [ phyad %%d ]
                [ xcvr internal|external ]
                [ wol p|u|m|b|a|g|s|d... ]
                [ sopass %%x:%%x:%%x:%%x:%%x:%%x ]
                [ msglvl %%d ] 
        ethtool -a|--show-pause DEVNAME Show pause options
        ethtool -A|--pause DEVNAME      Set pause options
                [ autoneg on|off ]
                [ rx on|off ]
                [ tx on|off ]
        ethtool -c|--show-coalesce DEVNAME      Show coalesce options
        ethtool -C|--coalesce DEVNAME   Set coalesce options
                [adaptive-rx on|off]
                [adaptive-tx on|off]
                [rx-usecs N]
                [rx-frames N]
                [rx-usecs-irq N]
                [rx-frames-irq N]
                [tx-usecs N]
                [tx-frames N]
                [tx-usecs-irq N]
                [tx-frames-irq N]
                [stats-block-usecs N]
                [pkt-rate-low N]
                [rx-usecs-low N]
                [rx-frames-low N]
                [tx-usecs-low N]
                [tx-frames-low N]
                [pkt-rate-high N]
                [rx-usecs-high N]
                [rx-frames-high N]
                [tx-usecs-high N]
                [tx-frames-high N]
                [sample-interval N]
        ethtool -g|--show-ring DEVNAME  Query RX/TX ring parameters
        ethtool -G|--set-ring DEVNAME   Set RX/TX ring parameters
                [ rx N ]
                [ rx-mini N ]
                [ rx-jumbo N ]
                [ tx N ]
        ethtool -k|--show-offload DEVNAME       Get protocol offload
                information
        ethtool -K|--offload DEVNAME    Set protocol offload
                [ rx on|off ]
                [ tx on|off ]
                [ sg on|off ]
                [ tso on|off ]
                [ ufo on|off ]
                [ gso on|off ]
                [ gro on|off ]
                [ lro on|off ]
        ethtool -r|--negotiate DEVNAME  Restart N-WAY negotation
        ethtool -n|--show-nfc DEVNAME   Show Rx network flow
                classificationoptions
                [ rx-flow-hash
                tcp4|udp4|ah4|sctp4|tcp6|udp6|ah6|sctp6 ]
        ethtool -N|--config-nfc DEVNAME Configure Rx network flow
                classification options
                [ rx-flow-hash tcp4|udp4|ah4|sctp4|tcp6|udp6|ah6|sctp6
                m|v|t|s|d|f|n|r... ]

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] net: fix NOHZ: local_softirq_pending 08
From: Kalle Valo @ 2009-10-01 14:24 UTC (permalink / raw)
  To: Michael Buesch
  Cc: David Miller, oliver-fJ+pQTUTwRTk1uMJSBkQmQ,
	johannes-cdvu00un1VgdHxzADdlk8Q, linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <200910011604.42916.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>

Michael Buesch <mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org> writes:

> On Thursday 01 October 2009 01:33:33 David Miller wrote:
>
>> I'm not applying this until all of these details are sorted out 
>
> John, please apply my fix to wireless-testing to get rid of the
> regression. You can revert it later, if there's a better fix
> available.

I agree, please take Michael's patch. It's trivial to change mac80211
part whenever there's better support available.

But I don't think this is a regression because I see the bug also with
2.6.28, most probably it has been in mac80211 forever. But it's still
a bug which needs to be fixed.

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/3] at76c50x-usb: set firmware and hardware version in wiphy
From: Kalle Valo @ 2009-10-01 14:27 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: John W. Linville, linux-wireless, netdev, Luis R. Rodriguez
In-Reply-To: <1254360747.23350.2.camel@localhost>

Ben Hutchings <bhutchings@solarflare.com> writes:

> On Wed, 2009-09-30 at 21:19 -0400, John W. Linville wrote:
> [...]
>> +	len = sizeof(wiphy->fw_version);
>> +	snprintf(wiphy->fw_version, len, "%d.%d.%d-%d",
>> +		 priv->fw_version.major, priv->fw_version.minor,
>> +		 priv->fw_version.patch, priv->fw_version.build);
>> +	/* null terminate the strings in case they were truncated */
>> +	wiphy->fw_version[len - 1] = '\0';
> [...]
>
> This last statement is unnecessary; snprintf() always null-terminates
> (unless the length is zero).

Yes, the extra null termination is unnecessary. This was my mistake in
the first patchset I sent.

-- 
Kalle Valo

^ permalink raw reply

* [PATCH] net/ppp: fix comments - ppp_{sync,asynctty}_receive() may sleep
From: Tilman Schmidt @ 2009-10-01 14:28 UTC (permalink / raw)
  To: Alan Cox, Alan Cox, Paul Mackerras, linux-ppp
  Cc: David Miller, netdev, Jarek Poplawski, linux-kernel

The receive_buf methods of the N_PPP and N_SYNC_PPP line disciplines,
ppp_asynctty_receive() and ppp_sync_receive(), call tty_unthrottle()
which may sleep. Fix the comments claiming otherwise.

Impact: documentation
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
---
 drivers/net/ppp_async.c   |    5 +----
 drivers/net/ppp_synctty.c |    5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ppp_async.c b/drivers/net/ppp_async.c
index 6de8399..30b1b33 100644
--- a/drivers/net/ppp_async.c
+++ b/drivers/net/ppp_async.c
@@ -337,10 +337,7 @@ ppp_asynctty_poll(struct tty_struct *tty, struct file *file, poll_table *wait)
 	return 0;
 }
 
-/*
- * This can now be called from hard interrupt level as well
- * as soft interrupt level or mainline.
- */
+/* May sleep, don't call from interrupt level or with interrupts disabled */
 static void
 ppp_asynctty_receive(struct tty_struct *tty, const unsigned char *buf,
 		  char *cflags, int count)
diff --git a/drivers/net/ppp_synctty.c b/drivers/net/ppp_synctty.c
index d2fa2db..c908b08 100644
--- a/drivers/net/ppp_synctty.c
+++ b/drivers/net/ppp_synctty.c
@@ -378,10 +378,7 @@ ppp_sync_poll(struct tty_struct *tty, struct file *file, poll_table *wait)
 	return 0;
 }
 
-/*
- * This can now be called from hard interrupt level as well
- * as soft interrupt level or mainline.
- */
+/* May sleep, don't call from interrupt level or with interrupts disabled */
 static void
 ppp_sync_receive(struct tty_struct *tty, const unsigned char *buf,
 		  char *cflags, int count)
-- 
1.6.2.1.214.ge986c

^ permalink raw reply related

* Re: [PATCHv2] IPv4 TCP fails to send window scale option when window scale is zero
From: Eric Dumazet @ 2009-10-01 14:30 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Netdev, Ori Finkalman, Ilpo Järvinen
In-Reply-To: <4AC478E9.5050605@codefidence.com>

Gilad Ben-Yossef a écrit :
> From: Ori Finkelman <ori@comsleep.com>
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5200aab..fcd278a 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -361,6 +361,7 @@ static inline int tcp_urg_mode(const struct tcp_sock
> *tp)
> #define OPTION_SACK_ADVERTISE  (1 << 0)
> #define OPTION_TS              (1 << 1)
> #define OPTION_MD5             (1 << 2)
> +#define OPTION_WSCALE          (1 << 3)

I manually applied your patch and tested it.

So far so good, it works well.

But you'll need to find correct way to submit a patch so that your mailer doesnt
mangle the content.

File Documentation/email-clients.txt contains useful tips.


^ permalink raw reply

* RE: [PATCH 2.6.31-rc9] drivers/net: ks8851_mll ethernet network driver
From: Choi, David @ 2009-10-01 14:51 UTC (permalink / raw)
  To: David Miller; +Cc: greg, netdev, Li, Charles, Choi, jgarzik, shemminger
In-Reply-To: <20090930.200535.217323985.davem@davemloft.net>

Hello all,

I really appreciate all of you, especially for your guidance, patience
and professionalism.


Regards,
David J. Choi


-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Wednesday, September 30, 2009 8:06 PM
To: Choi, David
Cc: greg@kroah.com; netdev@vger.kernel.org; Li, Charles; Choi@kroah.com;
jgarzik@redhat.com; shemminger@vyatta.com
Subject: Re: [PATCH 2.6.31-rc9] drivers/net: ks8851_mll ethernet network
driver 

From: "Choi, David" <David.Choi@Micrel.Com>
Date: Fri, 25 Sep 2009 17:42:12 -0700

> Hello David Miller,
> 
> First of all, thank you for your feedback.  Here is my new patch.
> 
>>From : David J. Choi <david.choi@micrel.com>
> 
> This is the first registration of ks8851 network driver with 
> MLL(address/data multiplexed) interface.
> 
> Signed-off-by : David J. Choi <david.choi@micrel.com>

Applied, thanks.

^ permalink raw reply

* Re: [RFCv4 PATCH 2/2] net: Allow protocols to provide an unlocked_recvmsg socket method
From: Arnaldo Carvalho de Melo @ 2009-10-01 15:03 UTC (permalink / raw)
  To: Nir Tzachar; +Cc: Linux Networking Development Mailing List, Ziv Ayalon
In-Reply-To: <9b2db90b0910010249h182bf5d4sc7fdaea9e1345720@mail.gmail.com>

Em Thu, Oct 01, 2009 at 11:49:39AM +0200, Nir Tzachar escreveu:
> Hi Arnaldo
> 
> I have repeated the tests using net-next on top of linus' git tree (I
> hope I got it right..) and the patches you sent me. Things did not get
> better, and in most cases were even worse; the recvmmsg parts
> distinctly showed better throughput, but the latency has more than
> doubled.

Interesting... Now the socket lock is held in recvmmsg over all of
udp_recvmmsg for the batch size while when not using unlocked_recvmmsg,
so I think one needs to carefully set the timeout parameter. Perhaps
we'll need to do something like tcp_rcvmmsg does, that is, to call
release_sock + lock_sock to process the backlog in the middle of a
recvmmsg call.
 
> The simplest test of using a batch size of 1 results with recvmmsg's
> latency over 1000 micro, while regular recvmsg is around 450 micro.
> (note that to use 1 packet there is a small bug in the reg_recv which
> needs to be fixed. Namely, change ret = -1 to ret = 0). On the
> previous system config -- part 0001 of the patch, on top of 2.6.31 --
> the latency of a single packet batch is 370 micro.
> 
> So, there seems to be a regression with the kernel tree I am using, or
> with part 0002 of the path. I'll try running the net-next with only
> part 1 of the patch and report.

Yeah, trying with only part 0001 should get you back to the previous
results, but try using it in nonblocking mode and tweaking the timeout
parameter in recvmmsg.

- Arnaldo

^ permalink raw reply

* [PATCH] Use sk_mark for routing lookup in more places
From: Atis Elsts @ 2009-10-01 15:14 UTC (permalink / raw)
  To: Laszlo Attila Toth; +Cc: David S. Miller, netdev

This patch against v2.6.31 adds support for route lookup using sk_mark in some 
more places. The benefits from this patch are the following.
First, SO_MARK option now has effect on UDP sockets too.
Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
lookup correctly if TCP sockets with SO_MARK were used.

Signed-off-by: Atis Elsts <atis@mikrotik.com>
---
 net/ipv4/af_inet.c   |    1 +
 net/ipv4/ip_output.c |    1 +
 net/ipv4/udp.c       |    1 +
 3 files changed, 3 insertions(+)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 566ea6c..7917963 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1103,6 +1103,7 @@ int inet_sk_rebuild_header(struct sock *sk)
 {
        struct flowi fl = {
                .oif = sk->sk_bound_dev_if,
+               .mark = sk->sk_mark,
                .nl_u = {
                        .ip4_u = {
                                .daddr  = daddr,
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7ffcd96..e088a97 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -335,6 +335,7 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)

                {
                        struct flowi fl = { .oif = sk->sk_bound_dev_if,
+                                           .mark = sk->sk_mark,
                                            .nl_u = { .ip4_u =
                                                      { .daddr = daddr,
                                                        .saddr = inet->saddr,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 80e3812..f90cdcc 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -688,6 +688,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, 
struct msghdr *msg,

        if (rt == NULL) {
                struct flowi fl = { .oif = ipc.oif,
+                                   .mark = sk->sk_mark,
                                    .nl_u = { .ip4_u =
                                              { .daddr = faddr,
                                                .saddr = saddr,

^ permalink raw reply related

* Re: [PATCH 0/2] cfg80211: firmware and hardware version
From: John W. Linville @ 2009-10-01 15:18 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Luis R. Rodriguez, linux-wireless, netdev
In-Reply-To: <87fxa3qjt2.fsf@purkki.valot.fi>

On Thu, Oct 01, 2009 at 05:18:33PM +0300, Kalle Valo wrote:
> "John W. Linville" <linville@tuxdriver.com> writes:
> 
> > On Fri, Sep 25, 2009 at 09:53:35AM -0700, Luis R. Rodriguez wrote:
> >
> >> So for Wake-on-Wireless I ran into the same, ethtool just did not
> >> offer the same wake up events needed for wireless. I could have
> >> technically used ethtool and expanded it to support wireless but it
> >> just seemed dirty.
> >> 
> >> I agree that using ethtool seems overkill compared to the patches
> >> you posted.
> >
> > I think you either overestimate the amount of trouble for implementing
> > (minimal) ethtool support or you underestimate the amount of
> > functionality available through that interface.
> 
> I'm not worried about the implementation complexity, and as your
> patches show it was easy. My concern is the overall design for
> wireless devices. Instead of using nl80211 for everything, with some
> features we would use nl80211/iw and with some ethtool. That's just
> confusing and I don't like that. I would prefer that nl80211 provides
> everything, it makes things so much easier.

Well, if the hw/fw version numbers were the only thing then I'd
probably say it's not a big deal.  But having ethtool support is nice
in that it makes a familiar tool work for us.  Among other things,
this probably helps with some distro scripts that don't work quite
right without it.  Plus, there is lots of debugging stuff that could
be turned-on without having to write new tools.

I suppose I understand the 'one API' idea, but why duplicate
functionality?  Anyway, adding a couple of ioctl calls isn't a
big deal.  And don't forget, we are still network drivers too...

> > That, or you just don't like using something named "eth"tool for
> > wireless -- but hey, let's be honest about the frames we
> > send/receive to/from the kernel... :-)
> 
> I don't have a problem with the name :) But ethernet is still so much
> different from 802.11 that there isn't that much to share and we in
> wireless will need different features.
> 
> One example is the hw version, ethtool only provides u32 to userspace
> and moves the burden of translating hw id to the user. For us a string
> is much better choise because when debuggin we need to often (or
> always?) know the chip version.

Look at the way most drivers set the version (using each byte as a
field).  If you want prettier output, adding a parser to the userland
ethtool is fairly trivial.  It looks something like the patch below...

> But this is not something I will start fighting about. If you still
> think that ethtool is the way to go, I'm perfectly fine with it.
> 
> >> The ethtool interface provides functionality for viewing and modifying
> > eeprom contents, dumping registers, trigger self-tests, basic driver
> > info, getting and setting message reporting levels, external card
> > identification (hey, _could_ be useful!), and some other bits like
> > checksum offload that might(?) be useful in the future.  I understand
> > regarding the WoW vs. WoL issue but probably the answer is just to
> > add a new method for WoW...?
> 
> I took a look at ethtool help output from debian unstable and I think
> this is the set of features we can use in wireless:
> 
>         ethtool -i|--driver DEVNAME     Show driver information
>         ethtool -d|--register-dump DEVNAME      Do a register dump
>                 [ raw on|off ]
>                 [ file FILENAME ]
>         ethtool -e|--eeprom-dump DEVNAME        Do a EEPROM dump
>                 [ raw on|off ]
>                 [ offset N ]
>                 [ length N ]
>         ethtool -E|--change-eeprom DEVNAME      Change bytes in device
>         EEPROM
>                 [ magic N ]
>                 [ offset N ]
>                 [ value N ]
>         ethtool -p|--identify DEVNAME   Show visible port
>         identification (e.g. blinking)
>                [ TIME-IN-SECONDS ]
>         ethtool -t|--test DEVNAME       Execute adapter self test
>                [ online | offline ]
 
I agree with the above.

> But here are the features which I doubt we will ever use:
> 
>         ethtool -s|--change DEVNAME     Change generic options
>                 [ speed %%d ]
>                 [ duplex half|full ]
>                 [ port tp|aui|bnc|mii|fibre ]
>                 [ autoneg on|off ]
>                 [ advertise %%x ]
>                 [ phyad %%d ]
>                 [ xcvr internal|external ]
>                 [ wol p|u|m|b|a|g|s|d... ]
>                 [ sopass %%x:%%x:%%x:%%x:%%x:%%x ]
>                 [ msglvl %%d ] 
>         ethtool -a|--show-pause DEVNAME Show pause options
>         ethtool -A|--pause DEVNAME      Set pause options
>                 [ autoneg on|off ]
>                 [ rx on|off ]
>                 [ tx on|off ]

I agree that the above are ethernet-specific.

>         ethtool -c|--show-coalesce DEVNAME      Show coalesce options
>         ethtool -C|--coalesce DEVNAME   Set coalesce options
>                 [adaptive-rx on|off]
>                 [adaptive-tx on|off]
>                 [rx-usecs N]
>                 [rx-frames N]
>                 [rx-usecs-irq N]
>                 [rx-frames-irq N]
>                 [tx-usecs N]
>                 [tx-frames N]
>                 [tx-usecs-irq N]
>                 [tx-frames-irq N]
>                 [stats-block-usecs N]
>                 [pkt-rate-low N]
>                 [rx-usecs-low N]
>                 [rx-frames-low N]
>                 [tx-usecs-low N]
>                 [tx-frames-low N]
>                 [pkt-rate-high N]
>                 [rx-usecs-high N]
>                 [rx-frames-high N]
>                 [tx-usecs-high N]
>                 [tx-frames-high N]
>                 [sample-interval N]

These _could_ be useful if wireless becomes more
performance-oriented...

>         ethtool -g|--show-ring DEVNAME  Query RX/TX ring parameters
>         ethtool -G|--set-ring DEVNAME   Set RX/TX ring parameters
>                 [ rx N ]
>                 [ rx-mini N ]
>                 [ rx-jumbo N ]
>                 [ tx N ]

Wireless devices have ring buffers, no?

>         ethtool -k|--show-offload DEVNAME       Get protocol offload
>                 information
>         ethtool -K|--offload DEVNAME    Set protocol offload
>                 [ rx on|off ]
>                 [ tx on|off ]
>                 [ sg on|off ]
>                 [ tso on|off ]
>                 [ ufo on|off ]
>                 [ gso on|off ]
>                 [ gro on|off ]
>                 [ lro on|off ]

Again, if wireless devices become performance-oriented...

>         ethtool -r|--negotiate DEVNAME  Restart N-WAY negotation

Ethernet-specific...might could be overloaded for wireless to trigger
reassoc...?

>         ethtool -n|--show-nfc DEVNAME   Show Rx network flow
>                 classificationoptions
>                 [ rx-flow-hash
>                 tcp4|udp4|ah4|sctp4|tcp6|udp6|ah6|sctp6 ]
>         ethtool -N|--config-nfc DEVNAME Configure Rx network flow
>                 classification options
>                 [ rx-flow-hash tcp4|udp4|ah4|sctp4|tcp6|udp6|ah6|sctp6
>                 m|v|t|s|d|f|n|r... ]

Long-shot, but no reason it couldn't be used in wireless... :-)

Anyway, it doesn't really matter if we don't use the whole API -- many
older ethernet devices don't support all these features.  The point
is that the API exists and has some overlap with our needs.  It is a
driver-oriented API, with nitty-gritty stuff that need not clutter a
configuraiton API like cfg80211.  There is even the potential of us
adding our own extensions (e.g. WoW) that are also device-oriented.

Anyway, between the link detection and making distro scripts work
plus enabling a familiar tool for basic driver info I think this is
a win.  So much the better if some drivers move to ethtool for register
dumping, setting message verbosity, querying/changing eeprom values,
etc, etc...

John

P.S.  The aforementioned path for userland ethtool...(theorhetical,
not even compiled...)

>From aa92d32ac1cca57bdd3439013b0c7777bdf1217c Mon Sep 17 00:00:00 2001
From: John W. Linville <linville@tuxdriver.com>
Date: Thu, 1 Oct 2009 11:01:32 -0400
Subject: [PATCH] add support for at76c50x-usb driver.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
 Makefile.am    |    2 +-
 at76c50x-usb.c |   32 ++++++++++++++++++++++++++++++++
 ethtool.c      |    1 +
 3 files changed, 34 insertions(+), 1 deletions(-)
 create mode 100644 at76c50x-usb.c

diff --git a/Makefile.am b/Makefile.am
index eac65fe..a384949 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -8,7 +8,7 @@ ethtool_SOURCES = ethtool.c ethtool-copy.h ethtool-util.h	\
 		  amd8111e.c de2104x.c e100.c e1000.c igb.c	\
 		  fec_8xx.c ibm_emac.c ixgb.c ixgbe.c natsemi.c	\
 		  pcnet32.c realtek.c tg3.c marvell.c vioc.c	\
-		  smsc911x.c
+		  smsc911x.c at76c50x-usb.c
 
 dist-hook:
 	cp $(top_srcdir)/ethtool.spec $(distdir)
diff --git a/at76c50x-usb.c b/at76c50x-usb.c
new file mode 100644
index 0000000..295d1cb
--- /dev/null
+++ b/at76c50x-usb.c
@@ -0,0 +1,32 @@
+#include <stdio.h>
+#include "ethtool-util.h"
+
+static char hw_versions[] = {
+        "503_ISL3861",
+        "503_ISL3863",
+        "        503",
+        "    503_ACC",
+        "        505",
+        "   505_2958",
+        "       505A",
+        "     505AMX",
+};
+
+int
+at76c50x_usb_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
+{
+	u8 version = (u8)(regs->version >> 24);
+	u8 rev_id = (u8)(regs->version);
+	char *ver_string;
+
+	if(version != 0)
+		return -1;
+
+	ver_string = hw_versions[rev_id];
+	fprintf(stdout,
+		"Hardware Version                    %s\n",
+		ver_string);
+
+	return 0;
+}
+
diff --git a/ethtool.c b/ethtool.c
index 0110682..7608750 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1189,6 +1189,7 @@ static struct {
 	{ "sky2", sky2_dump_regs },
         { "vioc", vioc_dump_regs },
         { "smsc911x", smsc911x_dump_regs },
+        { "at76c50x-usb", at76c50x_usb_dump_regs },
 };
 
 static int dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
-- 
1.6.2.5
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply related

* Re: [PATCH 0/2] cfg80211: firmware and hardware version
From: Ben Hutchings @ 2009-10-01 15:33 UTC (permalink / raw)
  To: John W. Linville; +Cc: Kalle Valo, Luis R. Rodriguez, linux-wireless, netdev
In-Reply-To: <20091001151820.GA2895@tuxdriver.com>

On Thu, 2009-10-01 at 11:18 -0400, John W. Linville wrote:
[...]
> > But here are the features which I doubt we will ever use:
> > 
> >         ethtool -s|--change DEVNAME     Change generic options
> >                 [ speed %%d ]
> >                 [ duplex half|full ]
> >                 [ port tp|aui|bnc|mii|fibre ]
> >                 [ autoneg on|off ]
> >                 [ advertise %%x ]
> >                 [ phyad %%d ]
> >                 [ xcvr internal|external ]
> >                 [ wol p|u|m|b|a|g|s|d... ]
> >                 [ sopass %%x:%%x:%%x:%%x:%%x:%%x ]
> >                 [ msglvl %%d ] 
> >         ethtool -a|--show-pause DEVNAME Show pause options
> >         ethtool -A|--pause DEVNAME      Set pause options
> >                 [ autoneg on|off ]
> >                 [ rx on|off ]
> >                 [ tx on|off ]
> 
> I agree that the above are ethernet-specific.
[...]

Message level isn't and WoL arguably isn't.  It's a shame that these
original ethtool settings are still bundled together...

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: r8169c: Support for Realtek 8168DP chip?
From: David Dillow @ 2009-10-01 15:49 UTC (permalink / raw)
  To: Rainer Koenig; +Cc: netdev
In-Reply-To: <1254404310.24972.4.camel@obelisk.thedillows.org>

On Thu, 2009-10-01 at 09:38 -0400, David Dillow wrote:
> On Thu, 2009-10-01 at 13:39 +0200, Rainer Koenig wrote:
> > The reason why is easy to decode when looking at the source: The
> > TxConfig register returns 2b800000 and there is no MAC_VERSION in the
> > list of valid versions. That means not PHY initialization code is
> > executed and stop, no working device. :-(
> 
> Francois Romieu posted a patch yesterday (today, his time) to the thread
> "r8169 chips on some Intel D945GSEJT boards fail to work after PXE boot"
> 
> It looks to add MAC support for your card; you should be able to find it
> at any of your favorite mail archives, Google, or better yet,
> http://patchwork.ozlabs.org/project/netdev/list/
> 
> Hmm, patchwork doesn't seem to have picked it up, yet.

Actually, it is there, you just need to change the filters to show
patches in state "RFC".

^ permalink raw reply

* Re: [PATCH 1/2] net/netfilter/ipvs: Move #define KMSG_COMPONENT to Makefile
From: Joe Perches @ 2009-10-01 15:55 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Patrick McHardy, David S. Miller, Simon Horman, Julian Anastasov,
	Netfilter Developer Mailing List, netdev,
	Linux Kernel Mailing List, lvs-devel
In-Reply-To: <alpine.LSU.2.00.0910011016480.24025@obet.zrqbmnf.qr>

On Thu, 2009-10-01 at 10:27 +0200, Jan Engelhardt wrote:
> On Thursday 2009-10-01 02:50, Joe Perches wrote:
> >I imagine an eventual goal of standardizing the default
> >pr_fmt define in kernel.h to
> >	#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >so that all pr_<level> calls get this unless otherwise
> >specified.
> 
> I like that approach. Saves me adding that line to .c
> files repeatedly.

There aren't too many existing pr_<level> calls so
that this couldn't be considered.

Files with pr_<level> without pr_fmt:

$ grep -rPl --include=*.[ch] \
	"\bpr_(info|warning|err|alert|notice|crit)\b" * |
  xargs grep -Lw "pr_fmt" | wc -l
569

Uses of pr_<level> without pr_fmt:

$ grep -rPl --include=*.[ch] \
	"\bpr_(info|warning|err|alert|notice|crit)\b" * |
  xargs grep -Lw "pr_fmt" |
  xargs grep -P "\bpr_(info|warning|err|alert|notice|crit)\b" |
  wc -l
2885

If you look at the pr_<levels>, it's nearly
a mechanical thing to strip the ones with
some sort of prefix and add a #define pr_fmt
to replace them.  Most all of them without
prefixes might benefit by using a standardized
#define pr_fmt(etc...) in kernel.h, so the
actual count of changes isn't that high.

> >Or perhaps better, to get rid of pr_fmt(fmt) altogether and
> >have printk emit the filename/modulename, function and/or
> >code offset by using something like %pS after the level.
> I object to that. You would be spamming the dmesg ring buffer
> with all that info

Of course printks could not change, there are way too
many of those to consider doing that globally.

But the printks emitted by pr_<level> might change.
Maybe by setting a bit in the string "<level>" or by
some other mechanism.

> filename: you would have to keep filename strings in the kernel.
> Surely I do not find that thrilling when there are ~18000
> non-arch .[ch] files whose pathnames amount to 542K.
> Same goes similar for functions.
> 
> modulename: obj-y files would only get "<built-in>" or something
> for KBUILD_MODNAME. Printing that to dmesg is not too useful.

The removal of KBUILD_MODNAME could only be done
for builds with CONFIG_KALLSYMS or
CONFIG_DYNAMIC_DEBUG.

It might also be possible to use something like
CONFIG_DYNAMIC_DEBUG to control which modules get
MODNAME, __func__, __LINE__ or offset emitted
by the pr_<level> via some boot/module/sysconf
or FTRACE like parameters.

cheers, Joe


^ permalink raw reply

* Re: kernel doc / docbook pdfdocs question
From: Randy Dunlap @ 2009-10-01 15:57 UTC (permalink / raw)
  To: Doug Maxey; +Cc: Stephen Hemminger, netdev
In-Reply-To: <27289.1254375747@jerryjeff.riw.enoyolf.org>

On Thu, 01 Oct 2009 00:42:27 -0500 Doug Maxey wrote:

> 
> On Wed, 30 Sep 2009 17:30:02 PDT, Stephen Hemminger wrote:
> >On Wed, 30 Sep 2009 14:59:36 -0500
> >Doug Maxey <dwm@enoyolf.org> wrote:
> >
> >> 
> >> Randy,
> >> 
> >> This may be slightly off topic for this list, but it does involve an
> >> (as yet un-released) network driver. :)
> >> 
> >> Do you have any insight that could guide me toward a fix for an issue
> >> seen with some header file constructs when trying to generate a pdf
> >> docbook?
> >> 
> >
> >Why clutter docbook output (which is supposed to be about general kernel
> >API's) with output for data structures in one driver.
> 
> It would be a general mechanism, and it would be to document an API.
> There are other subsystems that use DECLARE_BITMAP() (e.g., scsi).
> Just none at the moment that attempt to describe such a member,
> possibly because there isn't a way to document it.  Dunno.  Build it
> and they will come.  There is one party that is interested anyway.
> 
> Finally did find where this was getting warned about / tossed, in
> kernel-doc itself. =)

Hi,

Sorry for the delayed reply.  I was away yesterday.

What did you find in kernel-doc?  Something like the
"cannot understand prototype" message or something else?

Features/support in kernel-doc is mostly added on an as-needed basis.
Now that you have provided a sample, I can try to add support for it,
but it's not exactly a high priority for me... or you can add support
for it to kernel-doc and send a patch for it.  :)


---
~Randy

^ permalink raw reply

* Re: [PATCH] skge: use unique IRQ name
From: Stephen Hemminger @ 2009-10-01 16:06 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: netdev
In-Reply-To: <20091001122720.3822bdd3@leela>

On Thu, 1 Oct 2009 12:27:20 +0200
Michal Schmidt <mschmidt@redhat.com> wrote:

> Most network drivers request their IRQ when the interface is activated.
> skge does it in ->probe() instead, because it can work with two-port
> cards where the two net_devices use the same IRQ. This works fine most
> of the time, except in some situations when the interface gets renamed.
> Consider this example:
> 
> 1. modprobe skge
>    The card is detected as eth0 and requests IRQ 17. Directory
>    /proc/irq/17/eth0 is created.
> 2. There is an udev rule which says this interface should be called
>    eth1, so udev renames eth0 -> eth1.
> 3. modprobe 8139too
>    The Realtek card is detected as eth0. It will be using IRQ 17 too.
> 4. ip link set eth0 up
>    Now 8139too requests IRQ 17.
> 
> The result is:
> WARNING: at fs/proc/generic.c:590 proc_register ...
> proc_dir_entry '17/eth0' already registered
> ...
> And "ls /proc/irq/17" shows two subdirectories, both called eth0.
> 
> Fix it by using a unique name for skge's IRQ, based on the PCI address.
> The naming from the example then looks like this:
> $ grep skge /proc/interrupts
>  17:        169   IO-APIC-fasteoi   skge@0000:00:0a.0, eth0
> 
> irqbalance daemon will have to be taught to recognize "skge@" as an
> Ethernet interrupt. This will be a one-liner addition in classify.c. I
> will send a patch to irqbalance if this change is accepted.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> 
> Index: kernel/drivers/net/skge.c
> ===================================================================
> --- kernel.orig/drivers/net/skge.c
> +++ kernel/drivers/net/skge.c
> @@ -3895,6 +3895,7 @@ static int __devinit skge_probe(struct p
>  	struct net_device *dev, *dev1;
>  	struct skge_hw *hw;
>  	int err, using_dac = 0;
> +	size_t irq_name_len;
>  
>  	err = pci_enable_device(pdev);
>  	if (err) {
> @@ -3935,11 +3936,13 @@ static int __devinit skge_probe(struct p
>  #endif
>  
>  	err = -ENOMEM;
> -	hw = kzalloc(sizeof(*hw), GFP_KERNEL);
> +	irq_name_len = strlen(DRV_NAME) + strlen(dev_name(&pdev->dev)) + 2;
> +	hw = kzalloc(sizeof(*hw) + irq_name_len, GFP_KERNEL);
>  	if (!hw) {
>  		dev_err(&pdev->dev, "cannot allocate hardware struct\n");
>  		goto err_out_free_regions;
>  	}
> +	sprintf(hw->irq_name, DRV_NAME "@%s", dev_name(&pdev->dev));

I like this with one small change. Please use:
         skge@pci:0000:00:02.0
This makes the driver follow same format as existing DRM graphics drivers.
Michal could you follow up with additional patches for:
   1. sky2 driver has same issue
   2. irqbalance has a list of special drivers that needs to be updated



^ permalink raw reply

* Re: [PATCHv2] IPv4 TCP fails to send window scale option when window scale is zero
From: Gilad Ben-Yossef @ 2009-10-01 16:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, Ori Finkalman, Ilpo Järvinen
In-Reply-To: <4AC4BD1E.6060706@gmail.com>

Eric Dumazet wrote:

> Gilad Ben-Yossef a écrit :
>   
>> From: Ori Finkelman <ori@comsleep.com>
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 5200aab..fcd278a 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -361,6 +361,7 @@ static inline int tcp_urg_mode(const struct tcp_sock
>> *tp)
>> #define OPTION_SACK_ADVERTISE  (1 << 0)
>> #define OPTION_TS              (1 << 1)
>> #define OPTION_MD5             (1 << 2)
>> +#define OPTION_WSCALE          (1 << 3)
>>     
>
> I manually applied your patch and tested it.
>
> So far so good, it works well.
>   

Glad to hear. Thank you both Eric and Ilpo for the review.
> But you'll need to find correct way to submit a patch so that your mailer doesnt
> mangle the content.
>
> File Documentation/email-clients.txt contains useful tips.
>
>
>   
Arrggghh... I thought I have subdued Thunderbird but it tricked me. My 
prefs.js got re-written somehow. My sincere apologies. This is not my 
week with MUA.

I also noticed I put the signed-off-by in the wrong place...

Next email is version 3 - same content, hopefully correct formatting.

Thanks again.
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Now the world has gone to bed
	 Darkness won't engulf my head
	 I can see by infra-red
	 How I hate the night."


^ permalink raw reply

* [PATCH] make TLLAO option for NA packets configurable
From: Cosmin Ratiu @ 2009-10-01 16:16 UTC (permalink / raw)
  To: netdev; +Cc: Octavian Purdila

Hello,

This is a patch that adds a sysctl to control the sending of the Target Link 
Layer Address Option (TLLAO) with Neighbor Advertisements responding to 
unicast NS. The patch was made for kernel 2.6.7 (yes it is ancient), but the 
code is similar with the current kernel and I can rework it if you want it in.

RFC 2461, page 24 suggests that this option should be included with NAs to 
avoid a race with the sender clearing its cache after sending an unicast NS, 
but before receiving a NA.

It seems there are some Juniper routers (MX series) that expect this option to 
be included with all NAs.

Another solution is to always send this option, as it has little overhead.

Please let me know what you think,
Cosmin.

Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
--- //packages/linux_2.6.7/main/src/include/linux/sysctl.h
+++ /home/z/w1/packages/linux_2.6.7/main/src/include/linux/sysctl.h
@@ -444,6 +444,7 @@                                                
        NET_IPV6_IP6FRAG_TIME=23,                                  
        NET_IPV6_IP6FRAG_SECRET_INTERVAL=24,                       
        NET_IPV6_MLD_MAX_MSF=25,                                   
+       NET_IPV6_NDISC_FORCE_TLLAO=26,                             
 };                                                                
                                                                   
 enum {                                                            
--- //packages/linux_2.6.7/main/src/include/net/ipv6.h             
+++ /home/z/w1/packages/linux_2.6.7/main/src/include/net/ipv6.h
@@ -479,6 +479,7 @@
 extern int sysctl_ip6frag_low_thresh;
 extern int sysctl_ip6frag_time;
 extern int sysctl_ip6frag_secret_interval;
+extern int sysctl_ndisc_force_tllao;

 #endif /* __KERNEL__ */
 #endif /* _NET_IPV6_H */
--- //packages/linux_2.6.7/main/src/net/ipv6/ndisc.c
+++ /home/z/w1/packages/linux_2.6.7/main/src/net/ipv6/ndisc.c
@@ -169,6 +169,8 @@

 #define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)

+int sysctl_ndisc_force_tllao;
+
 static u8 *ndisc_fill_option(u8 *opt, int type, void *data, int data_len)
 {
        int space = NDISC_OPT_SPACE(data_len);
@@ -399,6 +401,9 @@
                return;
        }

+       if (sysctl_ndisc_force_tllao)
+               inc_opt = 1;
+
        if (inc_opt) {
                if (dev->addr_len)
                        len += NDISC_OPT_SPACE(dev->addr_len);
--- //packages/linux_2.6.7/main/src/net/ipv6/sysctl_net_ipv6.c
+++ /home/z/w1/packages/linux_2.6.7/main/src/net/ipv6/sysctl_net_ipv6.c
@@ -84,6 +84,14 @@
                .mode           = 0644,
                .proc_handler   = &proc_dointvec
        },
+       {
+               .ctl_name       = NET_IPV6_NDISC_FORCE_TLLAO,
+               .procname       = "ndisc_force_tllao",
+               .data           = &sysctl_ndisc_force_tllao,
+               .maxlen         = sizeof(int),
+               .mode           = 0644,
+               .proc_handler   = &proc_dointvec,
+       },
        { .ctl_name = 0 }
 };


^ permalink raw reply

* Re: [PATCH 0/2] cfg80211: firmware and hardware version
From: Kalle Valo @ 2009-10-01 16:20 UTC (permalink / raw)
  To: John W. Linville; +Cc: Luis R. Rodriguez, linux-wireless, netdev
In-Reply-To: <20091001151820.GA2895@tuxdriver.com>

"John W. Linville" <linville@tuxdriver.com> writes:

> On Thu, Oct 01, 2009 at 05:18:33PM +0300, Kalle Valo wrote:
>> 
>> I'm not worried about the implementation complexity, and as your
>> patches show it was easy. My concern is the overall design for
>> wireless devices. Instead of using nl80211 for everything, with some
>> features we would use nl80211/iw and with some ethtool. That's just
>> confusing and I don't like that. I would prefer that nl80211 provides
>> everything, it makes things so much easier.
>
> Well, if the hw/fw version numbers were the only thing then I'd
> probably say it's not a big deal.  But having ethtool support is nice
> in that it makes a familiar tool work for us.  Among other things,
> this probably helps with some distro scripts that don't work quite
> right without it.  Plus, there is lots of debugging stuff that could
> be turned-on without having to write new tools.

Agreed, maybe expect the distro scripts part. To me that just sounds
as a bug in the scripts.

> I suppose I understand the 'one API' idea, but why duplicate
> functionality?

Just because the common functionality in this case isn't high enough.
I'm worried that we will use 10% of the functionality in nl80211 and
the rest 90% will be something we can't use and have to reimplement in
nl80211.

> Anyway, adding a couple of ioctl calls isn't a big deal.

Sure, but we need to support this forever. If, say after two years, we
decide that ethtool is not the way to go, it's very difficult to
remove it. The less interfaces we have, the easier it is to maintain
them.

> And don't forget, we are still network drivers too...

I hope ethtool isn't a strict requirement for a network driver, at
least I haven't heard about that.

>> One example is the hw version, ethtool only provides u32 to userspace
>> and moves the burden of translating hw id to the user. For us a string
>> is much better choise because when debuggin we need to often (or
>> always?) know the chip version.
>
> Look at the way most drivers set the version (using each byte as a
> field).

Yes, that's how it is also with wl1251. A number like '0x7030101' is
just not that user friendly.

> If you want prettier output, adding a parser to the userland ethtool
> is fairly trivial. It looks something like the patch below...

Oh wow, that's cool and a truly useful feature. One complaint less
from me :)

>>         ethtool -c|--show-coalesce DEVNAME      Show coalesce options
>>         ethtool -C|--coalesce DEVNAME   Set coalesce options
>>                 [adaptive-rx on|off]
>>                 [adaptive-tx on|off]
>>                 [rx-usecs N]
>>                 [rx-frames N]
>>                 [rx-usecs-irq N]
>>                 [rx-frames-irq N]
>>                 [tx-usecs N]
>>                 [tx-frames N]
>>                 [tx-usecs-irq N]
>>                 [tx-frames-irq N]
>>                 [stats-block-usecs N]
>>                 [pkt-rate-low N]
>>                 [rx-usecs-low N]
>>                 [rx-frames-low N]
>>                 [tx-usecs-low N]
>>                 [tx-frames-low N]
>>                 [pkt-rate-high N]
>>                 [rx-usecs-high N]
>>                 [rx-frames-high N]
>>                 [tx-usecs-high N]
>>                 [tx-frames-high N]
>>                 [sample-interval N]
>
> These _could_ be useful if wireless becomes more
> performance-oriented...

Maybe, or maybe not. We will only find out within the next few years.

And what will we do if the parameters are actually a bit different? Is
it ok to extend ethtool for supporting wireless or do we later on have
to add separate support to nl80211? The latter would suck big time.

>>         ethtool -g|--show-ring DEVNAME  Query RX/TX ring parameters
>>         ethtool -G|--set-ring DEVNAME   Set RX/TX ring parameters
>>                 [ rx N ]
>>                 [ rx-mini N ]
>>                 [ rx-jumbo N ]
>>                 [ tx N ]
>
> Wireless devices have ring buffers, no?

Yes, there is hardware which have them but again the question is this
relevant for wireless devices. In ethernet the hardware is the
bottleneck but in 802.11 the wireless medium is the bottleneck, so the
parameters we need to configure are usually different.

>>         ethtool -r|--negotiate DEVNAME  Restart N-WAY negotation
>
> Ethernet-specific...might could be overloaded for wireless to trigger
> reassoc...?

Please no, I don't want to see any reassociation or anything else
802.11 state related in ethtool, nl80211 was created for this. This is
something I would object loudly :)

> Anyway, it doesn't really matter if we don't use the whole API -- many
> older ethernet devices don't support all these features.  The point
> is that the API exists and has some overlap with our needs.  It is a
> driver-oriented API, with nitty-gritty stuff that need not clutter a
> configuraiton API like cfg80211.  There is even the potential of us
> adding our own extensions (e.g. WoW) that are also device-oriented.
>
> Anyway, between the link detection and making distro scripts work
> plus enabling a familiar tool for basic driver info I think this is
> a win.  So much the better if some drivers move to ethtool for register
> dumping, setting message verbosity, querying/changing eeprom values,
> etc, etc...

Sounds good enough. As I said in my earlier email, I'm not going argue
about this for too long. You know this better than I do. So let's go
forward with ethtool. 

Thanks for listening to my concerns.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] make TLLAO option for NA packets configurable
From: Stephen Hemminger @ 2009-10-01 16:21 UTC (permalink / raw)
  To: Cosmin Ratiu; +Cc: netdev, Octavian Purdila
In-Reply-To: <200910011916.40908.cratiu@ixiacom.com>

On Thu, 1 Oct 2009 19:16:40 +0300
Cosmin Ratiu <cratiu@ixiacom.com> wrote:

> Hello,
> 
> This is a patch that adds a sysctl to control the sending of the Target Link 
> Layer Address Option (TLLAO) with Neighbor Advertisements responding to 
> unicast NS. The patch was made for kernel 2.6.7 (yes it is ancient), but the 
> code is similar with the current kernel and I can rework it if you want it in.
> 
> RFC 2461, page 24 suggests that this option should be included with NAs to 
> avoid a race with the sender clearing its cache after sending an unicast NS, 
> but before receiving a NA.
> 
> It seems there are some Juniper routers (MX series) that expect this option to 
> be included with all NAs.
> 
> Another solution is to always send this option, as it has little overhead.
> 
> Please let me know what you think,
> Cosmin.
> 
> Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com>
> --- //packages/linux_2.6.7/main/src/include/linux/sysctl.h
> +++ /home/z/w1/packages/linux_2.6.7/main/src/include/linux/sysctl.h
> @@ -444,6 +444,7 @@                                                
>         NET_IPV6_IP6FRAG_TIME=23,                                  
>         NET_IPV6_IP6FRAG_SECRET_INTERVAL=24,                       
>         NET_IPV6_MLD_MAX_MSF=25,                                   
> +       NET_IPV6_NDISC_FORCE_TLLAO=26,                             

Since numbered sysctl values are deprecated, can you use CTL_UNNUMBERED
to avoid having to add yet another value?

^ permalink raw reply

* Re: tg3: Badness at kernel/mutex.c:207
From: Matt Carlson @ 2009-10-01 16:25 UTC (permalink / raw)
  To: Felix Radensky; +Cc: Matthew Carlson, netdev@vger.kernel.org
In-Reply-To: <4AC47826.1000809@embedded-sol.com>

On Thu, Oct 01, 2009 at 02:36:38AM -0700, Felix Radensky wrote:
> Hi, Matt
> 
> Matt Carlson wrote:
> > On Sat, Sep 26, 2009 at 02:20:57PM -0700, Felix Radensky wrote:
> >   
> >> Hi,
> >>
> >> I'm running linux-2.6.31 on a custom MPC8536 based board with BCM57760 chip.
> >> Both tg3 driver, and Broadcom PHY driver are modules.
> >>
> >> Each time I run ifconfig eth2 up, I get the following error message:
> >>
> >> Badness at kernel/mutex.c:207
> >> NIP: c025132c LR: c0251314 CTR: c0251334
> >> REGS: efbedbd0 TRAP: 0700   Not tainted  (2.6.31)
> >> MSR: 00029000 <EE,ME,CE>  CR: 24020422  XER: 00000000
> >> TASK = efacce10[1080] 'ifconfig' THREAD: efbec000
> >> GPR00: 00000000 efbedc80 efacce10 00000001 00007020 00000002 00000000 
> >> 00000200
> >> GPR08: 00029000 c0350000 c0330000 00000001 24020424 10057d94 000002a0 
> >> 1000d82c
> >> GPR16: 1000d81c 1000d814 10010000 10050000 ef897a0c efbede18 ffff8914 
> >> ef897a00
> >> GPR24: 00008000 c034b480 efbec000 efb0122c c0350000 efacce10 ef82d2c0 
> >> efb01228
> >> NIP [c025132c] __mutex_lock_slowpath+0x1f0/0x1f8
> >> LR [c0251314] __mutex_lock_slowpath+0x1d8/0x1f8
> >> Call Trace:
> >> [efbedcd0] [c025134c] mutex_lock+0x18/0x34
> >> [efbedcf0] [f534a228] tg3_chip_reset+0x7cc/0x9f8 [tg3]
> >> [efbedd20] [f534a8f0] tg3_reset_hw+0x58/0x2360 [tg3]
> >> [efbedd70] [f5351dd4] tg3_open+0x610/0x910 [tg3]
> >> [efbeddb0] [c01e1c6c] dev_open+0x100/0x138
> >> [efbeddd0] [c01dff20] dev_change_flags+0x80/0x1ac
> >> [efbeddf0] [c02232cc] devinet_ioctl+0x648/0x824
> >> [efbede60] [c0223de4] inet_ioctl+0xcc/0xf8
> >> [efbede70] [c01cdf44] sock_ioctl+0x60/0x300
> >> [efbede90] [c008a35c] vfs_ioctl+0x34/0x8c
> >> [efbedea0] [c008a580] do_vfs_ioctl+0x88/0x724
> >> [efbedf10] [c008ac5c] sys_ioctl+0x40/0x74
> >> [efbedf40] [c000f814] ret_from_syscall+0x0/0x3c
> >> Instruction dump:
> >> 0fe00000 4bfffe80 801a000c 5409016f 4182fe60 4bf0f6d9 2f830000 41befe54
> >> 3d20c035 8009c2c0 2f800000 40befe44 <0fe00000> 4bfffe3c 9421ffe0 7c0802a6
> >>
> >> Does it indicate a real problem, or something that can be ignored ?
> >>
> >> Additional information from kernel log:
> >>
> >> tg3.c:v3.99 (April 20, 2009)
> >> tg3 0002:05:00.0: enabling bus mastering
> >> tg3 0002:05:00.0: PME# disabled
> >> tg3 mdio bus: probed
> >> eth2: Tigon3 [partno(BCM57760) rev 57780001] (PCI Express) MAC address 
> >> 00:10:18:00:00:00
> >> eth2: attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=500:01)
> >> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> >> eth2: dma_rwctrl[76180000] dma_mask[64-bit]
> >> tg3 0002:05:00.0: PME# disabled
> >>     
> >
> > Yes, this is a real problem.  The driver is taking the MDIO bus lock
> > while holding the device's own spinlock.  I think I may have a
> > workaround.  Let me test it and get back to you.
> >   
> 
> Did you have a chance to look into it ?

Yes, and the fix seems to work.  The patch changes the locking behavior
of the driver, so I'm being extra careful and checking for possible side
effects.  That's why it's taking so long.  Sorry for the delay.


^ permalink raw reply

* Re: [PATCH] pktgen: Fix delay handling
From: David Miller @ 2009-10-01 16:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, jdb, robert, netdev
In-Reply-To: <4AC47EB9.6070809@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 01 Oct 2009 12:04:41 +0200

> But it appears net/core/pktgen.c is different on net-next-2.6
> 
> Stephen, David, I am a bit lost here, something went wrong in a merge process ?
> 

net-next-2.6 is just a stale old tree, there is no new networking
work in there and it is simply Linus's tree as of a few weeks
ago.

It's only there so Stephen Rothwell has something to do a 'nop'
pull from into his linux-next tree.

I'll apply your fix, thanks!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox