Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 03/31] mm: expose gfp_to_alloc_flags()
From: Suresh Jayaraman @ 2009-10-01 14:05 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

From: Peter Zijlstra <a.p.zijlstra@chello.nl> 

Expose the gfp to alloc_flags mapping, so we can use it in other parts
of the vm.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 mm/internal.h   |   15 +++++++++++++++
 mm/page_alloc.c |   16 +---------------
 2 files changed, 16 insertions(+), 15 deletions(-)

Index: mmotm/mm/internal.h
===================================================================
--- mmotm.orig/mm/internal.h
+++ mmotm/mm/internal.h
@@ -194,6 +194,21 @@ static inline struct page *mem_map_next(
 #define __paginginit __init
 #endif
 
+/* The ALLOC_WMARK bits are used as an index to zone->watermark */
+#define ALLOC_WMARK_MIN		WMARK_MIN
+#define ALLOC_WMARK_LOW		WMARK_LOW
+#define ALLOC_WMARK_HIGH	WMARK_HIGH
+#define ALLOC_NO_WATERMARKS	0x04 /* don't check watermarks at all */
+
+/* Mask to get the watermark bits */
+#define ALLOC_WMARK_MASK	(ALLOC_NO_WATERMARKS-1)
+
+#define ALLOC_HARDER		0x10 /* try to alloc harder */
+#define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
+#define ALLOC_CPUSET		0x40 /* check for correct cpuset */
+
+int gfp_to_alloc_flags(gfp_t gfp_mask);
+
 /* Memory initialisation debug and verification */
 enum mminit_level {
 	MMINIT_WARNING,
Index: mmotm/mm/page_alloc.c
===================================================================
--- mmotm.orig/mm/page_alloc.c
+++ mmotm/mm/page_alloc.c
@@ -1190,19 +1190,6 @@ failed:
 	return NULL;
 }
 
-/* The ALLOC_WMARK bits are used as an index to zone->watermark */
-#define ALLOC_WMARK_MIN		WMARK_MIN
-#define ALLOC_WMARK_LOW		WMARK_LOW
-#define ALLOC_WMARK_HIGH	WMARK_HIGH
-#define ALLOC_NO_WATERMARKS	0x04 /* don't check watermarks at all */
-
-/* Mask to get the watermark bits */
-#define ALLOC_WMARK_MASK	(ALLOC_NO_WATERMARKS-1)
-
-#define ALLOC_HARDER		0x10 /* try to alloc harder */
-#define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
-#define ALLOC_CPUSET		0x40 /* check for correct cpuset */
-
 #ifdef CONFIG_FAIL_PAGE_ALLOC
 
 static struct fail_page_alloc_attr {
@@ -1691,8 +1678,7 @@ void wake_all_kswapd(unsigned int order,
 		wakeup_kswapd(zone, order);
 }
 
-static inline int
-gfp_to_alloc_flags(gfp_t gfp_mask)
+int gfp_to_alloc_flags(gfp_t gfp_mask)
 {
 	struct task_struct *p = current;
 	int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 02/31] swap over network documentation
From: Suresh Jayaraman @ 2009-10-01 14:04 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

From: Neil Brown <neilb@suse.de>

Document describing the problem and proposed solution

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 Documentation/network-swap.txt |  270 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 270 insertions(+)

Index: mmotm/Documentation/network-swap.txt
===================================================================
--- /dev/null
+++ mmotm/Documentation/network-swap.txt
@@ -0,0 +1,270 @@
+
+Problem:
+   When Linux needs to allocate memory it may find that there is
+   insufficient free memory so it needs to reclaim space that is in
+   use but not needed at the moment.  There are several options:
+
+   1/ Shrink a kernel cache such as the inode or dentry cache.  This
+      is fairly easy but provides limited returns.
+   2/ Discard 'clean' pages from the page cache.  This is easy, and
+      works well as long as there are clean pages in the page cache.
+      Similarly clean 'anonymous' pages can be discarded - if there
+      are any.
+   3/ Write out some dirty page-cache pages so that they become clean.
+      The VM limits the number of dirty page-cache pages to e.g. 40%
+      of available memory so that (among other reasons) a "sync" will
+      not take excessively long.  So there should never be excessive
+      amounts of dirty pagecache.
+      Writing out dirty page-cache pages involves work by the
+      filesystem which may need to allocate memory itself.  To avoid
+      deadlock, filesystems use GFP_NOFS when allocating memory on the
+      write-out path.  When this is used, cleaning dirty page-cache
+      pages is not an option so if the filesystem finds that  memory
+      is tight, another option must be found.
+   4/ Write out dirty anonymous pages to the "Swap" partition/file.
+      This is the most interesting for a couple of reasons.
+      a/ Unlike dirty page-cache pages, there is no need to write anon
+         pages out unless we are actually short of memory.  Thus they
+         tend to be left to last.
+      b/ Anon pages tend to be updated randomly and unpredictably, and
+         flushing them out of memory can have a very significant
+         performance impact on the process using them.  This contrasts
+         with page-cache pages which are often written sequentially
+         and often treated as "write-once, read-many".
+      So anon pages tend to be left until last to be cleaned, and may
+      be the only cleanable pages while there are still some dirty
+      page-cache pages (which are waiting on a GFP_NOFS allocation).
+
+[I don't find the above wholly satisfying.  There seems to be too much
+ hand-waving.  If someone can provide better text explaining why
+ swapout is a special case, that would be great.]
+
+So we need to be able to write to the swap file/partition without
+needing to allocate any memory ... or only a small well controlled
+amount.
+
+The VM reserves a small amount of memory that can only be allocated
+for use as part of the swap-out procedure.  It is only available to
+processes with the PF_MEMALLOC flag set, which is typically just the
+memory cleaner.
+
+Traditionally swap-out is performed directly to block devices (swap
+files on block-device filesystems are supported by examining the
+mapping from file offset to device offset in advance, and then using
+the device offsets to write directly to the device).  Block devices
+are (required to be) written to pre-allocate any memory that might be
+needed during write-out, and to block when the pre-allocated memory is
+exhausted and no other memory is available.  They can be sure not to
+block forever as the pre-allocated memory will be returned as soon as
+the data it is being used for has been written out.  The primary
+mechanism for pre-allocating memory is called "mempools".
+
+This approach does not work for writing anonymous pages
+(i.e. swapping) over a network, using e.g NFS or NBD or iSCSI.
+
+
+The main reason that it does not work is that when data from an anon
+page is written to the network, we must wait for a reply to confirm
+the data is safe.  Receiving that reply will consume memory and,
+significantly, we need to allocate memory to an incoming packet before
+we can tell if it is the reply we are waiting for or not.
+
+The secondary reason is that the network code is not written to use
+mempools and in most cases does not need to use them.  Changing all
+allocations in the networking layer to use mempools would be quite
+intrusive, and would waste memory, and probably cause a slow-down in
+the common case of not swapping over the network.
+
+These problems are addressed by enhancing the system of memory
+reserves used by PF_MEMALLOC and requiring any in-kernel networking
+client that is used for swap-out to indicate which sockets are used
+for swapout so they can be handled specially in low memory situations.
+
+There are several major parts to this enhancement:
+
+1/ page->reserve, GFP_MEMALLOC
+
+  To handle low memory conditions we need to know when those
+  conditions exist.  Having a global "low on memory" flag seems easy,
+  but its implementation is problematic.  Instead we make it possible
+  to tell if a recent memory allocation required use of the emergency
+  memory pool.
+  For pages returned by alloc_page, the new page->reserve flag
+  can be tested.  If this is set, then a low memory condition was
+  current when the page was allocated, so the memory should be used
+  carefully. (Because low memory conditions are transient, this
+  state is kept in an overloaded member instead of in page flags, which
+  would suggest a more permanent state.)
+
+  For memory allocated using slab/slub: If a page that is added to a
+  kmem_cache is found to have page->reserve set, then a  s->reserve
+  flag is set for the whole kmem_cache.  Further allocations will only
+  be returned from that page (or any other page in the cache) if they
+  are emergency allocation (i.e. PF_MEMALLOC or GFP_MEMALLOC is set).
+  Non-emergency allocations will block in alloc_page until a
+  non-reserve page is available.  Once a non-reserve page has been
+  added to the cache, the s->reserve flag on the cache is removed.
+
+  Because slab objects have no individual state its hard to pass
+  reserve state along, the current code relies on a regular alloc
+  failing. There are various allocation wrappers help here.
+
+  This allows us to
+   a/ request use of the emergency pool when allocating memory
+     (GFP_MEMALLOC), and
+   b/ to find out if the emergency pool was used.
+
+2/ SK_MEMALLOC, sk_buff->emergency.
+
+  When memory from the reserve is used to store incoming network
+  packets, the memory must be freed (and the packet dropped) as soon
+  as we find out that the packet is not for a socket that is used for
+  swap-out.
+  To achieve this we have an ->emergency flag for skbs, and an
+  SK_MEMALLOC flag for sockets.
+  When memory is allocated for an skb, it is allocated with
+  GFP_MEMALLOC (if we are currently swapping over the network at
+  all).  If a subsequent test shows that the emergency pool was used,
+  ->emergency is set.
+  When the skb is finally attached to its destination socket, the
+  SK_MEMALLOC flag on the socket is tested.  If the skb has
+  ->emergency set, but the socket does not have SK_MEMALLOC set, then
+  the skb is immediately freed and the packet is dropped.
+  This ensures that reserve memory is never queued on a socket that is
+  not used for swapout.
+
+  Similarly, if an skb is ever queued for delivery to user-space for
+  example by netfilter, the ->emergency flag is tested and the skb is
+  released if ->emergency is set. (so obviously the storage route may
+  not pass through a userspace helper, otherwise the packets will never
+  arrive and we'll deadlock)
+
+  This ensures that memory from the emergency reserve can be used to
+  allow swapout to proceed, but will not get caught up in any other
+  network queue.
+
+
+3/ pages_emergency
+
+  The above would be sufficient if the total memory below the lowest
+  memory watermark (i.e the size of the emergency reserve) were known
+  to be enough to hold all transient allocations needed for writeout.
+  I'm a little blurry on how big the current emergency pool is, but it
+  isn't big and certainly hasn't been sized to allow network traffic
+  to consume any.
+
+  We could simply make the size of the reserve bigger. However in the
+  common case that we are not swapping over the network, that would be
+  a waste of memory.
+
+  So a new "watermark" is defined: pages_emergency.  This is
+  effectively added to the current low water marks, so that pages from
+  this emergency pool can only be allocated if one of PF_MEMALLOC or
+  GFP_MEMALLOC are set.
+
+  pages_emergency can be changed dynamically based on need.  When
+  swapout over the network is required, pages_emergency is increased
+  to cover the maximum expected load.  When network swapout is
+  disabled, pages_emergency is decreased.
+
+  To determine how much to increase it by, we introduce reservation
+  groups....
+
+3a/ reservation groups
+
+  The memory used transiently for swapout can be in a number of
+  different places.  e.g. the network route cache, the network
+  fragment cache, in transit between network card and socket, or (in
+  the case of NFS) in sunrpc data structures awaiting a reply.
+  We need to ensure each of these is limited in the amount of memory
+  they use, and that the maximum is included in the reserve.
+
+  The memory required by the network layer only needs to be reserved
+  once, even if there are multiple swapout paths using the network
+  (e.g. NFS and NDB and iSCSI, though using all three for swapout at
+  the same time would be unusual).
+
+  So we create a tree of reservation groups.  The network might
+  register a collection of reservations, but not mark them as being in
+  use.  NFS and sunrpc might similarly register a collection of
+  reservations, and attach it to the network reservations as it
+  depends on them.
+  When swapout over NFS is requested, the NFS/sunrpc reservations are
+  activated which implicitly activates the network reservations.
+
+  The total new reservation is added to pages_emergency.
+
+  Provided each memory usage stays beneath the registered limit (at
+  least when allocating memory from reserves), the system will never
+  run out of emergency memory, and swapout will not deadlock.
+
+  It is worth noting here that it is not critical that each usage
+  stays beneath the limit 100% of the time.  Occasional excess is
+  acceptable provided that the memory will be freed  again within a
+  short amount of time that does *not* require waiting for any event
+  that itself might require memory.
+  This is because, at all stages of transmit and receive, it is
+  acceptable to discard all transient memory associated with a
+  particular writeout and try again later.  On transmit, the page can
+  be re-queued for later transmission.  On receive, the packet can be
+  dropped assuming that the peer will resend after a timeout.
+
+  Thus allocations that are truly transient and will be freed without
+  blocking do not strictly need to be reserved for.  Doing so might
+  still be a good idea to ensure forward progress doesn't take too
+  long.
+
+4/ low-mem accounting
+
+  Most places that might hold on to emergency memory (e.g. route
+  cache, fragment cache etc) already place a limit on the amount of
+  memory that they can use.  This limit can simply be reserved using
+  the above mechanism and no more needs to be done.
+
+  However some memory usage might not be accounted with sufficient
+  firmness to allow an appropriate emergency reservation.  The
+  in-flight skbs for incoming packets is one such example.
+
+  To support this, a low-overhead mechanism for accounting memory
+  usage against the reserves is provided.  This mechanism uses the
+  same data structure that is used to store the emergency memory
+  reservations through the addition of a 'usage' field.
+
+  Before we attempt allocation from the memory reserves, we much check
+  if the resulting 'usage' is below the reservation. If so, we increase
+  the usage and attempt the allocation (which should succeed). If
+  the projected 'usage' exceeds the reservation we'll either fail the
+  allocation, or wait for 'usage' to decrease enough so that it would
+  succeed, depending on __GFP_WAIT.
+
+  When memory that was allocated for that purpose is freed, the
+  'usage' field is checked again.  If it is non-zero, then the size of
+  the freed memory is subtracted from the usage, making sure the usage
+  never becomes less than zero.
+
+  This provides adequate accounting with minimal overheads when not in
+  a low memory condition.  When a low memory condition is encountered
+  it does add the cost of a spin lock necessary to serialise updates
+  to 'usage'.
+
+
+
+5/ swapon/swapoff/swap_out/swap_in
+
+  So that a filesystem (e.g. NFS) can know when to set SK_MEMALLOC on
+  any network socket that it uses, and can know when to account
+  reserve memory carefully, new address_space_operations are
+  available.
+  "swapon" requests that an address space (i.e a file) be make ready
+  for swapout.  swap_out and swap_in request the actual IO.  They
+  together must ensure that each swap_out request can succeed without
+  allocating more emergency memory that was reserved by swapon. swapoff
+  is used to reverse the state changes caused by swapon when we disable
+  the swap file.
+
+
+Thanks for reading this far.  I hope it made sense :-)
+
+Neil Brown (with updates from Peter Zijlstra)
+
+

^ permalink raw reply

* Re: [PATCH] net: fix NOHZ: local_softirq_pending 08
From: Michael Buesch @ 2009-10-01 14:04 UTC (permalink / raw)
  To: David Miller
  Cc: oliver, johannes, kalle.valo, linville, linux-wireless, netdev
In-Reply-To: <20090930.163333.234658158.davem@davemloft.net>

On Thursday 01 October 2009 01:33:33 David Miller wrote:

> I'm not applying this until all of these details are sorted out 

John, please apply my fix to wireless-testing to get rid of the regression.
You can revert it later, if there's a better fix available.

-- 
Greetings, Michael.

^ permalink raw reply

* [PATCH 01/31] mm: serialize access to min_free_kbytes
From: Suresh Jayaraman @ 2009-10-01 14:04 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust, Suresh Jayaraman

From: Peter Zijlstra <a.p.zijlstra@chello.nl> 

There is a small race between the procfs caller and the memory hotplug caller
of setup_per_zone_wmarks(). Not a big deal, but the next patch will add yet
another caller. Time to close the gap.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 mm/page_alloc.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

Index: mmotm/mm/page_alloc.c
===================================================================
--- mmotm.orig/mm/page_alloc.c
+++ mmotm/mm/page_alloc.c
@@ -121,6 +121,7 @@ static char * const zone_names[MAX_NR_ZO
 	 "Movable",
 };
 
+static DEFINE_SPINLOCK(min_free_lock);
 int min_free_kbytes = 1024;
 
 unsigned long __meminitdata nr_kernel_pages;
@@ -4448,13 +4449,13 @@ static void setup_per_zone_lowmem_reserv
 }
 
 /**
- * setup_per_zone_wmarks - called when min_free_kbytes changes
+ * __setup_per_zone_wmarks - called when min_free_kbytes changes
  * or when memory is hot-{added|removed}
  *
  * Ensures that the watermark[min,low,high] values for each zone are set
  * correctly with respect to min_free_kbytes.
  */
-void setup_per_zone_wmarks(void)
+static void __setup_per_zone_wmarks(void)
 {
 	unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
 	unsigned long lowmem_pages = 0;
@@ -4552,6 +4553,15 @@ static void __init setup_per_zone_inacti
 		calculate_zone_inactive_ratio(zone);
 }
 
+void setup_per_zone_wmarks(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&min_free_lock, flags);
+	__setup_per_zone_wmarks();
+	spin_unlock_irqrestore(&min_free_lock, flags);
+}
+
 /*
  * Initialise min_free_kbytes.
  *
@@ -4587,7 +4597,7 @@ static int __init init_per_zone_wmark_mi
 		min_free_kbytes = 128;
 	if (min_free_kbytes > 65536)
 		min_free_kbytes = 65536;
-	setup_per_zone_wmarks();
+	__setup_per_zone_wmarks();
 	setup_per_zone_lowmem_reserve();
 	setup_per_zone_inactive_ratio();
 	return 0;

^ permalink raw reply

* [PATCH 00/31] Swap over NFS -v20
From: Suresh Jayaraman @ 2009-10-01 14:04 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm
  Cc: netdev, Neil Brown, Miklos Szeredi, Wouter Verhelst,
	Peter Zijlstra, trond.myklebust

Hi,

Here's the latest version of swap over NFS series since -v19 last October by
Peter Zijlstra. Peter does not have time to pursue this further (though he has
not lost interest) and that led me to take over this patchset and try merging
upstream.

The patches are against the current mmotm. It does not support SLQB, yet.
These patches can also be found online here:
	http://www.suse.de/~sjayaraman/patches/swap-over-nfs/

The swap over NFS patches are being shipped with openSUSE 11.1 and SLE 11 (with
CONFIG_NFS_SWAP enabled by default) for several months now. There have been
no bugs reported so far due to these patches and it has been found stable.

Changes since -v19:
 - rebased patches against current -mm
 - adapted changes pertaining to using zone->watermarks array
 - dropped cleanup patches/fixes that have already made to upstream
 - dropped the patch that remove nfs mempools
 - fixed racy nature of sync_page in swap_sync_page (NeilBrown)
 - fixed use of uninitialized variable in cache_grow() (Miklos Szeredi)
 - fixed a bug in bnx2 driver (Jiri Bohac)
 - fixed null-pointer dereferences in swapfile code path when s_bdev is NULL

Thanks,
Suresh Jayaraman

--

Peter Zijlstra (26)
 mm: serialize access to min_free_kbytes
 mm: expose gfp_to_alloc_flags()
 mm: tag reseve pages
 mm: sl[au]b: add knowledge of reserve pages
 mm: kmem_alloc_estimate()
 mm: allow PF_MEMALLOC from softirq context
 mm: emergency pool
 mm: system wide ALLOC_NO_WATERMARK
 mm: __GFP_MEMALLOC
 mm: memory reserve management
 mm: add support for non block device backed swap files
 mm: methods for teaching filesystems about PG_swapcache pages
 net: packet split receive api
 net: sk_allocation() - concentrate socket related allocations
 selinux: tag avc cache alloc as non-critical
 netvm: network reserve infrastructure
 netvm: INET reserves
 netvm: hook skb allocation to reserves
 netvm: filter emergency skbs
 netvm: prevent a stream specific deadlock
 netvm: skb processing
 netfilter: NF_QUEUE vs emergency skbs
 nfs: teach the NFS client how to treat PG_swapcache pages
 nfs: disable data cache revalidation for swapfiles
 nfs: enable swap on NFS
 nfs: fix various memory recursions possible with swap over NFS

Jeff Mahoney (1)
 Fix initialization of ipv4_route_lock

Neil Brown (2)
 swap over network documentation
 Cope with racy nature of sync_page in swap_sync_page

Miklos Szeredi (1)
 Fix use of uninitialized variable in cache_grow()

Suresh Jayaraman (1)
 swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL


 fs/nfs/file.c                           |   18 
 fs/nfs/pagelist.c                       |    2 
 fs/nfs/write.c                          |   99 ++++
 include/linux/mm_types.h                |    1 
 include/linux/skbuff.h                  |   28 +
 include/linux/slab.h                    |   19 
 include/net/sock.h                      |   55 ++
 mm/page_alloc.c                         |  120 ++++--
 mm/page_io.c                            |    2 
 mm/slab.c                               |   80 +++-
 mm/slob.c                               |   67 +++
 mm/slub.c                               |   89 ++++
 mm/swapfile.c                           |   53 ++
 Documentation/filesystems/Locking	 |   22 +
 Documentation/filesystems/vfs.txt	 |   18 
 Documentation/network-swap.txt		 |  270 +++++++++++++
 drivers/net/bnx2.c               	 |    9 
 drivers/net/e1000e/netdev.c      	 |    7 
 drivers/net/igb/igb_main.c        	 |    9 
 drivers/net/ixgbe/ixgbe_main.c    	 |   14 
 drivers/net/sky2.c                	 |   16 
 fs/nfs/Kconfig                    	 |   10 
 fs/nfs/file.c                     	 |    6 
 fs/nfs/inode.c                    	 |    6 
 fs/nfs/internal.h                  	 |    7 
 fs/nfs/pagelist.c                 	 |    6 
 fs/nfs/read.c                     	 |    6 
 fs/nfs/write.c                    	 |   53 +-
 include/linux/buffer_head.h       	 |    1 
 include/linux/fs.h                	 |    9 
 include/linux/gfp.h               	 |    3 
 include/linux/mm.h                	 |   25 +
 include/linux/mm_types.h          	 |    1 
 include/linux/mmzone.h            	 |    3 
 include/linux/nfs_fs.h            	 |    2 
 include/linux/pagemap.h           	 |    5 
 include/linux/reserve.h           	 |  198 +++++++++
 include/linux/sched.h             	 |    7 
 include/linux/skbuff.h            	 |    3 
 include/linux/slab.h              	 |    4 
 include/linux/slub_def.h          	 |    1 
 include/linux/sunrpc/xprt.h       	 |    5 
 include/linux/swap.h              	 |    4 
 include/net/inet_frag.h           	 |    7 
 include/net/netns/ipv6.h          	 |    4 
 include/net/sock.h                	 |    5 
 kernel/softirq.c                  	 |    3 
 mm/Makefile                       	 |    2 
 mm/internal.h                     	 |   15 
 mm/page_alloc.c                   	 |   16 
 mm/page_io.c                      	 |   51 ++
 mm/reserve.c                      	 |  637 ++++++++++++++++++++++++++++++++
 mm/slab.c                         	 |   61 ++-
 mm/slob.c                         	 |   16 
 mm/slub.c                          	 |   43 +-
 mm/swap_state.c                   	 |    4 
 mm/swapfile.c                     	 |   30 +
 mm/vmstat.c                       	 |    6 
 net/Kconfig                       	 |    3 
 net/core/dev.c                    	 |   57 ++
 net/core/filter.c                 	 |    3 
 net/core/skbuff.c                 	 |  137 +++++-
 net/core/sock.c                   	 |  107 +++++
 net/ipv4/inet_fragment.c          	 |    3 
 net/ipv4/ip_fragment.c            	 |   86 ++++
 net/ipv4/route.c                  	 |   70 +++
 net/ipv4/tcp.c                    	 |    3 
 net/ipv4/tcp_input.c              	 |   12 
 net/ipv4/tcp_output.c             	 |   12 
 net/ipv6/reassembly.c             	 |   85 ++++
 net/ipv6/route.c                  	 |   77 +++
 net/ipv6/tcp_ipv6.c               	 |   15 
 net/netfilter/core.c              	 |    3 
 net/sctp/ulpevent.c               	 |    2 
 net/sunrpc/Kconfig                	 |    5 
 net/sunrpc/sched.c                	 |    9 
 net/sunrpc/xprtsock.c             	 |   68 +++
 security/selinux/avc.c            	 |    2 
 net/core/sock.c                         |   18 
 net/ipv4/route.c                        |    2 

 80 files changed, 2797 insertions(+), 245 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: r8169c: Support for Realtek 8168DP chip?
From: David Dillow @ 2009-10-01 13:38 UTC (permalink / raw)
  To: Rainer Koenig; +Cc: netdev
In-Reply-To: <4AC494EC.8050405@ts.fujitsu.com>

On Thu, 2009-10-01 at 13:39 +0200, Rainer Koenig wrote:
> The reason why is easy to decode when looking at the source: The
> TxConfig register returns 2b800000 and there is no MAC_VERSION in the
> list of valid versions. That means not PHY initialization code is
> executed and stop, no working device. :-(

Francois Romieu posted a patch yesterday (today, his time) to the thread
"r8169 chips on some Intel D945GSEJT boards fail to work after PXE boot"

It looks to add MAC support for your card; you should be able to find it
at any of your favorite mail archives, Google, or better yet,
http://patchwork.ozlabs.org/project/netdev/list/

Hmm, patchwork doesn't seem to have picked it up, yet.

Please test that and let us know how it works.
Dave


^ permalink raw reply

* Re: [PATCH] TI DaVinci EMAC: Minor macro related updates
From: Sergei Shtylyov @ 2009-10-01 12:11 UTC (permalink / raw)
  To: Chaithrika U S; +Cc: netdev, davinci-linux-open-source, davem
In-Reply-To: <1254428719-13960-1-git-send-email-chaithrika@ti.com>

Hello.

Chaithrika U S wrote:

> Use BIT for macro definitions wherever possible, remove
> unused and redundant macros.
> 
> Signed-off-by: Chaithrika U S <chaithrika@ti.com>
[...]
> diff --git a/drivers/net/davinci_emac.c b/drivers/net/davinci_emac.c
> index 65a2d0b..a421ec0 100644
> --- a/drivers/net/davinci_emac.c
> +++ b/drivers/net/davinci_emac.c
> @@ -164,16 +164,14 @@ static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
>  # define EMAC_MBP_MCASTCHAN(ch)		((ch) & 0x7)
>  
>  /* EMAC mac_control register */
> -#define EMAC_MACCONTROL_TXPTYPE		(0x200)
> -#define EMAC_MACCONTROL_TXPACEEN	(0x40)
> -#define EMAC_MACCONTROL_MIIEN		(0x20)
> -#define EMAC_MACCONTROL_GIGABITEN	(0x80)
> -#define EMAC_MACCONTROL_GIGABITEN_SHIFT (7)
> -#define EMAC_MACCONTROL_FULLDUPLEXEN	(0x1)
> +#define EMAC_MACCONTROL_TXPTYPE		BIT(9)
> +#define EMAC_MACCONTROL_TXPACEEN	BIT(6)
> +#define EMAC_MACCONTROL_GMIIEN		BIT(5)
> +#define EMAC_MACCONTROL_GIGABITEN	BIT(7)
> +#define EMAC_MACCONTROL_FULLDUPLEXEN	BIT(0)
>  #define EMAC_MACCONTROL_RMIISPEED_MASK	BIT(15)

    Can we have these properly sorted by value, while you're at it?

>  
>  /* GIGABIT MODE related bits */
> -#define EMAC_DM646X_MACCONTORL_GMIIEN	BIT(5)
>  #define EMAC_DM646X_MACCONTORL_GIG	BIT(7)
>  #define EMAC_DM646X_MACCONTORL_GIGFORCE	BIT(17)
>  
> @@ -192,10 +190,10 @@ static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
>  #define EMAC_RX_BUFFER_OFFSET_MASK	(0xFFFF)
>  
>  /* MAC_IN_VECTOR (0x180) register bit fields */
> -#define EMAC_DM644X_MAC_IN_VECTOR_HOST_INT	      (0x20000)
> -#define EMAC_DM644X_MAC_IN_VECTOR_STATPEND_INT	      (0x10000)
> -#define EMAC_DM644X_MAC_IN_VECTOR_RX_INT_VEC	      (0x0100)
> -#define EMAC_DM644X_MAC_IN_VECTOR_TX_INT_VEC	      (0x01)
> +#define EMAC_DM644X_MAC_IN_VECTOR_HOST_INT	BIT(17)
> +#define EMAC_DM644X_MAC_IN_VECTOR_STATPEND_INT	BIT(16)
> +#define EMAC_DM644X_MAC_IN_VECTOR_RX_INT_VEC	BIT(8)
> +#define EMAC_DM644X_MAC_IN_VECTOR_TX_INT_VEC	BIT(0)
>  
>  /** NOTE:: For DM646x the IN_VECTOR has changed */
>  #define EMAC_DM646X_MAC_IN_VECTOR_RX_INT_VEC	BIT(EMAC_DEF_RX_CH)
> @@ -203,7 +201,6 @@ static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
>  #define EMAC_DM646X_MAC_IN_VECTOR_HOST_INT	BIT(26)
>  #define EMAC_DM646X_MAC_IN_VECTOR_STATPEND_INT	BIT(27)
>  
> -
>  /* CPPI bit positions */
>  #define EMAC_CPPI_SOP_BIT		BIT(31)
>  #define EMAC_CPPI_EOP_BIT		BIT(30)
> @@ -747,8 +744,7 @@ static void emac_update_phystatus(struct emac_priv *priv)
>  
>  	if (priv->speed == SPEED_1000 && (priv->version == EMAC_VERSION_2)) {
>  		mac_control = emac_read(EMAC_MACCONTROL);
> -		mac_control |= (EMAC_DM646X_MACCONTORL_GMIIEN |
> -				EMAC_DM646X_MACCONTORL_GIG |
> +		mac_control |= (EMAC_DM646X_MACCONTORL_GIG |
>  				EMAC_DM646X_MACCONTORL_GIGFORCE);
>  	} else {
>  		/* Clear the GIG bit and GIGFORCE bit */
> @@ -2105,7 +2101,7 @@ static int emac_hw_enable(struct emac_priv *priv)
>  
>  	/* Enable MII */
>  	val = emac_read(EMAC_MACCONTROL);
> -	val |= (EMAC_MACCONTROL_MIIEN);
> +	val |= (EMAC_MACCONTROL_GMIIEN);

    Parens not needed.

>  	emac_write(EMAC_MACCONTROL, val);
>  
>  	/* Enable NAPI and interrupts */

WBR, Sergei

^ permalink raw reply

* r8169c: Support for Realtek 8168DP chip?
From: Rainer Koenig @ 2009-10-01 11:39 UTC (permalink / raw)
  To: netdev

Hi there,

I got several new workstation models  that come with the Realtek 8168DP
chip (8168 with DASH capabilites).
When trying to use this chip with the r8169 driver module I get the
following errors:

localhost kernel: r8169 0000:05:00.0: unknown MAC (2b800600)
localhost kernel: eth0: RTL8169 at 0xffffc2000004c000, 9:65:d3:9f, XID
28800000 IRQ 138
localhost kernel: eth0: PHY reset failed.
localhost kernel: r8169: eth0: TBI auto-negotiating
localhost kernel: r8169: eth0: unknown chipset (mac_version = 1).
localhost kernel: r8169: eth0: link down
localhost kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready

The distribution running is RHEL 5.4, but actually that doesn't matter
since I didn't see the necessary code lines even in the latest blob from
git.kernel.org.

The reason why is easy to decode when looking at the source: The
TxConfig register returns 2b800000 and there is no MAC_VERSION in the
list of valid versions. That means not PHY initialization code is
executed and stop, no working device. :-(

The latest OEM download from Realtek
http://218.210.127.131/downloads/RedirectFTPSite.aspx?SiteID=1&DownTypeID=3&DownID=332&PFid=5&Conn=4
compiles and works. Looking at the source of this driver it shows code
for this TxConfig value and it has a special part for the PHY
initialization.

So the questions are:
- Will there be a patch for the 8168DP chip in the r8169 driver soon?
- What is necessary to get a patch?
- Are the maintainers of r8169 talking to the people that do the OEM
  driver or is r8169 just a reverse engineered driver?

Best regards
Rainer
-- 
Dipl.-Inf. (FH) Rainer Koenig
Project Manager Linux Business Clients
Dept. TSP CLI E SW OSE

Fujitsu Technology Solutions
Bürgermeister-Ullrich-Str. 100
86199 Augsburg
Germany

Telephone: +49-821-804-3321
Telefax:   +49-821-804-2131
Mail:      mailto:Rainer.Koenig@ts.fujitsu.com

Internet         ts.fujtsu.com
Company Details  ts.fujitsu.com/imprint.html

^ permalink raw reply

* [PATCH 7/7] mlx4_en: Updated driver version and date
From: Yevgeny Petrilin @ 2009-10-01 14:34 UTC (permalink / raw)
  To: davem; +Cc: netdev


Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/mlx4_en.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h
index 8655624..89ca376 100644
--- a/drivers/net/mlx4/mlx4_en.h
+++ b/drivers/net/mlx4/mlx4_en.h
@@ -50,8 +50,8 @@
 #include "en_port.h"
 
 #define DRV_NAME	"mlx4_en"
-#define DRV_VERSION	"1.4.1.1"
-#define DRV_RELDATE	"June 2009"
+#define DRV_VERSION	"1.4.2.1"
+#define DRV_RELDATE	"Oct 2009"
 
 #define MLX4_EN_MSG_LEVEL	(NETIF_MSG_LINK | NETIF_MSG_IFDOWN)
 
-- 
1.6.1.3



^ permalink raw reply related

* [PATCH 6/7] mlx4_en: performing CLOSE_PORT at the end of tear-down process
From: Yevgeny Petrilin @ 2009-10-01 14:34 UTC (permalink / raw)
  To: davem; +Cc: netdev

As required by ConnectX PRM.
Not doing it might cause races in the HW during tear down process. 

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/en_netdev.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c
index 13fc6f0..b1e80d8 100644
--- a/drivers/net/mlx4/en_netdev.c
+++ b/drivers/net/mlx4/en_netdev.c
@@ -711,9 +711,8 @@ void mlx4_en_stop_port(struct net_device *dev)
 	netif_tx_stop_all_queues(dev);
 	netif_tx_unlock_bh(dev);
 
-	/* close port*/
+	/* Set port as not active */
 	priv->port_up = false;
-	mlx4_CLOSE_PORT(mdev->dev, priv->port);
 
 	/* Unregister Mac address for the port */
 	mlx4_unregister_mac(mdev->dev, priv->port, priv->mac_index);
@@ -738,6 +737,9 @@ void mlx4_en_stop_port(struct net_device *dev)
 			msleep(1);
 		mlx4_en_deactivate_cq(priv, &priv->rx_cq[i]);
 	}
+
+	/* close port*/
+	mlx4_CLOSE_PORT(mdev->dev, priv->port);
 }
 
 static void mlx4_en_restart(struct work_struct *work)
-- 
1.6.1.3



^ permalink raw reply related

* [PATCH 5/7] mlx4_en: Setting dev->perm_addr field
From: Yevgeny Petrilin @ 2009-10-01 14:34 UTC (permalink / raw)
  To: davem; +Cc: netdev

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/en_netdev.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c
index 922ae1f..13fc6f0 100644
--- a/drivers/net/mlx4/en_netdev.c
+++ b/drivers/net/mlx4/en_netdev.c
@@ -1030,9 +1030,10 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 
 	/* Set defualt MAC */
 	dev->addr_len = ETH_ALEN;
-	for (i = 0; i < ETH_ALEN; i++)
-		dev->dev_addr[ETH_ALEN - 1 - i] =
-		(u8) (priv->mac >> (8 * i));
+	for (i = 0; i < ETH_ALEN; i++) {
+		dev->dev_addr[ETH_ALEN - 1 - i] = (u8) (priv->mac >> (8 * i));
+		dev->perm_addr[ETH_ALEN - 1 - i] = (u8) (priv->mac >> (8 * i));
+	}
 
 	/*
 	 * Set driver features
-- 
1.6.1.3



^ permalink raw reply related

* [PATCH 4/7] mlx4_en: Added self diagnostics test implementation
From: Yevgeny Petrilin @ 2009-10-01 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev

The test includes 5 tests:
1. Interrupt test: Executing commands and receiving command completion
   on all our interrupt vectors.
2. Link test: Verifying we are connected to valid link partner.
3. Speed test: Check that we negotiated link speed correctly.
4. Registers test: Activate HW health check command.
5. Loopback test: Send a packet on loopback interface and catch it on RX side.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/Makefile      |    2 +-
 drivers/net/mlx4/en_ethtool.c  |   79 ++++++++++++------
 drivers/net/mlx4/en_netdev.c   |    2 +-
 drivers/net/mlx4/en_port.c     |   32 +++++++
 drivers/net/mlx4/en_port.h     |   14 +++
 drivers/net/mlx4/en_rx.c       |    7 ++
 drivers/net/mlx4/en_selftest.c |  182 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/en_tx.c       |   16 ++++
 drivers/net/mlx4/mlx4_en.h     |   21 +++++-
 9 files changed, 327 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/mlx4/en_selftest.c

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 1fd068e..d1aa45a 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -6,4 +6,4 @@ mlx4_core-y :=	alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \
 obj-$(CONFIG_MLX4_EN)               += mlx4_en.o
 
 mlx4_en-y := 	en_main.o en_tx.o en_rx.o en_ethtool.o en_port.o en_cq.o \
-		en_resources.o en_netdev.o
+		en_resources.o en_netdev.o en_selftest.o
diff --git a/drivers/net/mlx4/en_ethtool.c b/drivers/net/mlx4/en_ethtool.c
index 86467b4..745a204 100644
--- a/drivers/net/mlx4/en_ethtool.c
+++ b/drivers/net/mlx4/en_ethtool.c
@@ -125,6 +125,14 @@ static const char main_strings[][ETH_GSTRING_LEN] = {
 #define NUM_MAIN_STATS	21
 #define NUM_ALL_STATS	(NUM_MAIN_STATS + NUM_PORT_STATS + NUM_PKT_STATS + NUM_PERF_STATS)
 
+static const char mlx4_en_test_names[][ETH_GSTRING_LEN] = {
+	"Interupt Test",
+	"Link Test",
+	"Speed Test",
+	"Register Test",
+	"Loopback Test",
+};
+
 static u32 mlx4_en_get_msglevel(struct net_device *dev)
 {
 	return ((struct mlx4_en_priv *) netdev_priv(dev))->msg_enable;
@@ -148,10 +156,15 @@ static int mlx4_en_get_sset_count(struct net_device *dev, int sset)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 
-	if (sset != ETH_SS_STATS)
+	switch (sset) {
+	case ETH_SS_STATS:
+		return NUM_ALL_STATS +
+			(priv->tx_ring_num + priv->rx_ring_num) * 2;
+	case ETH_SS_TEST:
+		return MLX4_EN_NUM_SELF_TEST - !(priv->mdev->dev->caps.loopback_support) * 2;
+	default:
 		return -EOPNOTSUPP;
-
-	return NUM_ALL_STATS + (priv->tx_ring_num + priv->rx_ring_num) * 2;
+	}
 }
 
 static void mlx4_en_get_ethtool_stats(struct net_device *dev,
@@ -183,6 +196,12 @@ static void mlx4_en_get_ethtool_stats(struct net_device *dev,
 
 }
 
+static void mlx4_en_self_test(struct net_device *dev,
+			      struct ethtool_test *etest, u64 *buf)
+{
+	mlx4_en_ex_selftest(dev, &etest->flags, buf);
+}
+
 static void mlx4_en_get_strings(struct net_device *dev,
 				uint32_t stringset, uint8_t *data)
 {
@@ -190,30 +209,39 @@ static void mlx4_en_get_strings(struct net_device *dev,
 	int index = 0;
 	int i;
 
-	if (stringset != ETH_SS_STATS)
-		return;
-
-	/* Add main counters */
-	for (i = 0; i < NUM_MAIN_STATS; i++)
-		strcpy(data + (index++) * ETH_GSTRING_LEN, main_strings[i]);
-	for (i = 0; i < NUM_PORT_STATS; i++)
-		strcpy(data + (index++) * ETH_GSTRING_LEN,
+	switch (stringset) {
+	case ETH_SS_TEST:
+		for (i = 0; i < MLX4_EN_NUM_SELF_TEST - 2; i++)
+			strcpy(data + i * ETH_GSTRING_LEN, mlx4_en_test_names[i]);
+		if (priv->mdev->dev->caps.loopback_support)
+			for (; i < MLX4_EN_NUM_SELF_TEST; i++)
+				strcpy(data + i * ETH_GSTRING_LEN, mlx4_en_test_names[i]);
+		break;
+
+	case ETH_SS_STATS:
+		/* Add main counters */
+		for (i = 0; i < NUM_MAIN_STATS; i++)
+			strcpy(data + (index++) * ETH_GSTRING_LEN, main_strings[i]);
+		for (i = 0; i < NUM_PORT_STATS; i++)
+			strcpy(data + (index++) * ETH_GSTRING_LEN,
 			main_strings[i + NUM_MAIN_STATS]);
-	for (i = 0; i < priv->tx_ring_num; i++) {
-		sprintf(data + (index++) * ETH_GSTRING_LEN,
-			"tx%d_packets", i);
-		sprintf(data + (index++) * ETH_GSTRING_LEN,
-			"tx%d_bytes", i);
-	}
-	for (i = 0; i < priv->rx_ring_num; i++) {
-		sprintf(data + (index++) * ETH_GSTRING_LEN,
-			"rx%d_packets", i);
-		sprintf(data + (index++) * ETH_GSTRING_LEN,
-			"rx%d_bytes", i);
-	}
-	for (i = 0; i < NUM_PKT_STATS; i++)
-		strcpy(data + (index++) * ETH_GSTRING_LEN,
+		for (i = 0; i < priv->tx_ring_num; i++) {
+			sprintf(data + (index++) * ETH_GSTRING_LEN,
+				"tx%d_packets", i);
+			sprintf(data + (index++) * ETH_GSTRING_LEN,
+				"tx%d_bytes", i);
+		}
+		for (i = 0; i < priv->rx_ring_num; i++) {
+			sprintf(data + (index++) * ETH_GSTRING_LEN,
+				"rx%d_packets", i);
+			sprintf(data + (index++) * ETH_GSTRING_LEN,
+				"rx%d_bytes", i);
+		}
+		for (i = 0; i < NUM_PKT_STATS; i++)
+			strcpy(data + (index++) * ETH_GSTRING_LEN,
 			main_strings[i + NUM_MAIN_STATS + NUM_PORT_STATS]);
+		break;
+	}
 }
 
 static int mlx4_en_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
@@ -407,6 +435,7 @@ const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_strings = mlx4_en_get_strings,
 	.get_sset_count = mlx4_en_get_sset_count,
 	.get_ethtool_stats = mlx4_en_get_ethtool_stats,
+	.self_test = mlx4_en_self_test,
 	.get_wol = mlx4_en_get_wol,
 	.get_msglevel = mlx4_en_get_msglevel,
 	.set_msglevel = mlx4_en_set_msglevel,
diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c
index c48b0f4..922ae1f 100644
--- a/drivers/net/mlx4/en_netdev.c
+++ b/drivers/net/mlx4/en_netdev.c
@@ -108,7 +108,7 @@ static void mlx4_en_vlan_rx_kill_vid(struct net_device *dev, unsigned short vid)
 	mutex_unlock(&mdev->state_lock);
 }
 
-static u64 mlx4_en_mac_to_u64(u8 *addr)
+u64 mlx4_en_mac_to_u64(u8 *addr)
 {
 	u64 mac = 0;
 	int i;
diff --git a/drivers/net/mlx4/en_port.c b/drivers/net/mlx4/en_port.c
index a29abe8..aa3ef2a 100644
--- a/drivers/net/mlx4/en_port.c
+++ b/drivers/net/mlx4/en_port.c
@@ -142,6 +142,38 @@ int mlx4_SET_PORT_qpn_calc(struct mlx4_dev *dev, u8 port, u32 base_qpn,
 	return err;
 }
 
+int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port)
+{
+	struct mlx4_en_query_port_context *qport_context;
+	struct mlx4_en_priv *priv = netdev_priv(mdev->pndev[port]);
+	struct mlx4_en_port_state *state = &priv->port_state;
+	struct mlx4_cmd_mailbox *mailbox;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(mdev->dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+	memset(mailbox->buf, 0, sizeof(*qport_context));
+	err = mlx4_cmd_box(mdev->dev, 0, mailbox->dma, port, 0,
+			   MLX4_CMD_QUERY_PORT, MLX4_CMD_TIME_CLASS_B);
+	if (err)
+		goto out;
+	qport_context = mailbox->buf;
+
+	/* This command is always accessed from Ethtool context
+	 * already synchronized, no need in locking */
+	state->link_state = !!(qport_context->link_up & MLX4_EN_LINK_UP_MASK);
+	if ((qport_context->link_speed & MLX4_EN_SPEED_MASK) ==
+	    MLX4_EN_1G_SPEED)
+		state->link_speed = 1000;
+	else
+		state->link_speed = 10000;
+	state->transciver = qport_context->transceiver;
+
+out:
+	mlx4_free_cmd_mailbox(mdev->dev, mailbox);
+	return err;
+}
 
 int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 {
diff --git a/drivers/net/mlx4/en_port.h b/drivers/net/mlx4/en_port.h
index e6477f1..f6511aa 100644
--- a/drivers/net/mlx4/en_port.h
+++ b/drivers/net/mlx4/en_port.h
@@ -84,6 +84,20 @@ enum {
 	MLX4_MCAST_ENABLE       = 2,
 };
 
+struct mlx4_en_query_port_context {
+	u8 link_up;
+#define MLX4_EN_LINK_UP_MASK	0x80
+	u8 reserved;
+	__be16 mtu;
+	u8 reserved2;
+	u8 link_speed;
+#define MLX4_EN_SPEED_MASK	0x3
+#define MLX4_EN_1G_SPEED	0x2
+	u16 reserved3[5];
+	__be64 mac;
+	u8 transceiver;
+};
+
 
 struct mlx4_en_stat_out_mbox {
 	/* Received frames with a length of 64 octets */
diff --git a/drivers/net/mlx4/en_rx.c b/drivers/net/mlx4/en_rx.c
index 03b781a..247a408 100644
--- a/drivers/net/mlx4/en_rx.c
+++ b/drivers/net/mlx4/en_rx.c
@@ -654,6 +654,13 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			goto next;
 		}
 
+		if (unlikely(priv->validate_loopback)) {
+			priv->loopback_ok =
+				!strcmp((skb->data + ETH_HLEN), "MLX4 Loopback");
+			dev_kfree_skb_any(skb);
+			goto next;
+		}
+
 		skb->ip_summed = ip_summed;
 		skb->protocol = eth_type_trans(skb, dev);
 		skb_record_rx_queue(skb, cq->ring);
diff --git a/drivers/net/mlx4/en_selftest.c b/drivers/net/mlx4/en_selftest.c
new file mode 100644
index 0000000..8e2042e
--- /dev/null
+++ b/drivers/net/mlx4/en_selftest.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright (c) 2007 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/ethtool.h>
+#include <linux/netdevice.h>
+#include <linux/delay.h>
+#include <linux/mlx4/driver.h>
+
+#include "mlx4_en.h"
+
+
+static int mlx4_en_test_registers(struct mlx4_en_priv *priv)
+{
+	return mlx4_cmd(priv->mdev->dev, 0, 0, 0, MLX4_CMD_HW_HEALTH_CHECK,
+			MLX4_CMD_TIME_CLASS_A);
+}
+
+static int mlx4_en_test_loopback_xmit(struct mlx4_en_priv *priv)
+{
+	struct sk_buff *skb;
+	struct ethhdr *ethh;
+	unsigned char *packet;
+	unsigned int packet_size = MLX4_LOOPBACK_TEST_PAYLOAD;
+	unsigned int i;
+	int err;
+
+
+	/* build the pkt before xmit */
+	skb = netdev_alloc_skb(priv->dev,
+			MLX4_LOOPBACK_TEST_PAYLOAD + ETH_HLEN + NET_IP_ALIGN);
+	if (!skb) {
+		en_err(priv, "failed to allocate skb for xmit\n");
+		return -ENOMEM;
+	}
+	skb_reserve(skb, NET_IP_ALIGN);
+
+	ethh = (struct ethhdr *)skb_put(skb, sizeof(struct ethhdr));
+	packet	= (unsigned char *)skb_put(skb, packet_size);
+	memcpy(ethh->h_dest, priv->dev->dev_addr, ETH_ALEN);
+	memset(ethh->h_source, 0, ETH_ALEN);
+	ethh->h_proto = htons(ETH_P_ARP);
+	skb_set_mac_header(skb, 0);
+	for (i = 0; i < packet_size; ++i)	/* fill our packet */
+		sprintf(packet, "MLX4 Loopback");
+
+	/* xmit the pkt */
+	err = mlx4_en_xmit(skb, priv->dev);
+	return err;
+}
+
+static int mlx4_en_test_loopback(struct mlx4_en_priv *priv)
+{
+	u32 loopback_ok = 0;
+	int i;
+
+	priv->loopback_ok = 0;
+	priv->validate_loopback = 1;
+
+	/* xmit */
+	if (mlx4_en_test_loopback_xmit(priv)) {
+		en_err(priv, "Transmitting loopback packet failed\n");
+		goto mlx4_en_test_loopback_exit;
+	}
+
+	/* polling for result */
+	for (i = 0; i < MLX4_EN_LOOPBACK_RETRIES; ++i) {
+		msleep(MLX4_EN_LOOPBACK_TIMEOUT);
+		if (priv->loopback_ok) {
+			loopback_ok = 1;
+			break;
+		}
+	}
+	if (!loopback_ok)
+		en_err(priv, "Loopback packet didn't arrive\n");
+
+mlx4_en_test_loopback_exit:
+
+	priv->validate_loopback = 0;
+	return !loopback_ok;
+}
+
+
+static int mlx4_en_test_link(struct mlx4_en_priv *priv)
+{
+	if (mlx4_en_QUERY_PORT(priv->mdev, priv->port))
+		return -ENOMEM;
+	if (priv->port_state.link_state == 1)
+		return 0;
+	else
+		return 1;
+}
+
+static int mlx4_en_test_speed(struct mlx4_en_priv *priv)
+{
+
+	if (mlx4_en_QUERY_PORT(priv->mdev, priv->port))
+		return -ENOMEM;
+
+	/* The device currently only supports 10G speed */
+	if (priv->port_state.link_speed != SPEED_10000)
+		return priv->port_state.link_speed;
+	return 0;
+}
+
+
+void mlx4_en_ex_selftest(struct net_device *dev, u32 *flags, u64 *buf)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+	struct mlx4_en_tx_ring *tx_ring;
+	int i, running;
+
+	memset(buf, 0, sizeof(u64) * MLX4_EN_NUM_SELF_TEST);
+
+	if (*flags & ETH_TEST_FL_OFFLINE) {
+		/* disable the interface */
+		running = netif_running(dev);
+
+		if (running) {
+			netif_tx_disable(dev);
+			dev->trans_start = jiffies;
+		}
+retry_tx:
+		/* Wait untill all tx queues are empty.
+		 * there should not be any additional incoming traffic
+		 * since we turned the carrier off */
+		msleep(200);
+		for (i = 0; i < priv->tx_ring_num && running; i++) {
+			tx_ring = &priv->tx_ring[i];
+			if (tx_ring->prod != (tx_ring->cons + tx_ring->last_nr_txbb))
+				goto retry_tx;
+		}
+
+		if (priv->mdev->dev->caps.loopback_support) {
+			buf[3] = mlx4_en_test_registers(priv);
+			buf[4] = mlx4_en_test_loopback(priv);
+		}
+
+		if (running)
+			netif_tx_wake_all_queues(dev);
+
+	}
+	buf[0] = mlx4_test_interrupts(mdev->dev);
+	buf[1] = mlx4_en_test_link(priv);
+	buf[2] = mlx4_en_test_speed(priv);
+
+	for (i = 0; i < MLX4_EN_NUM_SELF_TEST; i++) {
+		if (buf[i])
+			*flags |= ETH_TEST_FL_FAILED;
+	}
+}
diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c
index 8c72799..47de091 100644
--- a/drivers/net/mlx4/en_tx.c
+++ b/drivers/net/mlx4/en_tx.c
@@ -599,6 +599,9 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct mlx4_wqe_data_seg *data;
 	struct skb_frag_struct *frag;
 	struct mlx4_en_tx_info *tx_info;
+	struct ethhdr *ethh;
+	u64 mac;
+	u32 mac_l, mac_h;
 	int tx_ind = 0;
 	int nr_txbb;
 	int desc_size;
@@ -675,6 +678,19 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		priv->port_stats.tx_chksum_offload++;
 	}
 
+	if (unlikely(priv->validate_loopback)) {
+		/* Copy dst mac address to wqe */
+		skb_reset_mac_header(skb);
+		ethh = eth_hdr(skb);
+		if (ethh && ethh->h_dest) {
+			mac = mlx4_en_mac_to_u64(ethh->h_dest);
+			mac_h = (u32) ((mac & 0xffff00000000) >> 16);
+			mac_l = (u32) (mac & 0xffffffff);
+			tx_desc->ctrl.srcrb_flags |= cpu_to_be32(mac_h);
+			tx_desc->ctrl.imm = cpu_to_be32(mac_l);
+		}
+	}
+
 	/* Handle LSO (TSO) packets */
 	if (lso_header_size) {
 		/* Mark opcode as LSO */
diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h
index 4376147..8655624 100644
--- a/drivers/net/mlx4/mlx4_en.h
+++ b/drivers/net/mlx4/mlx4_en.h
@@ -45,6 +45,7 @@
 #include <linux/mlx4/cq.h>
 #include <linux/mlx4/srq.h>
 #include <linux/mlx4/doorbell.h>
+#include <linux/mlx4/cmd.h>
 
 #include "en_port.h"
 
@@ -52,7 +53,6 @@
 #define DRV_VERSION	"1.4.1.1"
 #define DRV_RELDATE	"June 2009"
 
-
 #define MLX4_EN_MSG_LEVEL	(NETIF_MSG_LINK | NETIF_MSG_IFDOWN)
 
 #define en_print(level, priv, format, arg...)			\
@@ -171,10 +171,14 @@ enum {
 
 #define SMALL_PACKET_SIZE      (256 - NET_IP_ALIGN)
 #define HEADER_COPY_SIZE       (128 - NET_IP_ALIGN)
+#define MLX4_LOOPBACK_TEST_PAYLOAD (HEADER_COPY_SIZE - ETH_HLEN)
 
 #define MLX4_EN_MIN_MTU		46
 #define ETH_BCAST		0xffffffffffffULL
 
+#define MLX4_EN_LOOPBACK_RETRIES	5
+#define MLX4_EN_LOOPBACK_TIMEOUT	100
+
 #ifdef MLX4_EN_PERF_STAT
 /* Number of samples to 'average' */
 #define AVG_SIZE			128
@@ -389,6 +393,12 @@ struct mlx4_en_rss_context {
 	__be32 rss_key[10];
 };
 
+struct mlx4_en_port_state {
+	int link_state;
+	int link_speed;
+	int transciver;
+};
+
 struct mlx4_en_pkt_stats {
 	unsigned long broadcast;
 	unsigned long rx_prio[8];
@@ -437,6 +447,7 @@ struct mlx4_en_priv {
 	struct vlan_group *vlgrp;
 	struct net_device_stats stats;
 	struct net_device_stats ret_stats;
+	struct mlx4_en_port_state port_state;
 	spinlock_t stats_lock;
 
 	unsigned long last_moder_packets;
@@ -455,6 +466,8 @@ struct mlx4_en_priv {
 	u16 sample_interval;
 	u16 adaptive_rx_coal;
 	u32 msg_enable;
+	u32 loopback_ok;
+	u32 validate_loopback;
 
 	struct mlx4_hwq_resources res;
 	int link_state;
@@ -494,6 +507,7 @@ struct mlx4_en_priv {
 	struct mlx4_en_port_stats port_stats;
 	struct dev_mc_list *mc_list;
 	struct mlx4_en_stat_out_mbox hw_stats;
+
 };
 
 
@@ -562,6 +576,11 @@ int mlx4_SET_PORT_qpn_calc(struct mlx4_dev *dev, u8 port, u32 base_qpn,
 			   u8 promisc);
 
 int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset);
+int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port);
+
+#define MLX4_EN_NUM_SELF_TEST	5
+void mlx4_en_ex_selftest(struct net_device *dev, u32 *flags, u64 *buf);
+u64 mlx4_en_mac_to_u64(u8 *addr);
 
 /*
  * Globals
-- 
1.6.1.3



^ permalink raw reply related

* [PATCH 3/7] mlx4: Added HW_HEALTH_CHECK command opcode
From: Yevgeny Petrilin @ 2009-10-01 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev

When the command is executed, the Firmware checks HW state and configuration
registers and returns status.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 include/linux/mlx4/cmd.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 0f82293..78a1b96 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -56,6 +56,7 @@ enum {
 	MLX4_CMD_QUERY_HCA	 = 0xb,
 	MLX4_CMD_QUERY_PORT	 = 0x43,
 	MLX4_CMD_SENSE_PORT	 = 0x4d,
+	MLX4_CMD_HW_HEALTH_CHECK = 0x50,
 	MLX4_CMD_SET_PORT	 = 0xc,
 	MLX4_CMD_ACCESS_DDR	 = 0x2e,
 	MLX4_CMD_MAP_ICM	 = 0xffa,
-- 
1.6.1.3


^ permalink raw reply related

* [PATCH 2/7] mlx4: Query for loopback support
From: Yevgeny Petrilin @ 2009-10-01 14:33 UTC (permalink / raw)
  To: davem; +Cc: netdev

The fw reports whether loopback capabilities are enabled

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/fw.c       |    3 +++
 drivers/net/mlx4/fw.h       |    1 +
 drivers/net/mlx4/main.c     |    1 +
 include/linux/mlx4/device.h |    1 +
 4 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index cee199c..50f1ee8 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -176,6 +176,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_MAX_GID_OFFSET		0x3b
 #define QUERY_DEV_CAP_RATE_SUPPORT_OFFSET	0x3c
 #define QUERY_DEV_CAP_MAX_PKEY_OFFSET		0x3f
+#define QUERY_DEV_CAP_ETH_UC_LOOPBACK_OFFSET	0x43
 #define QUERY_DEV_CAP_FLAGS_OFFSET		0x44
 #define QUERY_DEV_CAP_RSVD_UAR_OFFSET		0x48
 #define QUERY_DEV_CAP_UAR_SZ_OFFSET		0x49
@@ -266,6 +267,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev_cap->max_msg_sz = 1 << (field & 0x1f);
 	MLX4_GET(stat_rate, outbox, QUERY_DEV_CAP_RATE_SUPPORT_OFFSET);
 	dev_cap->stat_rate_support = stat_rate;
+	MLX4_GET(field, outbox, QUERY_DEV_CAP_ETH_UC_LOOPBACK_OFFSET);
+	dev_cap->loopback_support = field & 0x1;
 	MLX4_GET(dev_cap->flags, outbox, QUERY_DEV_CAP_FLAGS_OFFSET);
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_RSVD_UAR_OFFSET);
 	dev_cap->reserved_uars = field >> 4;
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index 526d7f3..2cc1ba5 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -74,6 +74,7 @@ struct mlx4_dev_cap {
 	u64 def_mac[MLX4_MAX_PORTS + 1];
 	u16 eth_mtu[MLX4_MAX_PORTS + 1];
 	u16 stat_rate_support;
+	int loopback_support;
 	u32 flags;
 	int reserved_uars;
 	int uar_size;
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 348b09b..e291a5c 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -220,6 +220,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.bmme_flags	     = dev_cap->bmme_flags;
 	dev->caps.reserved_lkey	     = dev_cap->reserved_lkey;
 	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
+	dev->caps.loopback_support   = dev_cap->loopback_support;
 	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
 
 	dev->caps.log_num_macs  = log_num_mac;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index e27a68d..7a423e7 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -228,6 +228,7 @@ struct mlx4_caps {
 	u32			bmme_flags;
 	u32			reserved_lkey;
 	u16			stat_rate_support;
+	int			loopback_support;
 	u8			port_width_cap[MLX4_MAX_PORTS + 1];
 	int			max_gso_sz;
 	int                     reserved_qps_cnt[MLX4_NUM_QP_REGION];
-- 
1.6.1.3


^ permalink raw reply related

* [PATCH 1/7] mlx4: Added interrupts test support
From: Yevgeny Petrilin @ 2009-10-01 14:32 UTC (permalink / raw)
  To: davem; +Cc: netdev

A test that verifies that we can accept interrupts on all
the irq vectors of the device.
Interrupts are checked using the NOP command.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/mlx4/eq.c       |   44 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mlx4/device.h |    1 +
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index bffb799..81619a1 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -698,3 +698,47 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev)
 
 	kfree(priv->eq_table.uar_map);
 }
+
+/* A test that verifies that we can accept interrupts on all
+ * the irq vectors of the device.
+ * Interrupts are checked using the NOP command.
+ */
+int mlx4_test_interrupts(struct mlx4_dev *dev)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int i;
+	int err;
+
+	err = mlx4_NOP(dev);
+	/* When not in MSI_X, there is only one irq to check */
+	if (!(dev->flags & MLX4_FLAG_MSI_X))
+		return err;
+
+	/* A loop over all completion vectors, for each vector we will check
+	 * whether it works by mapping command completions to that vector
+	 * and performing a NOP command
+	 */
+	for (i = 0; !err && (i < dev->caps.num_comp_vectors); ++i) {
+		/* Temporary use polling for command completions */
+		mlx4_cmd_use_polling(dev);
+
+		/* Map the new eq to handle all asyncronous events */
+		err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0,
+				  priv->eq_table.eq[i].eqn);
+		if (err) {
+			mlx4_warn(dev, "Failed mapping eq for interrupt test\n");
+			mlx4_cmd_use_events(dev);
+			break;
+		}
+
+		/* Go back to using events */
+		mlx4_cmd_use_events(dev);
+		err = mlx4_NOP(dev);
+	}
+
+	/* Return to default */
+	mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0,
+		    priv->eq_table.eq[dev->caps.num_comp_vectors].eqn);
+	return err;
+}
+EXPORT_SYMBOL(mlx4_test_interrupts);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ce7cc6c..e27a68d 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -480,4 +480,5 @@ void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
 int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
 
+int mlx4_test_interrupts(struct mlx4_dev *dev);
 #endif /* MLX4_DEVICE_H */
-- 
1.6.1.3


^ permalink raw reply related

* [PATCH] TI DaVinci EMAC: Minor macro related updates
From: Chaithrika U S @ 2009-10-01 20:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, davinci-linux-open-source, Chaithrika U S

Use BIT for macro definitions wherever possible, remove
unused and redundant macros.

Signed-off-by: Chaithrika U S <chaithrika@ti.com>
---
Applies to Linus' kernel tree

 drivers/net/davinci_emac.c |   26 +++++++++++---------------
 1 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/net/davinci_emac.c b/drivers/net/davinci_emac.c
index 65a2d0b..a421ec0 100644
--- a/drivers/net/davinci_emac.c
+++ b/drivers/net/davinci_emac.c
@@ -164,16 +164,14 @@ static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
 # define EMAC_MBP_MCASTCHAN(ch)		((ch) & 0x7)
 
 /* EMAC mac_control register */
-#define EMAC_MACCONTROL_TXPTYPE		(0x200)
-#define EMAC_MACCONTROL_TXPACEEN	(0x40)
-#define EMAC_MACCONTROL_MIIEN		(0x20)
-#define EMAC_MACCONTROL_GIGABITEN	(0x80)
-#define EMAC_MACCONTROL_GIGABITEN_SHIFT (7)
-#define EMAC_MACCONTROL_FULLDUPLEXEN	(0x1)
+#define EMAC_MACCONTROL_TXPTYPE		BIT(9)
+#define EMAC_MACCONTROL_TXPACEEN	BIT(6)
+#define EMAC_MACCONTROL_GMIIEN		BIT(5)
+#define EMAC_MACCONTROL_GIGABITEN	BIT(7)
+#define EMAC_MACCONTROL_FULLDUPLEXEN	BIT(0)
 #define EMAC_MACCONTROL_RMIISPEED_MASK	BIT(15)
 
 /* GIGABIT MODE related bits */
-#define EMAC_DM646X_MACCONTORL_GMIIEN	BIT(5)
 #define EMAC_DM646X_MACCONTORL_GIG	BIT(7)
 #define EMAC_DM646X_MACCONTORL_GIGFORCE	BIT(17)
 
@@ -192,10 +190,10 @@ static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
 #define EMAC_RX_BUFFER_OFFSET_MASK	(0xFFFF)
 
 /* MAC_IN_VECTOR (0x180) register bit fields */
-#define EMAC_DM644X_MAC_IN_VECTOR_HOST_INT	      (0x20000)
-#define EMAC_DM644X_MAC_IN_VECTOR_STATPEND_INT	      (0x10000)
-#define EMAC_DM644X_MAC_IN_VECTOR_RX_INT_VEC	      (0x0100)
-#define EMAC_DM644X_MAC_IN_VECTOR_TX_INT_VEC	      (0x01)
+#define EMAC_DM644X_MAC_IN_VECTOR_HOST_INT	BIT(17)
+#define EMAC_DM644X_MAC_IN_VECTOR_STATPEND_INT	BIT(16)
+#define EMAC_DM644X_MAC_IN_VECTOR_RX_INT_VEC	BIT(8)
+#define EMAC_DM644X_MAC_IN_VECTOR_TX_INT_VEC	BIT(0)
 
 /** NOTE:: For DM646x the IN_VECTOR has changed */
 #define EMAC_DM646X_MAC_IN_VECTOR_RX_INT_VEC	BIT(EMAC_DEF_RX_CH)
@@ -203,7 +201,6 @@ static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
 #define EMAC_DM646X_MAC_IN_VECTOR_HOST_INT	BIT(26)
 #define EMAC_DM646X_MAC_IN_VECTOR_STATPEND_INT	BIT(27)
 
-
 /* CPPI bit positions */
 #define EMAC_CPPI_SOP_BIT		BIT(31)
 #define EMAC_CPPI_EOP_BIT		BIT(30)
@@ -747,8 +744,7 @@ static void emac_update_phystatus(struct emac_priv *priv)
 
 	if (priv->speed == SPEED_1000 && (priv->version == EMAC_VERSION_2)) {
 		mac_control = emac_read(EMAC_MACCONTROL);
-		mac_control |= (EMAC_DM646X_MACCONTORL_GMIIEN |
-				EMAC_DM646X_MACCONTORL_GIG |
+		mac_control |= (EMAC_DM646X_MACCONTORL_GIG |
 				EMAC_DM646X_MACCONTORL_GIGFORCE);
 	} else {
 		/* Clear the GIG bit and GIGFORCE bit */
@@ -2105,7 +2101,7 @@ static int emac_hw_enable(struct emac_priv *priv)
 
 	/* Enable MII */
 	val = emac_read(EMAC_MACCONTROL);
-	val |= (EMAC_MACCONTROL_MIIEN);
+	val |= (EMAC_MACCONTROL_GMIIEN);
 	emac_write(EMAC_MACCONTROL, val);
 
 	/* Enable NAPI and interrupts */
-- 
1.5.6


^ permalink raw reply related

* [PATCH] skge: use unique IRQ name
From: Michal Schmidt @ 2009-10-01 10:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20090922092826.5302225c@s6510>

Most network drivers request their IRQ when the interface is activated.
skge does it in ->probe() instead, because it can work with two-port
cards where the two net_devices use the same IRQ. This works fine most
of the time, except in some situations when the interface gets renamed.
Consider this example:

1. modprobe skge
   The card is detected as eth0 and requests IRQ 17. Directory
   /proc/irq/17/eth0 is created.
2. There is an udev rule which says this interface should be called
   eth1, so udev renames eth0 -> eth1.
3. modprobe 8139too
   The Realtek card is detected as eth0. It will be using IRQ 17 too.
4. ip link set eth0 up
   Now 8139too requests IRQ 17.

The result is:
WARNING: at fs/proc/generic.c:590 proc_register ...
proc_dir_entry '17/eth0' already registered
...
And "ls /proc/irq/17" shows two subdirectories, both called eth0.

Fix it by using a unique name for skge's IRQ, based on the PCI address.
The naming from the example then looks like this:
$ grep skge /proc/interrupts
 17:        169   IO-APIC-fasteoi   skge@0000:00:0a.0, eth0

irqbalance daemon will have to be taught to recognize "skge@" as an
Ethernet interrupt. This will be a one-liner addition in classify.c. I
will send a patch to irqbalance if this change is accepted.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Index: kernel/drivers/net/skge.c
===================================================================
--- kernel.orig/drivers/net/skge.c
+++ kernel/drivers/net/skge.c
@@ -3895,6 +3895,7 @@ static int __devinit skge_probe(struct p
 	struct net_device *dev, *dev1;
 	struct skge_hw *hw;
 	int err, using_dac = 0;
+	size_t irq_name_len;

 	err = pci_enable_device(pdev);
 	if (err) {
@@ -3935,11 +3936,13 @@ static int __devinit skge_probe(struct p
 #endif

 	err = -ENOMEM;
-	hw = kzalloc(sizeof(*hw), GFP_KERNEL);
+	irq_name_len = strlen(DRV_NAME) + strlen(dev_name(&pdev->dev)) + 2;
+	hw = kzalloc(sizeof(*hw) + irq_name_len, GFP_KERNEL);
 	if (!hw) {
 		dev_err(&pdev->dev, "cannot allocate hardware struct\n");
 		goto err_out_free_regions;
 	}
+	sprintf(hw->irq_name, DRV_NAME "@%s", dev_name(&pdev->dev));

 	hw->pdev = pdev;
 	spin_lock_init(&hw->hw_lock);
@@ -3974,7 +3977,7 @@ static int __devinit skge_probe(struct p
 		goto err_out_free_netdev;
 	}

-	err = request_irq(pdev->irq, skge_intr, IRQF_SHARED, dev->name, hw);
+	err = request_irq(pdev->irq, skge_intr, IRQF_SHARED, hw->irq_name, hw);
 	if (err) {
 		dev_err(&pdev->dev, "%s: cannot assign irq %d\n",
 		       dev->name, pdev->irq);
Index: kernel/drivers/net/skge.h
===================================================================
--- kernel.orig/drivers/net/skge.h
+++ kernel/drivers/net/skge.h
@@ -2423,6 +2423,8 @@ struct skge_hw {
 	u16		     phy_addr;
 	spinlock_t	     phy_lock;
 	struct tasklet_struct phy_task;
+
+	char		     irq_name[0]; /* name for /proc/interrupts */
 };

 enum pause_control {

^ permalink raw reply

* Re: [PATCH] pktgen: Fix delay handling
From: Eric Dumazet @ 2009-10-01 10:04 UTC (permalink / raw)
  To: Stephen Hemminger, David S. Miller
  Cc: Jesper Dangaard Brouer, Robert Olsson, netdev
In-Reply-To: <4AC47AB6.9000501@gmail.com>

Eric Dumazet a écrit :
> After last pktgen changes, delay handling is wrong.
> 
> pktgen actually sends packets at full line speed.
> 
> Fix is to update pkt_dev->next_tx even if spin() returns early,
> so that next spin() calls have a chance to see a positive delay.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Oh well, I hit this bug on linux-2.6 git tree, but I did the patch on net-next-2.6

But it appears net/core/pktgen.c is different on net-next-2.6

Stephen, David, I am a bit lost here, something went wrong in a merge process ?

In any case, here is the patch against Linus tree, where bug is present.

Thanks

[PATCH] pktgen: Fix delay handling

After last pktgen changes, delay handling is wrong.

pktgen actually sends packets at full line speed.

Fix is to update pkt_dev->next_tx even if spin() returns early,
so that next spin() calls have a chance to see a positive delay.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 4d11c28..b694552 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2105,15 +2105,17 @@ static void pktgen_setup_inject(struct pktgen_dev *pkt_dev)
 static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 {
 	ktime_t start_time, end_time;
-	s32 remaining;
+	s64 remaining;
 	struct hrtimer_sleeper t;

 	hrtimer_init_on_stack(&t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	hrtimer_set_expires(&t.timer, spin_until);

 	remaining = ktime_to_us(hrtimer_expires_remaining(&t.timer));
-	if (remaining <= 0)
+	if (remaining <= 0) {
+		pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
 		return;
+	}

 	start_time = ktime_now();
 	if (remaining < 100)

^ permalink raw reply related

* [PATCH] gigaset/CAPI: accept any number type/plan
From: Tilman Schmidt @ 2009-10-01  9:53 UTC (permalink / raw)
  To: Karsten Keil, Karsten Keil
  Cc: Hansjoerg Lipp, davem, i4ldeveloper, netdev, linux-kernel

Be more liberal in accepting CAPI CONNECT_REQ message parameters
Called Party Number and Calling Party Number:
* Accept Numbering plan "ISDN/Telephony" as supported.
* Ignore unsupported values for Type of number, Numbering plan,
  Presentation indicator and Screening indicator with a warning
  instead of rejecting the entire request.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
---
A second small fix to the new Gigaset CAPI interface resulting from
testing with more applications. Please tell me if you'd prefer me
to reissue "[PATCH 12/12] gigaset: add Kernel CAPI interface" with
both fixes folded in.

 drivers/isdn/gigaset/capi.c |   29 ++++++++++++++++-------------
 1 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/isdn/gigaset/capi.c b/drivers/isdn/gigaset/capi.c
index 8afff37..c276a92 100644
--- a/drivers/isdn/gigaset/capi.c
+++ b/drivers/isdn/gigaset/capi.c
@@ -1236,12 +1236,14 @@ static void do_connect_req(struct gigaset_capi_ctr *iif,
 		goto error;
 	}
 	l = *pp++;
-	/* check number type/numbering plan byte */
-	if (*pp != 0x80) {
+	/* check type of number/numbering plan byte */
+	switch (*pp) {
+	case 0x80:	/* unknown type / unknown numbering plan */
+	case 0x81:	/* unknown type / ISDN/Telephony numbering plan */
+		break;
+	default:	/* others: warn about potential misinterpretation */
 		dev_notice(cs->dev, "%s: %s type/plan 0x%02x unsupported\n",
 			   "CONNECT_REQ", "Called party number", *pp);
-		info = CapiIllMessageParmCoding;
-		goto error;
 	}
 	pp++;
 	l--;
@@ -1266,26 +1268,28 @@ static void do_connect_req(struct gigaset_capi_ctr *iif,
 	if (pp != NULL && *pp > 0) {
 		l = *pp++;
 
-		/* check number type/numbering plan byte */
-		if (*pp) {
-			/* ToDo: allow for Ext=1? */
+		/* check type of number/numbering plan byte */
+		/* ToDo: handle Ext=1? */
+		switch (*pp) {
+		case 0x00:	/* unknown type / unknown numbering plan */
+		case 0x01:	/* unknown type / ISDN/Telephony num. plan */
+			break;
+		default:
 			dev_notice(cs->dev,
 				   "%s: %s type/plan 0x%02x unsupported\n",
 				   "CONNECT_REQ", "Calling party number", *pp);
-			info = CapiIllMessageParmCoding;
-			goto error;
 		}
 		pp++;
 		l--;
 
-		/* check presentation/screening indicator */
+		/* check presentation indicator */
 		if (!l) {
 			dev_notice(cs->dev, "%s: %s IE truncated\n",
 				   "CONNECT_REQ", "Calling party number");
 			info = CapiIllMessageParmCoding;
 			goto error;
 		}
-		switch (*pp) {
+		switch (*pp & 0xfc) { /* ignore Screening indicator */
 		case 0x80:	/* Presentation allowed */
 			s = "^SCLIP=1\r";
 			break;
@@ -1297,8 +1301,7 @@ static void do_connect_req(struct gigaset_capi_ctr *iif,
 				   "CONNECT_REQ",
 				   "Presentation/Screening indicator",
 				   *pp);
-			info = CapiIllMessageParmCoding;
-			goto error;
+			s = "^SCLIP=1\r";
 		}
 		commands[AT_CLIP] = kstrdup(s, GFP_KERNEL);
 		if (!commands[AT_CLIP])
-- 
1.6.2.1.214.ge986c

^ permalink raw reply related

* Re: [RFCv4 PATCH 2/2] net: Allow protocols to provide an unlocked_recvmsg socket method
From: Nir Tzachar @ 2009-10-01  9:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo,
	Linux Networking Development Mailing List
  Cc: Ziv Ayalon
In-Reply-To: <20090923043813.GA6464@ghostprotocols.net>

Hi Arnaldo

I have repeated the tests using net-next on top of linus' git tree (I
hope I got it right..) and the patches you sent me. Things did not get
better, and in most cases were even worse; the recvmmsg parts
distinctly showed better throughput, but the latency has more than
doubled.

The simplest test of using a batch size of 1 results with recvmmsg's
latency over 1000 micro, while regular recvmsg is around 450 micro.
(note that to use 1 packet there is a small bug in the reg_recv which
needs to be fixed. Namely, change ret = -1 to ret = 0). On the
previous system config -- part 0001 of the patch, on top of 2.6.31 --
the latency of a single packet batch is 370 micro.

So, there seems to be a regression with the kernel tree I am using, or
with part 0002 of the path. I'll try running the net-next with only
part 1 of the patch and report.

Cheers.

^ permalink raw reply

* [PATCH] pktgen: Fix delay handling
From: Eric Dumazet @ 2009-10-01  9:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jesper Dangaard Brouer, Robert Olsson, netdev, David S. Miller
In-Reply-To: <20090930172532.2c2d1d42@s6510>

After last pktgen changes, delay handling is wrong.

pktgen actually sends packets at full line speed.

Fix is to update pkt_dev->next_tx even if spin() returns early,
so that next spin() calls have a chance to see a positive delay.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/pktgen.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 0bcecbf..1a0682e 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2106,15 +2106,17 @@ static void pktgen_setup_inject(struct pktgen_dev *pkt_dev)
 static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 {
 	ktime_t start;
-	s32 remaining;
+	s64 remaining;
 	struct hrtimer_sleeper t;

 	hrtimer_init_on_stack(&t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	hrtimer_set_expires(&t.timer, spin_until);

 	remaining = ktime_to_us(hrtimer_expires_remaining(&t.timer));
-	if (remaining <= 0)
+	if (remaining <= 0) {
+		pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
 		return;
+	}

 	start = ktime_now();
 	if (remaining < 100)

^ permalink raw reply related

* [PATCHv2] IPv4 TCP fails to send window scale option when window scale is zero
From: Gilad Ben-Yossef @ 2009-10-01  9:39 UTC (permalink / raw)
  To: Netdev; +Cc: Ori Finkalman, Ilpo Järvinen, Eric Dumazet

From: Ori Finkelman <ori@comsleep.com>

Acknowledge TCP window scale support by inserting the proper option in 
SYN/ACK and SYN headers
even if our window scale is zero.

This fixes the following observed behavior:

1. Client sends a SYN with TCP window scaling option and non zero window 
scale value to a Linux box.
2. Linux box notes large receive window from client.
3. Linux decides on a zero value of window scale for its part.
4. Due to compare against requested window scale size option, Linux does 
not to send windows scale TCP option header on SYN/ACK at all.

With the following result:

Client box thinks TCP window scaling is not supported, since SYN/ACK had 
no TCP window scale option,
while Linux thinks that TCP window scaling is supported (and scale might 
be non zero), since SYN had TCP window scale option and we have a 
mismatched idea between the client and server regarding window sizes.

Probably it also fixes up the following bug (not observed in practice):

1. Linux box opens TCP connection to some server.
2. Linux decides on zero value of window scale.
3. Due to compare against computed window scale size option, Linux does 
not to set windows scale TCP option header on SYN.  

With the expected result that the server OS does not use window scale 
option due to not receiving such an option in the SYN headers, leading 
to suboptimal performance.

---

Original bug reported and patch written by Ori Finkelman from Comsleep 
Ltd. I've fixed the SYN header case based on feedback from Eric Dumazet 
and Ilpo Jarvinen, as part of trying to get the patch mainlined.

The SYN/ACK behavior was observed with a Windows box as the client and 
latest Debian kernel but for the best
of my understanding this can happen with latest kernel versions and 
other client OS (probably also Linux) as well.

The SYN/ACK scenario was tested on a x86 system. The SYN sceanrio was 
only compile tested.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5200aab..fcd278a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -361,6 +361,7 @@ static inline int tcp_urg_mode(const struct tcp_sock 
*tp)
 #define OPTION_SACK_ADVERTISE  (1 << 0)
 #define OPTION_TS              (1 << 1)
 #define OPTION_MD5             (1 << 2)
+#define OPTION_WSCALE          (1 << 3)

 struct tcp_out_options {
        u8 options;             /* bit field of OPTION_* */
@@ -427,7 +428,7 @@ static void tcp_options_write(__be32 *ptr, struct 
tcp_sock *tp,
                               TCPOLEN_SACK_PERM);
        }

-       if (unlikely(opts->ws)) {
+       if (unlikely(OPTION_WSCALE & opts->options)) {
                *ptr++ = htonl((TCPOPT_NOP << 24) |
                               (TCPOPT_WINDOW << 16) |
                               (TCPOLEN_WINDOW << 8) |
@@ -494,8 +495,8 @@ static unsigned tcp_syn_options(struct sock *sk, 
struct sk_buff *skb,
        }
        if (likely(sysctl_tcp_window_scaling)) {
                opts->ws = tp->rx_opt.rcv_wscale;
-               if (likely(opts->ws))
-                       size += TCPOLEN_WSCALE_ALIGNED;
+               opts->options |= OPTION_WSCALE;
+               size += TCPOLEN_WSCALE_ALIGNED;
        }
        if (likely(sysctl_tcp_sack)) {
                opts->options |= OPTION_SACK_ADVERTISE;
@@ -537,8 +538,8 @@ static unsigned tcp_synack_options(struct sock *sk,

        if (likely(ireq->wscale_ok)) {
                opts->ws = ireq->rcv_wscale;
-               if (likely(opts->ws))
-                       size += TCPOLEN_WSCALE_ALIGNED;
+               opts->options |= OPTION_WSCALE;
+               size += TCPOLEN_WSCALE_ALIGNED;
        }
        if (likely(doing_ts)) {
                opts->options |= OPTION_TS;

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Now the world has gone to bed
	 Darkness won't engulf my head
	 I can see by infra-red
	 How I hate the night."

^ permalink raw reply related

* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
From: Gilad Ben-Yossef @ 2009-10-01  9:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ilpo Järvinen, Netdev, Ori Finkalman
In-Reply-To: <4AC357D3.7080606@gmail.com>

Eric Dumazet wrote:

>
>>>
>>> Your version slows down the tcp_options_write() function, once per tx packet.
>>>       
>> Are you serious that anding would cost that much? :-/
>>     
>
> Not really :)
>   
LOL I was trying very hard to understand why you thought this was such 
an issue. My head was flying into all sorts of weird directions like 
cache effects and the like... ;-)

<snip>
> Yes, wscale 0 is RFC valid, but are we sure some equipment wont play funny games
> with such value ? At least sending "wscale 1-14" must be working...
>   
Well, there at least used to be routers that would actually zeroed the 
WS value in transit while leaving the option set, but this is another 
issue of course.

Anyway, I know Vista at least does set the window scale TCP option by 
default. One assumes they occasionally send a zero value scale. Not that 
Vista is such a good benchmark to compare Linux to but at least I tend 
to believe the issue would have popped up if it is common enough.

I can craft a patch to introduce a route table option to set TCP window 
scale minimum and maximum sizes, similar to window size route option, if 
you there is a need for that. Personally, I think it is just overkill.
>
> My quick&dirty patch was only for discussion, I have no strong opinion on it,
> only that was on one place to patch instead of two/three/four I dont know yet.
>
> So please Gilad & Ori send us a new patch :)
>
>   
Revised patch follows in next email.

Gilad

   

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Now the world has gone to bed
	 Darkness won't engulf my head
	 I can see by infra-red
	 How I hate the night."


^ permalink raw reply

* Re: tg3: Badness at kernel/mutex.c:207
From: Felix Radensky @ 2009-10-01  9:36 UTC (permalink / raw)
  To: Matt Carlson; +Cc: netdev@vger.kernel.org
In-Reply-To: <20090928205128.GA12652@xw6200.broadcom.net>

Hi, Matt

Matt Carlson wrote:
> On Sat, Sep 26, 2009 at 02:20:57PM -0700, Felix Radensky wrote:
>   
>> Hi,
>>
>> I'm running linux-2.6.31 on a custom MPC8536 based board with BCM57760 chip.
>> Both tg3 driver, and Broadcom PHY driver are modules.
>>
>> Each time I run ifconfig eth2 up, I get the following error message:
>>
>> Badness at kernel/mutex.c:207
>> NIP: c025132c LR: c0251314 CTR: c0251334
>> REGS: efbedbd0 TRAP: 0700   Not tainted  (2.6.31)
>> MSR: 00029000 <EE,ME,CE>  CR: 24020422  XER: 00000000
>> TASK = efacce10[1080] 'ifconfig' THREAD: efbec000
>> GPR00: 00000000 efbedc80 efacce10 00000001 00007020 00000002 00000000 
>> 00000200
>> GPR08: 00029000 c0350000 c0330000 00000001 24020424 10057d94 000002a0 
>> 1000d82c
>> GPR16: 1000d81c 1000d814 10010000 10050000 ef897a0c efbede18 ffff8914 
>> ef897a00
>> GPR24: 00008000 c034b480 efbec000 efb0122c c0350000 efacce10 ef82d2c0 
>> efb01228
>> NIP [c025132c] __mutex_lock_slowpath+0x1f0/0x1f8
>> LR [c0251314] __mutex_lock_slowpath+0x1d8/0x1f8
>> Call Trace:
>> [efbedcd0] [c025134c] mutex_lock+0x18/0x34
>> [efbedcf0] [f534a228] tg3_chip_reset+0x7cc/0x9f8 [tg3]
>> [efbedd20] [f534a8f0] tg3_reset_hw+0x58/0x2360 [tg3]
>> [efbedd70] [f5351dd4] tg3_open+0x610/0x910 [tg3]
>> [efbeddb0] [c01e1c6c] dev_open+0x100/0x138
>> [efbeddd0] [c01dff20] dev_change_flags+0x80/0x1ac
>> [efbeddf0] [c02232cc] devinet_ioctl+0x648/0x824
>> [efbede60] [c0223de4] inet_ioctl+0xcc/0xf8
>> [efbede70] [c01cdf44] sock_ioctl+0x60/0x300
>> [efbede90] [c008a35c] vfs_ioctl+0x34/0x8c
>> [efbedea0] [c008a580] do_vfs_ioctl+0x88/0x724
>> [efbedf10] [c008ac5c] sys_ioctl+0x40/0x74
>> [efbedf40] [c000f814] ret_from_syscall+0x0/0x3c
>> Instruction dump:
>> 0fe00000 4bfffe80 801a000c 5409016f 4182fe60 4bf0f6d9 2f830000 41befe54
>> 3d20c035 8009c2c0 2f800000 40befe44 <0fe00000> 4bfffe3c 9421ffe0 7c0802a6
>>
>> Does it indicate a real problem, or something that can be ignored ?
>>
>> Additional information from kernel log:
>>
>> tg3.c:v3.99 (April 20, 2009)
>> tg3 0002:05:00.0: enabling bus mastering
>> tg3 0002:05:00.0: PME# disabled
>> tg3 mdio bus: probed
>> eth2: Tigon3 [partno(BCM57760) rev 57780001] (PCI Express) MAC address 
>> 00:10:18:00:00:00
>> eth2: attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=500:01)
>> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
>> eth2: dma_rwctrl[76180000] dma_mask[64-bit]
>> tg3 0002:05:00.0: PME# disabled
>>     
>
> Yes, this is a real problem.  The driver is taking the MDIO bus lock
> while holding the device's own spinlock.  I think I may have a
> workaround.  Let me test it and get back to you.
>   

Did you have a chance to look into it ?

Thanks.

Felix.


^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-10-01  9:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Gregory Haskins, Ira W. Snyder, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <4AC46989.7030502@redhat.com>

On Thu, Oct 01, 2009 at 10:34:17AM +0200, Avi Kivity wrote:
>> Second, I do not use ioeventfd anymore because it has too many problems
>> with the surrounding technology.  However, that is a topic for a
>> different thread.
>>    
>
> Please post your issues.  I see ioeventfd/irqfd as critical kvm interfaces.

I second that. AFAIK ioeventfd/irqfd got exposed to userspace in 2.6.32-rc1,
if there are issues we better nail them before 2.6.32 is out.
And yes, please start a different thread.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox