LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
From: Darren Hart @ 2010-09-01 20:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Stephen Rothwell, Michael Neuling, Gautham R Shenoy,
	Josh Triplett, linuxppc-dev, Will Schmidt, Paul Mackerras,
	Ankita Garg, Thomas Gleixner
In-Reply-To: <1283371161.2356.53.camel@gandalf.stny.rr.com>

On 09/01/2010 12:59 PM, Steven Rostedt wrote:
> On Wed, 2010-09-01 at 11:47 -0700, Darren Hart wrote:
> 
>> from tip/rt/2.6.33 causes the preempt_count() to change across the cede
>> call.  This patch appears to prevents the proxy preempt_count assignment
>> from happening. This non-local-cpu assignment to 0 would cause an
>> underrun of preempt_count() if the local-cpu had disabled preemption
>> prior to the assignment and then later tried to enable it. This appears
>> to be the case with the stack of __trace_hcall* calls preceeding the
>> return from extended_cede_processor() in the latency format trace-cmd
>> report:
>>
>>   <idle>-0       1d....   201.252737: function:             .cpu_die
> 
> Note, the above 1d.... is a series of values. The first being the CPU,
> the next if interrupts are disabled, the next if the NEED_RESCHED flag
> is set, the next is softirqs enabled or disabled, next the
> preempt_count, and finally the lockdepth count.
> 
> Here we only care about the preempt_count, which is zero when '.' and a
> number if it is something else. It is the second to last field in that
> list.
> 
> 
>>   <idle>-0       1d....   201.252738: function:                .pseries_mach_cpu_die
>>   <idle>-0       1d....   201.252740: function:                   .idle_task_exit
>>   <idle>-0       1d....   201.252741: function:                      .switch_slb
>>   <idle>-0       1d....   201.252742: function:                   .xics_teardown_cpu
>>   <idle>-0       1d....   201.252743: function:                      .xics_set_cpu_priority
>>   <idle>-0       1d....   201.252744: function:             .__trace_hcall_entry
>>   <idle>-0       1d..1.   201.252745: function:                .probe_hcall_entry
> 
>                        ^
>                 preempt_count set to 1
> 
>>   <idle>-0       1d..1.   201.252746: function:             .__trace_hcall_exit
>>   <idle>-0       1d..2.   201.252747: function:                .probe_hcall_exit
>>   <idle>-0       1d....   201.252748: function:             .__trace_hcall_entry
>>   <idle>-0       1d..1.   201.252748: function:                .probe_hcall_entry
>>   <idle>-0       1d..1.   201.252750: function:             .__trace_hcall_exit
>>   <idle>-0       1d..2.   201.252751: function:                .probe_hcall_exit
>>   <idle>-0       1d....   201.252752: function:             .__trace_hcall_entry
>>   <idle>-0       1d..1.   201.252753: function:                .probe_hcall_entry
>                    ^   ^
>                   CPU  preempt_count
> 
> Entering the function probe_hcall_entry() the preempt_count is 1 (see
> below). But probe_hcall_entry does:
> 
> 	h = &get_cpu_var(hcall_stats)[opcode / 4];
> 
> Without doing the put (which it does in probe_hcall_exit())
> 
> So exiting the probe_hcall_entry() the prempt_count is 2.
> The trace_hcall_entry() will do a preempt_enable() making it leave as 1.
> 
> 
>>   offon.sh-3684  6.....   201.466488: bprint:               .smp_pSeries_kick_cpu : resetting pcnt to 0 for cpu 1
> 
> This is CPU 6, changing the preempt count from 1 to 0.
> 
>>
>> preempt_count() is reset from 1 to 0 by smp_startup_cpu() without the
>> QCSS_NOT_STOPPED check from the patch above.
>>
>>   <idle>-0       1d....   201.466503: function:             .__trace_hcall_exit
> 
> Note: __trace_hcall_exit() and __trace_hcall_entry() basically do:
> 
>  preempt_disable();
>  call probe();
>  preempt_enable();
> 
> 
>>   <idle>-0       1d..1.   201.466505: function:                .probe_hcall_exit
> 
> The preempt_count of 1 entering the probe_hcall_exit() is because of the
> preempt_disable() shown above. It should have been entered as a 2.
> 
> But then it does:
> 
> 
> 	put_cpu_var(hcall_stats);
> 
> making preempt_count 0.
> 
> But the preempt_enable() in the trace_hcall_exit() causes this to be -1.
> 
> 
>>   <idle>-0       1d.Hff.   201.466507: bprint:               .pseries_mach_cpu_die : after cede: ffffffff
>>
>> With the preempt_count() being one less than it should be, the final
>> preempt_enable() in the trace_hcall path drops preempt_count to
>> 0xffffffff, which of course is an illegal value and leads to a crash.
> 
> I'm confused to how this works in mainline?

Turns out it didn't. 2.6.33.5 with CONFIG_PREEMPT=y sees this exact same
behavior. The following, part of the 2.6.33.6 stable release, prevents
this from happening:

aef40e87d866355ffd279ab21021de733242d0d5
powerpc/pseries: Only call start-cpu when a CPU is stopped

--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -82,6 +82,12 @@ static inline int __devinit smp_startup_cpu(unsigned
int lcpu)

        pcpu = get_hard_smp_processor_id(lcpu);

+       /* Check to see if the CPU out of FW already for kexec */
+       if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
+               cpu_set(lcpu, of_spin_map);
+               return 1;
+       }
+
        /* Fixup atomic count: it exited inside IRQ handler. */
        task_thread_info(paca[lcpu].__current)->preempt_count   = 0;

The question is now, Is this the right fix? If so, perhaps we can update
the comment to be a bit more clear and not refer solely to kexec.

Michael Neuling, can you offer any thoughts here? We hit this EVERY
TIME, which makes me wonder if the offline/online path could do this
without calling smp_startup_cpu at all.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply

* Memory allocation modifications in ibm_newemac driver
From: Jonathan Haws @ 2010-09-01 20:41 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org

[-- Attachment #1: Type: text/plain, Size: 5555 bytes --]

I am not sure which list this is best addressed to, so I hope I find someone who can help me out.

I am new to kernel development and am working on a network driver.  The AppliedMicro 405EX chip provides dual EMAC ports and the driver for those is the ibm_newemac driver.  I discovered an issue last year with said driver that I am trying to fix now.

The problem is this - when I enable a large MTU (larger than a single page) and run data through the EMAC and also am reading and writing data to a disk, memory becomes so fragmented that allocating a new SKB fails.

I have modified the driver to only ever deal with single pages, since the problem was not that there was a memory leak or that I was just plain out of memory - just that I had a whole ton of free single pages, but no buffers of order 2.  However, now I am getting the following BUG in the kernel:


PING 172.31.22.1 (172.31.22.1): 56 data bytes
64 bytes from 172.31.22.1: seq=0 ttl=128 time=1.147 ms
64 bytes from 172.31.22.1: seq=1 ttl=128 time=0.466 ms
64 bytes from 172.31.22.1: seq=2 ttl=128 time=0.448 ms
64 bytes from 172.31.22.1: seq=3 ttl=128 time=0.444 ms
64 bytes from 172.31.22.1: seq=4 ttl=128 time=0.443 ms
64 bytes from 172.31.22.1: seq=5 ttl=128 time=0.452 ms
64 bytes from 172.31.22.1: seq=6 ttl=128 time=0.445 ms
64 bytes from 172.31.22.1: seq=7 ttl=128 time=0.447 ms
64 bytes from 172.31.22.1: seq=8 ttl=128 time=0.443 ms
64 bytes from 172.31.22.1: seq=9 ttl=128 time=0.452 ms
64 bytes from 172.31.22.1: seq=10 ttl=128 time=0.444 ms
64 bytes from 172.31.22.1: seq=11 ttl=128 time=0.444 ms
64 bytes from 172.31.22.1: seq=12 ttl=128 time=0.448 ms
64 bytes from 172.31.22.1: seq=13 ttl=128 time=0.454 ms
64 bytes from 172.31.22.1: seq=14 ttl=128 time=0.445 ms
------------[ cut here ]------------
kernel BUG at mm/slub.c:2925!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT PowerPC 40x Platform
Modules linked in:
NIP: c0094024 LR: c01c86e4 CTR: 00000000
REGS: cc9679c0 TRAP: 0700   Not tainted  (2.6.31-rc5-walle-01329-geea77be-dirty)
MSR: 00029030 <EE,ME,CE,IR,DR>  CR: 22004024  XER: 2000005f
TASK = ce442940[291] 'ping' THREAD: cc966000
GPR00: 00080000 cc967a70 ce442940 cf0b0000 0000000e 00000000 cf011000 cf0b0024
GPR08: 003a97b1 00000001 c0348000 c0330000 22004022 100dc4d4 c0330000 c02c49fc
GPR16: c02c4a1c c02c4a3c c02c4a54 c02c38cc c02c49b4 cc966000 cf011310 00000001
GPR24: 0000000f 00000000 cca1c220 c01c86e4 cf0b0000 c0335ba4 cca1c180 c0529600
NIP [c0094024] kfree+0xdc/0xec
LR [c01c86e4] skb_release_data+0x78/0xd4
Call Trace:
[cc967a70] [cc966000] 0xcc966000 (unreliable)
[cc967a90] [c01c86e4] skb_release_data+0x78/0xd4
[cc967aa0] [c01c8334] __kfree_skb+0x18/0xe8
[cc967ab0] [c01d1968] netif_receive_skb+0x368/0x378
[cc967ae0] [c01a7394] emac_poll_rx+0x150/0x7b0
[cc967b40] [c01a2abc] mal_poll+0xe4/0x29c
[cc967b80] [c01d4a50] net_rx_action+0x9c/0x1b4
[cc967bb0] [c003b3c0] __do_softirq+0xc4/0x148
[cc967bf0] [c0004d18] do_softirq+0x78/0x80
[cc967c00] [c003b67c] local_bh_enable+0xc0/0xd8
[cc967c10] [c01d5ed8] dev_queue_xmit+0xfc/0x3e4
[cc967c30] [c01f2cd8] ip_finish_output+0xfc/0x31c
[cc967c50] [c01f2f7c] ip_local_out+0x34/0x48
[cc967c60] [c01f3228] ip_push_pending_frames+0x298/0x3d8
[cc967c80] [c0210980] raw_sendmsg+0x6e8/0x74c
[cc967d20] [c0219f44] inet_sendmsg+0x4c/0x78
[cc967d40] [c01c1684] sock_sendmsg+0xac/0xe4
[cc967e30] [c01c19fc] sys_sendto+0xbc/0xf0
[cc967f00] [c01c2450] sys_socketcall+0x140/0x1f8
[cc967f40] [c000f434] ret_from_syscall+0x0/0x3c
Instruction dump:
8009000c 813e0080 5400103a 7d3c012e 939e0080 4bffffc8 83ff000c 4bffff78
801f0000 7009c000 7d200026 55291ffe <0f090000> 7fe3fb78 4bfdf905 4bffffa4
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
[cc967810] [c0006ef0] show_stack+0x44/0x16c (unreliable)
[cc967850] [c0034d78] panic+0x94/0x170
[cc9678a0] [c000cdc0] die+0x17c/0x190
[cc9678c0] [c000d08c] _exception+0x174/0x1c4
[cc9679b0] [c000fa30] ret_from_except_full+0x0/0x4c
[cc967a70] [cc966000] 0xcc966000
[cc967a90] [c01c86e4] skb_release_data+0x78/0xd4
[cc967aa0] [c01c8334] __kfree_skb+0x18/0xe8
[cc967ab0] [c01d1968] netif_receive_skb+0x368/0x378
[cc967ae0] [c01a7394] emac_poll_rx+0x150/0x7b0
[cc967b40] [c01a2abc] mal_poll+0xe4/0x29c
[cc967b80] [c01d4a50] net_rx_action+0x9c/0x1b4
[cc967bb0] [c003b3c0] __do_softirq+0xc4/0x148
[cc967bf0] [c0004d18] do_softirq+0x78/0x80
[cc967c00] [c003b67c] local_bh_enable+0xc0/0xd8
[cc967c10] [c01d5ed8] dev_queue_xmit+0xfc/0x3e4
[cc967c30] [c01f2cd8] ip_finish_output+0xfc/0x31c
[cc967c50] [c01f2f7c] ip_local_out+0x34/0x48
[cc967c60] [c01f3228] ip_push_pending_frames+0x298/0x3d8
[cc967c80] [c0210980] raw_sendmsg+0x6e8/0x74c
[cc967d20] [c0219f44] inet_sendmsg+0x4c/0x78
[cc967d40] [c01c1684] sock_sendmsg+0xac/0xe4
[cc967e30] [c01c19fc] sys_sendto+0xbc/0xf0
[cc967f00] [c01c2450] sys_socketcall+0x140/0x1f8
[cc967f40] [c000f434] ret_from_syscall+0x0/0x3c
Rebooting in 180 seconds..


I cannot see why this is occurring.  I have made sure that I have the pages allocated when the driver starts up.  I outlined the driver according to how the e1000e driver accomplished the same task.

Has anyone seen behavior such as this before?  Can anyone point me in the direction I need to go to get this to work?  Any help is appreciated.  I have attached the modified source if anyone wants to take a look.  All the modifications are preceded by a comment such as /* JRH - some comment */.  Just grep JRH and you can see all my changes.

Thanks,

Jonathan


[-- Attachment #2: core.c --]
[-- Type: text/plain, Size: 85590 bytes --]

/*
 * drivers/net/ibm_newemac/core.c
 *
 * Driver for PowerPC 4xx on-chip ethernet controller.
 *
 * Copyright 2007 Benjamin Herrenschmidt, IBM Corp.
 *                <benh@kernel.crashing.org>
 *
 * Based on the arch/ppc version of the driver:
 *
 * Copyright (c) 2004, 2005 Zultys Technologies.
 * Eugene Surovegin <eugene.surovegin@zultys.com> or <ebs@ebshome.net>
 *
 * Based on original work by
 * 	Matt Porter <mporter@kernel.crashing.org>
 *	(c) 2003 Benjamin Herrenschmidt <benh@kernel.crashing.org>
 *      Armin Kuster <akuster@mvista.com>
 * 	Johnnie Peters <jpeters@mvista.com>
 *
 * This program is free software; you can redistribute  it and/or modify it
 * under  the terms of  the GNU General  Public License as published by the
 * Free Software Foundation;  either version 2 of the  License, or (at your
 * option) any later version.
 *
 */

#include <linux/sched.h>
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/delay.h>
#include <linux/types.h>
#include <linux/pci.h>
#include <linux/etherdevice.h>
#include <linux/skbuff.h>
#include <linux/crc32.h>
#include <linux/ethtool.h>
#include <linux/mii.h>
#include <linux/bitops.h>
#include <linux/workqueue.h>
#include <linux/of.h>
#include <linux/highmem.h>

#include <asm/processor.h>
#include <asm/io.h>
#include <asm/dma.h>
#include <asm/uaccess.h>
#include <asm/dcr.h>
#include <asm/dcr-regs.h>

#include "core.h"

/*
 * Lack of dma_unmap_???? calls is intentional.
 *
 * API-correct usage requires additional support state information to be
 * maintained for every RX and TX buffer descriptor (BD). Unfortunately, due to
 * EMAC design (e.g. TX buffer passed from network stack can be split into
 * several BDs, dma_map_single/dma_map_page can be used to map particular BD),
 * maintaining such information will add additional overhead.
 * Current DMA API implementation for 4xx processors only ensures cache coherency
 * and dma_unmap_???? routines are empty and are likely to stay this way.
 * I decided to omit dma_unmap_??? calls because I don't want to add additional
 * complexity just for the sake of following some abstract API, when it doesn't
 * add any real benefit to the driver. I understand that this decision maybe
 * controversial, but I really tried to make code API-correct and efficient
 * at the same time and didn't come up with code I liked :(.                --ebs
 */

#define DRV_NAME        "emac"
#define DRV_VERSION     "3.54"
#define DRV_DESC        "PPC 4xx OCP EMAC driver"

MODULE_DESCRIPTION(DRV_DESC);
MODULE_AUTHOR
("Eugene Surovegin <eugene.surovegin@zultys.com> or <ebs@ebshome.net>");
MODULE_LICENSE("GPL");

/*
 * PPC64 doesn't (yet) have a cacheable_memcpy
 */
#ifdef CONFIG_PPC64
#define cacheable_memcpy(d,s,n) memcpy((d),(s),(n))
#endif

/* minimum number of free TX descriptors required to wake up TX process */
#define EMAC_TX_WAKEUP_THRESH		(NUM_TX_BUFF / 4)

/* If packet size is less than this number, we allocate small skb and copy packet
 * contents into it instead of just sending original big skb up
 */
#define EMAC_RX_COPY_THRESH		CONFIG_IBM_NEW_EMAC_RX_COPY_THRESHOLD

/* Since multiple EMACs share MDIO lines in various ways, we need
 * to avoid re-using the same PHY ID in cases where the arch didn't
 * setup precise phy_map entries
 *
 * XXX This is something that needs to be reworked as we can have multiple
 * EMAC "sets" (multiple ASICs containing several EMACs) though we can
 * probably require in that case to have explicit PHY IDs in the device-tree
 */
static u32 busy_phy_map;
static DEFINE_MUTEX(emac_phy_map_lock);

/* This is the wait queue used to wait on any event related to probe, that
 * is discovery of MALs, other EMACs, ZMII/RGMIIs, etc...
 */
static DECLARE_WAIT_QUEUE_HEAD( emac_probe_wait);

/* Having stable interface names is a doomed idea. However, it would be nice
 * if we didn't have completely random interface names at boot too :-) It's
 * just a matter of making everybody's life easier. Since we are doing
 * threaded probing, it's a bit harder though. The base idea here is that
 * we make up a list of all emacs in the device-tree before we register the
 * driver. Every emac will then wait for the previous one in the list to
 * initialize before itself. We should also keep that list ordered by
 * cell_index.
 * That list is only 4 entries long, meaning that additional EMACs don't
 * get ordering guarantees unless EMAC_BOOT_LIST_SIZE is increased.
 */

#define EMAC_BOOT_LIST_SIZE	4
static struct device_node *emac_boot_list[EMAC_BOOT_LIST_SIZE];

/* How long should I wait for dependent devices ? */
#define EMAC_PROBE_DEP_TIMEOUT	(HZ * 5)

/* I don't want to litter system log with timeout errors
 * when we have brain-damaged PHY.
 */
static inline void emac_report_timeout_error(struct emac_instance *dev,
		const char *error)
{
if (emac_has_feature(dev, EMAC_FTR_440GX_PHY_CLK_FIX |
				EMAC_FTR_460EX_PHY_CLK_FIX |
				EMAC_FTR_440EP_PHY_CLK_FIX))
DBG(dev, "%s" NL, error);
else if (net_ratelimit())
printk(KERN_ERR "%s: %s\n", dev->ofdev->node->full_name, error);
}

/* EMAC PHY clock workaround:
 * 440EP/440GR has more sane SDR0_MFR register implementation than 440GX,
 * which allows controlling each EMAC clock
 */
static inline void emac_rx_clk_tx(struct emac_instance *dev)
{
#ifdef CONFIG_PPC_DCR_NATIVE
	if (emac_has_feature(dev, EMAC_FTR_440EP_PHY_CLK_FIX))
	dcri_clrset(SDR0, SDR0_MFR,
			0, SDR0_MFR_ECS >> dev->cell_index);
#endif
}

static inline void emac_rx_clk_default(struct emac_instance *dev)
{
#ifdef CONFIG_PPC_DCR_NATIVE
	if (emac_has_feature(dev, EMAC_FTR_440EP_PHY_CLK_FIX))
	dcri_clrset(SDR0, SDR0_MFR,
			SDR0_MFR_ECS >> dev->cell_index, 0);
#endif
}

/* PHY polling intervals */
#define PHY_POLL_LINK_ON	HZ
#define PHY_POLL_LINK_OFF	(HZ / 5)

/* Graceful stop timeouts in us.
 * We should allow up to 1 frame time (full-duplex, ignoring collisions)
 */
#define STOP_TIMEOUT_10		1230
#define STOP_TIMEOUT_100	124
#define STOP_TIMEOUT_1000	13
#define STOP_TIMEOUT_1000_JUMBO	73

static unsigned char default_mcast_addr[] =
{ 0x01, 0x80, 0xC2, 0x00, 0x00, 0x01 };

/* Please, keep in sync with struct ibm_emac_stats/ibm_emac_error_stats */
static const char emac_stats_keys[EMAC_ETHTOOL_STATS_COUNT][ETH_GSTRING_LEN] =
{ "rx_packets", "rx_bytes", "tx_packets", "tx_bytes", "rx_packets_csum",
		"tx_packets_csum", "tx_undo", "rx_dropped_stack", "rx_dropped_oom",
		"rx_dropped_error", "rx_dropped_resize", "rx_dropped_mtu",
		"rx_stopped", "rx_bd_errors", "rx_bd_overrun", "rx_bd_bad_packet",
		"rx_bd_runt_packet", "rx_bd_short_event", "rx_bd_alignment_error",
		"rx_bd_bad_fcs", "rx_bd_packet_too_long", "rx_bd_out_of_range",
		"rx_bd_in_range", "rx_parity", "rx_fifo_overrun", "rx_overrun",
		"rx_bad_packet", "rx_runt_packet", "rx_short_event",
		"rx_alignment_error", "rx_bad_fcs", "rx_packet_too_long",
		"rx_out_of_range", "rx_in_range", "tx_dropped", "tx_bd_errors",
		"tx_bd_bad_fcs", "tx_bd_carrier_loss", "tx_bd_excessive_deferral",
		"tx_bd_excessive_collisions", "tx_bd_late_collision",
		"tx_bd_multple_collisions", "tx_bd_single_collision", "tx_bd_underrun",
		"tx_bd_sqe", "tx_parity", "tx_underrun", "tx_sqe", "tx_errors" };

static irqreturn_t emac_irq(int irq, void *dev_instance);
static void emac_clean_tx_ring(struct emac_instance *dev);
static void __emac_set_multicast_list(struct emac_instance *dev);

static inline int emac_phy_supports_gige(int phy_mode)
{
	return phy_mode == PHY_MODE_GMII || phy_mode == PHY_MODE_RGMII || phy_mode
			== PHY_MODE_SGMII || phy_mode == PHY_MODE_TBI || phy_mode
			== PHY_MODE_RTBI;
}

static inline int emac_phy_gpcs(int phy_mode)
{
	return phy_mode == PHY_MODE_SGMII || phy_mode == PHY_MODE_TBI || phy_mode
			== PHY_MODE_RTBI;
}

static inline void emac_tx_enable(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 r;

	DBG(dev, "tx_enable" NL);

	r = in_be32(&p->mr0);
	if (!(r & EMAC_MR0_TXE))
		out_be32(&p->mr0, r | EMAC_MR0_TXE);
}

static void emac_tx_disable(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 r;

	DBG(dev, "tx_disable" NL);

	r = in_be32(&p->mr0);
	if (r & EMAC_MR0_TXE)
	{
		int n = dev->stop_timeout;
		out_be32(&p->mr0, r & ~EMAC_MR0_TXE);
		while (!(in_be32(&p->mr0) & EMAC_MR0_TXI) && n)
		{
			udelay(1);
			--n;
		}
		if (unlikely(!n))
			emac_report_timeout_error(dev, "TX disable timeout");
	}
}

static void emac_rx_enable(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 r;

	if (unlikely(test_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags)))
		goto out;

	DBG(dev, "rx_enable" NL);

	r = in_be32(&p->mr0);
	if (!(r & EMAC_MR0_RXE))
	{
		if (unlikely(!(r & EMAC_MR0_RXI)))
		{
			/* Wait if previous async disable is still in progress */
			int n = dev->stop_timeout;
			while (!(r = in_be32(&p->mr0) & EMAC_MR0_RXI) && n)
			{
				udelay(1);
				--n;
			}
			if (unlikely(!n))
				emac_report_timeout_error(dev, "RX disable timeout");
		}
		out_be32(&p->mr0, r | EMAC_MR0_RXE);
	}
	out: ;
}

static void emac_rx_disable(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 r;

	DBG(dev, "rx_disable" NL);

	r = in_be32(&p->mr0);
	if (r & EMAC_MR0_RXE)
	{
		int n = dev->stop_timeout;
		out_be32(&p->mr0, r & ~EMAC_MR0_RXE);
		while (!(in_be32(&p->mr0) & EMAC_MR0_RXI) && n)
		{
			udelay(1);
			--n;
		}
		if (unlikely(!n))
			emac_report_timeout_error(dev, "RX disable timeout");
	}
}

static inline void emac_netif_stop(struct emac_instance *dev)
{
	netif_tx_lock_bh(dev->ndev);
	netif_addr_lock(dev->ndev);
	dev->no_mcast = 1;
	netif_addr_unlock(dev->ndev);
	netif_tx_unlock_bh(dev->ndev);
	dev->ndev->trans_start = jiffies; /* prevent tx timeout */
	mal_poll_disable(dev->mal, &dev->commac);
	netif_tx_disable(dev->ndev);
}

static inline void emac_netif_start(struct emac_instance *dev)
{
	netif_tx_lock_bh(dev->ndev);
	netif_addr_lock(dev->ndev);
	dev->no_mcast = 0;
	if (dev->mcast_pending && netif_running(dev->ndev))
		__emac_set_multicast_list(dev);
	netif_addr_unlock(dev->ndev);
	netif_tx_unlock_bh(dev->ndev);

	netif_wake_queue(dev->ndev);

	/* NOTE: unconditional netif_wake_queue is only appropriate
	 * so long as all callers are assured to have free tx slots
	 * (taken from tg3... though the case where that is wrong is
	 *  not terribly harmful)
	 */
	mal_poll_enable(dev->mal, &dev->commac);
}

static inline void emac_rx_disable_async(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 r;

	DBG(dev, "rx_disable_async" NL);

	r = in_be32(&p->mr0);
	if (r & EMAC_MR0_RXE)
		out_be32(&p->mr0, r & ~EMAC_MR0_RXE);
}

static int emac_reset(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	int n = 20;

	DBG(dev, "reset" NL);

	if (!dev->reset_failed)
	{
		/* 40x erratum suggests stopping RX channel before reset,
		 * we stop TX as well
		 */
		emac_rx_disable(dev);
		emac_tx_disable(dev);
	}

#ifdef CONFIG_PPC_DCR_NATIVE
	/* Enable internal clock source */
	if (emac_has_feature(dev, EMAC_FTR_460EX_PHY_CLK_FIX))
	dcri_clrset(SDR0, SDR0_ETH_CFG,
			0, SDR0_ETH_CFG_ECS << dev->cell_index);
#endif

	out_be32(&p->mr0, EMAC_MR0_SRST);
	while ((in_be32(&p->mr0) & EMAC_MR0_SRST) && n)
		--n;

#ifdef CONFIG_PPC_DCR_NATIVE
	/* Enable external clock source */
	if (emac_has_feature(dev, EMAC_FTR_460EX_PHY_CLK_FIX))
	dcri_clrset(SDR0, SDR0_ETH_CFG,
			SDR0_ETH_CFG_ECS << dev->cell_index, 0);
#endif

	if (n)
	{
		dev->reset_failed = 0;
		return 0;
	}
	else
	{
		emac_report_timeout_error(dev, "reset timeout");
		dev->reset_failed = 1;
		return -ETIMEDOUT;
	}
}

static void emac_hash_mc(struct emac_instance *dev)
{
	const int regs = EMAC_XAHT_REGS(dev);
	u32 *gaht_base = emac_gaht_base(dev);
	u32 gaht_temp[regs];
	struct dev_mc_list *dmi;
	int i;

	DBG(dev, "hash_mc %d" NL, dev->ndev->mc_count);

	memset(gaht_temp, 0, sizeof(gaht_temp));

	for (dmi = dev->ndev->mc_list; dmi; dmi = dmi->next)
	{
		int slot, reg, mask;
		DBG2(dev, "mc %pM" NL, dmi->dmi_addr);

		slot = EMAC_XAHT_CRC_TO_SLOT(dev, ether_crc(ETH_ALEN, dmi->dmi_addr));
		reg = EMAC_XAHT_SLOT_TO_REG(dev, slot);
		mask = EMAC_XAHT_SLOT_TO_MASK(dev, slot);

		gaht_temp[reg] |= mask;
	}

	for (i = 0; i < regs; i++)
		out_be32(gaht_base + i, gaht_temp[i]);
}

static inline u32 emac_iff2rmr(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);
	u32 r;

	r = EMAC_RMR_SP | EMAC_RMR_SFCS | EMAC_RMR_IAE | EMAC_RMR_BAE;

	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		r |= EMAC4_RMR_BASE;
	else
		r |= EMAC_RMR_BASE;

	if (ndev->flags & IFF_PROMISC)
		r |= EMAC_RMR_PME;
	else if (ndev->flags & IFF_ALLMULTI || (ndev->mc_count
			> EMAC_XAHT_SLOTS(dev)))
		r |= EMAC_RMR_PMME;
	else if (ndev->mc_count > 0)
		r |= EMAC_RMR_MAE;

	return r;
}

static u32 __emac_calc_base_mr1(struct emac_instance *dev, int tx_size,
		int rx_size)
{
	u32 ret = EMAC_MR1_VLE | EMAC_MR1_IST | EMAC_MR1_TR0_MULT;

	DBG2(dev, "__emac_calc_base_mr1" NL);

	switch (tx_size)
	{
	case 2048:
		ret |= EMAC_MR1_TFS_2K;
		break;
	default:
printk	(KERN_WARNING "%s: Unknown Rx FIFO size %d\n",
			dev->ndev->name, tx_size);
}

switch(rx_size)
{
	case 16384:
	ret |= EMAC_MR1_RFS_16K;
	break;
	case 4096:
	ret |= EMAC_MR1_RFS_4K;
	break;
	default:
	printk(KERN_WARNING "%s: Unknown Rx FIFO size %d\n",
			dev->ndev->name, rx_size);
}

return ret;
}

static u32 __emac4_calc_base_mr1(struct emac_instance *dev, int tx_size,
		int rx_size)
{
	u32 ret = EMAC_MR1_VLE | EMAC_MR1_IST | EMAC4_MR1_TR
			| EMAC4_MR1_OBCI(dev->opb_bus_freq / 1000000);

	DBG2(dev, "__emac4_calc_base_mr1" NL);

	switch (tx_size)
	{
	case 4096:
		ret |= EMAC4_MR1_TFS_4K;
		break;
	case 2048:
		ret |= EMAC4_MR1_TFS_2K;
		break;
	default:
printk	(KERN_WARNING "%s: Unknown Rx FIFO size %d\n",
			dev->ndev->name, tx_size);
}

switch(rx_size)
{
	case 16384:
	ret |= EMAC4_MR1_RFS_16K;
	break;
	case 4096:
	ret |= EMAC4_MR1_RFS_4K;
	break;
	case 2048:
	ret |= EMAC4_MR1_RFS_2K;
	break;
	default:
	printk(KERN_WARNING "%s: Unknown Rx FIFO size %d\n",
			dev->ndev->name, rx_size);
}

return ret;
}

static u32 emac_calc_base_mr1(struct emac_instance *dev, int tx_size,
		int rx_size)
{
	return emac_has_feature(dev, EMAC_FTR_EMAC4) ? __emac4_calc_base_mr1(dev,
			tx_size, rx_size) : __emac_calc_base_mr1(dev, tx_size, rx_size);
}

static inline u32 emac_calc_trtr(struct emac_instance *dev, unsigned int size)
{
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		return ((size >> 6) - 1) << EMAC_TRTR_SHIFT_EMAC4;
	else
		return ((size >> 6) - 1) << EMAC_TRTR_SHIFT;
}

static inline u32 emac_calc_rwmr(struct emac_instance *dev, unsigned int low,
		unsigned int high)
{
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		return (low << 22) | ((high & 0x3ff) << 6);
	else
		return (low << 23) | ((high & 0x1ff) << 7);
}

static int emac_configure(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	struct net_device *ndev = dev->ndev;
	int tx_size, rx_size, link = netif_carrier_ok(dev->ndev);
	u32 r, mr1 = 0;

	DBG(dev, "configure" NL);

	if (!link)
	{
		out_be32(&p->mr1, in_be32(&p->mr1) | EMAC_MR1_FDE | EMAC_MR1_ILE);
		udelay(100);
	}
	else if (emac_reset(dev) < 0)
		return -ETIMEDOUT;

	if (emac_has_feature(dev, EMAC_FTR_HAS_TAH))
		tah_reset(dev->tah_dev);

	DBG(dev, " link = %d duplex = %d, pause = %d, asym_pause = %d\n",
			link, dev->phy.duplex, dev->phy.pause, dev->phy.asym_pause);

	/* Default fifo sizes */
	tx_size = dev->tx_fifo_size;
	rx_size = dev->rx_fifo_size;

	/* No link, force loopback */
	if (!link)
		mr1 = EMAC_MR1_FDE | EMAC_MR1_ILE;

	/* Check for full duplex */
	else if (dev->phy.duplex == DUPLEX_FULL)
		mr1 |= EMAC_MR1_FDE | EMAC_MR1_MWSW_001;

	/* Adjust fifo sizes, mr1 and timeouts based on link speed */
	dev->stop_timeout = STOP_TIMEOUT_10;
	switch (dev->phy.speed)
	{
	case SPEED_1000:
		if (emac_phy_gpcs(dev->phy.mode))
		{
			mr1 |= EMAC_MR1_MF_1000GPCS | EMAC_MR1_MF_IPPA(
					(dev->phy.gpcs_address != 0xffffffff) ?
					dev->phy.gpcs_address : dev->phy.address);

			/* Put some arbitrary OUI, Manuf & Rev IDs so we can
			 * identify this GPCS PHY later.
			 */
			out_be32(&p->u1.emac4.ipcr, 0xdeadbeef);
		}
		else
			mr1 |= EMAC_MR1_MF_1000;

		/* Extended fifo sizes */
		tx_size = dev->tx_fifo_size_gige;
		rx_size = dev->rx_fifo_size_gige;

		if (dev->ndev->mtu > ETH_DATA_LEN)
		{
			if (emac_has_feature(dev, EMAC_FTR_EMAC4))
				mr1 |= EMAC4_MR1_JPSM;
			else
				mr1 |= EMAC_MR1_JPSM;
			dev->stop_timeout = STOP_TIMEOUT_1000_JUMBO;
		}
		else
			dev->stop_timeout = STOP_TIMEOUT_1000;
		break;
	case SPEED_100:
		mr1 |= EMAC_MR1_MF_100;
		dev->stop_timeout = STOP_TIMEOUT_100;
		break;
	default: /* make gcc happy */
		break;
	}

	if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
		rgmii_set_speed(dev->rgmii_dev, dev->rgmii_port,
				dev->phy.speed);
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
		zmii_set_speed(dev->zmii_dev, dev->zmii_port, dev->phy.speed);

	/* on 40x erratum forces us to NOT use integrated flow control,
	 * let's hope it works on 44x ;)
	 */
	if (!emac_has_feature(dev, EMAC_FTR_NO_FLOW_CONTROL_40x) && dev->phy.duplex
			== DUPLEX_FULL)
	{
		if (dev->phy.pause)
			mr1 |= EMAC_MR1_EIFC | EMAC_MR1_APP;
		else if (dev->phy.asym_pause)
			mr1 |= EMAC_MR1_APP;
	}

	/* Add base settings & fifo sizes & program MR1 */
	mr1 |= emac_calc_base_mr1(dev, tx_size, rx_size);
	out_be32(&p->mr1, mr1);

	/* Set individual MAC address */
	out_be32(&p->iahr, (ndev->dev_addr[0] << 8) | ndev->dev_addr[1]);
	out_be32(&p->ialr, (ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16)
			| (ndev->dev_addr[4] << 8) | ndev->dev_addr[5]);

	/* VLAN Tag Protocol ID */
	out_be32(&p->vtpid, 0x8100);

	/* Receive mode register */
	r = emac_iff2rmr(ndev);
	if (r & EMAC_RMR_MAE)
		emac_hash_mc(dev);
	out_be32(&p->rmr, r);

	/* FIFOs thresholds */
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		r = EMAC4_TMR1((dev->mal_burst_size / dev->fifo_entry_size) + 1,
				tx_size / 2 / dev->fifo_entry_size);
	else
		r = EMAC_TMR1((dev->mal_burst_size / dev->fifo_entry_size) + 1,
				tx_size / 2 / dev->fifo_entry_size);
	out_be32(&p->tmr1, r);
	out_be32(&p->trtr, emac_calc_trtr(dev, tx_size / 2));

	/* PAUSE frame is sent when RX FIFO reaches its high-water mark,
	 there should be still enough space in FIFO to allow the our link
	 partner time to process this frame and also time to send PAUSE
	 frame itself.

	 Here is the worst case scenario for the RX FIFO "headroom"
	 (from "The Switch Book") (100Mbps, without preamble, inter-frame gap):

	 1) One maximum-length frame on TX                    1522 bytes
	 2) One PAUSE frame time                                64 bytes
	 3) PAUSE frame decode time allowance                   64 bytes
	 4) One maximum-length frame on RX                    1522 bytes
	 5) Round-trip propagation delay of the link (100Mb)    15 bytes
	 ----------
	 3187 bytes

	 I chose to set high-water mark to RX_FIFO_SIZE / 4 (1024 bytes)
	 low-water mark  to RX_FIFO_SIZE / 8 (512 bytes)
	 */
	r = emac_calc_rwmr(dev, rx_size / 8 / dev->fifo_entry_size, rx_size / 4
			/ dev->fifo_entry_size);
	out_be32(&p->rwmr, r);

	/* Set PAUSE timer to the maximum */
	out_be32(&p->ptr, 0xffff);

	/* IRQ sources */
	r = EMAC_ISR_OVR | EMAC_ISR_BP | EMAC_ISR_SE | EMAC_ISR_ALE | EMAC_ISR_BFCS
			| EMAC_ISR_PTLE | EMAC_ISR_ORE | EMAC_ISR_IRE | EMAC_ISR_TE;
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		r |= EMAC4_ISR_TXPE | EMAC4_ISR_RXPE /* | EMAC4_ISR_TXUE |
		 EMAC4_ISR_RXOE | */;
	out_be32(&p->iser, r);

	/* We need to take GPCS PHY out of isolate mode after EMAC reset */
	if (emac_phy_gpcs(dev->phy.mode))
	{
		if (dev->phy.gpcs_address != 0xffffffff)
			emac_mii_reset_gpcs(&dev->phy);
		else
			emac_mii_reset_phy(&dev->phy);
	}

	return 0;
}

static void emac_reinitialize(struct emac_instance *dev)
{
	DBG(dev, "reinitialize" NL);

	emac_netif_stop(dev);
	if (!emac_configure(dev))
	{
		emac_tx_enable(dev);
		emac_rx_enable(dev);
	}
	emac_netif_start(dev);
}

static void emac_full_tx_reset(struct emac_instance *dev)
{
	DBG(dev, "full_tx_reset" NL);

	emac_tx_disable(dev);
	mal_disable_tx_channel(dev->mal, dev->mal_tx_chan);
	emac_clean_tx_ring(dev);
	dev->tx_cnt = dev->tx_slot = dev->ack_slot = 0;

	emac_configure(dev);

	mal_enable_tx_channel(dev->mal, dev->mal_tx_chan);
	emac_tx_enable(dev);
	emac_rx_enable(dev);
}

static void emac_reset_work(struct work_struct *work)
{
struct	emac_instance *dev = container_of(work, struct emac_instance, reset_work);

	DBG(dev, "reset_work" NL);

	mutex_lock(&dev->link_lock);
	if (dev->opened)
	{
		emac_netif_stop(dev);
		emac_full_tx_reset(dev);
		emac_netif_start(dev);
	}
	mutex_unlock(&dev->link_lock);
}

static void emac_tx_timeout(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);

	DBG(dev, "tx_timeout" NL);

	schedule_work(&dev->reset_work);
}

static inline int emac_phy_done(struct emac_instance *dev, u32 stacr)
{
	int done = !!(stacr & EMAC_STACR_OC);

	if (emac_has_feature(dev, EMAC_FTR_STACR_OC_INVERT))
		done = !done;

	return done;
}
;

static int __emac_mdio_read(struct emac_instance *dev, u8 id, u8 reg)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 r = 0;
	int n, err = -ETIMEDOUT;

	mutex_lock(&dev->mdio_lock);

	DBG2(dev, "mdio_read(%02x,%02x)" NL, id, reg);

	/* Enable proper MDIO port */
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
		zmii_get_mdio(dev->zmii_dev, dev->zmii_port);
	if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
		rgmii_get_mdio(dev->rgmii_dev, dev->rgmii_port);

	/* Wait for management interface to become idle */
	n = 20;
	while (!emac_phy_done(dev, in_be32(&p->stacr)))
	{
		udelay(1);
		if (!--n)
		{
			DBG2(dev, " -> timeout wait idle\n");
			goto bail;
		}
	}

	/* Issue read command */
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		r = EMAC4_STACR_BASE(dev->opb_bus_freq);
	else
		r = EMAC_STACR_BASE(dev->opb_bus_freq);
	if (emac_has_feature(dev, EMAC_FTR_STACR_OC_INVERT))
		r |= EMAC_STACR_OC;
	if (emac_has_feature(dev, EMAC_FTR_HAS_NEW_STACR))
		r |= EMACX_STACR_STAC_READ;
	else
		r |= EMAC_STACR_STAC_READ;
	r |= (reg & EMAC_STACR_PRA_MASK) | ((id & EMAC_STACR_PCDA_MASK)
			<< EMAC_STACR_PCDA_SHIFT);
	out_be32(&p->stacr, r);

	/* Wait for read to complete */
	n = 200;
	while (!emac_phy_done(dev, (r = in_be32(&p->stacr))))
	{
		udelay(1);
		if (!--n)
		{
			DBG2(dev, " -> timeout wait complete\n");
			goto bail;
		}
	}

	if (unlikely(r & EMAC_STACR_PHYE))
	{
		DBG(dev, "mdio_read(%02x, %02x) failed" NL, id, reg);
		err = -EREMOTEIO;
		goto bail;
	}

	r = ((r >> EMAC_STACR_PHYD_SHIFT) & EMAC_STACR_PHYD_MASK);

	DBG2(dev, "mdio_read -> %04x" NL, r);
	err = 0;
	bail: if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
		rgmii_put_mdio(dev->rgmii_dev, dev->rgmii_port);
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
		zmii_put_mdio(dev->zmii_dev, dev->zmii_port);
	mutex_unlock(&dev->mdio_lock);

	return err == 0 ? r : err;
}

static void __emac_mdio_write(struct emac_instance *dev, u8 id, u8 reg, u16 val)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 r = 0;
	int n, err = -ETIMEDOUT;

	mutex_lock(&dev->mdio_lock);

	DBG2(dev, "mdio_write(%02x,%02x,%04x)" NL, id, reg, val);

	/* Enable proper MDIO port */
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
		zmii_get_mdio(dev->zmii_dev, dev->zmii_port);
	if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
		rgmii_get_mdio(dev->rgmii_dev, dev->rgmii_port);

	/* Wait for management interface to be idle */
	n = 20;
	while (!emac_phy_done(dev, in_be32(&p->stacr)))
	{
		udelay(1);
		if (!--n)
		{
			DBG2(dev, " -> timeout wait idle\n");
			goto bail;
		}
	}

	/* Issue write command */
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		r = EMAC4_STACR_BASE(dev->opb_bus_freq);
	else
		r = EMAC_STACR_BASE(dev->opb_bus_freq);
	if (emac_has_feature(dev, EMAC_FTR_STACR_OC_INVERT))
		r |= EMAC_STACR_OC;
	if (emac_has_feature(dev, EMAC_FTR_HAS_NEW_STACR))
		r |= EMACX_STACR_STAC_WRITE;
	else
		r |= EMAC_STACR_STAC_WRITE;
	r |= (reg & EMAC_STACR_PRA_MASK) | ((id & EMAC_STACR_PCDA_MASK)
			<< EMAC_STACR_PCDA_SHIFT) | (val << EMAC_STACR_PHYD_SHIFT);
	out_be32(&p->stacr, r);

	/* Wait for write to complete */
	n = 200;
	while (!emac_phy_done(dev, in_be32(&p->stacr)))
	{
		udelay(1);
		if (!--n)
		{
			DBG2(dev, " -> timeout wait complete\n");
			goto bail;
		}
	}
	err = 0;
	bail: if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
		rgmii_put_mdio(dev->rgmii_dev, dev->rgmii_port);
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
		zmii_put_mdio(dev->zmii_dev, dev->zmii_port);
	mutex_unlock(&dev->mdio_lock);
}

static int emac_mdio_read(struct net_device *ndev, int id, int reg)
{
	struct emac_instance *dev = netdev_priv(ndev);
	int res;

	res
			= __emac_mdio_read((dev->mdio_instance && dev->phy.gpcs_address
					!= id) ? dev->mdio_instance : dev, (u8) id, (u8) reg);
	return res;
}

static void emac_mdio_write(struct net_device *ndev, int id, int reg, int val)
{
	struct emac_instance *dev = netdev_priv(ndev);

	__emac_mdio_write(
			(dev->mdio_instance && dev->phy.gpcs_address != id) ? dev->mdio_instance
					: dev, (u8) id, (u8) reg, (u16) val);
}

/* Tx lock BH */
static void __emac_set_multicast_list(struct emac_instance *dev)
{
	struct emac_regs __iomem *p = dev->emacp;
	u32 rmr = emac_iff2rmr(dev->ndev);

	DBG(dev, "__multicast %08x" NL, rmr);

	/* I decided to relax register access rules here to avoid
	 * full EMAC reset.
	 *
	 * There is a real problem with EMAC4 core if we use MWSW_001 bit
	 * in MR1 register and do a full EMAC reset.
	 * One TX BD status update is delayed and, after EMAC reset, it
	 * never happens, resulting in TX hung (it'll be recovered by TX
	 * timeout handler eventually, but this is just gross).
	 * So we either have to do full TX reset or try to cheat here :)
	 *
	 * The only required change is to RX mode register, so I *think* all
	 * we need is just to stop RX channel. This seems to work on all
	 * tested SoCs.                                                --ebs
	 *
	 * If we need the full reset, we might just trigger the workqueue
	 * and do it async... a bit nasty but should work --BenH
	 */
	dev->mcast_pending = 0;
	emac_rx_disable(dev);
	if (rmr & EMAC_RMR_MAE)
		emac_hash_mc(dev);
	out_be32(&p->rmr, rmr);
	emac_rx_enable(dev);
}

/* Tx lock BH */
static void emac_set_multicast_list(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);

	DBG(dev, "multicast" NL);

	BUG_ON(!netif_running(dev->ndev));

	if (dev->no_mcast)
	{
		dev->mcast_pending = 1;
		return;
	}
	__emac_set_multicast_list(dev);
}

static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
{
	int rx_sync_size = emac_rx_sync_size(new_mtu);
	int rx_skb_size = emac_rx_skb_size(new_mtu);
	int i, ret = 0;

	mutex_lock(&dev->link_lock);
	emac_netif_stop(dev);
	emac_rx_disable(dev);
	mal_disable_rx_channel(dev->mal, dev->mal_rx_chan);

	if (dev->rx_sg_skb)
	{
		++dev->estats.rx_dropped_resize;
		dev_kfree_skb(dev->rx_sg_skb);
		dev->rx_sg_skb = NULL;
	}

	/* Make a first pass over RX ring and mark BDs ready, dropping
	 * non-processed packets on the way. We need this as a separate pass
	 * to simplify error recovery in the case of allocation failure later.
	 */
	for (i = 0; i < NUM_RX_BUFF; ++i)
	{
		if (dev->rx_desc[i].ctrl & MAL_RX_CTRL_FIRST)
			++dev->estats.rx_dropped_resize;

		dev->rx_desc[i].data_len = 0;
		dev->rx_desc[i].ctrl = MAL_RX_CTRL_EMPTY
				| (i == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
	}

	/* Reallocate RX ring only if bigger skb buffers are required */
	if (rx_skb_size <= dev->rx_skb_size)
		goto skip;

	/* Second pass, allocate new skbs */
	for (i = 0; i < NUM_RX_BUFF; ++i)
	{
		struct sk_buff *skb = alloc_skb(rx_skb_size, GFP_ATOMIC);
		if (!skb)
		{
			ret = -ENOMEM;
			goto oom;
		}

		BUG_ON(!dev->rx_skb[i]);
		dev_kfree_skb(dev->rx_skb[i]);

		skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
		/*JRH - map a page */
		/*dev->rx_desc[i].data_ptr = dma_map_single(&dev->ofdev->dev,
				skb->data - 2, rx_sync_size, DMA_FROM_DEVICE) + 2;*/
		dev->rx_desc[i].data_ptr = dma_map_page(&dev->ofdev->dev, dev->page[i],
				0, rx_sync_size, DMA_FROM_DEVICE);
		dev->rx_skb[i] = skb;
	}
	skip:
	/* Check if we need to change "Jumbo" bit in MR1 */
	if ((new_mtu > ETH_DATA_LEN) ^ (dev->ndev->mtu > ETH_DATA_LEN))
	{
		/* This is to prevent starting RX channel in emac_rx_enable() */
		set_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags);

		dev->ndev->mtu = new_mtu;
		emac_full_tx_reset(dev);
	}

	mal_set_rcbs(dev->mal, dev->mal_rx_chan, emac_rx_size(new_mtu));
	oom:
	/* Restart RX */
	clear_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags);
	dev->rx_slot = 0;
	mal_enable_rx_channel(dev->mal, dev->mal_rx_chan);
	emac_rx_enable(dev);
	emac_netif_start(dev);
	mutex_unlock(&dev->link_lock);

	return ret;
}

/* Process ctx, rtnl_lock semaphore */
static int emac_change_mtu(struct net_device *ndev, int new_mtu)
{
	struct emac_instance *dev = netdev_priv(ndev);
	int ret = 0;

	if (new_mtu < EMAC_MIN_MTU || new_mtu > dev->max_mtu)
		return -EINVAL;

	DBG(dev, "change_mtu(%d)" NL, new_mtu);

	if (netif_running(ndev))
	{
		/* Check if we really need to reinitalize RX ring */
		if (emac_rx_skb_size(ndev->mtu) != emac_rx_skb_size(new_mtu))
			ret = emac_resize_rx_ring(dev, new_mtu);
	}

	if (!ret)
	{
		ndev->mtu = new_mtu;
		dev->rx_skb_size = emac_rx_skb_size(new_mtu);
		dev->rx_sync_size = emac_rx_sync_size(new_mtu);
	}

	return ret;
}

static void emac_clean_tx_ring(struct emac_instance *dev)
{
	int i;

	for (i = 0; i < NUM_TX_BUFF; ++i)
	{
		if (dev->tx_skb[i])
		{
			dev_kfree_skb(dev->tx_skb[i]);
			dev->tx_skb[i] = NULL;
			if (dev->tx_desc[i].ctrl & MAL_TX_CTRL_READY)
				++dev->estats.tx_dropped;
		}
		dev->tx_desc[i].ctrl = 0;
		dev->tx_desc[i].data_ptr = 0;
	}
}

static void emac_clean_rx_ring(struct emac_instance *dev)
{
	int i;

	for (i = 0; i < NUM_RX_BUFF; ++i)
		if (dev->rx_skb[i])
		{
			dev->rx_desc[i].ctrl = 0;
			dev_kfree_skb(dev->rx_skb[i]);
			dev->rx_skb[i] = NULL;
			dev->rx_desc[i].data_ptr = 0;
		}

	if (dev->rx_sg_skb)
	{
		dev_kfree_skb(dev->rx_sg_skb);
		dev->rx_sg_skb = NULL;
	}
}

static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
		gfp_t flags)
{
	struct sk_buff *skb = alloc_skb(dev->rx_skb_size, flags);
	if (unlikely(!skb))
		return -ENOMEM;

	dev->rx_skb[slot] = skb;
	dev->rx_desc[slot].data_len = 0;

	skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
	/*JRH - map a page */
	/*dev->rx_desc[slot].data_ptr = dma_map_single(&dev->ofdev->dev,
			skb->data - 2, dev->rx_sync_size, DMA_FROM_DEVICE) + 2;*/
	dev->rx_desc[slot].data_ptr = dma_map_page(&dev->ofdev->dev,
			dev->page[slot], 0, dev->rx_sync_size, DMA_FROM_DEVICE) + 2;
	wmb();
	dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY
			| (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);

	return 0;
}

static void emac_print_link_status(struct emac_instance *dev)
{
if (netif_carrier_ok(dev->ndev))
printk(KERN_INFO "%s: link is up, %d %s%s\n",
		dev->ndev->name, dev->phy.speed,
		dev->phy.duplex == DUPLEX_FULL ? "FDX" : "HDX",
		dev->phy.pause ? ", pause enabled" :
		dev->phy.asym_pause ? ", asymmetric pause enabled" : "");
else
printk(KERN_INFO "%s: link is down\n", dev->ndev->name);
}

/* Process ctx, rtnl_lock semaphore */
static int emac_open(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);
	int err, i;

	DBG(dev, "open" NL);

	/* Setup error IRQ handler */
	err = request_irq(dev->emac_irq, emac_irq, 0, "EMAC", dev);
	if (err)
	{
		printk(KERN_ERR "%s: failed to request IRQ %d\n",
				ndev->name, dev->emac_irq);
		return err;
	}

	/* Allocate RX ring */
	for (i = 0; i < NUM_RX_BUFF; ++i)
	{
		if(!(dev->page[i] = alloc_page(GFP_ATOMIC)))
		{
			printk(KERN_ERR "%s: failed to allocate pages in RX ring\n",
					ndev->name);
		}
		if (emac_alloc_rx_skb(dev, i, GFP_KERNEL))
		{
			printk(KERN_ERR "%s: failed to allocate RX ring\n",
					ndev->name);
			goto oom;
		}
	}

	dev->tx_cnt = dev->tx_slot = dev->ack_slot = dev->rx_slot = 0;
	clear_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags);
	dev->rx_sg_skb = NULL;

	mutex_lock(&dev->link_lock);
	dev->opened = 1;

	/* Start PHY polling now.
	 */
	if (dev->phy.address >= 0)
	{
		int link_poll_interval;
		if (dev->phy.def->ops->poll_link(&dev->phy))
		{
			dev->phy.def->ops->read_link(&dev->phy);
			emac_rx_clk_default(dev);
			netif_carrier_on(dev->ndev);
			link_poll_interval = PHY_POLL_LINK_ON;
		}
		else
		{
			emac_rx_clk_tx(dev);
			netif_carrier_off(dev->ndev);
			link_poll_interval = PHY_POLL_LINK_OFF;
		}
		dev->link_polling = 1;
		wmb();
		schedule_delayed_work(&dev->link_work, link_poll_interval);
		emac_print_link_status(dev);
	}
	else
		netif_carrier_on(dev->ndev);

	/* Required for Pause packet support in EMAC */
	dev_mc_add(ndev, default_mcast_addr, sizeof(default_mcast_addr), 1);

	emac_configure(dev);
	mal_poll_add(dev->mal, &dev->commac);
	mal_enable_tx_channel(dev->mal, dev->mal_tx_chan);
	mal_set_rcbs(dev->mal, dev->mal_rx_chan, emac_rx_size(ndev->mtu));
	mal_enable_rx_channel(dev->mal, dev->mal_rx_chan);
	emac_tx_enable(dev);
	emac_rx_enable(dev);
	emac_netif_start(dev);

	mutex_unlock(&dev->link_lock);

	return 0;
	oom: emac_clean_rx_ring(dev);
	free_irq(dev->emac_irq, dev);

	return -ENOMEM;
}

/* BHs disabled */
#if 0
static int emac_link_differs(struct emac_instance *dev)
{
	u32 r = in_be32(&dev->emacp->mr1);

	int duplex = r & EMAC_MR1_FDE ? DUPLEX_FULL : DUPLEX_HALF;
	int speed, pause, asym_pause;

	if (r & EMAC_MR1_MF_1000)
	speed = SPEED_1000;
	else if (r & EMAC_MR1_MF_100)
	speed = SPEED_100;
	else
	speed = SPEED_10;

	switch (r & (EMAC_MR1_EIFC | EMAC_MR1_APP))
	{
		case (EMAC_MR1_EIFC | EMAC_MR1_APP):
		pause = 1;
		asym_pause = 0;
		break;
		case EMAC_MR1_APP:
		pause = 0;
		asym_pause = 1;
		break;
		default:
		pause = asym_pause = 0;
	}
	return speed != dev->phy.speed || duplex != dev->phy.duplex ||
	pause != dev->phy.pause || asym_pause != dev->phy.asym_pause;
}
#endif

static void emac_link_timer(struct work_struct *work)
{
	struct emac_instance *dev =
	container_of(to_delayed_work(work),
			struct emac_instance, link_work);
	int link_poll_interval;

	mutex_lock(&dev->link_lock);
	DBG2(dev, "link timer" NL);

	if (!dev->opened)
		goto bail;

	if (dev->phy.def->ops->poll_link(&dev->phy))
	{
		if (!netif_carrier_ok(dev->ndev))
		{
			emac_rx_clk_default(dev);
			/* Get new link parameters */
			dev->phy.def->ops->read_link(&dev->phy);

			netif_carrier_on(dev->ndev);
			emac_netif_stop(dev);
			emac_full_tx_reset(dev);
			emac_netif_start(dev);
			emac_print_link_status(dev);
		}
		link_poll_interval = PHY_POLL_LINK_ON;
	}
	else
	{
		if (netif_carrier_ok(dev->ndev))
		{
			emac_rx_clk_tx(dev);
			netif_carrier_off(dev->ndev);
			netif_tx_disable(dev->ndev);
			emac_reinitialize(dev);
			emac_print_link_status(dev);
		}
		link_poll_interval = PHY_POLL_LINK_OFF;
	}
	schedule_delayed_work(&dev->link_work, link_poll_interval);
	bail: mutex_unlock(&dev->link_lock);
}

static void emac_force_link_update(struct emac_instance *dev)
{
	netif_carrier_off(dev->ndev);
	smp_rmb();
	if (dev->link_polling)
	{
		cancel_rearming_delayed_work(&dev->link_work);
		if (dev->link_polling)
			schedule_delayed_work(&dev->link_work, PHY_POLL_LINK_OFF);
	}
}

/* Process ctx, rtnl_lock semaphore */
static int emac_close(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);

	DBG(dev, "close" NL);

	if (dev->phy.address >= 0)
	{
		dev->link_polling = 0;
		cancel_rearming_delayed_work(&dev->link_work);
	}
	mutex_lock(&dev->link_lock);
	emac_netif_stop(dev);
	dev->opened = 0;
	mutex_unlock(&dev->link_lock);

	emac_rx_disable(dev);
	emac_tx_disable(dev);
	mal_disable_rx_channel(dev->mal, dev->mal_rx_chan);
	mal_disable_tx_channel(dev->mal, dev->mal_tx_chan);
	mal_poll_del(dev->mal, &dev->commac);

	emac_clean_tx_ring(dev);
	emac_clean_rx_ring(dev);

	free_irq(dev->emac_irq, dev);

	return 0;
}

static inline u16 emac_tx_csum(struct emac_instance *dev, struct sk_buff *skb)
{
	if (emac_has_feature(dev, EMAC_FTR_HAS_TAH) && (skb->ip_summed
			== CHECKSUM_PARTIAL))
	{
		++dev->stats.tx_packets_csum;
		return EMAC_TX_CTRL_TAH_CSUM;
	}
	return 0;
}

static inline int emac_xmit_finish(struct emac_instance *dev, int len)
{
	struct emac_regs __iomem *p = dev->emacp;
	struct net_device *ndev = dev->ndev;

	/* Send the packet out. If the if makes a significant perf
	 * difference, then we can store the TMR0 value in "dev"
	 * instead
	 */
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		out_be32(&p->tmr0, EMAC4_TMR0_XMIT);
	else
		out_be32(&p->tmr0, EMAC_TMR0_XMIT);

	if (unlikely(++dev->tx_cnt == NUM_TX_BUFF))
	{
		netif_stop_queue(ndev);
		DBG2(dev, "stopped TX queue" NL);
	}

	ndev->trans_start = jiffies;
	++dev->stats.tx_packets;
	dev->stats.tx_bytes += len;

	return 0;
}

/* Tx lock BH */
static int emac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);
	unsigned int len = skb->len;
	int slot;

	u16 ctrl = EMAC_TX_CTRL_GFCS | EMAC_TX_CTRL_GP | MAL_TX_CTRL_READY
			| MAL_TX_CTRL_LAST | emac_tx_csum(dev, skb);

	slot = dev->tx_slot++;
	if (dev->tx_slot == NUM_TX_BUFF)
	{
		dev->tx_slot = 0;
		ctrl |= MAL_TX_CTRL_WRAP;
	}

	DBG2(dev, "xmit(%u) %d" NL, len, slot);

	dev->tx_skb[slot] = skb;
	dev->tx_desc[slot].data_ptr = dma_map_single(&dev->ofdev->dev, skb->data,
			len, DMA_TO_DEVICE);
	dev->tx_desc[slot].data_len = (u16) len;
	wmb();
	dev->tx_desc[slot].ctrl = ctrl;

	return emac_xmit_finish(dev, len);
}

static inline int emac_xmit_split(struct emac_instance *dev, int slot, u32 pd,
		int len, int last, u16 base_ctrl)
{
	while (1)
	{
		u16 ctrl = base_ctrl;
		int chunk = min(len, MAL_MAX_TX_SIZE);
		len -= chunk;

		slot = (slot + 1) % NUM_TX_BUFF;

		if (last && !len)
			ctrl |= MAL_TX_CTRL_LAST;
		if (slot == NUM_TX_BUFF - 1)
			ctrl |= MAL_TX_CTRL_WRAP;

		dev->tx_skb[slot] = NULL;
		dev->tx_desc[slot].data_ptr = pd;
		dev->tx_desc[slot].data_len = (u16) chunk;
		dev->tx_desc[slot].ctrl = ctrl;
		++dev->tx_cnt;

		if (!len)
			break;

		pd += chunk;
	}
	return slot;
}

/* Tx lock BH disabled (SG version for TAH equipped EMACs) */
static int emac_start_xmit_sg(struct sk_buff *skb, struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);
	int nr_frags = skb_shinfo(skb)->nr_frags;
	int len = skb->len, chunk;
	int slot, i;
	u16 ctrl;
	u32 pd;

	/* This is common "fast" path */
	if (likely(!nr_frags && len <= MAL_MAX_TX_SIZE))
		return emac_start_xmit(skb, ndev);

	len -= skb->data_len;

	/* Note, this is only an *estimation*, we can still run out of empty
	 * slots because of the additional fragmentation into
	 * MAL_MAX_TX_SIZE-sized chunks
	 */
	if (unlikely(dev->tx_cnt + nr_frags + mal_tx_chunks(len) > NUM_TX_BUFF))
		goto stop_queue;

	ctrl = EMAC_TX_CTRL_GFCS | EMAC_TX_CTRL_GP | MAL_TX_CTRL_READY
			| emac_tx_csum(dev, skb);
	slot = dev->tx_slot;

	/* skb data */
	dev->tx_skb[slot] = NULL;
	chunk = min(len, MAL_MAX_TX_SIZE);
	dev->tx_desc[slot].data_ptr = pd = dma_map_single(&dev->ofdev->dev,
			skb->data, len, DMA_TO_DEVICE);
	dev->tx_desc[slot].data_len = (u16) chunk;
	len -= chunk;
	if (unlikely(len))
		slot = emac_xmit_split(dev, slot, pd + chunk, len, !nr_frags, ctrl);
	/* skb fragments */
	for (i = 0; i < nr_frags; ++i)
	{
		struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i];
		len = frag->size;

		if (unlikely(dev->tx_cnt + mal_tx_chunks(len) >= NUM_TX_BUFF))
			goto undo_frame;

		pd = dma_map_page(&dev->ofdev->dev, frag->page, frag->page_offset, len,
				DMA_TO_DEVICE);

		slot = emac_xmit_split(dev, slot, pd, len, i == nr_frags - 1, ctrl);
	}

	DBG2(dev, "xmit_sg(%u) %d - %d" NL, skb->len, dev->tx_slot, slot);

	/* Attach skb to the last slot so we don't release it too early */
	dev->tx_skb[slot] = skb;

	/* Send the packet out */
	if (dev->tx_slot == NUM_TX_BUFF - 1)
		ctrl |= MAL_TX_CTRL_WRAP;
	wmb();
	dev->tx_desc[dev->tx_slot].ctrl = ctrl;
	dev->tx_slot = (slot + 1) % NUM_TX_BUFF;

	return emac_xmit_finish(dev, skb->len);

	undo_frame:
	/* Well, too bad. Our previous estimation was overly optimistic.
	 * Undo everything.
	 */
	while (slot != dev->tx_slot)
	{
		dev->tx_desc[slot].ctrl = 0;
		--dev->tx_cnt;
		if (--slot < 0)
			slot = NUM_TX_BUFF - 1;
	}
	++dev->estats.tx_undo;

	stop_queue: netif_stop_queue(ndev);
	DBG2(dev, "stopped TX queue" NL);
	return NETDEV_TX_BUSY;
}

/* Tx lock BHs */
static void emac_parse_tx_error(struct emac_instance *dev, u16 ctrl)
{
	struct emac_error_stats *st = &dev->estats;

	DBG(dev, "BD TX error %04x" NL, ctrl);

	++st->tx_bd_errors;
	if (ctrl & EMAC_TX_ST_BFCS)
		++st->tx_bd_bad_fcs;
	if (ctrl & EMAC_TX_ST_LCS)
		++st->tx_bd_carrier_loss;
	if (ctrl & EMAC_TX_ST_ED)
		++st->tx_bd_excessive_deferral;
	if (ctrl & EMAC_TX_ST_EC)
		++st->tx_bd_excessive_collisions;
	if (ctrl & EMAC_TX_ST_LC)
		++st->tx_bd_late_collision;
	if (ctrl & EMAC_TX_ST_MC)
		++st->tx_bd_multple_collisions;
	if (ctrl & EMAC_TX_ST_SC)
		++st->tx_bd_single_collision;
	if (ctrl & EMAC_TX_ST_UR)
		++st->tx_bd_underrun;
	if (ctrl & EMAC_TX_ST_SQE)
		++st->tx_bd_sqe;
}

static void emac_poll_tx(void *param)
{
	struct emac_instance *dev = param;
	u32 bad_mask;

	DBG2(dev, "poll_tx, %d %d" NL, dev->tx_cnt, dev->ack_slot);

	if (emac_has_feature(dev, EMAC_FTR_HAS_TAH))
		bad_mask = EMAC_IS_BAD_TX_TAH;
	else
		bad_mask = EMAC_IS_BAD_TX;

	netif_tx_lock_bh(dev->ndev);
	if (dev->tx_cnt)
	{
		u16 ctrl;
		int slot = dev->ack_slot, n = 0;
		again: ctrl = dev->tx_desc[slot].ctrl;
		if (!(ctrl & MAL_TX_CTRL_READY))
		{
			struct sk_buff *skb = dev->tx_skb[slot];
			++n;

			if (skb)
			{
				dev_kfree_skb(skb);
				dev->tx_skb[slot] = NULL;
			}
			slot = (slot + 1) % NUM_TX_BUFF;

			if (unlikely(ctrl & bad_mask))
				emac_parse_tx_error(dev, ctrl);

			if (--dev->tx_cnt)
				goto again;
		}
		if (n)
		{
			dev->ack_slot = slot;
			if (netif_queue_stopped(dev->ndev) && dev->tx_cnt
					< EMAC_TX_WAKEUP_THRESH)
				netif_wake_queue(dev->ndev);

			DBG2(dev, "tx %d pkts" NL, n);
		}
	}
	netif_tx_unlock_bh(dev->ndev);
}

static inline void emac_recycle_rx_skb(struct emac_instance *dev, int slot,
		int len)
{
	/*struct sk_buff *skb = dev->rx_skb[slot];*/

	DBG2(dev, "recycle %d %d" NL, slot, len);

	/*JRH - map a page */
	if (len)
		dma_map_page(&dev->ofdev->dev, dev->page[slot], 0,
				EMAC_DMA_ALIGN(len + 2), DMA_FROM_DEVICE);
		/*dma_map_single(&dev->ofdev->dev, skb->data - 2,
				EMAC_DMA_ALIGN(len + 2), DMA_FROM_DEVICE);*/

	dev->rx_desc[slot].data_len = 0;
	wmb();
	dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY
			| (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
}

static void emac_parse_rx_error(struct emac_instance *dev, u16 ctrl)
{
	struct emac_error_stats *st = &dev->estats;

	DBG(dev, "BD RX error %04x" NL, ctrl);

	++st->rx_bd_errors;
	if (ctrl & EMAC_RX_ST_OE)
		++st->rx_bd_overrun;
	if (ctrl & EMAC_RX_ST_BP)
		++st->rx_bd_bad_packet;
	if (ctrl & EMAC_RX_ST_RP)
		++st->rx_bd_runt_packet;
	if (ctrl & EMAC_RX_ST_SE)
		++st->rx_bd_short_event;
	if (ctrl & EMAC_RX_ST_AE)
		++st->rx_bd_alignment_error;
	if (ctrl & EMAC_RX_ST_BFCS)
		++st->rx_bd_bad_fcs;
	if (ctrl & EMAC_RX_ST_PTL)
		++st->rx_bd_packet_too_long;
	if (ctrl & EMAC_RX_ST_ORE)
		++st->rx_bd_out_of_range;
	if (ctrl & EMAC_RX_ST_IRE)
		++st->rx_bd_in_range;
}

static inline void emac_rx_csum(struct emac_instance *dev, struct sk_buff *skb,
		u16 ctrl)
{
#ifdef CONFIG_IBM_NEW_EMAC_TAH
	if (!ctrl && dev->tah_dev)
	{
		skb->ip_summed = CHECKSUM_UNNECESSARY;
		++dev->stats.rx_packets_csum;
	}
#endif
}

static inline int emac_rx_sg_append(struct emac_instance *dev, int slot)
{
	if (likely(dev->rx_sg_skb != NULL))
	{
		int len = dev->rx_desc[slot].data_len;
		int tot_len = dev->rx_sg_skb->len + len;

		if (unlikely(tot_len + 2 > dev->rx_skb_size))
		{
			++dev->estats.rx_dropped_mtu;
			dev_kfree_skb(dev->rx_sg_skb);
			dev->rx_sg_skb = NULL;
		}
		else
		{
			cacheable_memcpy(skb_tail_pointer(dev->rx_sg_skb),
					dev->rx_skb[slot]->data, len);
			skb_put(dev->rx_sg_skb, len);
			emac_recycle_rx_skb(dev, slot, len);
			return 0;
		}
	}
	emac_recycle_rx_skb(dev, slot, 0);
	return -1;
}

/* JRH - Consume an allocated page */
static void emac_consume_page(struct emac_instance *dev, int slot,
		struct sk_buff *skb, int len)
{
	dev->page[slot] = NULL;
	skb->len += len;
	skb->data_len += len;
	skb->truesize += len;

	/* Allocate a new page to replace the one consumed */
	if(!(dev->page[slot] = alloc_page(GFP_ATOMIC)))
	{
		printk(KERN_ERR "%s: allocating new page after consumption\n", __FUNCTION__);
		return;
	}

	/* Map the page */
	dev->rx_desc[slot].data_ptr = dma_map_page(&dev->ofdev->dev,
			dev->page[slot], 0, dev->rx_sync_size, DMA_FROM_DEVICE);

	return;
}

/* JRH - rewrote append routine to deal with single pages only
 *     - moved away from linear SKBs and using paged SKBs */
static inline int emac_rx_sg_append2(struct emac_instance *dev, int slot)
{
	if (likely(dev->rx_sg_skb != NULL))
	{
		int len = dev->rx_desc[slot].data_len;
		int idx = skb_shinfo(dev->rx_sg_skb)->nr_frags;
		struct page *page = dev->page[slot];
		struct sk_buff *skb = dev->rx_sg_skb;

		/* Append the page to the end of the SG chain */
		skb_fill_page_desc(skb, idx, page, 0, len);
		emac_consume_page(dev, slot, skb, len);

		return 0;
	}
	emac_recycle_rx_skb(dev, slot, 0);
	return -1;
}

/* NAPI poll context */
static int emac_poll_rx(void *param, int budget)
{
	struct emac_instance *dev = param;
	int slot = dev->rx_slot, received = 0;
	int len;
	struct sk_buff *skb;

	DBG2(dev, "poll_rx(%d)" NL, budget);

again:
	while (budget > 0)
	{
		u16 ctrl = dev->rx_desc[slot].ctrl;

		if (ctrl & MAL_RX_CTRL_EMPTY)
			break;

		skb = dev->rx_skb[slot];
		mb();
		len = dev->rx_desc[slot].data_len;

		if (unlikely(!MAL_IS_SINGLE_RX(ctrl)))
			goto sg;

		ctrl &= EMAC_BAD_RX_MASK;
		if (unlikely(ctrl && ctrl != EMAC_RX_TAH_BAD_CSUM))
		{
			DBG(dev, "bad checksum\n");
			emac_parse_rx_error(dev, ctrl);
			++dev->estats.rx_dropped_error;
			emac_recycle_rx_skb(dev, slot, 0);
			len = 0;
			goto next;
		}

		if (len < ETH_HLEN)
		{
			DBG(dev, "length less than header length\n");
			++dev->estats.rx_dropped_stack;
			emac_recycle_rx_skb(dev, slot, len);
			goto next;
		}

		if (len && len < EMAC_RX_COPY_THRESH)
		{
			/* JRH - move to pages */
			/*struct sk_buff *copy_skb = alloc_skb(
					len + EMAC_RX_SKB_HEADROOM + 2, GFP_ATOMIC);
			if (unlikely(!copy_skb))
				goto oom;

			skb_reserve(copy_skb, EMAC_RX_SKB_HEADROOM + 2);
			cacheable_memcpy(copy_skb->data - 2, skb->data - 2, len + 2);
			emac_recycle_rx_skb(dev, slot, len);
			skb = copy_skb;*/
			u8 *vaddr;
			DBG(dev, "Copying page to skb\n");
			vaddr = (u8*)kmap_atomic(dev->page[slot], KM_SKB_DATA_SOFTIRQ);
			cacheable_memcpy(skb_tail_pointer(skb), vaddr, len);
			kunmap_atomic(vaddr, KM_SKB_DATA_SOFTIRQ);
			skb_put(skb,len);
		}
		else /*if (unlikely(emac_alloc_rx_skb(dev, slot, GFP_ATOMIC)))
			goto oom;*/
		{
			DBG(dev, "placing page in skb\n");
			skb_fill_page_desc(skb, 0, dev->page[slot], 0, len);
			emac_consume_page(dev, slot, skb, len);
		}

		/* JRH - do not call skb_put() on a paged SKB! */
		/*skb_put(skb, len);*/

push_packet:
		skb->dev = dev->ndev;
		skb->protocol = eth_type_trans(skb, dev->ndev);
		emac_rx_csum(dev, skb, ctrl);

		if (unlikely(netif_receive_skb(skb) == NET_RX_DROP))
			++dev->estats.rx_dropped_stack;
next:
		++dev->stats.rx_packets;
skip:
		dev->stats.rx_bytes += len;
		slot = (slot + 1) % NUM_RX_BUFF;
		--budget;
		++received;
		continue;
sg:
		if (ctrl & MAL_RX_CTRL_FIRST)
		{
			BUG_ON(dev->rx_sg_skb);
			if (unlikely(emac_alloc_rx_skb(dev, slot, GFP_ATOMIC)))
			{
				DBG(dev, "rx OOM %d" NL, slot);
				++dev->estats.rx_dropped_oom;
				emac_recycle_rx_skb(dev, slot, 0);
			}
			else
			{
				/* JRH - changing to SG chain, instead of linear SKB */
				dev->rx_sg_skb = skb;
				/*skb_put(skb, len);*/
				skb_fill_page_desc(dev->rx_sg_skb, 0, dev->page[slot], 0, len);
				emac_consume_page(dev, slot, dev->rx_sg_skb, len);
			}
		}
		else if (!emac_rx_sg_append2(dev, slot) && (ctrl & MAL_RX_CTRL_LAST))
		{

			skb = dev->rx_sg_skb;
			dev->rx_sg_skb = NULL;

			ctrl &= EMAC_BAD_RX_MASK;
			if (unlikely(ctrl && ctrl != EMAC_RX_TAH_BAD_CSUM))
			{
				emac_parse_rx_error(dev, ctrl);
				++dev->estats.rx_dropped_error;
				dev_kfree_skb(skb);
				len = 0;
			}
			else
			{
				/* JRH - added pskb_may_pull for eth_type_trans() call above */
				/* eth_type_trans needs skb->data to point at something */
				if (!pskb_may_pull(skb, ETH_HLEN))
				{
					DBG(dev, "pskb_may_pull failed.\n");
					dev_kfree_skb(skb);
					goto skip;
				}
				goto push_packet;
			}
		}
		goto skip;
/*oom:*/
		DBG(dev, "rx OOM %d" NL, slot);
		/* Drop the packet and recycle skb */
		++dev->estats.rx_dropped_oom;
		emac_recycle_rx_skb(dev, slot, 0);
		goto next;
	}

	if (received)
	{
		DBG2(dev, "rx %d BDs" NL, received);
		dev->rx_slot = slot;
	}

	if (unlikely(budget && test_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags)))
	{
		mb();
		if (!(dev->rx_desc[slot].ctrl & MAL_RX_CTRL_EMPTY))
		{
			DBG2(dev, "rx restart" NL);
			received = 0;
			goto again;
		}

		if (dev->rx_sg_skb)
		{
			DBG2(dev, "dropping partial rx packet" NL);
			++dev->estats.rx_dropped_error;
			dev_kfree_skb(dev->rx_sg_skb);
			dev->rx_sg_skb = NULL;
		}

		clear_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags);
		mal_enable_rx_channel(dev->mal, dev->mal_rx_chan);
		emac_rx_enable(dev);
		dev->rx_slot = 0;
	}
	return received;
}

/* NAPI poll context */
static int emac_peek_rx(void *param)
{
	struct emac_instance *dev = param;

	return !(dev->rx_desc[dev->rx_slot].ctrl & MAL_RX_CTRL_EMPTY);
}

/* NAPI poll context */
static int emac_peek_rx_sg(void *param)
{
	struct emac_instance *dev = param;

	int slot = dev->rx_slot;
	while (1)
	{
		u16 ctrl = dev->rx_desc[slot].ctrl;
		if (ctrl & MAL_RX_CTRL_EMPTY)
			return 0;
		else if (ctrl & MAL_RX_CTRL_LAST)
			return 1;

		slot = (slot + 1) % NUM_RX_BUFF;

		/* I'm just being paranoid here :) */
		if (unlikely(slot == dev->rx_slot))
			return 0;
	}
}

/* Hard IRQ */
static void emac_rxde(void *param)
{
	struct emac_instance *dev = param;

	++dev->estats.rx_stopped;
	emac_rx_disable_async(dev);
}

/* Hard IRQ */
static irqreturn_t emac_irq(int irq, void *dev_instance)
{
	struct emac_instance *dev = dev_instance;
	struct emac_regs __iomem *p = dev->emacp;
	struct emac_error_stats *st = &dev->estats;
	u32 isr;

	spin_lock(&dev->lock);

	isr = in_be32(&p->isr);
	out_be32(&p->isr, isr);

	DBG(dev, "isr = %08x" NL, isr);

	if (isr & EMAC4_ISR_TXPE)
		++st->tx_parity;
	if (isr & EMAC4_ISR_RXPE)
		++st->rx_parity;
	if (isr & EMAC4_ISR_TXUE)
		++st->tx_underrun;
	if (isr & EMAC4_ISR_RXOE)
		++st->rx_fifo_overrun;
	if (isr & EMAC_ISR_OVR)
		++st->rx_overrun;
	if (isr & EMAC_ISR_BP)
		++st->rx_bad_packet;
	if (isr & EMAC_ISR_RP)
		++st->rx_runt_packet;
	if (isr & EMAC_ISR_SE)
		++st->rx_short_event;
	if (isr & EMAC_ISR_ALE)
		++st->rx_alignment_error;
	if (isr & EMAC_ISR_BFCS)
		++st->rx_bad_fcs;
	if (isr & EMAC_ISR_PTLE)
		++st->rx_packet_too_long;
	if (isr & EMAC_ISR_ORE)
		++st->rx_out_of_range;
	if (isr & EMAC_ISR_IRE)
		++st->rx_in_range;
	if (isr & EMAC_ISR_SQE)
		++st->tx_sqe;
	if (isr & EMAC_ISR_TE)
		++st->tx_errors;

	spin_unlock(&dev->lock);

	return IRQ_HANDLED;
}

static struct net_device_stats *emac_stats(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);
	struct emac_stats *st = &dev->stats;
	struct emac_error_stats *est = &dev->estats;
	struct net_device_stats *nst = &dev->nstats;
	unsigned long flags;

	DBG2(dev, "stats" NL);

	/* Compute "legacy" statistics */
	spin_lock_irqsave(&dev->lock, flags);
	nst->rx_packets = (unsigned long) st->rx_packets;
	nst->rx_bytes = (unsigned long) st->rx_bytes;
	nst->tx_packets = (unsigned long) st->tx_packets;
	nst->tx_bytes = (unsigned long) st->tx_bytes;
	nst->rx_dropped = (unsigned long) (est->rx_dropped_oom
			+ est->rx_dropped_error + est->rx_dropped_resize
			+ est->rx_dropped_mtu);
	nst->tx_dropped = (unsigned long) est->tx_dropped;

	nst->rx_errors = (unsigned long) est->rx_bd_errors;
	nst->rx_fifo_errors = (unsigned long) (est->rx_bd_overrun
			+ est->rx_fifo_overrun + est->rx_overrun);
	nst->rx_frame_errors = (unsigned long) (est->rx_bd_alignment_error
			+ est->rx_alignment_error);
	nst->rx_crc_errors = (unsigned long) (est->rx_bd_bad_fcs + est->rx_bad_fcs);
	nst->rx_length_errors
			= (unsigned long) (est->rx_bd_runt_packet + est->rx_bd_short_event
					+ est->rx_bd_packet_too_long + est->rx_bd_out_of_range
					+ est->rx_bd_in_range + est->rx_runt_packet
					+ est->rx_short_event + est->rx_packet_too_long
					+ est->rx_out_of_range + est->rx_in_range);

	nst->tx_errors = (unsigned long) (est->tx_bd_errors + est->tx_errors);
	nst->tx_fifo_errors = (unsigned long) (est->tx_bd_underrun
			+ est->tx_underrun);
	nst->tx_carrier_errors = (unsigned long) est->tx_bd_carrier_loss;
	nst->collisions = (unsigned long) (est->tx_bd_excessive_deferral
			+ est->tx_bd_excessive_collisions + est->tx_bd_late_collision
			+ est->tx_bd_multple_collisions);
	spin_unlock_irqrestore(&dev->lock, flags);
	return nst;
}

static struct mal_commac_ops emac_commac_ops =
{ .poll_tx = &emac_poll_tx, .poll_rx = &emac_poll_rx, .peek_rx = &emac_peek_rx,
		.rxde = &emac_rxde, };

static struct mal_commac_ops emac_commac_sg_ops =
{ .poll_tx = &emac_poll_tx, .poll_rx = &emac_poll_rx,
		.peek_rx = &emac_peek_rx_sg, .rxde = &emac_rxde, };

/* Ethtool support */
static int emac_ethtool_get_settings(struct net_device *ndev,
		struct ethtool_cmd *cmd)
{
	struct emac_instance *dev = netdev_priv(ndev);

	cmd->supported = dev->phy.features;
	cmd->port = PORT_MII;
	cmd->phy_address = dev->phy.address;
	cmd->transceiver = dev->phy.address >= 0 ? XCVR_EXTERNAL : XCVR_INTERNAL;

	mutex_lock(&dev->link_lock);
	cmd->advertising = dev->phy.advertising;
	cmd->autoneg = dev->phy.autoneg;
	cmd->speed = dev->phy.speed;
	cmd->duplex = dev->phy.duplex;
	mutex_unlock(&dev->link_lock);

	return 0;
}

static int emac_ethtool_set_settings(struct net_device *ndev,
		struct ethtool_cmd *cmd)
{
	struct emac_instance *dev = netdev_priv(ndev);
	u32 f = dev->phy.features;

	DBG(dev, "set_settings(%d, %d, %d, 0x%08x)" NL,
			cmd->autoneg, cmd->speed, cmd->duplex, cmd->advertising);

	/* Basic sanity checks */
	if (dev->phy.address < 0)
		return -EOPNOTSUPP;
	if (cmd->autoneg != AUTONEG_ENABLE && cmd->autoneg != AUTONEG_DISABLE)
		return -EINVAL;
	if (cmd->autoneg == AUTONEG_ENABLE && cmd->advertising == 0)
		return -EINVAL;
	if (cmd->duplex != DUPLEX_HALF && cmd->duplex != DUPLEX_FULL)
		return -EINVAL;

	if (cmd->autoneg == AUTONEG_DISABLE)
	{
		switch (cmd->speed)
		{
		case SPEED_10:
			if (cmd->duplex == DUPLEX_HALF && !(f & SUPPORTED_10baseT_Half))
				return -EINVAL;
			if (cmd->duplex == DUPLEX_FULL && !(f & SUPPORTED_10baseT_Full))
				return -EINVAL;
			break;
		case SPEED_100:
			if (cmd->duplex == DUPLEX_HALF && !(f & SUPPORTED_100baseT_Half))
				return -EINVAL;
			if (cmd->duplex == DUPLEX_FULL && !(f & SUPPORTED_100baseT_Full))
				return -EINVAL;
			break;
		case SPEED_1000:
			if (cmd->duplex == DUPLEX_HALF && !(f & SUPPORTED_1000baseT_Half))
				return -EINVAL;
			if (cmd->duplex == DUPLEX_FULL && !(f & SUPPORTED_1000baseT_Full))
				return -EINVAL;
			break;
		default:
			return -EINVAL;
		}

		mutex_lock(&dev->link_lock);
		dev->phy.def->ops->setup_forced(&dev->phy, cmd->speed, cmd->duplex);
		mutex_unlock(&dev->link_lock);

	}
	else
	{
		if (!(f & SUPPORTED_Autoneg))
			return -EINVAL;

		mutex_lock(&dev->link_lock);
		dev->phy.def->ops->setup_aneg(&dev->phy, (cmd->advertising & f)
				| (dev->phy.advertising & (ADVERTISED_Pause
						| ADVERTISED_Asym_Pause)));
		mutex_unlock(&dev->link_lock);
	}
	emac_force_link_update(dev);

	return 0;
}

static void emac_ethtool_get_ringparam(struct net_device *ndev,
		struct ethtool_ringparam *rp)
{
	rp->rx_max_pending = rp->rx_pending = NUM_RX_BUFF;
	rp->tx_max_pending = rp->tx_pending = NUM_TX_BUFF;
}

static void emac_ethtool_get_pauseparam(struct net_device *ndev,
		struct ethtool_pauseparam *pp)
{
	struct emac_instance *dev = netdev_priv(ndev);

	mutex_lock(&dev->link_lock);
	if ((dev->phy.features & SUPPORTED_Autoneg) && (dev->phy.advertising
			& (ADVERTISED_Pause | ADVERTISED_Asym_Pause)))
		pp->autoneg = 1;

	if (dev->phy.duplex == DUPLEX_FULL)
	{
		if (dev->phy.pause)
			pp->rx_pause = pp->tx_pause = 1;
		else if (dev->phy.asym_pause)
			pp->tx_pause = 1;
	}
	mutex_unlock(&dev->link_lock);
}

static u32 emac_ethtool_get_rx_csum(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);

	return dev->tah_dev != NULL;
}

static int emac_get_regs_len(struct emac_instance *dev)
{
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
		return sizeof(struct emac_ethtool_regs_subhdr)
				+ EMAC4_ETHTOOL_REGS_SIZE(dev);
	else
		return sizeof(struct emac_ethtool_regs_subhdr)
				+ EMAC_ETHTOOL_REGS_SIZE(dev);
}

static int emac_ethtool_get_regs_len(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);
	int size;

	size = sizeof(struct emac_ethtool_regs_hdr) + emac_get_regs_len(dev)
			+ mal_get_regs_len(dev->mal);
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
		size += zmii_get_regs_len(dev->zmii_dev);
	if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
		size += rgmii_get_regs_len(dev->rgmii_dev);
	if (emac_has_feature(dev, EMAC_FTR_HAS_TAH))
		size += tah_get_regs_len(dev->tah_dev);

	return size;
}

static void *emac_dump_regs(struct emac_instance *dev, void *buf)
{
	struct emac_ethtool_regs_subhdr *hdr = buf;

	hdr->index = dev->cell_index;
	if (emac_has_feature(dev, EMAC_FTR_EMAC4))
	{
		hdr->version = EMAC4_ETHTOOL_REGS_VER;
		memcpy_fromio(hdr + 1, dev->emacp, EMAC4_ETHTOOL_REGS_SIZE(dev));
		return ((void *) (hdr + 1) + EMAC4_ETHTOOL_REGS_SIZE(dev));
	}
	else
	{
		hdr->version = EMAC_ETHTOOL_REGS_VER;
		memcpy_fromio(hdr + 1, dev->emacp, EMAC_ETHTOOL_REGS_SIZE(dev));
		return ((void *) (hdr + 1) + EMAC_ETHTOOL_REGS_SIZE(dev));
	}
}

static void emac_ethtool_get_regs(struct net_device *ndev,
		struct ethtool_regs *regs, void *buf)
{
	struct emac_instance *dev = netdev_priv(ndev);
	struct emac_ethtool_regs_hdr *hdr = buf;

	hdr->components = 0;
	buf = hdr + 1;

	buf = mal_dump_regs(dev->mal, buf);
	buf = emac_dump_regs(dev, buf);
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
	{
		hdr->components |= EMAC_ETHTOOL_REGS_ZMII;
		buf = zmii_dump_regs(dev->zmii_dev, buf);
	}
	if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
	{
		hdr->components |= EMAC_ETHTOOL_REGS_RGMII;
		buf = rgmii_dump_regs(dev->rgmii_dev, buf);
	}
	if (emac_has_feature(dev, EMAC_FTR_HAS_TAH))
	{
		hdr->components |= EMAC_ETHTOOL_REGS_TAH;
		buf = tah_dump_regs(dev->tah_dev, buf);
	}
}

static int emac_ethtool_nway_reset(struct net_device *ndev)
{
	struct emac_instance *dev = netdev_priv(ndev);
	int res = 0;

	DBG(dev, "nway_reset" NL);

	if (dev->phy.address < 0)
		return -EOPNOTSUPP;

	mutex_lock(&dev->link_lock);
	if (!dev->phy.autoneg)
	{
		res = -EINVAL;
		goto out;
	}

	dev->phy.def->ops->setup_aneg(&dev->phy, dev->phy.advertising);
	out: mutex_unlock(&dev->link_lock);
	emac_force_link_update(dev);
	return res;
}

static int emac_ethtool_get_stats_count(struct net_device *ndev)
{
	return EMAC_ETHTOOL_STATS_COUNT;
}

static void emac_ethtool_get_strings(struct net_device *ndev, u32 stringset,
		u8 * buf)
{
	if (stringset == ETH_SS_STATS)
		memcpy(buf, &emac_stats_keys, sizeof(emac_stats_keys));
}

static void emac_ethtool_get_ethtool_stats(struct net_device *ndev,
		struct ethtool_stats *estats, u64 * tmp_stats)
{
	struct emac_instance *dev = netdev_priv(ndev);

	memcpy(tmp_stats, &dev->stats, sizeof(dev->stats));
	tmp_stats += sizeof(dev->stats) / sizeof(u64);
	memcpy(tmp_stats, &dev->estats, sizeof(dev->estats));
}

static void emac_ethtool_get_drvinfo(struct net_device *ndev,
		struct ethtool_drvinfo *info)
{
	struct emac_instance *dev = netdev_priv(ndev);

	strcpy(info->driver, "ibm_emac");
	strcpy(info->version, DRV_VERSION);
	info->fw_version[0] = '\0';
	sprintf(info->bus_info, "PPC 4xx EMAC-%d %s", dev->cell_index,
			dev->ofdev->node->full_name);
	info->n_stats = emac_ethtool_get_stats_count(ndev);
	info->regdump_len = emac_ethtool_get_regs_len(ndev);
}

static const struct ethtool_ops emac_ethtool_ops =
{ .get_settings = emac_ethtool_get_settings,
		.set_settings = emac_ethtool_set_settings,
		.get_drvinfo = emac_ethtool_get_drvinfo,

		.get_regs_len = emac_ethtool_get_regs_len,
		.get_regs = emac_ethtool_get_regs,

		.nway_reset = emac_ethtool_nway_reset,

		.get_ringparam = emac_ethtool_get_ringparam,
		.get_pauseparam = emac_ethtool_get_pauseparam,

		.get_rx_csum = emac_ethtool_get_rx_csum,

		.get_strings = emac_ethtool_get_strings,
		.get_stats_count = emac_ethtool_get_stats_count,
		.get_ethtool_stats = emac_ethtool_get_ethtool_stats,

		.get_link = ethtool_op_get_link, .get_tx_csum = ethtool_op_get_tx_csum,
		.get_sg = ethtool_op_get_sg, };

static int emac_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd)
{
	struct emac_instance *dev = netdev_priv(ndev);
	uint16_t *data = (uint16_t *) &rq->ifr_ifru;

	DBG(dev, "ioctl %08x" NL, cmd);

	if (dev->phy.address < 0)
		return -EOPNOTSUPP;

	switch (cmd)
	{
	case SIOCGMIIPHY:
	case SIOCDEVPRIVATE:
		data[0] = dev->phy.address;
		/* Fall through */
	case SIOCGMIIREG:
	case SIOCDEVPRIVATE + 1:
		data[3] = emac_mdio_read(ndev, dev->phy.address, data[1]);
		return 0;

	case SIOCSMIIREG:
	case SIOCDEVPRIVATE + 2:
		if (!capable(CAP_NET_ADMIN))
			return -EPERM;
		emac_mdio_write(ndev, dev->phy.address, data[1], data[2]);
		return 0;
	default:
		return -EOPNOTSUPP;
	}
}

struct emac_depentry
{
	u32 phandle;
	struct device_node *node;
	struct of_device *ofdev;
	void *drvdata;
};

#define	EMAC_DEP_MAL_IDX	0
#define	EMAC_DEP_ZMII_IDX	1
#define	EMAC_DEP_RGMII_IDX	2
#define	EMAC_DEP_TAH_IDX	3
#define	EMAC_DEP_MDIO_IDX	4
#define	EMAC_DEP_PREV_IDX	5
#define	EMAC_DEP_COUNT		6

static int __devinit emac_check_deps(struct emac_instance *dev,
		struct emac_depentry *deps)
{
	int i, there = 0;
	struct device_node *np;

	for (i = 0; i < EMAC_DEP_COUNT; i++)
	{
		/* no dependency on that item, allright */
		if (deps[i].phandle == 0)
		{
			there++;
			continue;
		}
		/* special case for blist as the dependency might go away */
		if (i == EMAC_DEP_PREV_IDX)
		{
			np = *(dev->blist - 1);
			if (np == NULL)
			{
				deps[i].phandle = 0;
				there++;
				continue;
			}
			if (deps[i].node == NULL)
				deps[i].node = of_node_get(np);
		}
		if (deps[i].node == NULL)
			deps[i].node = of_find_node_by_phandle(deps[i].phandle);
		if (deps[i].node == NULL)
			continue;
		if (deps[i].ofdev == NULL)
			deps[i].ofdev = of_find_device_by_node(deps[i].node);
		if (deps[i].ofdev == NULL)
			continue;
		if (deps[i].drvdata == NULL)
			deps[i].drvdata = dev_get_drvdata(&deps[i].ofdev->dev);
		if (deps[i].drvdata != NULL)
			there++;
	}
	return (there == EMAC_DEP_COUNT);
}

static void emac_put_deps(struct emac_instance *dev)
{
	if (dev->mal_dev)
		of_dev_put(dev->mal_dev);
	if (dev->zmii_dev)
		of_dev_put(dev->zmii_dev);
	if (dev->rgmii_dev)
		of_dev_put(dev->rgmii_dev);
	if (dev->mdio_dev)
		of_dev_put(dev->mdio_dev);
	if (dev->tah_dev)
		of_dev_put(dev->tah_dev);
}

static int __devinit emac_of_bus_notify(struct notifier_block *nb,
		unsigned long action, void *data)
{
	/* We are only intereted in device addition */
	if (action == BUS_NOTIFY_BOUND_DRIVER)
		wake_up_all(&emac_probe_wait);
	return 0;
}

static struct notifier_block emac_of_bus_notifier __devinitdata =
{ .notifier_call = emac_of_bus_notify };

static int __devinit emac_wait_deps(struct emac_instance *dev)
{
	struct emac_depentry deps[EMAC_DEP_COUNT];
	int i, err;

	memset(&deps, 0, sizeof(deps));

	deps[EMAC_DEP_MAL_IDX].phandle = dev->mal_ph;
	deps[EMAC_DEP_ZMII_IDX].phandle = dev->zmii_ph;
	deps[EMAC_DEP_RGMII_IDX].phandle = dev->rgmii_ph;
	if (dev->tah_ph)
		deps[EMAC_DEP_TAH_IDX].phandle = dev->tah_ph;
	if (dev->mdio_ph)
		deps[EMAC_DEP_MDIO_IDX].phandle = dev->mdio_ph;
	if (dev->blist && dev->blist > emac_boot_list)
		deps[EMAC_DEP_PREV_IDX].phandle = 0xffffffffu;
	bus_register_notifier(&of_platform_bus_type, &emac_of_bus_notifier);
	wait_event_timeout(emac_probe_wait, emac_check_deps(dev, deps),
			EMAC_PROBE_DEP_TIMEOUT);
	bus_unregister_notifier(&of_platform_bus_type, &emac_of_bus_notifier);
	err = emac_check_deps(dev, deps) ? 0 : -ENODEV;
	for (i = 0; i < EMAC_DEP_COUNT; i++)
	{
		if (deps[i].node)
			of_node_put(deps[i].node);
		if (err && deps[i].ofdev)
			of_dev_put(deps[i].ofdev);
	}
	if (err == 0)
	{
		dev->mal_dev = deps[EMAC_DEP_MAL_IDX].ofdev;
		dev->zmii_dev = deps[EMAC_DEP_ZMII_IDX].ofdev;
		dev->rgmii_dev = deps[EMAC_DEP_RGMII_IDX].ofdev;
		dev->tah_dev = deps[EMAC_DEP_TAH_IDX].ofdev;
		dev->mdio_dev = deps[EMAC_DEP_MDIO_IDX].ofdev;
	}
	if (deps[EMAC_DEP_PREV_IDX].ofdev)
		of_dev_put(deps[EMAC_DEP_PREV_IDX].ofdev);
	return err;
}

static int __devinit emac_read_uint_prop(struct device_node *np,
		const char *name, u32 *val, int fatal)
{
	int len;
	const u32 *prop = of_get_property(np, name, &len);
	if (prop == NULL || len < sizeof(u32))
	{
		if (fatal)
		printk(KERN_ERR "%s: missing %s property\n",
				np->full_name, name);
		return -ENODEV;
	}
	*val = *prop;
	return 0;
}

static int __devinit emac_init_phy(struct emac_instance *dev)
{
	struct device_node *np = dev->ofdev->node;
	struct net_device *ndev = dev->ndev;
	u32 phy_map, adv;
	int i;

	dev->phy.dev = ndev;
	dev->phy.mode = dev->phy_mode;

	/* PHY-less configuration.
	 * XXX I probably should move these settings to the dev tree
	 */
	if (dev->phy_address == 0xffffffff && dev->phy_map == 0xffffffff)
	{
		emac_reset(dev);

		/* PHY-less configuration.
		 * XXX I probably should move these settings to the dev tree
		 */
		dev->phy.address = -1;
		dev->phy.features = SUPPORTED_MII;
		if (emac_phy_supports_gige(dev->phy_mode))
			dev->phy.features |= SUPPORTED_1000baseT_Full;
		else
			dev->phy.features |= SUPPORTED_100baseT_Full;
		dev->phy.pause = 1;

		return 0;
	}

	mutex_lock(&emac_phy_map_lock);
	phy_map = dev->phy_map | busy_phy_map;

	DBG(dev, "PHY maps %08x %08x" NL, dev->phy_map, busy_phy_map);

	dev->phy.mdio_read = emac_mdio_read;
	dev->phy.mdio_write = emac_mdio_write;

	/* Enable internal clock source */
#ifdef CONFIG_PPC_DCR_NATIVE
	if (emac_has_feature(dev, EMAC_FTR_440GX_PHY_CLK_FIX))
	dcri_clrset(SDR0, SDR0_MFR, 0, SDR0_MFR_ECS);
#endif
	/* PHY clock workaround */
	emac_rx_clk_tx(dev);

	/* Enable internal clock source on 440GX*/
#ifdef CONFIG_PPC_DCR_NATIVE
	if (emac_has_feature(dev, EMAC_FTR_440GX_PHY_CLK_FIX))
	dcri_clrset(SDR0, SDR0_MFR, 0, SDR0_MFR_ECS);
#endif
	/* Configure EMAC with defaults so we can at least use MDIO
	 * This is needed mostly for 440GX
	 */
	if (emac_phy_gpcs(dev->phy.mode))
	{
		/* XXX
		 * Make GPCS PHY address equal to EMAC index.
		 * We probably should take into account busy_phy_map
		 * and/or phy_map here.
		 *
		 * Note that the busy_phy_map is currently global
		 * while it should probably be per-ASIC...
		 */
		dev->phy.gpcs_address = dev->gpcs_address;
		if (dev->phy.gpcs_address == 0xffffffff)
			dev->phy.address = dev->cell_index;
	}

	emac_configure(dev);

	if (dev->phy_address != 0xffffffff)
		phy_map = ~(1 << dev->phy_address);

	for (i = 0; i < 0x20; phy_map >>= 1, ++i)
		if (!(phy_map & 1))
		{
			int r;
			busy_phy_map |= 1 << i;

			/* Quick check if there is a PHY at the address */
			r = emac_mdio_read(dev->ndev, i, MII_BMCR);
			if (r == 0xffff || r < 0)
				continue;
			if (!emac_mii_phy_probe(&dev->phy, i))
				break;
		}

	/* Enable external clock source */
#ifdef CONFIG_PPC_DCR_NATIVE
	if (emac_has_feature(dev, EMAC_FTR_440GX_PHY_CLK_FIX))
	dcri_clrset(SDR0, SDR0_MFR, SDR0_MFR_ECS, 0);
#endif
	mutex_unlock(&emac_phy_map_lock);
	if (i == 0x20)
	{
		printk(KERN_WARNING "%s: can't find PHY!\n", np->full_name);
		return -ENXIO;
	}

	/* Init PHY */
	if (dev->phy.def->ops->init)
		dev->phy.def->ops->init(&dev->phy);

	/* Disable any PHY features not supported by the platform */
	dev->phy.def->features &= ~dev->phy_feat_exc;

	/* Setup initial link parameters */
	if (dev->phy.features & SUPPORTED_Autoneg)
	{
		adv = dev->phy.features;
		if (!emac_has_feature(dev, EMAC_FTR_NO_FLOW_CONTROL_40x))
			adv |= ADVERTISED_Pause | ADVERTISED_Asym_Pause;
		/* Restart autonegotiation */
		dev->phy.def->ops->setup_aneg(&dev->phy, adv);
	}
	else
	{
		u32 f = dev->phy.def->features;
		int speed = SPEED_10, fd = DUPLEX_HALF;

		/* Select highest supported speed/duplex */
		if (f & SUPPORTED_1000baseT_Full)
		{
			speed = SPEED_1000;
			fd = DUPLEX_FULL;
		}
		else if (f & SUPPORTED_1000baseT_Half)
			speed = SPEED_1000;
		else if (f & SUPPORTED_100baseT_Full)
		{
			speed = SPEED_100;
			fd = DUPLEX_FULL;
		}
		else if (f & SUPPORTED_100baseT_Half)
			speed = SPEED_100;
		else if (f & SUPPORTED_10baseT_Full)
			fd = DUPLEX_FULL;

		/* Force link parameters */
		dev->phy.def->ops->setup_forced(&dev->phy, speed, fd);
	}
	return 0;
}

static int __devinit emac_init_config(struct emac_instance *dev)
{
	struct device_node *np = dev->ofdev->node;
	const void *p;
	unsigned int plen;
	const char *pm, *phy_modes[] =
	{ [PHY_MODE_NA] = "", [PHY_MODE_MII] = "mii", [PHY_MODE_RMII] = "rmii",
			[PHY_MODE_SMII] = "smii", [PHY_MODE_RGMII] = "rgmii",
			[PHY_MODE_TBI] = "tbi", [PHY_MODE_GMII] = "gmii",
			[PHY_MODE_RTBI] = "rtbi", [PHY_MODE_SGMII] = "sgmii", };

	/* Read config from device-tree */
	if (emac_read_uint_prop(np, "mal-device", &dev->mal_ph, 1))
		return -ENXIO;
	if (emac_read_uint_prop(np, "mal-tx-channel", &dev->mal_tx_chan, 1))
		return -ENXIO;
	if (emac_read_uint_prop(np, "mal-rx-channel", &dev->mal_rx_chan, 1))
		return -ENXIO;
	if (emac_read_uint_prop(np, "cell-index", &dev->cell_index, 1))
		return -ENXIO;
	if (emac_read_uint_prop(np, "max-frame-size", &dev->max_mtu, 0))
		dev->max_mtu = 1500;
	if (emac_read_uint_prop(np, "rx-fifo-size", &dev->rx_fifo_size, 0))
		dev->rx_fifo_size = 2048;
	if (emac_read_uint_prop(np, "tx-fifo-size", &dev->tx_fifo_size, 0))
		dev->tx_fifo_size = 2048;
	if (emac_read_uint_prop(np, "rx-fifo-size-gige", &dev->rx_fifo_size_gige, 0))
		dev->rx_fifo_size_gige = dev->rx_fifo_size;
	if (emac_read_uint_prop(np, "tx-fifo-size-gige", &dev->tx_fifo_size_gige, 0))
		dev->tx_fifo_size_gige = dev->tx_fifo_size;
	if (emac_read_uint_prop(np, "phy-address", &dev->phy_address, 0))
		dev->phy_address = 0xffffffff;
	if (emac_read_uint_prop(np, "phy-map", &dev->phy_map, 0))
		dev->phy_map = 0xffffffff;
	if (emac_read_uint_prop(np, "gpcs-address", &dev->gpcs_address, 0))
		dev->gpcs_address = 0xffffffff;
	if (emac_read_uint_prop(np->parent, "clock-frequency", &dev->opb_bus_freq,
			1))
		return -ENXIO;
	if (emac_read_uint_prop(np, "tah-device", &dev->tah_ph, 0))
		dev->tah_ph = 0;
	if (emac_read_uint_prop(np, "tah-channel", &dev->tah_port, 0))
		dev->tah_port = 0;
	if (emac_read_uint_prop(np, "mdio-device", &dev->mdio_ph, 0))
		dev->mdio_ph = 0;
	if (emac_read_uint_prop(np, "zmii-device", &dev->zmii_ph, 0))
		dev->zmii_ph = 0;
	;
	if (emac_read_uint_prop(np, "zmii-channel", &dev->zmii_port, 0))
		dev->zmii_port = 0xffffffff;
	;
	if (emac_read_uint_prop(np, "rgmii-device", &dev->rgmii_ph, 0))
		dev->rgmii_ph = 0;
	;
	if (emac_read_uint_prop(np, "rgmii-channel", &dev->rgmii_port, 0))
		dev->rgmii_port = 0xffffffff;
	;
	if (emac_read_uint_prop(np, "fifo-entry-size", &dev->fifo_entry_size, 0))
		dev->fifo_entry_size = 16;
	if (emac_read_uint_prop(np, "mal-burst-size", &dev->mal_burst_size, 0))
		dev->mal_burst_size = 256;

	/* PHY mode needs some decoding */
	dev->phy_mode = PHY_MODE_NA;
	pm = of_get_property(np, "phy-mode", &plen);
	if (pm != NULL)
	{
		int i;
		for (i = 0; i < ARRAY_SIZE(phy_modes); i++)
			if (!strcasecmp(pm, phy_modes[i]))
			{
				dev->phy_mode = i;
				break;
			}
	}

	/* Backward compat with non-final DT */
	if (dev->phy_mode == PHY_MODE_NA && pm != NULL && plen == 4)
	{
		u32 nmode = *(const u32 *) pm;
		if (nmode > PHY_MODE_NA && nmode <= PHY_MODE_SGMII)
			dev->phy_mode = nmode;
	}

	/* Check EMAC version */
	if (of_device_is_compatible(np, "ibm,emac4sync"))
	{
		dev->features |= (EMAC_FTR_EMAC4 | EMAC_FTR_EMAC4SYNC);
		if (of_device_is_compatible(np, "ibm,emac-460ex")
				|| of_device_is_compatible(np, "ibm,emac-460gt"))
			dev->features |= EMAC_FTR_460EX_PHY_CLK_FIX;
		if (of_device_is_compatible(np, "ibm,emac-405ex")
				|| of_device_is_compatible(np, "ibm,emac-405exr"))
			dev->features |= EMAC_FTR_440EP_PHY_CLK_FIX;
	}
	else if (of_device_is_compatible(np, "ibm,emac4"))
	{
		dev->features |= EMAC_FTR_EMAC4;
		if (of_device_is_compatible(np, "ibm,emac-440gx"))
			dev->features |= EMAC_FTR_440GX_PHY_CLK_FIX;
	}
	else
	{
		if (of_device_is_compatible(np, "ibm,emac-440ep")
				|| of_device_is_compatible(np, "ibm,emac-440gr"))
			dev->features |= EMAC_FTR_440EP_PHY_CLK_FIX;
		if (of_device_is_compatible(np, "ibm,emac-405ez"))
		{
#ifdef CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL
			dev->features |= EMAC_FTR_NO_FLOW_CONTROL_40x;
#else
			printk(KERN_ERR "%s: Flow control not disabled!\n",
					np->full_name);
			return -ENXIO;
#endif
		}

	}

	/* Fixup some feature bits based on the device tree */
	if (of_get_property(np, "has-inverted-stacr-oc", NULL))
		dev->features |= EMAC_FTR_STACR_OC_INVERT;
	if (of_get_property(np, "has-new-stacr-staopc", NULL))
		dev->features |= EMAC_FTR_HAS_NEW_STACR;

	/* CAB lacks the appropriate properties */
	if (of_device_is_compatible(np, "ibm,emac-axon"))
		dev->features |= EMAC_FTR_HAS_NEW_STACR | EMAC_FTR_STACR_OC_INVERT;

	/* Enable TAH/ZMII/RGMII features as found */
	if (dev->tah_ph != 0)
	{
#ifdef CONFIG_IBM_NEW_EMAC_TAH
		dev->features |= EMAC_FTR_HAS_TAH;
#else
		printk(KERN_ERR "%s: TAH support not enabled !\n",
				np->full_name);
		return -ENXIO;
#endif
	}

	if (dev->zmii_ph != 0)
	{
#ifdef CONFIG_IBM_NEW_EMAC_ZMII
		dev->features |= EMAC_FTR_HAS_ZMII;
#else
		printk(KERN_ERR "%s: ZMII support not enabled !\n",
				np->full_name);
		return -ENXIO;
#endif
	}

	if (dev->rgmii_ph != 0)
	{
#ifdef CONFIG_IBM_NEW_EMAC_RGMII
		dev->features |= EMAC_FTR_HAS_RGMII;
#else
		printk(KERN_ERR "%s: RGMII support not enabled !\n",
				np->full_name);
		return -ENXIO;
#endif
	}

	/* Read MAC-address */
	p = of_get_property(np, "local-mac-address", NULL);
	if (p == NULL)
	{
		printk(KERN_ERR "%s: Can't find local-mac-address property\n",
				np->full_name);
		return -ENXIO;
	}
	memcpy(dev->ndev->dev_addr, p, 6);

	/* IAHT and GAHT filter parameterization */
	if (emac_has_feature(dev, EMAC_FTR_EMAC4SYNC))
	{
		dev->xaht_slots_shift = EMAC4SYNC_XAHT_SLOTS_SHIFT;
		dev->xaht_width_shift = EMAC4SYNC_XAHT_WIDTH_SHIFT;
	}
	else
	{
		dev->xaht_slots_shift = EMAC4_XAHT_SLOTS_SHIFT;
		dev->xaht_width_shift = EMAC4_XAHT_WIDTH_SHIFT;
	}

	DBG(dev, "features     : 0x%08x / 0x%08x\n", dev->features, EMAC_FTRS_POSSIBLE);
	DBG(dev, "tx_fifo_size : %d (%d gige)\n", dev->tx_fifo_size, dev->tx_fifo_size_gige);
	DBG(dev, "rx_fifo_size : %d (%d gige)\n", dev->rx_fifo_size, dev->rx_fifo_size_gige);
	DBG(dev, "max_mtu      : %d\n", dev->max_mtu);
	DBG(dev, "OPB freq     : %d\n", dev->opb_bus_freq);

	return 0;
}

static const struct net_device_ops emac_netdev_ops =
{ .ndo_open = emac_open, .ndo_stop = emac_close, .ndo_get_stats = emac_stats,
		.ndo_set_multicast_list = emac_set_multicast_list,
		.ndo_do_ioctl = emac_ioctl, .ndo_tx_timeout = emac_tx_timeout,
		.ndo_validate_addr = eth_validate_addr,
		.ndo_set_mac_address = eth_mac_addr, .ndo_start_xmit = emac_start_xmit,
		.ndo_change_mtu = eth_change_mtu, };

static const struct net_device_ops
		emac_gige_netdev_ops =
		{ .ndo_open = emac_open, .ndo_stop = emac_close,
				.ndo_get_stats = emac_stats,
				.ndo_set_multicast_list = emac_set_multicast_list,
				.ndo_do_ioctl = emac_ioctl, .ndo_tx_timeout = emac_tx_timeout,
				.ndo_validate_addr = eth_validate_addr,
				.ndo_set_mac_address = eth_mac_addr,
				.ndo_start_xmit = emac_start_xmit_sg,
				.ndo_change_mtu = emac_change_mtu, };

static int __devinit emac_probe(struct of_device *ofdev,
		const struct of_device_id *match)
{
	struct net_device *ndev;
	struct emac_instance *dev;
	struct device_node *np = ofdev->node;
	struct device_node **blist = NULL;
	int err, i;

	/* Skip unused/unwired EMACS.  We leave the check for an unused
	 * property here for now, but new flat device trees should set a
	 * status property to "disabled" instead.
	 */
	if (of_get_property(np, "unused", NULL) || !of_device_is_available(np))
		return -ENODEV;

	/* Find ourselves in the bootlist if we are there */
	for (i = 0; i < EMAC_BOOT_LIST_SIZE; i++)
		if (emac_boot_list[i] == np)
			blist = &emac_boot_list[i];

	/* Allocate our net_device structure */
	err = -ENOMEM;
	ndev = alloc_etherdev(sizeof(struct emac_instance));
	if (!ndev)
	{
		printk(KERN_ERR "%s: could not allocate ethernet device!\n",
				np->full_name);
		goto err_gone;
	}
	dev = netdev_priv(ndev);
	dev->ndev = ndev;
	dev->ofdev = ofdev;
	dev->blist = blist;
	SET_NETDEV_DEV(ndev, &ofdev->dev);

	/* Initialize some embedded data structures */
	mutex_init(&dev->mdio_lock);
	mutex_init(&dev->link_lock);
	spin_lock_init(&dev->lock);
	INIT_WORK(&dev->reset_work, emac_reset_work);

	/* Init various config data based on device-tree */
	err = emac_init_config(dev);
	if (err != 0)
		goto err_free;

	/* Get interrupts. EMAC irq is mandatory, WOL irq is optional */
	dev->emac_irq = irq_of_parse_and_map(np, 0);
	dev->wol_irq = irq_of_parse_and_map(np, 1);
	if (dev->emac_irq == NO_IRQ)
	{
		printk(KERN_ERR "%s: Can't map main interrupt\n", np->full_name);
		goto err_free;
	}
	ndev->irq = dev->emac_irq;

	/* Map EMAC regs */
	if (of_address_to_resource(np, 0, &dev->rsrc_regs))
	{
		printk(KERN_ERR "%s: Can't get registers address\n",
				np->full_name);
		goto err_irq_unmap;
	}
	// TODO : request_mem_region
	dev->emacp = ioremap(dev->rsrc_regs.start, dev->rsrc_regs.end
			- dev->rsrc_regs.start + 1);
	if (dev->emacp == NULL)
	{
		printk(KERN_ERR "%s: Can't map device registers!\n",
				np->full_name);
		err = -ENOMEM;
		goto err_irq_unmap;
	}

	/* Wait for dependent devices */
	err = emac_wait_deps(dev);
	if (err)
	{
		printk(KERN_ERR
				"%s: Timeout waiting for dependent devices\n",
				np->full_name);
		/*  display more info about what's missing ? */
		goto err_reg_unmap;
	}
	dev->mal = dev_get_drvdata(&dev->mal_dev->dev);
	if (dev->mdio_dev != NULL)
		dev->mdio_instance = dev_get_drvdata(&dev->mdio_dev->dev);

	/* Register with MAL */
	dev->commac.ops = &emac_commac_ops;
	dev->commac.dev = dev;
	dev->commac.tx_chan_mask = MAL_CHAN_MASK(dev->mal_tx_chan);
	dev->commac.rx_chan_mask = MAL_CHAN_MASK(dev->mal_rx_chan);
	err = mal_register_commac(dev->mal, &dev->commac);
	if (err)
	{
		printk(KERN_ERR "%s: failed to register with mal %s!\n",
				np->full_name, dev->mal_dev->node->full_name);
		goto err_rel_deps;
	}
	dev->rx_skb_size = emac_rx_skb_size(ndev->mtu);
	dev->rx_sync_size = emac_rx_sync_size(ndev->mtu);

	if (dev->rx_skb_size > PAGE_SIZE)
	{
		dev->rx_skb_size = PAGE_SIZE;
	}

	/* Get pointers to BD rings */
	dev->tx_desc = dev->mal->bd_virt + mal_tx_bd_offset(dev->mal,
			dev->mal_tx_chan);
	dev->rx_desc = dev->mal->bd_virt + mal_rx_bd_offset(dev->mal,
			dev->mal_rx_chan);

	DBG(dev, "tx_desc %p" NL, dev->tx_desc);
	DBG(dev, "rx_desc %p" NL, dev->rx_desc);

	/* Clean rings */
	memset(dev->tx_desc, 0, NUM_TX_BUFF * sizeof(struct mal_descriptor));
	memset(dev->rx_desc, 0, NUM_RX_BUFF * sizeof(struct mal_descriptor));
	memset(dev->tx_skb, 0, NUM_TX_BUFF * sizeof(struct sk_buff *));
	memset(dev->rx_skb, 0, NUM_RX_BUFF * sizeof(struct sk_buff *));

	/* Attach to ZMII, if needed */
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII) && (err
			= zmii_attach(dev->zmii_dev, dev->zmii_port, &dev->phy_mode)) != 0)
		goto err_unreg_commac;

	/* Attach to RGMII, if needed */
	if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII) && (err
			= rgmii_attach(dev->rgmii_dev, dev->rgmii_port, dev->phy_mode))
			!= 0)
		goto err_detach_zmii;

	/* Attach to TAH, if needed */
	if (emac_has_feature(dev, EMAC_FTR_HAS_TAH) && (err
			= tah_attach(dev->tah_dev, dev->tah_port)) != 0)
		goto err_detach_rgmii;

	/* Set some link defaults before we can find out real parameters */
	dev->phy.speed = SPEED_100;
	dev->phy.duplex = DUPLEX_FULL;
	dev->phy.autoneg = AUTONEG_DISABLE;
	dev->phy.pause = dev->phy.asym_pause = 0;
	dev->stop_timeout = STOP_TIMEOUT_100;
	INIT_DELAYED_WORK(&dev->link_work, emac_link_timer);

	/* Find PHY if any */
	err = emac_init_phy(dev);
	if (err != 0)
		goto err_detach_tah;

	printk(KERN_INFO "ENABLING HIGHDMA!\n");
	ndev->features |= NETIF_F_HIGHDMA;

	if (dev->tah_dev)
		ndev->features |= NETIF_F_IP_CSUM | NETIF_F_SG;
	ndev->watchdog_timeo = 5 * HZ;
	if (emac_phy_supports_gige(dev->phy_mode))
	{
		ndev->netdev_ops = &emac_gige_netdev_ops;
		dev->commac.ops = &emac_commac_sg_ops;
	}
	else
		ndev->netdev_ops = &emac_netdev_ops;
	SET_ETHTOOL_OPS(ndev, &emac_ethtool_ops);

	netif_carrier_off(ndev);
	netif_stop_queue(ndev);

	err = register_netdev(ndev);
	if (err)
	{
		printk(KERN_ERR "%s: failed to register net device (%d)!\n",
				np->full_name, err);
		goto err_detach_tah;
	}

	/* Set our drvdata last as we don't want them visible until we are
	 * fully initialized
	 */
	wmb();
	dev_set_drvdata(&ofdev->dev, dev);

	/* There's a new kid in town ! Let's tell everybody */
	wake_up_all(&emac_probe_wait);

	printk(KERN_INFO "%s: EMAC-%d %s, MAC %pM\n",
			ndev->name, dev->cell_index, np->full_name, ndev->dev_addr);

	if (dev->phy_mode == PHY_MODE_SGMII)
	printk(KERN_NOTICE "%s: in SGMII mode\n", ndev->name);

	if (dev->phy.address >= 0)
		printk("%s: found %s PHY (0x%02x)\n", ndev->name, dev->phy.def->name,
				dev->phy.address);

	emac_dbg_register(dev);

	/* Life is good */
	return 0;

	/* I have a bad feeling about this ... */

	err_detach_tah: if (emac_has_feature(dev, EMAC_FTR_HAS_TAH))
		tah_detach(dev->tah_dev, dev->tah_port);
	err_detach_rgmii: if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
		rgmii_detach(dev->rgmii_dev, dev->rgmii_port);
	err_detach_zmii: if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
		zmii_detach(dev->zmii_dev, dev->zmii_port);
	err_unreg_commac: mal_unregister_commac(dev->mal, &dev->commac);
	err_rel_deps: emac_put_deps(dev);
	err_reg_unmap: iounmap(dev->emacp);
	err_irq_unmap: if (dev->wol_irq != NO_IRQ)
		irq_dispose_mapping(dev->wol_irq);
	if (dev->emac_irq != NO_IRQ)
		irq_dispose_mapping(dev->emac_irq);
	err_free: kfree(ndev);
	err_gone:
	/* if we were on the bootlist, remove us as we won't show up and
	 * wake up all waiters to notify them in case they were waiting
	 * on us
	 */
	if (blist)
	{
		*blist = NULL;
		wake_up_all(&emac_probe_wait);
	}
	return err;
}

static int __devexit emac_remove(struct of_device *ofdev)
{
	struct emac_instance *dev = dev_get_drvdata(&ofdev->dev);

	DBG(dev, "remove" NL);

	dev_set_drvdata(&ofdev->dev, NULL);

	unregister_netdev(dev->ndev);

	flush_scheduled_work();

	if (emac_has_feature(dev, EMAC_FTR_HAS_TAH))
	tah_detach(dev->tah_dev, dev->tah_port);
	if (emac_has_feature(dev, EMAC_FTR_HAS_RGMII))
	rgmii_detach(dev->rgmii_dev, dev->rgmii_port);
	if (emac_has_feature(dev, EMAC_FTR_HAS_ZMII))
	zmii_detach(dev->zmii_dev, dev->zmii_port);

	mal_unregister_commac(dev->mal, &dev->commac);
	emac_put_deps(dev);

	emac_dbg_unregister(dev);
	iounmap(dev->emacp);

	if (dev->wol_irq != NO_IRQ)
	irq_dispose_mapping(dev->wol_irq);
	if (dev->emac_irq != NO_IRQ)
	irq_dispose_mapping(dev->emac_irq);

	kfree(dev->ndev);

	return 0;
}

/* XXX Features in here should be replaced by properties... */
static struct of_device_id emac_match[] =
{
{ .type = "network", .compatible = "ibm,emac", },
{ .type = "network", .compatible = "ibm,emac4", },
{ .type = "network", .compatible = "ibm,emac4sync", },
{ }, };

static struct of_platform_driver emac_driver =
{ .name = "emac", .match_table = emac_match,

.probe = emac_probe, .remove = emac_remove, };

static void __init emac_make_bootlist(void)
{
	struct device_node *np = NULL;
	int j, max, i = 0, k;
	int cell_indices[EMAC_BOOT_LIST_SIZE];

	/* Collect EMACs */
	while((np = of_find_all_nodes(np)) != NULL)
	{
		const u32 *idx;

		if (of_match_node(emac_match, np) == NULL)
		continue;
		if (of_get_property(np, "unused", NULL))
		continue;
		idx = of_get_property(np, "cell-index", NULL);
		if (idx == NULL)
		continue;
		cell_indices[i] = *idx;
		emac_boot_list[i++] = of_node_get(np);
		if (i >= EMAC_BOOT_LIST_SIZE)
		{
			of_node_put(np);
			break;
		}
	}
	max = i;

	/* Bubble sort them (doh, what a creative algorithm :-) */
	for (i = 0; max > 1 && (i < (max - 1)); i++)
	for (j = i; j < max; j++)
	{
		if (cell_indices[i] > cell_indices[j])
		{
			np = emac_boot_list[i];
			emac_boot_list[i] = emac_boot_list[j];
			emac_boot_list[j] = np;
			k = cell_indices[i];
			cell_indices[i] = cell_indices[j];
			cell_indices[j] = k;
		}
	}
}

static int __init emac_init(void)
{
	int rc;

	printk(KERN_INFO DRV_DESC ", version " DRV_VERSION "\n");

	/* Init debug stuff */
	emac_init_debug();

	/* Build EMAC boot list */
	emac_make_bootlist();

	/* Init submodules */
	rc = mal_init();
	if (rc)
	goto err;
	rc = zmii_init();
	if (rc)
	goto err_mal;
	rc = rgmii_init();
	if (rc)
	goto err_zmii;
	rc = tah_init();
	if (rc)
	goto err_rgmii;
	rc = of_register_platform_driver(&emac_driver);
	if (rc)
	goto err_tah;

	return 0;

	err_tah:
	tah_exit();
	err_rgmii:
	rgmii_exit();
	err_zmii:
	zmii_exit();
	err_mal:
	mal_exit();
	err:
	return rc;
}

static void __exit emac_exit(void)
{
	int i;

	of_unregister_platform_driver(&emac_driver);

	tah_exit();
	rgmii_exit();
	zmii_exit();
	mal_exit();
	emac_fini_debug();

	/* Destroy EMAC boot list */
	for (i = 0; i < EMAC_BOOT_LIST_SIZE; i++)
	if (emac_boot_list[i])
	of_node_put(emac_boot_list[i]);
}

module_init(emac_init);
module_exit(emac_exit);

^ permalink raw reply

* RE: Memory allocation modifications in ibm_newemac driver
From: Jonathan Haws @ 2010-09-01 21:18 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <5ABE8A5E96C0364CAF2B4DC1DEB7CEAC127A2F09@Mercury.usurf.usu.edu>

I found out what was causing the crash, but still am not there and could us=
e some direction:=0A=
=0A=
What was happening was that I was not allocating a new SKB to replace the o=
ne in the ring that was being passed up the stack.  I have remedied that an=
d am now having another issue:=0A=
=0A=
Once the ring index rolls over (it does so at 64) I start to lose packets b=
ecause they are not being handled correctly (or do not contain the correct =
headers or something of that sort).  Here is a simple ping test showing wha=
t is happening:=0A=
=0A=
64 bytes from 172.31.22.21: seq=3D29 ttl=3D128 time=3D10.826 ms=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 54 0X1800 110 0XC04E6B80=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 55 0X1800 110 0XC04E6EC0=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 56 0X1800 98 0XC04E70C0=0A=
64 bytes from 172.31.22.21: seq=3D30 ttl=3D128 time=3D10.839 ms=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 57 0X1800 110 0XC04E6580=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 58 0X1800 98 0XC04E5B80=0A=
64 bytes from 172.31.22.21: seq=3D31 ttl=3D128 time=3D10.832 ms=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 59 0X1800 219 0XC04E5740=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 60 0X1800 249 0XC04E5000=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 61 0X1800 92 0XC04E5340=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 62 0X1800 98 0XC04E4A00=0A=
64 bytes from 172.31.22.21: seq=3D32 ttl=3D128 time=3D10.825 ms=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 63 0X5800 92 0XC04E4E00=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 0 0X1800 98 0XC04EF520=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 1 0X1800 92 0XC04D8340=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 2 0X1800 98 0XC04D8260=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 3 0X1800 60 0XC04E7AA0=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 4 0X1800 92 0XC04E74A0=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 5 0X1800 92 0XC04E6BC0=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 6 0X1800 98 0XC04E6F80=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 7 0X1800 92 0XC04E64E0=0A=
emac/plb/opb/ethernet@ef600900: PACKET: 8 0X1800 98 0XC04E5BC0=0A=
=0A=
The first number in my debug print statement is what the driver calls the s=
lot number (the ring index).  When it rolls over I start losing the ping re=
plies.  What have I done wrong to cause that?  The data is coming in and is=
 there.  Have any other network device developers seen similar behavior?=0A=
=0A=
Thanks,=0A=
=0A=
Jonathan=0A=
=0A=

^ permalink raw reply

* RE: Memory allocation modifications in ibm_newemac driver
From: Jonathan Haws @ 2010-09-01 22:04 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <5ABE8A5E96C0364CAF2B4DC1DEB7CEAC127A2FBE@Mercury.usurf.usu.edu>

Okay, I think I have all the issues worked out and can now send and receive=
 any size packet without a hiccup.  I have tested this in our system setup =
as well with data being sent out to disk and did not see any problems there=
 either (since it only ever allocates a single page, never more).=0A=
=0A=
Is this something that may be wanted in the mainline?  I have not run full =
benchmarks, but I anticipate that my modified driver is slightly slower tha=
n the mainline driver because we keep track of an SKB ring, as well as a ri=
ng of pages and allocate both on each packet received.=0A=
=0A=
Thanks,=0A=
=0A=
Jonathan=0A=
=0A=

^ permalink raw reply

* RE: Memory allocation modifications in ibm_newemac driver
From: Jonathan Haws @ 2010-09-01 22:46 UTC (permalink / raw)
  To: Jonathan Haws, linux-kernel@vger.kernel.org,
	linux-net@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <5ABE8A5E96C0364CAF2B4DC1DEB7CEAC127A301B@Mercury.usurf.usu.edu>

Apparently I spoke too soon - sorry about that.  I am still getting the err=
or when I try to write to disk and receive on the network at the same time.=
  Here is the output:=0A=
=0A=
blastee: page allocation failure. order:1, mode:0x4020=0A=
Call Trace:=0A=
[ccea9a40] [c0006ef0] show_stack+0x44/0x16c (unreliable)=0A=
[ccea9a80] [c006f9f0] __alloc_pages_nodemask+0x38c/0x4f8=0A=
[ccea9b00] [c0095008] __slab_alloc+0x594/0x5e0=0A=
[ccea9b40] [c0095a08] __kmalloc_track_caller+0xe8/0xf0=0A=
[ccea9b60] [c01c848c] __alloc_skb+0x60/0x140=0A=
[ccea9b80] [c01a7df8] emac_poll_rx+0x568/0x768=0A=
[ccea9bc0] [c01a28e4] mal_poll+0xa8/0x1ec=0A=
[ccea9bf0] [c01d3eec] net_rx_action+0x9c/0x1b4=0A=
[ccea9c20] [c003b3c0] __do_softirq+0xc4/0x148=0A=
[ccea9c60] [c0004d18] do_softirq+0x78/0x80=0A=
[ccea9c70] [c003b67c] local_bh_enable+0xc0/0xd8=0A=
[ccea9c80] [c01c29bc] lock_sock_nested+0xc0/0xdc=0A=
[ccea9cc0] [c0212cb4] udp_recvmsg+0x318/0x3a4=0A=
[ccea9d10] [c01c2334] sock_common_recvmsg+0x3c/0x60=0A=
[ccea9d30] [c01c06c4] sock_recvmsg+0xb8/0xf0=0A=
[ccea9e20] [c01c09b0] sys_recvfrom+0x8c/0xfc=0A=
[ccea9f00] [c01c18d4] sys_socketcall+0x128/0x1f8=0A=
[ccea9f40] [c000f434] ret_from_syscall+0x0/0x3c=0A=
Mem-Info:=0A=
DMA per-cpu:=0A=
CPU    0: hi:   90, btch:  15 usd:  48=0A=
Active_anon:28 active_file:807 inactive_anon:85=0A=
 inactive_file:171 unevictable:0 dirty:0 writeback:0 unstable:0=0A=
 free:506 slab:53530 mapped:362 pagetables:19 bounce:0=0A=
DMA free:2024kB min:2036kB low:2544kB high:3052kB active_anon:112kB inactiv=
e_anon:340kB active_file:3228kB inactive_file:684kB unevictable:0kB present=
:260096kB pages_scanned:0 all_unreclaimable? no=0A=
lowmem_reserve[]: 0 0 0=0A=
DMA: 410*4kB 28*8kB 4*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0=
*2048kB 0*4096kB =3D 2024kB=0A=
978 total pagecache pages=0A=
0 pages in swap cache=0A=
Swap cache stats: add 0, delete 0, find 0/0=0A=
Free swap  =3D 0kB=0A=
Total swap =3D 0kB=0A=
65536 pages RAM=0A=
1400 pages reserved=0A=
1001 pages shared=0A=
62828 pages non-shared=0A=
SLUB: Unable to allocate memory on node -1 (gfp=3D0x20)=0A=
  cache: kmalloc-8192, object size: 8192, buffer size: 8192, default order:=
 3, min order: 1=0A=
  node 0: slabs: 7140, objs: 25809, free: 0=0A=
=0A=
Can anyone explain to me why I would be getting this error in the first pla=
ce?  Why is it failing to allocate a page when there are pages available?  =
That does not make any sense to me.=0A=
=0A=
Thanks,=0A=
=0A=
Jonathan=

^ permalink raw reply

* Re: [v1 PATCH] ucc_geth: fix ethtool set ring param bug
From: Liang Li @ 2010-09-02  0:50 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, avorontsov, davem, linuxppc-dev
In-Reply-To: <1283348550.2268.7.camel@achroite.uk.solarflarecom.com>

On Wed, Sep 01, 2010 at 02:42:30PM +0100, Ben Hutchings wrote:
> On Wed, 2010-09-01 at 09:43 +0800, Liang Li wrote:
> > It's common sense that when we should do change to driver ring
> > desc/buffer etc only after 'stop/shutdown' the device. When we
> > do change while devices/driver is running, kernel oops occur:
> [...]
> > -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > -
> >  	if (netif_running(netdev)) {
> > -		/* FIXME: restart automatically */
> > -		printk(KERN_INFO
> > -			"Please re-open the interface.\n");
> > +		u16 rx_t;
> > +		u16 tx_t;
> > +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> > +		ucc_geth_close(netdev);
> > +
> > +		rx_t = ug_info->bdRingLenRx[queue];
> > +		tx_t = ug_info->bdRingLenTx[queue];
> > +
> > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > +
> > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > +		ret = ucc_geth_open(netdev);
> > +		if (ret) {
> > +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> > +					" interface %s failed. Please try to make the interface "
> > +					" down, then try again.\n", netdev->name);
> > +			ug_info->bdRingLenRx[queue] = rx_t;
> > +			ug_info->bdRingLenTx[queue] = tx_t;
> > +		}
> [...]
> 
> Bringing the interface down will call ucc_geth_close(), which will try
> to free resources that have not been allocated!

Sorry, I did not understand you on this point. There is no
ucc_geth_close when 'open fail'. What you mean here exactly?

> 
> If you cannot roll back a failed change then at least use dev_close()
> and dev_open() rather than calling ucc_geth_{close,open}() directly, so
> that the interface state is updated correctly.  Or just refuse to make
> the change if the interface is up.

That make things more simply but I do not think that is necessary.
Since there is no such restriction exist in most NIC drivers. :)
Actually I did not see the 'fail of reopen' case. So I assume you
may witnessed similar 'open fail' case in some rare cases. May you
please give more input on this then I can 'make re-open safer' here.

Thanks,
				-Liang Li

> 
> Ben.
> 
> -- 
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
From: Michael Neuling @ 2010-09-02  1:02 UTC (permalink / raw)
  To: Darren Hart
  Cc: Stephen Rothwell, Gautham R Shenoy, Josh Triplett, Steven Rostedt,
	linuxppc-dev, Will Schmidt, Paul Mackerras, Ankita Garg,
	Thomas Gleixner
In-Reply-To: <4C7EBAA8.7030601@us.ibm.com>



In message <4C7EBAA8.7030601@us.ibm.com> you wrote:
> On 09/01/2010 12:59 PM, Steven Rostedt wrote:
> > On Wed, 2010-09-01 at 11:47 -0700, Darren Hart wrote:
> > 
> >> from tip/rt/2.6.33 causes the preempt_count() to change across the cede
> >> call.  This patch appears to prevents the proxy preempt_count assignment
> >> from happening. This non-local-cpu assignment to 0 would cause an
> >> underrun of preempt_count() if the local-cpu had disabled preemption
> >> prior to the assignment and then later tried to enable it. This appears
> >> to be the case with the stack of __trace_hcall* calls preceeding the
> >> return from extended_cede_processor() in the latency format trace-cmd
> >> report:
> >>
> >>   <idle>-0       1d....   201.252737: function:             .cpu_die
> > 
> > Note, the above 1d.... is a series of values. The first being the CPU,
> > the next if interrupts are disabled, the next if the NEED_RESCHED flag
> > is set, the next is softirqs enabled or disabled, next the
> > preempt_count, and finally the lockdepth count.
> > 
> > Here we only care about the preempt_count, which is zero when '.' and a
> > number if it is something else. It is the second to last field in that
> > list.
> > 
> > 
> >>   <idle>-0       1d....   201.252738: function:                .pseries_ma
ch_cpu_die
> >>   <idle>-0       1d....   201.252740: function:                   .idle_ta
sk_exit
> >>   <idle>-0       1d....   201.252741: function:                      .swit
ch_slb
> >>   <idle>-0       1d....   201.252742: function:                   .xics_te
ardown_cpu
> >>   <idle>-0       1d....   201.252743: function:                      .xics
_set_cpu_priority
> >>   <idle>-0       1d....   201.252744: function:             .__trace_hcall
_entry
> >>   <idle>-0       1d..1.   201.252745: function:                .probe_hcal
l_entry
> > 
> >                        ^
> >                 preempt_count set to 1
> > 
> >>   <idle>-0       1d..1.   201.252746: function:             .__trace_hcall
_exit
> >>   <idle>-0       1d..2.   201.252747: function:                .probe_hcal
l_exit
> >>   <idle>-0       1d....   201.252748: function:             .__trace_hcall
_entry
> >>   <idle>-0       1d..1.   201.252748: function:                .probe_hcal
l_entry
> >>   <idle>-0       1d..1.   201.252750: function:             .__trace_hcall
_exit
> >>   <idle>-0       1d..2.   201.252751: function:                .probe_hcal
l_exit
> >>   <idle>-0       1d....   201.252752: function:             .__trace_hcall
_entry
> >>   <idle>-0       1d..1.   201.252753: function:                .probe_hcal
l_entry
> >                    ^   ^
> >                   CPU  preempt_count
> > 
> > Entering the function probe_hcall_entry() the preempt_count is 1 (see
> > below). But probe_hcall_entry does:
> > 
> > 	h = &get_cpu_var(hcall_stats)[opcode / 4];
> > 
> > Without doing the put (which it does in probe_hcall_exit())
> > 
> > So exiting the probe_hcall_entry() the prempt_count is 2.
> > The trace_hcall_entry() will do a preempt_enable() making it leave as 1.
> > 
> > 
> >>   offon.sh-3684  6.....   201.466488: bprint:               .smp_pSeries_k
ick_cpu : resetting pcnt to 0 for cpu 1
> > 
> > This is CPU 6, changing the preempt count from 1 to 0.
> > 
> >>
> >> preempt_count() is reset from 1 to 0 by smp_startup_cpu() without the
> >> QCSS_NOT_STOPPED check from the patch above.
> >>
> >>   <idle>-0       1d....   201.466503: function:             .__trace_hcall
_exit
> > 
> > Note: __trace_hcall_exit() and __trace_hcall_entry() basically do:
> > 
> >  preempt_disable();
> >  call probe();
> >  preempt_enable();
> > 
> > 
> >>   <idle>-0       1d..1.   201.466505: function:                .probe_hcal
l_exit
> > 
> > The preempt_count of 1 entering the probe_hcall_exit() is because of the
> > preempt_disable() shown above. It should have been entered as a 2.
> > 
> > But then it does:
> > 
> > 
> > 	put_cpu_var(hcall_stats);
> > 
> > making preempt_count 0.
> > 
> > But the preempt_enable() in the trace_hcall_exit() causes this to be -1.
> > 
> > 
> >>   <idle>-0       1d.Hff.   201.466507: bprint:               .pseries_mach
_cpu_die : after cede: ffffffff
> >>
> >> With the preempt_count() being one less than it should be, the final
> >> preempt_enable() in the trace_hcall path drops preempt_count to
> >> 0xffffffff, which of course is an illegal value and leads to a crash.
> > 
> > I'm confused to how this works in mainline?
> 
> Turns out it didn't. 2.6.33.5 with CONFIG_PREEMPT=y sees this exact same
> behavior. The following, part of the 2.6.33.6 stable release, prevents
> this from happening:
> 
> aef40e87d866355ffd279ab21021de733242d0d5
> powerpc/pseries: Only call start-cpu when a CPU is stopped
> 
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -82,6 +82,12 @@ static inline int __devinit smp_startup_cpu(unsigned
> int lcpu)
> 
>         pcpu = get_hard_smp_processor_id(lcpu);
> 
> +       /* Check to see if the CPU out of FW already for kexec */
> +       if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
> +               cpu_set(lcpu, of_spin_map);
> +               return 1;
> +       }
> +
>         /* Fixup atomic count: it exited inside IRQ handler. */
>         task_thread_info(paca[lcpu].__current)->preempt_count   = 0;
> 
> The question is now, Is this the right fix? If so, perhaps we can update
> the comment to be a bit more clear and not refer solely to kexec.
> 
> Michael Neuling, can you offer any thoughts here? We hit this EVERY
> TIME, which makes me wonder if the offline/online path could do this
> without calling smp_startup_cpu at all.

We need to call smp_startup_cpu on boot when we the cpus are still in
FW.  smp_startup_cpu does this for us on boot.

I'm wondering if we just need to move the test down a bit to make sure
the preempt_count is set.  I've not been following this thread, but
maybe this might work?

Untested patch below...

Mikey

diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 0317cce..3afaba4 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -104,18 +104,18 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
 
 	pcpu = get_hard_smp_processor_id(lcpu);
 
-	/* Check to see if the CPU out of FW already for kexec */
-	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
-		cpumask_set_cpu(lcpu, of_spin_mask);
-		return 1;
-	}
-
 	/* Fixup atomic count: it exited inside IRQ handler. */
 	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
 
 	if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
 		goto out;
 
+	/* Check to see if the CPU out of FW already for kexec */
+	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
+		cpumask_set_cpu(lcpu, of_spin_mask);
+		return 1;
+	}
+
 	/* 
 	 * If the RTAS start-cpu token does not exist then presume the
 	 * cpu is already spinning.

^ permalink raw reply related

* Re: [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2010-09-02  1:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linuxppc-dev list, Andrew Morton, Linux Kernel list
In-Reply-To: <1283234198.2151.15.camel@pasglop>

On Tue, 2010-08-31 at 15:56 +1000, Benjamin Herrenschmidt wrote:
> Hi Linus !
> 
> Here's a few small fixes, one is an important fix for a nasty regression
> breaking pseries machine running under hypervisor... oops !

I updated this with some embedded fixed from Kumar and a fix for a new
deadlock in the pseries dlpar code.

New log below.

Cheers,
Ben.

The following changes since commit 2bfc96a127bc1cc94d26bfaa40159966064f9c8c:
  Linus Torvalds (1):
        Linux 2.6.36-rc3

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

Alexander Graf (1):
      powerpc/85xx: Fix compilation of mpc85xx_mds.c

Anton Vorontsov (1):
      powerpc/85xx: Add P1021 PCI IDs and quirks

Julia Lawall (2):
      arch/powerpc/platforms/83xx/mpc837x_mds.c: Add missing iounmap
      arch/powerpc/sysdev/qe_lib/qe.c: Add of_node_put to avoid memory leak

Kumar Gala (1):
      powerpc/85xx: Fix compile issue with p1022_ds due to lmb rename to memblock

Li Yang (1):
      fsl_rio: fix compile errors

Matthew McClintock (1):
      powerpc/kexec: Adds correct calling convention for kexec purgatory

Michael Neuling (1):
      powerpc: Don't use kernel stack with translation off

Nathan Fontenot (1):
      powerpc/pseries: Correct rtas_data_buf locking in dlpar code

Paul Mackerras (1):
      powerpc/perf_event: Reduce latency of calling perf_event_do_pending

 arch/powerpc/kernel/head_64.S             |   12 ++++++--
 arch/powerpc/kernel/misc_32.S             |    3 ++
 arch/powerpc/kernel/time.c                |   23 +++++++--------
 arch/powerpc/platforms/83xx/mpc837x_mds.c |    9 ++++--
 arch/powerpc/platforms/85xx/mpc85xx_mds.c |    1 +
 arch/powerpc/platforms/85xx/p1022_ds.c    |    4 +-
 arch/powerpc/platforms/pseries/dlpar.c    |   42 ++++++++++++++++++++---------
 arch/powerpc/sysdev/fsl_pci.c             |    2 +
 arch/powerpc/sysdev/fsl_rio.c             |    6 +++-
 arch/powerpc/sysdev/qe_lib/qe.c           |    1 +
 include/linux/pci_ids.h                   |    2 +
 11 files changed, 71 insertions(+), 34 deletions(-)

^ permalink raw reply

* Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
From: Michael Neuling @ 2010-09-02  3:46 UTC (permalink / raw)
  To: Darren Hart
  Cc: Stephen Rothwell, Gautham R Shenoy, Josh Triplett, Steven Rostedt,
	linuxppc-dev, Will Schmidt, Paul Mackerras, Ankita Garg,
	Thomas Gleixner
In-Reply-To: <4C7E9FC1.60004@us.ibm.com>

> +       /* Check to see if the CPU out of FW already for kexec */

Wow, that comment is shit.  The checkin comment in
aef40e87d866355ffd279ab21021de733242d0d5 is much better.

> This comment is really confusing to me. I _think_ it is saying that this test
> determines if the CPU is done executing firmware code and has begun executing
> OS code.... Is that right?

Yeah.  

It means for a normal boot, the CPU will not have started yet (still in
firmware (FW)) so we have to call FW to bring it out.  In the kexec case
though, the CPU will have started already (it's spinning in the kernel)
so we don't have to call FW to bring it back out again.  To distinguish
between these two cases, we ask FW if the CPU has started or not (via
smp_query_cpu_stopped()) and if it's already start, don't restart it.

Originally, we could call FW to start a cpu that was already started,
but FW changed recently and stopped allowing us to do this.  Hence this
patch.

Mikey

^ permalink raw reply

* Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
From: Steven Rostedt @ 2010-09-02  4:06 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Stephen Rothwell, Darren Hart, Gautham R Shenoy, Josh Triplett,
	linuxppc-dev, Will Schmidt, Paul Mackerras, Ankita Garg,
	Thomas Gleixner
In-Reply-To: <11579.1283389322@neuling.org>

On Thu, 2010-09-02 at 11:02 +1000, Michael Neuling wrote:

> We need to call smp_startup_cpu on boot when we the cpus are still in
> FW.  smp_startup_cpu does this for us on boot.
> 
> I'm wondering if we just need to move the test down a bit to make sure
> the preempt_count is set.  I've not been following this thread, but
> maybe this might work?

Egad no! Setting the preempt_count to zero _is_ the bug. I think Darren
even said that adding the exit prevented the bug (although now he's
hitting a hard lockup someplace else). The original code he was using
did not have the condition to return for kexec. It was just a
coincidence that this code helped in bringing a CPU back online.

> 
> Untested patch below...
> 
> Mikey
> 
> diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
> index 0317cce..3afaba4 100644
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -104,18 +104,18 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
>  
>  	pcpu = get_hard_smp_processor_id(lcpu);
>  
> -	/* Check to see if the CPU out of FW already for kexec */
> -	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
> -		cpumask_set_cpu(lcpu, of_spin_mask);
> -		return 1;
> -	}
> -
>  	/* Fixup atomic count: it exited inside IRQ handler. */
>  	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;

We DON'T want to do the above. It's nasty! This is one CPU's task
touching an intimate part of another CPU's task. It's equivalent of me
putting my hand down you wife's blouse. It's offensive, and rude.

OK, if the CPU was never online, then you can do what you want. But what
we see is that this fails on CPU hotplug.  You stop a CPU, and it goes
into this cede_processor() call. When you wake it up, suddenly the task
on that woken CPU has its preempt count fscked up.  This was really
really hard to debug. We thought it was stack corruption or something.
But it ended up being that this code has one CPU touching the breasts of
another CPU. This code is a pervert!

What the trace clearly showed, was that we take down a CPU, and in doing
so, the code on that CPU set the preempt count to 1, and it expected to
have it as 1 when it returned. But the code that kicked started this CPU
back to life (bring the CPU back online), set the preempt count on the
task of that CPU to 0, and screwed everything up.

-- Steve

>  
>  	if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
>  		goto out;
>  
> +	/* Check to see if the CPU out of FW already for kexec */
> +	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
> +		cpumask_set_cpu(lcpu, of_spin_mask);
> +		return 1;
> +	}
> +
>  	/* 
>  	 * If the RTAS start-cpu token does not exist then presume the
>  	 * cpu is already spinning.
> 

^ permalink raw reply

* RE: Memory allocation modifications in ibm_newemac driver
From: Benjamin Herrenschmidt @ 2010-09-02  4:18 UTC (permalink / raw)
  To: Jonathan Haws
  Cc: linux-net@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <5ABE8A5E96C0364CAF2B4DC1DEB7CEAC127A301B@Mercury.usurf.usu.edu>

On Wed, 2010-09-01 at 22:04 +0000, Jonathan Haws wrote:
> Okay, I think I have all the issues worked out and can now send and
> receive any size packet without a hiccup.  I have tested this in our
> system setup as well with data being sent out to disk and did not see
> any problems there either (since it only ever allocates a single page,
> never more).
> 
> Is this something that may be wanted in the mainline?  I have not run
> full benchmarks, but I anticipate that my modified driver is slightly
> slower than the mainline driver because we keep track of an SKB ring,
> as well as a ring of pages and allocate both on each packet received.

Well, it won't hurt to post the patch to review :-)

Make sure to CC me as I more/less maintain that driver.

Cheers,
Ben.

^ permalink raw reply

* RE: Memory allocation modifications in ibm_newemac driver
From: Benjamin Herrenschmidt @ 2010-09-02  4:18 UTC (permalink / raw)
  To: Jonathan Haws
  Cc: linux-net@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <5ABE8A5E96C0364CAF2B4DC1DEB7CEAC127A305B@Mercury.usurf.usu.edu>

On Wed, 2010-09-01 at 22:46 +0000, Jonathan Haws wrote:
> 
> Can anyone explain to me why I would be getting this error in the
> first place?  Why is it failing to allocate a page when there are
> pages available?  That does not make any sense to me.

order:1

It's failing to allocate -two- pages.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries
From: Darren Hart @ 2010-09-02  6:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Stephen Rothwell, Michael Neuling, Gautham R Shenoy,
	Josh Triplett, linuxppc-dev, Will Schmidt, Paul Mackerras,
	Ankita Garg, Thomas Gleixner
In-Reply-To: <1283400367.2356.69.camel@gandalf.stny.rr.com>

On 09/01/2010 09:06 PM, Steven Rostedt wrote:
> On Thu, 2010-09-02 at 11:02 +1000, Michael Neuling wrote:
> 
>> We need to call smp_startup_cpu on boot when we the cpus are still in
>> FW.  smp_startup_cpu does this for us on boot.
>>
>> I'm wondering if we just need to move the test down a bit to make sure
>> the preempt_count is set.  I've not been following this thread, but
>> maybe this might work?
> 
> Egad no! Setting the preempt_count to zero _is_ the bug. I think Darren
> even said that adding the exit prevented the bug (although now he's
> hitting a hard lockup someplace else). The original code he was using
> did not have the condition to return for kexec. It was just a
> coincidence that this code helped in bringing a CPU back online.
> 
>>
>> Untested patch below...
>>
>> Mikey
>>
>> diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
>> index 0317cce..3afaba4 100644
>> --- a/arch/powerpc/platforms/pseries/smp.c
>> +++ b/arch/powerpc/platforms/pseries/smp.c
>> @@ -104,18 +104,18 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
>>  
>>  	pcpu = get_hard_smp_processor_id(lcpu);
>>  
>> -	/* Check to see if the CPU out of FW already for kexec */
>> -	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
>> -		cpumask_set_cpu(lcpu, of_spin_mask);
>> -		return 1;
>> -	}
>> -
>>  	/* Fixup atomic count: it exited inside IRQ handler. */
>>  	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
> 
> We DON'T want to do the above. It's nasty! This is one CPU's task
> touching an intimate part of another CPU's task. It's equivalent of me
> putting my hand down you wife's blouse. It's offensive, and rude.
> 
> OK, if the CPU was never online, then you can do what you want. But what
> we see is that this fails on CPU hotplug.  You stop a CPU, and it goes
> into this cede_processor() call. When you wake it up, suddenly the task
> on that woken CPU has its preempt count fscked up.  This was really
> really hard to debug. We thought it was stack corruption or something.
> But it ended up being that this code has one CPU touching the breasts of
> another CPU. This code is a pervert!
> 
> What the trace clearly showed, was that we take down a CPU, and in doing
> so, the code on that CPU set the preempt count to 1, and it expected to
> have it as 1 when it returned. But the code that kicked started this CPU
> back to life (bring the CPU back online), set the preempt count on the
> task of that CPU to 0, and screwed everything up.


Right, what Steve said.

This patch resolved the problem, but it did so inadvertently while
trying to fix kexec. I'm wondering if the offline/online code needs to
be updated to never call smp_startup_cpu(). Or perhaps this is the right
fix, and the comment needs to be cleaned up so that it isn't only for
kexec.

With this in place, we no longer see the preempt_count dropping below
zero. However, if I offline/online a CPU about 246 times I hit the
opposite problem, a preempt_count() overflow. There appears to be a
missing preempt_enable() somewhere in the offline/online paths.

On 2.6.33.7 with CONFIG_PREEMPT=y I run the following test:

# cat offon_loop.sh 
#!/bin/sh
for i in `seq 100`; do
        echo "Iteration $i"
        echo 0 > /sys/devices/system/cpu/cpu1/online
        echo 1 > /sys/devices/system/cpu/cpu1/online
done

And see the overflow:

Iteration 238
Iteration 239
Iteration 240

Message from syslogd@igoort1 at Sep  1 17:36:45 ...
 kernel:------------[ cut here ]------------
Iteration 241
Iteration 242
Iteration 243
Iteration 244
Iteration 245

With the following on the console. This is the:
	/*
	 * Spinlock count overflowing soon?
	 */
	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
				PREEMPT_MASK - 10);
test. Max preempt count is 256.

Somewhere we are now MISSING a sub_preempt_count() or preempt_enable().

Badness at kernel/sched.c:5372
NIP: c000000000694f14 LR: c000000000694ef8 CTR: c00000000005b0a0
REGS: c00000008e776ae0 TRAP: 0700   Not tainted  (2.6.33.7-dirty)
MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 28000082  XER: 00000000
TASK = c00000010e6fc040[0] 'swapper' THREAD: c00000008e774000 CPU: 1
GPR00: 0000000000000000 c00000008e776d60 c000000000af2ab8 0000000000000001
GPR04: c00000008e776fb8 0000000000000004 0000000000000001 ffffffffff676980
GPR08: 0000000000000400 c000000000bcd930 0000000000000001 c000000000b2d360
GPR12: 0000000000000002 c000000000b0f680 0000000000000000 000000000f394acc
GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000008e777880
GPR20: c000000000e05160 c00000008e777870 7fffffffffffffff c000000000b0f480
GPR24: c0000000009da400 0000000000000002 0000000000000000 0000000000000001
GPR28: c00000008e776fb8 0000000000000001 c000000000a75bf0 c00000008e776d60
NIP [c000000000694f14] .add_preempt_count+0xc0/0xe0
LR [c000000000694ef8] .add_preempt_count+0xa4/0xe0
Call Trace:
[c00000008e776d60] [c00000008e776e00] 0xc00000008e776e00 (unreliable)
[c00000008e776df0] [c00000000005b0d8] .probe_hcall_entry+0x38/0x94
[c00000008e776e80] [c00000000004ef70] .__trace_hcall_entry+0x80/0xd4
[c00000008e776f10] [c00000000004f96c] .plpar_hcall_norets+0x50/0xd0
[c00000008e776f80] [c000000000055044] .smp_xics_message_pass+0x110/0x244
[c00000008e777030] [c000000000034824] .smp_send_reschedule+0x5c/0x78
[c00000008e7770c0] [c00000000006a6bc] .resched_task+0xb4/0xd8
[c00000008e777150] [c00000000006a840] .check_preempt_curr_idle+0x2c/0x44
[c00000008e7771e0] [c00000000007a868] .try_to_wake_up+0x460/0x548
[c00000008e7772b0] [c00000000007a98c] .default_wake_function+0x3c/0x54
[c00000008e777350] [c0000000000ab3b0] .autoremove_wake_function+0x44/0x84
[c00000008e777400] [c00000000006449c] .__wake_up_common+0x7c/0xf4
[c00000008e7774c0] [c00000000006a5d8] .__wake_up+0x60/0x90
[c00000008e777570] [c0000000000810dc] .printk_tick+0x84/0xa8
[c00000008e777600] [c000000000095c90] .update_process_times+0x64/0xa4
[c00000008e7776a0] [c0000000000bdcec] .tick_sched_timer+0xd0/0x120
[c00000008e777750] [c0000000000afe7c] .__run_hrtimer+0x1a0/0x29c
[c00000008e777800] [c0000000000b02a4] .hrtimer_interrupt+0x124/0x278
[c00000008e777900] [c00000000002ea90] .timer_interrupt+0x1dc/0x2e4
[c00000008e7779a0] [c000000000003700] decrementer_common+0x100/0x180
--- Exception: 901 at .raw_local_irq_restore+0x48/0x54
    LR = .cpu_idle+0x12c/0x208
[c00000008e777c90] [c000000000b21130] ppc_md+0x0/0x240 (unreliable)
[c00000008e777cd0] [c000000000016a60] .cpu_idle+0x120/0x208
[c00000008e777d70] [c00000000069dbec] .start_secondary+0x394/0x3d4
[c00000008e777e30] [c000000000008278] .start_secondary_resume+0x10/0x14
Instruction dump:
409e0034 78630620 2ba300f4 40fd0028 4bd0b711 60000000 2fa30000 419e0018
e93e8880 80090000 2f800000 409e0008 <0fe00000> 383f0090 e8010010 7c0803a6

BUG: scheduling while atomic: swapper/0/0x000000f0
Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan]
Call Trace:
[c00000008e7779a0] [c000000000014280] .show_stack+0xd8/0x218 (unreliable)
[c00000008e777a80] [c0000000006980b0] .dump_stack+0x28/0x3c
[c00000008e777b00] [c000000000071954] .__schedule_bug+0x94/0xb8
[c00000008e777b90] [c00000000068db40] .schedule+0xf8/0xbf4
[c00000008e777cd0] [c000000000016b34] .cpu_idle+0x1f4/0x208
[c00000008e777d70] [c00000000069dbec] .start_secondary+0x394/0x3d4
[c00000008e777e30] [c000000000008278] .start_secondary_resume+0x10/0x14

I'll spend tomorrow collecting traces and trying to see who's groping who this time...

--
Darren



> 
> -- Steve
> 
> 
>>  
>>  	if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
>>  		goto out;
>>  
>> +	/* Check to see if the CPU out of FW already for kexec */
>> +	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
>> +		cpumask_set_cpu(lcpu, of_spin_mask);
>> +		return 1;
>> +	}
>> +
>>  	/* 
>>  	 * If the RTAS start-cpu token does not exist then presume the
>>  	 * cpu is already spinning.
>>
> 
> 


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply

* Re:
From: Wolfgang Denk @ 2010-09-02  6:14 UTC (permalink / raw)
  To: Rupjyoti Sarmah, Prodyut Hazarika, Mark Miesfeld; +Cc: linuxppc-dev
In-Reply-To: <20100818205646.57783157D71@gemini.denx.de>

Dear Rupjyoti, Prodyut, Mark,

two weeks ago I wrote:

In message <20100818205646.57783157D71@gemini.denx.de> you wrote:
> 
> drivers/ata/sata_dwc_460ex.c fails to build in current mainline:
...
> Do you have any hints how to fix that?

Any comments or ideas how to fix this?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Time is an illusion perpetrated by the manufacturers of space.

^ permalink raw reply

* RE: P1021MDS QE Ethernet Ports
From: Ioannis Kokkoris @ 2010-09-02  8:26 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <COL120-W4274596E983E02BCBF4AD0A78B0@phx.gbl>


> From: johnkokko@hotmail.com
> To: linuxppc-dev@lists.ozlabs.org
> Subject: P1021MDS QE Ethernet Ports
> Date: Wed=2C 1 Sep 2010 15:11:56 +0300
>
>
> Hello=2C
>
> we are seeing a strange behavior when trying to use the QE Ethernet inter=
faces.
> ENET5 (UCC5 - RMII) interface on P1021MDS boards does not come up if ther=
e is no physical link on the ENET1 (UCC1 - MII) Port.
> It seems that interrupts from ENET5 are normally received but the link co=
mes up and works properly only if we have physical connection on ENET1.
>
So far I found the following:

After adding traces=2C it seems that genphy_update_link() polls the correct=
 device=2C with the correct address (0x03)=2C but although a physical link =
is present in ENET5=2C the polling is not successful until there is a link =
in ENET1 (address 0x02)!

genphy_update_link: Dev: Micrel KS8041 ADD: 3 Status read 0x7849=A0 (withou=
t ENET1 Link)
genphy_update_link: Dev: Micrel KS8041 ADD: 3 Status read 0x786D=A0 (with=
=A0=A0=A0 ENET1 Link)

How does the MDIO of ENET1 affect the management of the physical interface =
in a different HW address?


> Can anyone think of a possible reason for this behavior=2C is there a way=
 to trace this problem?
> I can provide any further information needed.
>
> Any help would be appreciated=2C
> Regards=2C
> John
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
 		 	   		  =

^ permalink raw reply

* RE: [PATCH v2 4/6] mtd: m25p80: add a read function to read page by page
From: Hu Mingkai-B21284 @ 2010-09-02  9:04 UTC (permalink / raw)
  To: Grant Likely, David Woodhouse
  Cc: linuxppc-dev, Gala Kumar-B11780, linux-mtd, Zang Roy-R61911,
	spi-devel-general
In-Reply-To: <AANLkTikM5je7Ckt=7bP6fj7OKCaroSn_AqgcS4T+Bv5d@mail.gmail.com>

[adding mtd list]

Hi guys,

Could you please take a look at this patch and give some suggestion?

Thanks,
Mingkai

> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of =
Grant
> Likely
> Sent: Tuesday, August 10, 2010 3:00 PM
> To: Hu Mingkai-B21284; David Woodhouse
> Cc: linuxppc-dev@ozlabs.org; spi-devel-general@lists.sourceforge.net; =
Gala
> Kumar-B11780; Zang Roy-R61911
> Subject: Re: [PATCH v2 4/6] mtd: m25p80: add a read function to read =
page by
> page
>=20
> [adding David Woodhouse]
>=20
> (Btw, you should cc: David Woodhouse and the mtd list on the MTD =
patches).
>=20
> On Mon, Aug 2, 2010 at 1:52 AM, Mingkai Hu <Mingkai.hu@freescale.com> =
wrote:
> > For Freescale's eSPI controller, the max transaction length one time
> > is limitted by the SPCOM[TRANSLEN] field which is 0xFFFF. When used
> > mkfs.ext2 command to create ext2 filesystem on the flash, the read
> > length will exceed the max value of the SPCOM[TRANSLEN] field, so
> > change the read function to read page by page.
> >
> > For other SPI flash driver, also needed to supply the read function =
if
> > used the eSPI controller.
> >
> > Signed-off-by: Mingkai Hu <Mingkai.hu@freescale.com>
>=20
> This one bothers me, but I can't put my finger on it.  The flag feels =
like a
> controller specific hack.  David, can you take a look at this patch?
>=20
> g.
>=20
> > ---
> >
> > v2:
> > =A0- Add SPI_MASTER_TRANS_LIMIT flag to indicate the master's trans
> > length
> > =A0 limitation, so the MTD driver can select the correct transfer
> > behaviour
> > =A0 at driver probe time
> >
> > =A0drivers/mtd/devices/m25p80.c | =A0 78
> > ++++++++++++++++++++++++++++++++++++++++++
> > =A0drivers/spi/spi_fsl_espi.c =A0 | =A0 =A01 +
> > =A0include/linux/spi/spi.h =A0 =A0 =A0| =A0 =A01 +
> > =A03 files changed, 80 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/mtd/devices/m25p80.c
> > b/drivers/mtd/devices/m25p80.c index 5f00075..30e4568 100644
> > --- a/drivers/mtd/devices/m25p80.c
> > +++ b/drivers/mtd/devices/m25p80.c
> > @@ -376,6 +376,81 @@ static int m25p80_read(struct mtd_info *mtd,
> > loff_t from, size_t len,
> > =A0}
> >
> > =A0/*
> > + * Read an address range from the flash chip page by page.
> > + * Some controller has transaction length limitation such as the
> > + * Freescale's eSPI controller can only trasmit 0xFFFF bytes one
> > + * time, so we have to read page by page if the len is more than
> > + * the limitation.
> > + */
> > +static int m25p80_page_read(struct mtd_info *mtd, loff_t from, =
size_t
> > +len,
> > + =A0 =A0 =A0 size_t *retlen, u_char *buf)
> > +{
> > + =A0 =A0 =A0 struct m25p *flash =3D mtd_to_m25p(mtd);
> > + =A0 =A0 =A0 struct spi_transfer t[2];
> > + =A0 =A0 =A0 struct spi_message m;
> > + =A0 =A0 =A0 u32 i, page_size =3D 0;
> > +
> > + =A0 =A0 =A0 DEBUG(MTD_DEBUG_LEVEL2, "%s: %s %s 0x%08x, len %zd\n",
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
dev_name(&flash->spi->dev), __func__, "from",
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (u32)from, len);
> > +
> > + =A0 =A0 =A0 /* sanity checks */
> > + =A0 =A0 =A0 if (!len)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
> > +
> > + =A0 =A0 =A0 if (from + len > flash->mtd.size)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EINVAL;
> > +
> > + =A0 =A0 =A0 spi_message_init(&m);
> > + =A0 =A0 =A0 memset(t, 0, (sizeof t));
> > +
> > + =A0 =A0 =A0 /* NOTE:
> > + =A0 =A0 =A0 =A0* OPCODE_FAST_READ (if available) is faster.
> > + =A0 =A0 =A0 =A0* Should add 1 byte DUMMY_BYTE.
> > + =A0 =A0 =A0 =A0*/
> > + =A0 =A0 =A0 t[0].tx_buf =3D flash->command;
> > + =A0 =A0 =A0 t[0].len =3D m25p_cmdsz(flash) + FAST_READ_DUMMY_BYTE;
> > + =A0 =A0 =A0 spi_message_add_tail(&t[0], &m);
> > +
> > + =A0 =A0 =A0 t[1].rx_buf =3D buf;
> > + =A0 =A0 =A0 spi_message_add_tail(&t[1], &m);
> > +
> > + =A0 =A0 =A0 /* Byte count starts at zero. */
> > + =A0 =A0 =A0 if (retlen)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 *retlen =3D 0;
> > +
> > + =A0 =A0 =A0 mutex_lock(&flash->lock);
> > +
> > + =A0 =A0 =A0 /* Wait till previous write/erase is done. */
> > + =A0 =A0 =A0 if (wait_till_ready(flash)) {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* REVISIT status return?? */
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mutex_unlock(&flash->lock);
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 1;
> > + =A0 =A0 =A0 }
> > +
> > + =A0 =A0 =A0 /* Set up the write data buffer. */
> > + =A0 =A0 =A0 flash->command[0] =3D OPCODE_READ;
> > +
> > + =A0 =A0 =A0 for (i =3D page_size; i < len; i +=3D page_size) {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 page_size =3D len - i;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (page_size > flash->page_size)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 page_size =3D =
flash->page_size;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 m25p_addr2cmd(flash, from + i, =
flash->command);
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 t[1].len =3D page_size;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 t[1].rx_buf =3D buf + i;
> > +
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 spi_sync(flash->spi, &m);
> > +
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 *retlen +=3D m.actual_length - =
m25p_cmdsz(flash)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 - =
FAST_READ_DUMMY_BYTE;
> > + =A0 =A0 =A0 }
> > +
> > + =A0 =A0 =A0 mutex_unlock(&flash->lock);
> > +
> > + =A0 =A0 =A0 return 0;
> > +}
> > +
> > +/*
> > =A0* Write an address range to the flash chip. =A0Data must be =
written in
> > =A0* FLASH_PAGESIZE chunks. =A0The address range may be any size =
provided
> > =A0* it is within the physical boundaries.
> > @@ -877,6 +952,9 @@ static int __devinit m25p_probe(struct =
spi_device
> > *spi)
> > =A0 =A0 =A0 =A0flash->mtd.erase =3D m25p80_erase;
> > =A0 =A0 =A0 =A0flash->mtd.read =3D m25p80_read;
> >
> > + =A0 =A0 =A0 if (spi->master->flags & SPI_MASTER_TRANS_LIMIT)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 flash->mtd.read =3D m25p80_page_read;
> > +
> > =A0 =A0 =A0 =A0/* sst flash chips use AAI word program */
> > =A0 =A0 =A0 =A0if (info->jedec_id >> 16 =3D=3D 0xbf)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0flash->mtd.write =3D sst_write; diff =
--git
> > a/drivers/spi/spi_fsl_espi.c b/drivers/spi/spi_fsl_espi.c index
> > 61987cf..e15b7dc 100644
> > --- a/drivers/spi/spi_fsl_espi.c
> > +++ b/drivers/spi/spi_fsl_espi.c
> > @@ -470,6 +470,7 @@ static struct spi_master * __devinit
> > fsl_espi_probe(struct device *dev,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto err_probe;
> >
> > =A0 =A0 =A0 =A0master->setup =3D fsl_espi_setup;
> > + =A0 =A0 =A0 master->flags =3D SPI_MASTER_TRANS_LIMIT;
> >
> > =A0 =A0 =A0 =A0mpc8xxx_spi =3D spi_master_get_devdata(master);
> > =A0 =A0 =A0 =A0mpc8xxx_spi->spi_do_one_msg =3D fsl_espi_do_one_msg; =
diff --git
> > a/include/linux/spi/spi.h b/include/linux/spi/spi.h index
> > af56071..0729cbd 100644
> > --- a/include/linux/spi/spi.h
> > +++ b/include/linux/spi/spi.h
> > @@ -261,6 +261,7 @@ struct spi_master {
> > =A0#define SPI_MASTER_HALF_DUPLEX BIT(0) =A0 =A0 =A0 =A0 =A0/* can't =
do full
> > duplex */
> > =A0#define SPI_MASTER_NO_RX =A0 =A0 =A0 BIT(1) =A0 =A0 =A0 =A0 =A0/* =
can't do buffer
> > read */
> > =A0#define SPI_MASTER_NO_TX =A0 =A0 =A0 BIT(2) =A0 =A0 =A0 =A0 =A0/* =
can't do buffer
> > write */
> > +#define SPI_MASTER_TRANS_LIMIT BIT(3) =A0 =A0 =A0 =A0 =A0/* have =
trans length
> > +limit */
> >
> > =A0 =A0 =A0 =A0/* Setup mode and clock, etc (spi driver may call =
many times).
> > =A0 =A0 =A0 =A0 *
> > --
> > 1.6.4
> >
> >
> >
>=20
>=20
>=20
> --
> Grant Likely, B.Sc., P.Eng.
> Secret Lab Technologies Ltd.

^ permalink raw reply

* RE: [PATCH v2 1/6] spi/mpc8xxx: refactor the common code for SPI/eSPI controller
From: Hu Mingkai-B21284 @ 2010-09-02  8:27 UTC (permalink / raw)
  To: Grant Likely
  Cc: linuxppc-dev, Gala Kumar-B11780, Zang Roy-R61911,
	spi-devel-general
In-Reply-To: <AANLkTim=Qc-zA=9E0SfOB+8V48Y957MXkefD324xoq0w@mail.gmail.com>

Hi Grant and all,

Sorry for the delay, I'm on business trip some weeks ago.

> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of =
Grant
> Likely
> Sent: Tuesday, August 10, 2010 2:40 PM
> To: Hu Mingkai-B21284
> Cc: linuxppc-dev@ozlabs.org; spi-devel-general@lists.sourceforge.net; =
Gala
> Kumar-B11780; Zang Roy-R61911
> Subject: Re: [PATCH v2 1/6] spi/mpc8xxx: refactor the common code for =
SPI/eSPI
> controller
>=20
> On Mon, Aug 2, 2010 at 1:51 AM, Mingkai Hu <Mingkai.hu@freescale.com> =
wrote:
> > Refactor the common code in file spi_mpc8xxx.c to spi_fsl_lib.c used
> > by SPI/eSPI controller driver as a library, move the SPI controller
> > driver to a new file spi_fsl_spi.c, and leave the QE/CPM SPI
> > controller code in this file.
> >
> > Because the register map of the SPI controller and eSPI controller =
is
> > so different, also leave the code operated the register to the =
driver
> > code, not the common code.
> >
> > Signed-off-by: Mingkai Hu <Mingkai.hu@freescale.com>
> > ---
> > v2:
> > =A0- Rename spi_mpc8xxx.c to spi_fsl_lib.c, also the config name
> > =A0- Rename fsl_spi.c to spi_fsl_spi.c, also the config name
> > =A0- Move register map definiton from spi_fsl_lib.c to spi_fsl_spi.c
> > =A0- Break some funcions line in the arguments instead of the
> > declaration
> > =A0- Init bits_per_word to 0 to eliminate the else clause
> > =A0- Add brace for the else clause to match if clause
> > =A0- Drop the last entry's comma in the match table
> > =A0- move module_init() immediately after the init fsl_spi_init()
> > function
> >
> > =A0drivers/spi/Kconfig =A0 =A0 =A0 | =A0 20 +-
> > =A0drivers/spi/Makefile =A0 =A0 =A0| =A0 =A03 +-
> > =A0drivers/spi/spi_fsl_lib.c | =A0237 ++++++++
> > =A0drivers/spi/spi_fsl_lib.h | =A0119 ++++
> > =A0drivers/spi/spi_fsl_spi.c | 1173
> > +++++++++++++++++++++++++++++++++++++
> > =A0drivers/spi/spi_mpc8xxx.c | 1421
> > ---------------------------------------------
>=20
> This patch seems pretty good now.  However, the rename from =
spi_mpc8xxx.c to
> spi_fsl_lib.c should be done in a separate patch.  It is too difficult =
to track
> what has changed, when the file gets moved at the same time.  Move the =
file in
> one patch verbatim (with the required Makefile change), and then move =
out the
> fsl_spi specific bits in a second patch.
>=20

Ok, I'll spilt this patch.

Thanks,
Mingkai

^ permalink raw reply

* RE: [PATCH v2 2/6] eSPI: add eSPI controller support
From: Hu Mingkai-B21284 @ 2010-09-02  8:28 UTC (permalink / raw)
  To: Grant Likely
  Cc: linuxppc-dev, Gala Kumar-B11780, Zang Roy-R61911,
	spi-devel-general
In-Reply-To: <AANLkTin3W4++cAFcksVBX3QxeV8QF3d5T2tGy_KbEkRs@mail.gmail.com>



> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of =
Grant
> Likely
> Sent: Tuesday, August 10, 2010 2:45 PM
> To: Hu Mingkai-B21284
> Cc: linuxppc-dev@ozlabs.org; spi-devel-general@lists.sourceforge.net; =
Gala
> Kumar-B11780; Zang Roy-R61911
> Subject: Re: [PATCH v2 2/6] eSPI: add eSPI controller support
>=20
> On Mon, Aug 2, 2010 at 1:52 AM, Mingkai Hu <Mingkai.hu@freescale.com> =
wrote:
> > Add eSPI controller support based on the library code spi_fsl_lib.c.
> >
> > The eSPI controller is newer controller 85xx/Pxxx devices supported.
> > There're some differences comparing to the SPI controller:
> >
> > 1. Has different register map and different bit definition
> > =A0 So leave the code operated the register to the driver code, not
> > =A0 the common code.
> >
> > 2. Support 4 dedicated chip selects
> > =A0 The software can't controll the chip selects directly, The =
SPCOM[CS]
> > =A0 field is used to select which chip selects is used, and the
> > =A0 SPCOM[TRANLEN] field is set to tell the controller how long the =
CS
> > =A0 signal need to be asserted. So the driver doesn't need the
> > chipselect
> > =A0 related function when transfering data, just set corresponding
> > register
> > =A0 fields to controll the chipseclect.
> >
> > 3. Different Transmit/Receive FIFO access register behavior
> > =A0 For SPI controller, the Tx/Rx FIFO access register can hold only
> > =A0 one character regardless of the character length, but for eSPI
> > =A0 controller, the register can hold 4 or 2 characters according to
> > =A0 the character lengths. Access the Tx/Rx FIFO access register of =
the
> > =A0 eSPI controller will shift out/in 4/2 characters one time. For =
SPI
> > =A0 subsystem, the command and data are put into different =
transfers, so
> > =A0 we need to combine all the transfers to one transfer in order to
> > pass
> > =A0 the transfer to eSPI controller.
> >
> > Signed-off-by: Mingkai Hu <Mingkai.hu@freescale.com>
>=20
> I've not dug deep into this patch, but it seems pretty good.  I did =
notice one
> thing below...
> [...]
>=20
> > diff --git a/drivers/spi/spi_fsl_lib.h b/drivers/spi/spi_fsl_lib.h
> > index 774e1c8..0772c98 100644
> > --- a/drivers/spi/spi_fsl_lib.h
> > +++ b/drivers/spi/spi_fsl_lib.h
> > @@ -22,10 +22,12 @@
> > =A0struct mpc8xxx_spi {
> > =A0 =A0 =A0 =A0struct device *dev;
> > =A0 =A0 =A0 =A0struct fsl_spi_reg __iomem *base;
> > + =A0 =A0 =A0 struct fsl_espi_reg __iomem *espi_base;
>=20
> There's got to be a cleaner way to do this.  Rather than putting both =
pointers
> into mpc8xxx_spi, I suggest each driver use its own fsl_spi_priv and
> fsl_espi_priv wrapper structures that contain the controller specific =
properties.
>=20

Make sense, I'll change it.

Thanks,
Mingkai

^ permalink raw reply

* RE: [PATCH v2 6/6] DTS: add SPI flash(s25fl128p01) support on p4080ds and mpc8536ds board
From: Hu Mingkai-B21284 @ 2010-09-02  8:30 UTC (permalink / raw)
  To: Grant Likely
  Cc: linuxppc-dev, Gala Kumar-B11780, Zang Roy-R61911,
	spi-devel-general
In-Reply-To: <AANLkTinzea7-k3Wd6EsAy+ipThHMhkpMF3pCP7g_6O6J@mail.gmail.com>



> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of =
Grant
> Likely
> Sent: Tuesday, August 10, 2010 3:11 PM
> To: Hu Mingkai-B21284
> Cc: linuxppc-dev@ozlabs.org; spi-devel-general@lists.sourceforge.net; =
Gala
> Kumar-B11780; Zang Roy-R61911
> Subject: Re: [PATCH v2 6/6] DTS: add SPI flash(s25fl128p01) support on =
p4080ds
> and mpc8536ds board
>=20
> Hi Mingkai,
>=20
> one comment below.  Otherwise this patch looks good, and so does patch =
5.
>=20
> g.
>=20
> On Mon, Aug 2, 2010 at 1:52 AM, Mingkai Hu <Mingkai.hu@freescale.com> =
wrote:
> > Signed-off-by: Mingkai Hu <Mingkai.hu@freescale.com>
> > ---
> >
> > v2:
> > =A0- Remove the whitespace inconsitencies
> >
> > =A0arch/powerpc/boot/dts/mpc8536ds.dts | =A0 52
> > +++++++++++++++++++++++++++++++++++
> > =A0arch/powerpc/boot/dts/p4080ds.dts =A0 | =A0 11 +++-----
> > =A02 files changed, 56 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/boot/dts/mpc8536ds.dts
> > b/arch/powerpc/boot/dts/mpc8536ds.dts
> > index 815cebb..a75c10e 100644
> > --- a/arch/powerpc/boot/dts/mpc8536ds.dts
> > +++ b/arch/powerpc/boot/dts/mpc8536ds.dts
> > @@ -108,6 +108,58 @@
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0};
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0};
> >
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 spi@7000 {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 #address-cells =3D =
<1>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 #size-cells =3D <0>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 compatible =3D =
"fsl,mpc8536-espi";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reg =3D <0x7000 =
0x1000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 interrupts =3D <59 =
0x2>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 interrupt-parent =3D =
<&mpic>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
fsl,espi-num-chipselects =3D <4>;
> > +
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flash@0 {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
#address-cells =3D <1>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
#size-cells =3D <1>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
compatible =3D "spansion,s25sl12801";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reg =
=3D <0>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
spi-max-frequency =3D <40000000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
partition@u-boot {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 label =3D "u-boot";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 reg =3D <0x00000000 0x00100000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 read-only;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
partition@kernel {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 label =3D "kernel";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 reg =3D <0x00100000 0x00500000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 read-only;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
partition@dtb {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 label =3D "dtb";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 reg =3D <0x00600000 0x00100000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 read-only;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
partition@fs {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 label =3D "file system";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 reg =3D <0x00700000 0x00900000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flash@1 {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
compatible =3D "spansion,s25sl12801";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reg =
=3D <1>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
spi-max-frequency =3D <40000000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flash@2 {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
compatible =3D "spansion,s25sl12801";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reg =
=3D <2>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
spi-max-frequency =3D <40000000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flash@3 {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
compatible =3D "spansion,s25sl12801";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reg =
=3D <3>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
spi-max-frequency =3D <40000000>;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 };
> > +
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0dma@21300 {
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0#address-cells =3D =
<1>;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0#size-cells =3D <1>; =
diff --git
> > a/arch/powerpc/boot/dts/p4080ds.dts
> > b/arch/powerpc/boot/dts/p4080ds.dts
> > index 6b29eab..48437ad 100644
> > --- a/arch/powerpc/boot/dts/p4080ds.dts
> > +++ b/arch/powerpc/boot/dts/p4080ds.dts
> > @@ -236,22 +236,19 @@
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0};
> >
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0spi@110000 {
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cell-index =3D <0>;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0#address-cells =3D =
<1>;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0#size-cells =3D <0>;
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 compatible =3D =
"fsl,espi";
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 compatible =3D =
"fsl,mpc8536-espi";
>=20
> Should be more specific here by specifying the exact device; plus a =
list of what
> it is compatible with.  For example:
>=20
> compatible =3D "fsl,p4080-espi", "fsl,mpc5836-espi";
>=20
> the reason for this is that the driver for the existing part is still =
able to
> bind against the node, but if it ever needs it, then information about =
the
> specific device is available which can be used to (for example) figure =
out when
> to enable silicon bug workarounds.
>=20

Make sense, I'll add it.

Thanks,
Mingkai

^ permalink raw reply

* [PATCH 2/2] [PPC] Motion-PRO: Changed the default blinking rate for the status LED to 200 ms, as per the customer's request.
From: sposelenov @ 2010-09-02 10:20 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Sergei Poselenov, wd, dzu
In-Reply-To: <1283422832-9620-1-git-send-email-sposelenov@emcraft.com>

From: Sergei Poselenov <sposelenov@emcraft.com>

Signed-off-by: Sergei Poselenov <sposelenov@emcraft.com>
---
 arch/powerpc/boot/dts/motionpro.dts |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/boot/dts/motionpro.dts b/arch/powerpc/boot/dts/motionpro.dts
index 6ca4fc1..b469fc6 100644
--- a/arch/powerpc/boot/dts/motionpro.dts
+++ b/arch/powerpc/boot/dts/motionpro.dts
@@ -105,7 +105,7 @@
 			label = "motionpro-statusled";
 			reg = <0x660 0x10>;
 			interrupts = <1 15 0>;
-			blink-delay = <100>; // 100 msec
+			blink-delay = <200>; // 200 msec
 		};

 		motionpro-led@670 {	// Motion-PRO ready LED
-- 
1.6.2.5

^ permalink raw reply related

* [PATCH 1/2] [PPC] Motion-PRO: Added LED support for the Promess Motion-Pro board. The driver is based on the original version(http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg06694.html), adapted for the current kernel structures.
From: sposelenov @ 2010-09-02 10:20 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Sergei Poselenov, wd, dzu

From: Sergei Poselenov <sposelenov@emcraft.com>

Signed-off-by: Sergei Poselenov <sposelenov@emcraft.com>
---
 arch/powerpc/configs/52xx/motionpro_defconfig |    1 +
 arch/powerpc/include/asm/mpc52xx.h            |    5 +
 drivers/leds/Kconfig                          |    7 +
 drivers/leds/Makefile                         |    1 +
 drivers/leds/leds-motionpro.c                 |  255 +++++++++++++++++++++++++
 5 files changed, 269 insertions(+), 0 deletions(-)
 create mode 100644 drivers/leds/leds-motionpro.c

diff --git a/arch/powerpc/configs/52xx/motionpro_defconfig b/arch/powerpc/configs/52xx/motionpro_defconfig
index 20d53a1..cad1f44 100644
--- a/arch/powerpc/configs/52xx/motionpro_defconfig
+++ b/arch/powerpc/configs/52xx/motionpro_defconfig
@@ -77,6 +77,7 @@ CONFIG_I2C_CHARDEV=y
 CONFIG_I2C_MPC=y
 CONFIG_WATCHDOG=y
 # CONFIG_USB_SUPPORT is not set
+CONFIG_LEDS_MOTIONPRO=y
 CONFIG_NEW_LEDS=y
 CONFIG_LEDS_CLASS=y
 CONFIG_LEDS_TRIGGERS=y
diff --git a/arch/powerpc/include/asm/mpc52xx.h b/arch/powerpc/include/asm/mpc52xx.h
index 1f41382..b206e47 100644
--- a/arch/powerpc/include/asm/mpc52xx.h
+++ b/arch/powerpc/include/asm/mpc52xx.h
@@ -148,6 +148,11 @@ struct mpc52xx_gpio {
 #define MPC52xx_GPIO_PSC_CONFIG_UART_WITH_CD	5
 #define MPC52xx_GPIO_PCI_DIS			(1<<15)
 
+/* Enables GPT register to operate as simple GPIO output register */
+#define MPC52xx_GPT_ENABLE_OUTPUT	0x00000024
+/* Puts 1 on GPT output pin */
+#define MPC52xx_GPT_OUTPUT_1		0x00000010
+
 /* GPIO with WakeUp*/
 struct mpc52xx_gpio_wkup {
 	u8 wkup_gpioe;		/* GPIO_WKUP + 0x00 */
diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index e411262..f5c3e6b 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -311,6 +311,13 @@ config LEDS_NS2
 	  Network Space v2 board (and parents). This include Internet Space v2,
 	  Network Space (Max) v2 and d2 Network v2 boards.
 
+config LEDS_MOTIONPRO
+	tristate "Motionpro LEDs Support"
+	depends on LEDS_CLASS
+	help
+	  This option enables support for status and ready LEDs connected
+	  to GPIO lines on Motionpro board.
+
 config LEDS_TRIGGERS
 	bool "LED Trigger support"
 	help
diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
index 7d6b958..738e227 100644
--- a/drivers/leds/Makefile
+++ b/drivers/leds/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_LEDS_ADP5520)		+= leds-adp5520.o
 obj-$(CONFIG_LEDS_DELL_NETBOOKS)	+= dell-led.o
 obj-$(CONFIG_LEDS_MC13783)		+= leds-mc13783.o
 obj-$(CONFIG_LEDS_NS2)			+= leds-ns2.o
+obj-$(CONFIG_LEDS_MOTIONPRO)		+= leds-motionpro.o
 
 # LED SPI Drivers
 obj-$(CONFIG_LEDS_DAC124S085)		+= leds-dac124s085.o
diff --git a/drivers/leds/leds-motionpro.c b/drivers/leds/leds-motionpro.c
new file mode 100644
index 0000000..94f6cf8
--- /dev/null
+++ b/drivers/leds/leds-motionpro.c
@@ -0,0 +1,255 @@
+/*
+ * LEDs driver for the Motion-PRO board.
+ *
+ * Copyright (C) 2007 Semihalf
+ * Jan Wrobel <wrr@semihalf.com>
+ * Marian Balakowicz <m8@semihalf.com>
+ *
+ * Porting to the mainline Linux tree.
+ * Copyright (C) 2010 Emcraft Systems
+ * Sergei Poselenov <sposelenov@emcraft.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc., 51
+ * Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ *
+ *
+ * Decription:
+ * This driver enables control over Motion-PRO status and ready LEDs through
+ * sysfs. LEDs can be controlled by writing to sysfs files:
+ * class/leds/<led-name>/(brightness|delay_off|delay_on).
+ * See Documentation/leds-class.txt for more details.
+ * <led-name> is the set to the value of 'label' property of the
+ * corresponding GPT node.
+ *
+ * Before user issues first control command via sysfs, LED blinking is
+ * controlled by the kernel ('blink-delay' property of the GPT node
+ * in the device tree blob).
+ *
+ */
+
+#define DEBUG
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/leds.h>
+#include <linux/of_platform.h>
+#include <asm/mpc52xx.h>
+
+/* LED control bits */
+#define LED_ON	MPC52xx_GPT_OUTPUT_1
+
+/* LED mode */
+#define LED_MODE_KERNEL		1
+#define LED_MODE_USER		2
+
+struct motionpro_led {
+	spinlock_t led_lock;		/* Protects the LED data */
+	struct mpc52xx_gpt __iomem *gpt;/* LED registers */
+	struct timer_list blink_timer;	/* Used if blink_delay is nonzero */
+	unsigned int blink_delay;	/* [ms], if set to 0 blinking is off */
+	unsigned int mode;		/* kernel/user */
+	struct led_classdev mpled_cdev;	/* LED class */
+};
+
+/*
+ * Timer event - blinks LED before user takes control over it
+ * with the first access via sysfs.
+ */
+static void mpled_timer_toggle(unsigned long data)
+{
+	struct motionpro_led *mpled = (struct motionpro_led *)data;
+
+	spin_lock_bh(&mpled->led_lock);
+	if (mpled->mode == LED_MODE_KERNEL) {
+		u32 val = in_be32(&mpled->gpt->mode);
+		val ^= LED_ON;
+		out_be32(&mpled->gpt->mode, val);
+
+		mod_timer(&mpled->blink_timer,
+			jiffies + msecs_to_jiffies(mpled->blink_delay));
+	}
+	spin_unlock_bh(&mpled->led_lock);
+}
+
+/*
+ * Turn on/off led according to user settings in sysfs.
+ * First call to this function disables kernel blinking.
+ */
+static void mpled_set(struct led_classdev *led_cdev,
+		      enum led_brightness brightness)
+{
+	struct motionpro_led *mpled;
+	int old_mode;
+	u32 val;
+
+	mpled = container_of(led_cdev, struct motionpro_led, mpled_cdev);
+
+	spin_lock_bh(&mpled->led_lock);
+	/* disable kernel controll */
+	old_mode = mpled->mode;
+	if (old_mode == LED_MODE_KERNEL)
+		mpled->mode = LED_MODE_USER;
+
+	val = in_be32(&mpled->gpt->mode);
+	if (brightness)
+		val |= LED_ON;
+	else
+		val &= ~LED_ON;
+	out_be32(&mpled->gpt->mode, val);
+	spin_unlock_bh(&mpled->led_lock);
+
+	/* delete kernel mode blink timer, not needed anymore */
+	if ((old_mode == LED_MODE_KERNEL) && mpled->blink_delay)
+		del_timer(&mpled->blink_timer);
+}
+
+static void mpled_init_led(void __iomem *gpt_mode)
+{
+	u32 val = in_be32(gpt_mode);
+	val |= MPC52xx_GPT_ENABLE_OUTPUT;
+	val &= ~LED_ON;
+	out_be32(gpt_mode, val);
+}
+
+static int __devinit mpled_probe(struct platform_device *op,
+				 const struct of_device_id *match)
+{
+	struct motionpro_led *mpled;
+	const unsigned int *of_blink_delay;
+	const char *label;
+	int err;
+
+	dev_dbg(&op->dev, "mpled_probe: node=%s (op=%p, match=%p)\n",
+		op->name, op, match);
+
+	mpled = kzalloc(sizeof(*mpled), GFP_KERNEL);
+	if (!mpled)
+		return -ENOMEM;
+
+	mpled->gpt = of_iomap(op->dev.of_node, 0);
+	if (!mpled->gpt) {
+		printk(KERN_ERR __FILE__ ": "
+			"Error mapping GPT registers for LED %s\n",
+			op->dev.of_node->full_name);
+		err = -EIO;
+		goto err_free;
+	}
+
+	/* initialize GPT for LED use */
+	mpled_init_led(&mpled->gpt->mode);
+
+	spin_lock_init(&mpled->led_lock);
+	mpled->mode = LED_MODE_KERNEL;
+
+	/* get LED label, used to register led classdev */
+	label = of_get_property(op->dev.of_node, "label", NULL);
+	if (label == NULL) {
+		printk(KERN_ERR __FILE__ ": "
+			"No label property provided for LED %s\n",
+			op->dev.of_node->full_name);
+		err = -EINVAL;
+		goto err;
+	}
+	dev_dbg(&op->dev, "mpled_probe: label = '%s'\n", label);
+
+	/* get 'blink-delay' property if present */
+	of_blink_delay = of_get_property(op->dev.of_node, "blink-delay", NULL);
+	mpled->blink_delay =  of_blink_delay ? *of_blink_delay : 0;
+	dev_dbg(&op->dev, "mpled_probe: blink_delay = %d msec\n",
+		mpled->blink_delay);
+
+	/* initialize kernel blink_timer if blink_delay was provided */
+	if (mpled->blink_delay) {
+		init_timer(&mpled->blink_timer);
+		mpled->blink_timer.function = mpled_timer_toggle;
+		mpled->blink_timer.data = (unsigned long)mpled;
+
+		mod_timer(&mpled->blink_timer,
+			jiffies + msecs_to_jiffies(mpled->blink_delay));
+	}
+
+	/* register LED classdev */
+	mpled->mpled_cdev.name = label;
+	mpled->mpled_cdev.brightness_set = mpled_set;
+	mpled->mpled_cdev.default_trigger = "timer";
+
+	err = led_classdev_register(NULL, &mpled->mpled_cdev);
+	if (err) {
+		printk(KERN_ERR __FILE__ ": "
+			"Error registering class device for LED %s\n",
+			op->dev.of_node->full_name);
+		goto err;
+	}
+
+	dev_set_drvdata(&op->dev, mpled);
+	return 0;
+
+err:
+	if (mpled->blink_delay)
+		del_timer(&mpled->blink_timer);
+	iounmap(mpled->gpt);
+err_free:
+	kfree(mpled);
+
+	return err;
+}
+
+static int mpled_remove(struct platform_device *op)
+{
+	struct motionpro_led *mpled = dev_get_drvdata(&op->dev);
+
+	dev_dbg(&op->dev, "mpled_remove: (%p)\n", op);
+
+	if (mpled->blink_delay && (mpled->mode == LED_MODE_KERNEL))
+		del_timer(&mpled->blink_timer);
+
+	led_classdev_unregister(&mpled->mpled_cdev);
+
+	iounmap(mpled->gpt);
+	kfree(mpled);
+
+	return 0;
+}
+
+static const struct of_device_id mpled_match[] = {
+	{ .compatible = "promess,motionpro-led", },
+	{},
+};
+
+static struct of_platform_driver mpled_driver = {
+	.probe		= mpled_probe,
+	.remove		= mpled_remove,
+	.driver		= {
+		.owner		= THIS_MODULE,
+		.name		= "leds-motionpro",
+		.of_match_table	= mpled_match,
+	},
+};
+
+static int __init mpled_init(void)
+{
+	return of_register_platform_driver(&mpled_driver);
+}
+
+static void __exit mpled_exit(void)
+{
+	of_unregister_platform_driver(&mpled_driver);
+}
+
+module_init(mpled_init);
+module_exit(mpled_exit);
+
+MODULE_LICENSE("GPL")
+MODULE_DESCRIPTION("Motion-PRO LED driver");
+MODULE_AUTHOR("Jan Wrobel <wrr@semihalf.com>");
+MODULE_AUTHOR("Marian Balakowicz <m8@semihalf.com>");
-- 
1.6.2.5

^ permalink raw reply related

* Re: [v1 PATCH] ucc_geth: fix ethtool set ring param bug
From: Ben Hutchings @ 2010-09-02 11:11 UTC (permalink / raw)
  To: Liang Li; +Cc: netdev, avorontsov, davem, linuxppc-dev
In-Reply-To: <20100902005034.GA9901@localhost>

On Thu, 2010-09-02 at 08:50 +0800, Liang Li wrote:
> On Wed, Sep 01, 2010 at 02:42:30PM +0100, Ben Hutchings wrote:
> > On Wed, 2010-09-01 at 09:43 +0800, Liang Li wrote:
> > > It's common sense that when we should do change to driver ring
> > > desc/buffer etc only after 'stop/shutdown' the device. When we
> > > do change while devices/driver is running, kernel oops occur:
> > [...]
> > > -	ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > -	ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > -
> > >  	if (netif_running(netdev)) {
> > > -		/* FIXME: restart automatically */
> > > -		printk(KERN_INFO
> > > -			"Please re-open the interface.\n");
> > > +		u16 rx_t;
> > > +		u16 tx_t;
> > > +		printk(KERN_INFO "Stopping interface %s.\n", netdev->name);
> > > +		ucc_geth_close(netdev);
> > > +
> > > +		rx_t = ug_info->bdRingLenRx[queue];
> > > +		tx_t = ug_info->bdRingLenTx[queue];
> > > +
> > > +		ug_info->bdRingLenRx[queue] = ring->rx_pending;
> > > +		ug_info->bdRingLenTx[queue] = ring->tx_pending;
> > > +
> > > +		printk(KERN_INFO "Reactivating interface %s.\n", netdev->name);
> > > +		ret = ucc_geth_open(netdev);
> > > +		if (ret) {
> > > +			printk(KERN_WARNING "uec_set_ringparam: set ring param for running"
> > > +					" interface %s failed. Please try to make the interface "
> > > +					" down, then try again.\n", netdev->name);
> > > +			ug_info->bdRingLenRx[queue] = rx_t;
> > > +			ug_info->bdRingLenTx[queue] = tx_t;
> > > +		}
> > [...]
> > 
> > Bringing the interface down will call ucc_geth_close(), which will try
> > to free resources that have not been allocated!
> 
> Sorry, I did not understand you on this point. There is no
> ucc_geth_close when 'open fail'. What you mean here exactly?
[...]

Read your own warning.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 1/2] [PPC] Motion-PRO: Added LED support for the Promess Motion-Pro board. The driver is based on the original version(http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg06694.html), adapted for the current kernel structures.
From: Anton Vorontsov @ 2010-09-02 12:46 UTC (permalink / raw)
  To: sposelenov; +Cc: linuxppc-dev, wd, dzu
In-Reply-To: <1283422832-9620-1-git-send-email-sposelenov@emcraft.com>

On Thu, Sep 02, 2010 at 12:20:31PM +0200, sposelenov@emcraft.com wrote:
[...]
> +config LEDS_MOTIONPRO
> +	tristate "Motionpro LEDs Support"
> +	depends on LEDS_CLASS
> +	help
> +	  This option enables support for status and ready LEDs connected
> +	  to GPIO lines on Motionpro board.

Why not expose these GPIOs via GPIOLIB[1] and use generic GPIO
LEDs[2] driver, along with the timer LED trigger[3] for blinking?

Thanks,

[1] Documentation/gpio.txt
    Documentation/powerpc/dts-bindings/gpio/gpio.txt
[2] drivers/leds/leds-gpio.c
    Documentation/powerpc/dts-bindings/gpio/led.txt
[3] drivers/leds/ledtrig-timer.c

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply

* RE: P1021MDS QE Ethernet Ports
From: Haiying Wang @ 2010-09-02 13:32 UTC (permalink / raw)
  To: Ioannis Kokkoris; +Cc: linuxppc-dev
In-Reply-To: <COL120-W10FFCD057D88A833B4B46EA78C0@phx.gbl>

On Thu, 2010-02-09 at 11:26 +0300, Ioannis Kokkoris wrote:
> > From: johnkokko@hotmail.com
> > To: linuxppc-dev@lists.ozlabs.org
> > Subject: P1021MDS QE Ethernet Ports
> > Date: Wed, 1 Sep 2010 15:11:56 +0300
> >
> >
> > Hello,
> >
> > we are seeing a strange behavior when trying to use the QE Ethernet interfaces.
> > ENET5 (UCC5 - RMII) interface on P1021MDS boards does not come up if there is no physical link on the ENET1 (UCC1 - MII) Port.
> > It seems that interrupts from ENET5 are normally received but the link comes up and works properly only if we have physical connection on ENET1.
> >
> So far I found the following:
> 
> After adding traces, it seems that genphy_update_link() polls the correct device, with the correct address (0x03), but although a physical link is present in ENET5, the polling is not successful until there is a link in ENET1 (address 0x02)!
> 
> genphy_update_link: Dev: Micrel KS8041 ADD: 3 Status read 0x7849  (without ENET1 Link)
> genphy_update_link: Dev: Micrel KS8041 ADD: 3 Status read 0x786D  (with    ENET1 Link)
> 
> How does the MDIO of ENET1 affect the management of the physical interface in a different HW address?
Which board version are you using? this problem is now fixed in the new board version but not available right now. You can connect two UECs for your current development.

> 
> > Can anyone think of a possible reason for this behavior, is there a way to trace this problem?
> > I can provide any further information needed.
> >
> > Any help would be appreciated,
> > Regards,
> > John
> >
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>  		 	   		  
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: [PATCH 1/2] [PPC] Motion-PRO: Added LED support for the Promess Motion-Pro board. The driver is based on the original version(http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg06694.html), adapted for the current kernel structures.
From: Anton Vorontsov @ 2010-09-02 14:01 UTC (permalink / raw)
  To: Sergei Poselenov; +Cc: linuxppc-dev, wd, dzu
In-Reply-To: <20100902173456.3fdef6b8@emcraft.com>

On Thu, Sep 02, 2010 at 05:34:56PM +0400, Sergei Poselenov wrote:
[...]
> > > +	tristate "Motionpro LEDs Support"
> > > +	depends on LEDS_CLASS
> > > +	help
> > > +	  This option enables support for status and ready LEDs
> > > connected
> > > +	  to GPIO lines on Motionpro board.
> > 
> > Why not expose these GPIOs via GPIOLIB[1] and use generic GPIO
> > LEDs[2] driver, along with the timer LED trigger[3] for blinking?
> > 
> > Thanks,
> > 
> > [1] Documentation/gpio.txt
> >     Documentation/powerpc/dts-bindings/gpio/gpio.txt
> > [2] drivers/leds/leds-gpio.c
> >     Documentation/powerpc/dts-bindings/gpio/led.txt
> > [3] drivers/leds/ledtrig-timer.c
> > 
> 
> Yes, this seem possible to implement (and thanks for pointing into
> this), however, the driver is already exists (actually, since 2007),
> so why to not add it to save efforts?

- Faking PWM in the LEDs driver is just wrong thing to do.
  I don't see any other drivers doing this, and even if they
  were, they would need to be fixed;

- This duplicates timer trigger functionality;

- By writing (if there isn't any already) a generic GPIOLIB
  driver for the GPIO controller that you have, you could use
  these GPIOs not only for LEDs, but also for SPI, MDIO, I2C,
  MMC, and even raw NAND chips.

  I.e., by choosing the right methodology you save much more
  efforts in the long run.

Thanks,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox