Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] TCP congestion module: add TCP-LP supporting for 2.6.16.14
From: Wong Edison @ 2006-05-08 17:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: pavel, netdev, linux-kernel
In-Reply-To: <20060508.104322.58430929.davem@davemloft.net>

> Or, just include it, and select it with the TCP_CONGESTION socket
> option when you want it.  Sorry, this does require app modifications.

i would like to have more information about this
so within the app
after create the socket
then call setsockopt (!?)
to set the TCP_CONGESTION into "lp" (in my case) ??

is that means the socket's congestion algorithm will then be what i set ??
in this socket within this app only ??

how about the socket create by "accept" ??
it will still use the default ?? or as listen socket ??

thanks

^ permalink raw reply

* Re: [RFC][SECMARK 08/08] Add selinux_relabel_packet_permission() check to xt_SECMARK
From: Karl MacMillan @ 2006-05-08 17:54 UTC (permalink / raw)
  To: James Morris
  Cc: selinux, netdev, netfilter-devel, Stephen Smalley, Daniel J Walsh
In-Reply-To: <Pine.LNX.4.64.0605071139330.8588@d.namei>

On Sun, 2006-05-07 at 11:40 -0400, James Morris wrote:
> This patch adds the selinux_relabel_packet_permission() check to the 
> SECMARK target, so that SELinux policy is consulted to ensure that the 
> labeling operation is permitted by the current task.
> 
> 
> Signed-off-by: James Morris <jmorris@namei.org>
> 
> ---
> 
>  net/netfilter/xt_SECMARK.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff -purN -X dontdiff linux-2.6.17-rc3-git7.p/net/netfilter/xt_SECMARK.c linux-2.6.17-rc3-git7.w/net/netfilter/xt_SECMARK.c
> --- linux-2.6.17-rc3-git7.p/net/netfilter/xt_SECMARK.c	2006-05-03 11:34:12.000000000 -0400
> +++ linux-2.6.17-rc3-git7.w/net/netfilter/xt_SECMARK.c	2006-05-07 00:35:44.000000000 -0400
> @@ -72,6 +72,12 @@ static int checkentry_selinux(struct xt_
>  		return 0;
>  	}
>  
> +	err = selinux_relabel_packet_permission(sel->selsid);
> +	if (err) {
> +		printk(KERN_INFO PFX "unable to obtain relabeling permission\n");
> +		return 0;
> +	}
> +
>  	return 1;
>  }
>  
> 

Glad that you added this. This only checks on the addition of rules,
correct? Obviously changes that don't include an addition (e.g.,
removal) could change the labeling behavior. Is it possible / needed to
try to provide anything like the relabelto/relabelfrom pairing that is
present for files?

Karl

-- 
Karl MacMillan
Tresys Technology
www.tresys.com

> --
> This message was distributed to subscribers of the selinux mailing list.
> If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
> the words "unsubscribe selinux" without quotes as the message.


^ permalink raw reply

* Re: [PATCH] core: linkwatch should use jiffies64
From: Stefan Rompf @ 2006-05-08 18:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20060507.230727.67882043.davem@davemloft.net>

Am Montag 08 Mai 2006 08:07 schrieb David S. Miller:

> What is so special about what linkwatch is doing such
> that it needs this kind of treatment and other similar
> pieces or code do not?
>
> We have all sorts of interfaces such as time_after() et el.
> in order to deal with wrapping issues.

time_after() and friends can handle jiffies wrapping, however they require the 
difference between compared times to be less than 0x80000000 jiffies (about 
24 days on HZ=1000) to work reliably on 32bit architectures. So if the 
network is stable for 24 days, events generated within days 25-49 will suffer 
a *huge* false delay.

> And furthermore 
> using 64-bit jiffies here might not be appropriate because
> they are not guarenteed to be accessed atomically,

get_jiffies_64() handles this transparently.

Stefan

^ permalink raw reply

* Re: [PATCH] netdev: hotplug napi race cleanup
From: David S. Miller @ 2006-05-08 18:37 UTC (permalink / raw)
  To: shemminger; +Cc: herbert, patrakov, netdev, akpm
In-Reply-To: <20060508095458.0debd022@localhost.localdomain>

From: Stephen Hemminger <shemminger@osdl.org>
Date: Mon, 8 May 2006 09:54:58 -0700

> The issue is are there network devices that can't sleep during
> register_netdevice?

Oh right, I forgot about that.

^ permalink raw reply

* Re: [PATCH] TCP congestion module: add TCP-LP supporting for 2.6.16.14
From: David S. Miller @ 2006-05-08 18:38 UTC (permalink / raw)
  To: hswong3i; +Cc: pavel, netdev, linux-kernel
In-Reply-To: <3feffd230605081050x104461fcj76f2821cfc311a6e@mail.gmail.com>

From: "Wong Edison" <hswong3i@gmail.com>
Date: Tue, 9 May 2006 01:50:36 +0800

> > Or, just include it, and select it with the TCP_CONGESTION socket
> > option when you want it.  Sorry, this does require app modifications.
> 
> i would like to have more information about this
> so within the app
> after create the socket
> then call setsockopt (!?)
> to set the TCP_CONGESTION into "lp" (in my case) ??
> 
> is that means the socket's congestion algorithm will then be what i set ??
> in this socket within this app only ??

Yes, it applies to the socket.

^ permalink raw reply

* Re: IPv6 connect() from site-local to global IPv6 address.
From: David Woodhouse @ 2006-05-08 18:48 UTC (permalink / raw)
  To: Rick Jones; +Cc: YOSHIFUJI Hideaki / 吉藤英明, netdev
In-Reply-To: <445F7576.5080102@hp.com>

On Mon, 2006-05-08 at 09:44 -0700, Rick Jones wrote:
> Or get the applications fixed no?  Kludging around application bugs 
> sounds a bit like the "Fram Oil Filter" commercial where the mechanic is 
> grinning while he says "You can pay me now, or you can pay be later." As 
> in pay for the slightly more expensive oil filter now, or engine repair 
> later.

Well, obviously. That's _why_ I want to deploy IPv6 and get it tested.
But I used to be able to do this without actually breaking the network,
and without being told to _stop_ running radvd because it breaks things.

> Other than fixing the applications that only take the first response 
> (isn't that a generic application bug going back nearly decades now? 
> amazing how things stay the same isn't it) Can you run a caching-only 
> name server at the edge that filters-out the IPv6 responses so your 
> systems never see Global IPV6 responses?

I don't think that kind of answer is going to be sufficient to persuade
Uli to switch back from favouring IPv4 over IPv6. That's done the trick,
admittedly -- by ensuring that we get _no_ testing of IPv6 unless we run
with IPv6-only networking :)

-- 
dwmw2


^ permalink raw reply

* Re: [PATCH] netdev: hotplug napi race cleanup
From: Stephen Hemminger @ 2006-05-08 19:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: herbert, patrakov, netdev, akpm
In-Reply-To: <20060508.113731.97080992.davem@davemloft.net>

On Mon, 08 May 2006 11:37:31 -0700 (PDT)
"David S. Miller" <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@osdl.org>
> Date: Mon, 8 May 2006 09:54:58 -0700
> 
> > The issue is are there network devices that can't sleep during
> > register_netdevice?
> 
> Oh right, I forgot about that.

We could do something like this in register_netdevice()

	if (in_atomic() || irqs_disabled())
		net_set_todo(dev);
	else {
		dev->reg_state = NETREG_REGISTERED;
		ret = netdev_register_sysfs(dev);
		if (ret) {
			... 
	}

It seems a bit grotty, and might cause pain later.

^ permalink raw reply

* Re: Hardware flow control on RTL8169
From: Francois Romieu @ 2006-05-08 19:46 UTC (permalink / raw)
  To: s.munaut; +Cc: netdev
In-Reply-To: <1147099047.445f57a7874d4@ssl0.ovh.net>

s.munaut@intopix.com <s.munaut@intopix.com> :
[...]
> I'm using a kurobox (www.kurobox.com) with a 2.6.15 kernel and I'd like to use
> hardware flow control with it. However it seems the driver doesn't support it,
> is that correct ?
> 
> At least I see the device continue sending even when the other device is
> sending PAUSE frames with 0xffff pausetime continuously.
> 
> I've tried adding RTL_W16(TBI_ANAR, 0x00a0); to the hw_start function but
> that doesn't seem to have any effect ...
> 
> Any clue ?

- use netdev@vger.kernel.org instead of netdev@oss.sgi.com
- don't trim the Cc: (just a remainder...);
- try the hack below with and without the first hunk. Use 'ethtool -s ethX'
  to see what the device reports. It may not settle immediately.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 0ad3310..bc702be 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -769,6 +769,8 @@ static int rtl8169_set_speed_xmii(struct
 			auto_nego &= ~(PHY_Cap_10_Half | PHY_Cap_100_Half);
 	}
 
+	auto_nego |= ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM;
+
 	tp->phy_auto_nego_reg = auto_nego;
 	tp->phy_1000_ctrl_reg = giga_ctrl;
 
@@ -960,6 +962,11 @@ static void rtl8169_gset_xmii(struct net
 	else if (status & _10bps)
 		cmd->speed = SPEED_10;
 
+	if (status & TxFlowCtrl)
+		cmd->advertising |= ADVERTISED_Asym_Pause;
+	if (status & RxFlowCtrl)
+		cmd->advertising |= ADVERTISED_Pause;
+
 	cmd->duplex = ((status & _1000bpsF) || (status & FullDup)) ?
 		      DUPLEX_FULL : DUPLEX_HALF;
 }

^ permalink raw reply related

* Re: Initial benchmarks of some VJ ideas [mmap memcpy vs copy_to_user].
From: Evgeniy Polyakov @ 2006-05-08 19:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, caitlinb, kelly, rusty
In-Reply-To: <20060508122418.GA22554@2ka.mipt.ru>

On Mon, May 08, 2006 at 04:24:22PM +0400, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> Luckily TCP processing is much more costly, e1000 interrupt handler
> is too big, there are a lot of context switches and other
> cache-unfriendly and locking stuff, but I still
> wonder where does 6 (!) times performance gain lives.

Since nocopy is actually equal to dma into mapped buffer,
so we get something close to 6 times less CPU usage, and if it can be
lineary transferred into performance gain, we found where the most
significant part of VJ channels lives. Unfortunately it is not backward
compatible with recv() system call, and requires major changes in
application to use this advantage.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: Initial benchmarks of some VJ ideas [mmap memcpy vs copy_to_user].
From: David S. Miller @ 2006-05-08 20:15 UTC (permalink / raw)
  To: johnpol; +Cc: netdev, caitlinb, kelly, rusty
In-Reply-To: <20060508195132.GB19091@2ka.mipt.ru>

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Mon, 8 May 2006 23:51:32 +0400

> Since nocopy is actually equal to dma into mapped buffer,
> so we get something close to 6 times less CPU usage, and if it can be
> lineary transferred into performance gain, we found where the most
> significant part of VJ channels lives.

Van's machines were cpu limited.  And once cpu limit was removed,
they became bus bandwidth limited.

> Unfortunately it is not backward compatible with recv() system call,
> and requires major changes in application to use this advantage.

I have stopped believing that compatible API for getting top
performance in networking receive is possible a very long time ago.

ABI change is an absolutely requirement.

^ permalink raw reply

* [PATCH] Fix RTL8019AS init for Toshiba RBTX49xx boards
From: Sergei Shtylyov @ 2006-05-08 20:58 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <4443BE71.6090908@ru.mvista.com>

[-- Attachment #1: Type: text/plain, Size: 335 bytes --]

    Ensure that 8-bit mode is selected for the on-board Realtek RTL8019AS chip
on Toshiba RBHMA4x00, get rid of the duplicate #ifdef's when setting
ei_status.word16.
    The chip's datasheet says that the PSTOP register shouldn't exceed 0x60 in
8-bit mode -- ensure this too.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>


[-- Attachment #2: RBTX49xx-RTL8019AS-init-fix.patch --]
[-- Type: text/plain, Size: 1928 bytes --]

Index: linus/drivers/net/ne.c
===================================================================
--- linus.orig/drivers/net/ne.c
+++ linus/drivers/net/ne.c
@@ -139,8 +139,9 @@ bad_clone_list[] __initdata = {
 
 #if defined(CONFIG_PLAT_MAPPI)
 #  define DCR_VAL 0x4b
-#elif defined(CONFIG_PLAT_OAKS32R)
-#  define DCR_VAL 0x48
+#elif defined(CONFIG_PLAT_OAKS32R)  || \
+   defined(CONFIG_TOSHIBA_RBTX4927) || defined(CONFIG_TOSHIBA_RBTX4938)
+#  define DCR_VAL 0x48		/* 8-bit mode */
 #else
 #  define DCR_VAL 0x49
 #endif
@@ -396,10 +397,22 @@ static int __init ne_probe1(struct net_d
 		/* We must set the 8390 for word mode. */
 		outb_p(DCR_VAL, ioaddr + EN0_DCFG);
 		start_page = NESM_START_PG;
-		stop_page = NESM_STOP_PG;
+
+		/*
+		 * Realtek RTL8019AS datasheet says that the PSTOP register
+		 * shouldn't exceed 0x60 in 8-bit mode.
+		 * This chip can be identified by reading the signature from
+		 * the  remote byte count registers (otherwise write-only)...
+		 */
+		if ((DCR_VAL & 0x01) == 0 &&		/* 8-bit mode */
+		    inb(ioaddr + EN0_RCNTLO) == 0x50 &&
+		    inb(ioaddr + EN0_RCNTHI) == 0x70)
+			stop_page = 0x60;
+		else
+			stop_page = NESM_STOP_PG;
 	} else {
 		start_page = NE1SM_START_PG;
-		stop_page = NE1SM_STOP_PG;
+		stop_page  = NE1SM_STOP_PG;
 	}
 
 #if  defined(CONFIG_PLAT_MAPPI) || defined(CONFIG_PLAT_OAKS32R)
@@ -509,15 +522,9 @@ static int __init ne_probe1(struct net_d
 	ei_status.name = name;
 	ei_status.tx_start_page = start_page;
 	ei_status.stop_page = stop_page;
-#if defined(CONFIG_TOSHIBA_RBTX4927) || defined(CONFIG_TOSHIBA_RBTX4938)
-	wordlength = 1;
-#endif
 
-#ifdef CONFIG_PLAT_OAKS32R
-	ei_status.word16 = 0;
-#else
-	ei_status.word16 = (wordlength == 2);
-#endif
+	/* Use 16-bit mode only if this wasn't overridden by DCR_VAL */
+	ei_status.word16 = (wordlength == 2 && (DCR_VAL & 0x01));
 
 	ei_status.rx_start_page = start_page + TX_PAGES;
 #ifdef PACKETBUF_MEMSIZE




^ permalink raw reply

* Re: [RFC][SECMARK 08/08] Add selinux_relabel_packet_permission() check to xt_SECMARK
From: James Morris @ 2006-05-08 21:19 UTC (permalink / raw)
  To: Karl MacMillan
  Cc: selinux, netdev, netfilter-devel, Stephen Smalley, Daniel J Walsh
In-Reply-To: <1147110876.32719.71.camel@jackjack.columbia.tresys.com>

On Mon, 8 May 2006, Karl MacMillan wrote:

> Glad that you added this. This only checks on the addition of rules,
> correct? Obviously changes that don't include an addition (e.g.,
> removal) could change the labeling behavior. Is it possible / needed to
> try to provide anything like the relabelto/relabelfrom pairing that is
> present for files?

The xtables target knows nothing of rule deletion, so we can't detect 
anything there.  All operations require cap_net_admin, though. so we do at 
least have that.

There's also no way to do relabelfrom, as a single rule update actually 
causes the entire 'table' to be replaced, and we have no linkage between 
old and new rules, or in fact, any way to look at the previous state.  It 
turns out that we don't need a relabelfrom anyway, as packets which enter 
the system are inherently unlabeled, and all that SECMARK does is add a 
label, so we know implicitly that setting a label on a packet is always a 
'relabelfrom unlabeled'.


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [RFC] SECMARK 1.0
From: James Morris @ 2006-05-08 21:29 UTC (permalink / raw)
  To: Karl MacMillan
  Cc: Joshua Brindle, selinux, netdev, netfilter-devel, Stephen Smalley,
	Daniel J Walsh
In-Reply-To: <1147110088.32719.56.camel@jackjack.columbia.tresys.com>

On Mon, 8 May 2006, Karl MacMillan wrote:

> Something like CONNMARK seems preferable to me (perhaps even allowing
> type_transition rules to give the related packets a unique type). This
> makes the labeling reflect the real security property of the packets.

That's arguable.  The real security property afaict is that the packets 
are of some state (established or related to an existing connection).  It 
is implicit in the mechanism that they're tracked as part of an authorized 
connection.

> Yes, we are trusting the conntrack to mark the packets accurately, but
> it makes the policy match the intent. Otherwise it is not possible to
> reason about information flow using just the policy.

Why not?  You just state that all established and related packets reaching 
vsftpd are valid, and that no invalid packets can deliver data to the 
application.  You can play tricks and stick a label on a packet but that 
doesn't change what's actually happening or your ability to reason about 
it.  You assume conntrack works correctly (and if it doesn't, then 
labeling connections will break, too).

> Are there serious downsides to this approach?

Yes, it's an ugly hack which is not needed.

> > You can always not use conntrack and emulate the existing controls, as 
> > well.
> 
> Yes, but gaining connection tracking is a major advantage of this
> approach over the existing controls.

The point is to show that this scheme provides much stonger security 
assurrances, and that if you wished, you could easily rervert to stateless 
filtering and have the "correct" labels on the packets; which would be 
worse.



- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* [PATCH 1/4] New IrDA maintainer
From: Samuel Ortiz @ 2006-05-08 21:23 UTC (permalink / raw)
  To: ext David S. Miller; +Cc: Jean Tourrilhes, IrDA users, netdev

As agreed with Jean Tourrilhes, I am taking over IrDA
maintainership.

Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>

---

 MAINTAINERS |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

e6a1573bed4cefc91fafcd56eb67dd3d1e92dddc
diff --git a/MAINTAINERS b/MAINTAINERS
index 61060e8..502dadc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1472,10 +1472,11 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 
 IRDA SUBSYSTEM
-P:	Jean Tourrilhes
+P:	Samuel Ortiz
+M:	samuel@sortiz.org
 L:	irda-users@lists.sourceforge.net (subscribers-only)
 W:	http://irda.sourceforge.net/
-S:	Odd Fixes
+S:	Maintained
 
 ISAPNP
 P:	Jaroslav Kysela
-- 
1.2.4


^ permalink raw reply related

* [PATCH 2/4] IrDA: Removing unused EXPORT_SYMBOLs
From: Samuel Ortiz @ 2006-05-08 21:23 UTC (permalink / raw)
  To: ext David S. Miller; +Cc: Jean Tourrilhes, IrDA users, netdev

This patch removes the following unused EXPORT_SYMBOL's:
- irias_find_attrib
- irias_new_string_value
- irias_new_octseq_value

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>

---

 net/irda/irias_object.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

49586f340ee5f31ff40d142b32d8a15ca001bf4b
diff --git a/net/irda/irias_object.c b/net/irda/irias_object.c
index c6d169f..82e665c 100644
--- a/net/irda/irias_object.c
+++ b/net/irda/irias_object.c
@@ -257,7 +257,6 @@ struct ias_attrib *irias_find_attrib(str
 	/* Unsafe (locking), attrib might change */
 	return attrib;
 }
-EXPORT_SYMBOL(irias_find_attrib);
 
 /*
  * Function irias_add_attribute (obj, attrib)
@@ -484,7 +483,6 @@ struct ias_value *irias_new_string_value
 
 	return value;
 }
-EXPORT_SYMBOL(irias_new_string_value);
 
 /*
  * Function irias_new_octseq_value (octets, len)
@@ -519,7 +517,6 @@ struct ias_value *irias_new_octseq_value
 	memcpy(value->t.oct_seq, octseq , len);
 	return value;
 }
-EXPORT_SYMBOL(irias_new_octseq_value);
 
 struct ias_value *irias_new_missing_value(void)
 {
-- 
1.2.4


^ permalink raw reply related

* [PATCH 3/4] IrDA: smsc-ircc: Minimal hotplug support.
From: Samuel Ortiz @ 2006-05-08 21:23 UTC (permalink / raw)
  To: ext David S. Miller
  Cc: Jean Tourrilhes, IrDA users, netdev, David Brownell,
	Andrew Morton

Minimal PNP hotplug support for the smsc-ircc2 driver.  A modular driver
will be modprobed via hotplug, but still bypasses driver model probing.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>


---

 drivers/net/irda/smsc-ircc2.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

863409d3d18ed6194ef84d813bb316248a9ea0a3
diff --git a/drivers/net/irda/smsc-ircc2.c b/drivers/net/irda/smsc-ircc2.c
index 58f76ce..a467404 100644
--- a/drivers/net/irda/smsc-ircc2.c
+++ b/drivers/net/irda/smsc-ircc2.c
@@ -54,6 +54,7 @@ #include <linux/init.h>
 #include <linux/rtnetlink.h>
 #include <linux/serial_reg.h>
 #include <linux/dma-mapping.h>
+#include <linux/pnp.h>
 #include <linux/platform_device.h>
 
 #include <asm/io.h>
@@ -358,6 +359,16 @@ static inline void register_bank(int iob
                iobase + IRCC_MASTER);
 }
 
+#ifdef	CONFIG_PNP
+/* PNP hotplug support */
+static const struct pnp_device_id smsc_ircc_pnp_table[] = {
+	{ .id = "SMCf010", .driver_data = 0 },
+	/* and presumably others */
+	{ }
+};
+MODULE_DEVICE_TABLE(pnp, smsc_ircc_pnp_table);
+#endif
+
 
 /*******************************************************************************
  *
@@ -2072,7 +2083,8 @@ static void smsc_ircc_sir_wait_hw_transm
 
 /* PROBING
  *
- *
+ * REVISIT we can be told about the device by PNP, and should use that info
+ * instead of probing hardware and creating a platform_device ...
  */
 
 static int __init smsc_ircc_look_for_chips(void)
-- 
1.3.1


^ permalink raw reply related

* [PATCH 4/4] IrDA: Switching to a workqueue for the SIR work
From: Samuel Ortiz @ 2006-05-08 21:24 UTC (permalink / raw)
  To: ext David S. Miller
  Cc: Jean Tourrilhes, IrDA users, netdev, Christoph Hellwig

Since sir_kthread.c pretty much duplicates the workqueue functionality,
we'd better switch.
The SIR fsm has been merged into sir_dev.c and thus sir_kthread.c is
deleted.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>


---

 drivers/net/irda/Makefile      |    2 
 drivers/net/irda/sir-dev.h     |   13 -
 drivers/net/irda/sir_dev.c     |  315 ++++++++++++++++++++++++-
 drivers/net/irda/sir_kthread.c |  508 ----------------------------------------
 4 files changed, 314 insertions(+), 524 deletions(-)
 delete mode 100644 drivers/net/irda/sir_kthread.c

f13ce8e0a66e2b576110a0d9b8b1b1906350feef
diff --git a/drivers/net/irda/Makefile b/drivers/net/irda/Makefile
index 27ab75f..c1ce239 100644
--- a/drivers/net/irda/Makefile
+++ b/drivers/net/irda/Makefile
@@ -46,4 +46,4 @@ obj-$(CONFIG_MA600_DONGLE)	+= ma600-sir.
 obj-$(CONFIG_TOIM3232_DONGLE)	+= toim3232-sir.o
 
 # The SIR helper module
-sir-dev-objs := sir_dev.o sir_dongle.o sir_kthread.o
+sir-dev-objs := sir_dev.o sir_dongle.o
diff --git a/drivers/net/irda/sir-dev.h b/drivers/net/irda/sir-dev.h
index f69fb4c..9fa294a 100644
--- a/drivers/net/irda/sir-dev.h
+++ b/drivers/net/irda/sir-dev.h
@@ -15,23 +15,14 @@ #ifndef IRDA_SIR_H
 #define IRDA_SIR_H
 
 #include <linux/netdevice.h>
+#include <linux/workqueue.h>
 
 #include <net/irda/irda.h>
 #include <net/irda/irda_device.h>		// iobuff_t
 
-/* FIXME: unify irda_request with sir_fsm! */
-
-struct irda_request {
-	struct list_head lh_request;
-	unsigned long pending;
-	void (*func)(void *);
-	void *data;
-	struct timer_list timer;
-};
-
 struct sir_fsm {
 	struct semaphore	sem;
-	struct irda_request	rq;
+	struct work_struct      work;
 	unsigned		state, substate;
 	int			param;
 	int			result;
diff --git a/drivers/net/irda/sir_dev.c b/drivers/net/irda/sir_dev.c
index ea7c946..3b5854d 100644
--- a/drivers/net/irda/sir_dev.c
+++ b/drivers/net/irda/sir_dev.c
@@ -23,6 +23,298 @@ #include <net/irda/irda_device.h>
 
 #include "sir-dev.h"
 
+
+static struct workqueue_struct *irda_sir_wq;
+
+/* STATE MACHINE */
+
+/* substate handler of the config-fsm to handle the cases where we want
+ * to wait for transmit completion before changing the port configuration
+ */
+
+static int sirdev_tx_complete_fsm(struct sir_dev *dev)
+{
+	struct sir_fsm *fsm = &dev->fsm;
+	unsigned next_state, delay;
+	unsigned bytes_left;
+
+	do {
+		next_state = fsm->substate;	/* default: stay in current substate */
+		delay = 0;
+
+		switch(fsm->substate) {
+
+		case SIRDEV_STATE_WAIT_XMIT:
+			if (dev->drv->chars_in_buffer)
+				bytes_left = dev->drv->chars_in_buffer(dev);
+			else
+				bytes_left = 0;
+			if (!bytes_left) {
+				next_state = SIRDEV_STATE_WAIT_UNTIL_SENT;
+				break;
+			}
+
+			if (dev->speed > 115200)
+				delay = (bytes_left*8*10000) / (dev->speed/100);
+			else if (dev->speed > 0)
+				delay = (bytes_left*10*10000) / (dev->speed/100);
+			else
+				delay = 0;
+			/* expected delay (usec) until remaining bytes are sent */
+			if (delay < 100) {
+				udelay(delay);
+				delay = 0;
+				break;
+			}
+			/* sleep some longer delay (msec) */
+			delay = (delay+999) / 1000;
+			break;
+
+		case SIRDEV_STATE_WAIT_UNTIL_SENT:
+			/* block until underlaying hardware buffer are empty */
+			if (dev->drv->wait_until_sent)
+				dev->drv->wait_until_sent(dev);
+			next_state = SIRDEV_STATE_TX_DONE;
+			break;
+
+		case SIRDEV_STATE_TX_DONE:
+			return 0;
+
+		default:
+			IRDA_ERROR("%s - undefined state\n", __FUNCTION__);
+			return -EINVAL;
+		}
+		fsm->substate = next_state;
+	} while (delay == 0);
+	return delay;
+}
+
+/*
+ * Function sirdev_config_fsm
+ *
+ * State machine to handle the configuration of the device (and attached dongle, if any).
+ * This handler is scheduled for execution in kIrDAd context, so we can sleep.
+ * however, kIrDAd is shared by all sir_dev devices so we better don't sleep there too
+ * long. Instead, for longer delays we start a timer to reschedule us later.
+ * On entry, fsm->sem is always locked and the netdev xmit queue stopped.
+ * Both must be unlocked/restarted on completion - but only on final exit.
+ */
+
+static void sirdev_config_fsm(void *data)
+{
+	struct sir_dev *dev = data;
+	struct sir_fsm *fsm = &dev->fsm;
+	int next_state;
+	int ret = -1;
+	unsigned delay;
+
+	IRDA_DEBUG(2, "%s(), <%ld>\n", __FUNCTION__, jiffies);
+
+	do {
+		IRDA_DEBUG(3, "%s - state=0x%04x / substate=0x%04x\n",
+			__FUNCTION__, fsm->state, fsm->substate);
+
+		next_state = fsm->state;
+		delay = 0;
+
+		switch(fsm->state) {
+
+		case SIRDEV_STATE_DONGLE_OPEN:
+			if (dev->dongle_drv != NULL) {
+				ret = sirdev_put_dongle(dev);
+				if (ret) {
+					fsm->result = -EINVAL;
+					next_state = SIRDEV_STATE_ERROR;
+					break;
+				}
+			}
+
+			/* Initialize dongle */
+			ret = sirdev_get_dongle(dev, fsm->param);
+			if (ret) {
+				fsm->result = ret;
+				next_state = SIRDEV_STATE_ERROR;
+				break;
+			}
+
+			/* Dongles are powered through the modem control lines which
+			 * were just set during open. Before resetting, let's wait for
+			 * the power to stabilize. This is what some dongle drivers did
+			 * in open before, while others didn't - should be safe anyway.
+			 */
+
+			delay = 50;
+			fsm->substate = SIRDEV_STATE_DONGLE_RESET;
+			next_state = SIRDEV_STATE_DONGLE_RESET;
+
+			fsm->param = 9600;
+
+			break;
+
+		case SIRDEV_STATE_DONGLE_CLOSE:
+			/* shouldn't we just treat this as success=? */
+			if (dev->dongle_drv == NULL) {
+				fsm->result = -EINVAL;
+				next_state = SIRDEV_STATE_ERROR;
+				break;
+			}
+
+			ret = sirdev_put_dongle(dev);
+			if (ret) {
+				fsm->result = ret;
+				next_state = SIRDEV_STATE_ERROR;
+				break;
+			}
+			next_state = SIRDEV_STATE_DONE;
+			break;
+
+		case SIRDEV_STATE_SET_DTR_RTS:
+			ret = sirdev_set_dtr_rts(dev,
+				(fsm->param&0x02) ? TRUE : FALSE,
+				(fsm->param&0x01) ? TRUE : FALSE);
+			next_state = SIRDEV_STATE_DONE;
+			break;
+
+		case SIRDEV_STATE_SET_SPEED:
+			fsm->substate = SIRDEV_STATE_WAIT_XMIT;
+			next_state = SIRDEV_STATE_DONGLE_CHECK;
+			break;
+
+		case SIRDEV_STATE_DONGLE_CHECK:
+			ret = sirdev_tx_complete_fsm(dev);
+			if (ret < 0) {
+				fsm->result = ret;
+				next_state = SIRDEV_STATE_ERROR;
+				break;
+			}
+			if ((delay=ret) != 0)
+				break;
+
+			if (dev->dongle_drv) {
+				fsm->substate = SIRDEV_STATE_DONGLE_RESET;
+				next_state = SIRDEV_STATE_DONGLE_RESET;
+			}
+			else {
+				dev->speed = fsm->param;
+				next_state = SIRDEV_STATE_PORT_SPEED;
+			}
+			break;
+
+		case SIRDEV_STATE_DONGLE_RESET:
+			if (dev->dongle_drv->reset) {
+				ret = dev->dongle_drv->reset(dev);
+				if (ret < 0) {
+					fsm->result = ret;
+					next_state = SIRDEV_STATE_ERROR;
+					break;
+				}
+			}
+			else
+				ret = 0;
+			if ((delay=ret) == 0) {
+				/* set serial port according to dongle default speed */
+				if (dev->drv->set_speed)
+					dev->drv->set_speed(dev, dev->speed);
+				fsm->substate = SIRDEV_STATE_DONGLE_SPEED;
+				next_state = SIRDEV_STATE_DONGLE_SPEED;
+			}
+			break;
+
+		case SIRDEV_STATE_DONGLE_SPEED:
+			if (dev->dongle_drv->reset) {
+				ret = dev->dongle_drv->set_speed(dev, fsm->param);
+				if (ret < 0) {
+					fsm->result = ret;
+					next_state = SIRDEV_STATE_ERROR;
+					break;
+				}
+			}
+			else
+				ret = 0;
+			if ((delay=ret) == 0)
+				next_state = SIRDEV_STATE_PORT_SPEED;
+			break;
+
+		case SIRDEV_STATE_PORT_SPEED:
+			/* Finally we are ready to change the serial port speed */
+			if (dev->drv->set_speed)
+				dev->drv->set_speed(dev, dev->speed);
+			dev->new_speed = 0;
+			next_state = SIRDEV_STATE_DONE;
+			break;
+
+		case SIRDEV_STATE_DONE:
+			/* Signal network layer so it can send more frames */
+			netif_wake_queue(dev->netdev);
+			next_state = SIRDEV_STATE_COMPLETE;
+			break;
+
+		default:
+			IRDA_ERROR("%s - undefined state\n", __FUNCTION__);
+			fsm->result = -EINVAL;
+			/* fall thru */
+
+		case SIRDEV_STATE_ERROR:
+			IRDA_ERROR("%s - error: %d\n", __FUNCTION__, fsm->result);
+
+#if 0	/* don't enable this before we have netdev->tx_timeout to recover */
+			netif_stop_queue(dev->netdev);
+#else
+			netif_wake_queue(dev->netdev);
+#endif
+			/* fall thru */
+
+		case SIRDEV_STATE_COMPLETE:
+			/* config change finished, so we are not busy any longer */
+			sirdev_enable_rx(dev);
+			up(&fsm->sem);
+			return;
+		}
+		fsm->state = next_state;
+	} while(!delay);
+
+	queue_delayed_work(irda_sir_wq, &fsm->work, msecs_to_jiffies(delay));
+}
+
+/* schedule some device configuration task for execution by kIrDAd
+ * on behalf of the above state machine.
+ * can be called from process or interrupt/tasklet context.
+ */
+
+int sirdev_schedule_request(struct sir_dev *dev, int initial_state, unsigned param)
+{
+	struct sir_fsm *fsm = &dev->fsm;
+
+	IRDA_DEBUG(2, "%s - state=0x%04x / param=%u\n", __FUNCTION__, initial_state, param);
+
+	if (down_trylock(&fsm->sem)) {
+		if (in_interrupt()  ||  in_atomic()  ||  irqs_disabled()) {
+			IRDA_DEBUG(1, "%s(), state machine busy!\n", __FUNCTION__);
+			return -EWOULDBLOCK;
+		} else
+			down(&fsm->sem);
+	}
+
+	if (fsm->state == SIRDEV_STATE_DEAD) {
+		/* race with sirdev_close should never happen */
+		IRDA_ERROR("%s(), instance staled!\n", __FUNCTION__);
+		up(&fsm->sem);
+		return -ESTALE;		/* or better EPIPE? */
+	}
+
+	netif_stop_queue(dev->netdev);
+	atomic_set(&dev->enable_rx, 0);
+
+	fsm->state = initial_state;
+	fsm->param = param;
+	fsm->result = 0;
+
+	INIT_WORK(&fsm->work, sirdev_config_fsm, dev);
+	queue_work(irda_sir_wq, &fsm->work);
+	return 0;
+}
+
+
 /***************************************************************************/
 
 void sirdev_enable_rx(struct sir_dev *dev)
@@ -619,10 +911,6 @@ struct sir_dev * sirdev_get_instance(con
 	spin_lock_init(&dev->tx_lock);
 	init_MUTEX(&dev->fsm.sem);
 
-	INIT_LIST_HEAD(&dev->fsm.rq.lh_request);
-	dev->fsm.rq.pending = 0;
-	init_timer(&dev->fsm.rq.timer);
-
 	dev->drv = drv;
 	dev->netdev = ndev;
 
@@ -682,3 +970,22 @@ int sirdev_put_instance(struct sir_dev *
 }
 EXPORT_SYMBOL(sirdev_put_instance);
 
+static int __init sir_wq_init(void)
+{
+	irda_sir_wq = create_singlethread_workqueue("irda_sir_wq");
+	if (!irda_sir_wq)
+		return -ENOMEM;
+	return 0;
+}
+
+static void __exit sir_wq_exit(void)
+{
+	destroy_workqueue(irda_sir_wq);
+}
+
+module_init(sir_wq_init);
+module_exit(sir_wq_exit);
+
+MODULE_AUTHOR("Martin Diehl <info@mdiehl.de>");
+MODULE_DESCRIPTION("IrDA SIR core");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/irda/sir_kthread.c b/drivers/net/irda/sir_kthread.c
deleted file mode 100644
index e3904d6..0000000
--- a/drivers/net/irda/sir_kthread.c
+++ /dev/null
@@ -1,508 +0,0 @@
-/*********************************************************************
- *
- *	sir_kthread.c:		dedicated thread to process scheduled
- *				sir device setup requests
- *
- *	Copyright (c) 2002 Martin Diehl
- *
- *	This program is free software; you can redistribute it and/or 
- *	modify it under the terms of the GNU General Public License as 
- *	published by the Free Software Foundation; either version 2 of 
- *	the License, or (at your option) any later version.
- *
- ********************************************************************/    
-
-#include <linux/module.h>
-#include <linux/kernel.h>
-#include <linux/version.h>
-#include <linux/init.h>
-#include <linux/smp_lock.h>
-#include <linux/completion.h>
-#include <linux/delay.h>
-
-#include <net/irda/irda.h>
-
-#include "sir-dev.h"
-
-/**************************************************************************
- *
- * kIrDAd kernel thread and config state machine
- *
- */
-
-struct irda_request_queue {
-	struct list_head request_list;
-	spinlock_t lock;
-	task_t *thread;
-	struct completion exit;
-	wait_queue_head_t kick, done;
-	atomic_t num_pending;
-};
-
-static struct irda_request_queue irda_rq_queue;
-
-static int irda_queue_request(struct irda_request *rq)
-{
-	int ret = 0;
-	unsigned long flags;
-
-	if (!test_and_set_bit(0, &rq->pending)) {
-		spin_lock_irqsave(&irda_rq_queue.lock, flags);
-		list_add_tail(&rq->lh_request, &irda_rq_queue.request_list);
-		wake_up(&irda_rq_queue.kick);
-		atomic_inc(&irda_rq_queue.num_pending);
-		spin_unlock_irqrestore(&irda_rq_queue.lock, flags);
-		ret = 1;
-	}
-	return ret;
-}
-
-static void irda_request_timer(unsigned long data)
-{
-	struct irda_request *rq = (struct irda_request *)data;
-	unsigned long flags;
-	
-	spin_lock_irqsave(&irda_rq_queue.lock, flags);
-	list_add_tail(&rq->lh_request, &irda_rq_queue.request_list);
-	wake_up(&irda_rq_queue.kick);
-	spin_unlock_irqrestore(&irda_rq_queue.lock, flags);
-}
-
-static int irda_queue_delayed_request(struct irda_request *rq, unsigned long delay)
-{
-	int ret = 0;
-	struct timer_list *timer = &rq->timer;
-
-	if (!test_and_set_bit(0, &rq->pending)) {
-		timer->expires = jiffies + delay;
-		timer->function = irda_request_timer;
-		timer->data = (unsigned long)rq;
-		atomic_inc(&irda_rq_queue.num_pending);
-		add_timer(timer);
-		ret = 1;
-	}
-	return ret;
-}
-
-static void run_irda_queue(void)
-{
-	unsigned long flags;
-	struct list_head *entry, *tmp;
-	struct irda_request *rq;
-
-	spin_lock_irqsave(&irda_rq_queue.lock, flags);
-	list_for_each_safe(entry, tmp, &irda_rq_queue.request_list) {
-		rq = list_entry(entry, struct irda_request, lh_request);
-		list_del_init(entry);
-		spin_unlock_irqrestore(&irda_rq_queue.lock, flags);
-
-		clear_bit(0, &rq->pending);
-		rq->func(rq->data);
-
-		if (atomic_dec_and_test(&irda_rq_queue.num_pending))
-			wake_up(&irda_rq_queue.done);
-
-		spin_lock_irqsave(&irda_rq_queue.lock, flags);
-	}
-	spin_unlock_irqrestore(&irda_rq_queue.lock, flags);
-}		
-
-static int irda_thread(void *startup)
-{
-	DECLARE_WAITQUEUE(wait, current);
-
-	daemonize("kIrDAd");
-
-	irda_rq_queue.thread = current;
-
-	complete((struct completion *)startup);
-
-	while (irda_rq_queue.thread != NULL) {
-
-		/* We use TASK_INTERRUPTIBLE, rather than
-		 * TASK_UNINTERRUPTIBLE.  Andrew Morton made this
-		 * change ; he told me that it is safe, because "signal
-		 * blocking is now handled in daemonize()", he added
-		 * that the problem is that "uninterruptible sleep
-		 * contributes to load average", making user worry.
-		 * Jean II */
-		set_task_state(current, TASK_INTERRUPTIBLE);
-		add_wait_queue(&irda_rq_queue.kick, &wait);
-		if (list_empty(&irda_rq_queue.request_list))
-			schedule();
-		else
-			__set_task_state(current, TASK_RUNNING);
-		remove_wait_queue(&irda_rq_queue.kick, &wait);
-
-		/* make swsusp happy with our thread */
-		try_to_freeze();
-
-		run_irda_queue();
-	}
-
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,5,35)
-	reparent_to_init();
-#endif
-	complete_and_exit(&irda_rq_queue.exit, 0);
-	/* never reached */
-	return 0;
-}
-
-
-static void flush_irda_queue(void)
-{
-	if (atomic_read(&irda_rq_queue.num_pending)) {
-
-		DECLARE_WAITQUEUE(wait, current);
-
-		if (!list_empty(&irda_rq_queue.request_list))
-			run_irda_queue();
-
-		set_task_state(current, TASK_UNINTERRUPTIBLE);
-		add_wait_queue(&irda_rq_queue.done, &wait);
-		if (atomic_read(&irda_rq_queue.num_pending))
-			schedule();
-		else
-			__set_task_state(current, TASK_RUNNING);
-		remove_wait_queue(&irda_rq_queue.done, &wait);
-	}
-}
-
-/* substate handler of the config-fsm to handle the cases where we want
- * to wait for transmit completion before changing the port configuration
- */
-
-static int irda_tx_complete_fsm(struct sir_dev *dev)
-{
-	struct sir_fsm *fsm = &dev->fsm;
-	unsigned next_state, delay;
-	unsigned bytes_left;
-
-	do {
-		next_state = fsm->substate;	/* default: stay in current substate */
-		delay = 0;
-
-		switch(fsm->substate) {
-
-		case SIRDEV_STATE_WAIT_XMIT:
-			if (dev->drv->chars_in_buffer)
-				bytes_left = dev->drv->chars_in_buffer(dev);
-			else
-				bytes_left = 0;
-			if (!bytes_left) {
-				next_state = SIRDEV_STATE_WAIT_UNTIL_SENT;
-				break;
-			}
-
-			if (dev->speed > 115200)
-				delay = (bytes_left*8*10000) / (dev->speed/100);
-			else if (dev->speed > 0)
-				delay = (bytes_left*10*10000) / (dev->speed/100);
-			else
-				delay = 0;
-			/* expected delay (usec) until remaining bytes are sent */
-			if (delay < 100) {
-				udelay(delay);
-				delay = 0;
-				break;
-			}
-			/* sleep some longer delay (msec) */
-			delay = (delay+999) / 1000;
-			break;
-
-		case SIRDEV_STATE_WAIT_UNTIL_SENT:
-			/* block until underlaying hardware buffer are empty */
-			if (dev->drv->wait_until_sent)
-				dev->drv->wait_until_sent(dev);
-			next_state = SIRDEV_STATE_TX_DONE;
-			break;
-
-		case SIRDEV_STATE_TX_DONE:
-			return 0;
-
-		default:
-			IRDA_ERROR("%s - undefined state\n", __FUNCTION__);
-			return -EINVAL;
-		}
-		fsm->substate = next_state;
-	} while (delay == 0);
-	return delay;
-}
-
-/*
- * Function irda_config_fsm
- *
- * State machine to handle the configuration of the device (and attached dongle, if any).
- * This handler is scheduled for execution in kIrDAd context, so we can sleep.
- * however, kIrDAd is shared by all sir_dev devices so we better don't sleep there too
- * long. Instead, for longer delays we start a timer to reschedule us later.
- * On entry, fsm->sem is always locked and the netdev xmit queue stopped.
- * Both must be unlocked/restarted on completion - but only on final exit.
- */
-
-static void irda_config_fsm(void *data)
-{
-	struct sir_dev *dev = data;
-	struct sir_fsm *fsm = &dev->fsm;
-	int next_state;
-	int ret = -1;
-	unsigned delay;
-
-	IRDA_DEBUG(2, "%s(), <%ld>\n", __FUNCTION__, jiffies); 
-
-	do {
-		IRDA_DEBUG(3, "%s - state=0x%04x / substate=0x%04x\n",
-			__FUNCTION__, fsm->state, fsm->substate);
-
-		next_state = fsm->state;
-		delay = 0;
-
-		switch(fsm->state) {
-
-		case SIRDEV_STATE_DONGLE_OPEN:
-			if (dev->dongle_drv != NULL) {
-				ret = sirdev_put_dongle(dev);
-				if (ret) {
-					fsm->result = -EINVAL;
-					next_state = SIRDEV_STATE_ERROR;
-					break;
-				}
-			}
-
-			/* Initialize dongle */
-			ret = sirdev_get_dongle(dev, fsm->param);
-			if (ret) {
-				fsm->result = ret;
-				next_state = SIRDEV_STATE_ERROR;
-				break;
-			}
-
-			/* Dongles are powered through the modem control lines which
-			 * were just set during open. Before resetting, let's wait for
-			 * the power to stabilize. This is what some dongle drivers did
-			 * in open before, while others didn't - should be safe anyway.
-			 */
-
-			delay = 50;
-			fsm->substate = SIRDEV_STATE_DONGLE_RESET;
-			next_state = SIRDEV_STATE_DONGLE_RESET;
-
-			fsm->param = 9600;
-
-			break;
-
-		case SIRDEV_STATE_DONGLE_CLOSE:
-			/* shouldn't we just treat this as success=? */
-			if (dev->dongle_drv == NULL) {
-				fsm->result = -EINVAL;
-				next_state = SIRDEV_STATE_ERROR;
-				break;
-			}
-
-			ret = sirdev_put_dongle(dev);
-			if (ret) {
-				fsm->result = ret;
-				next_state = SIRDEV_STATE_ERROR;
-				break;
-			}
-			next_state = SIRDEV_STATE_DONE;
-			break;
-
-		case SIRDEV_STATE_SET_DTR_RTS:
-			ret = sirdev_set_dtr_rts(dev,
-				(fsm->param&0x02) ? TRUE : FALSE,
-				(fsm->param&0x01) ? TRUE : FALSE);
-			next_state = SIRDEV_STATE_DONE;
-			break;
-
-		case SIRDEV_STATE_SET_SPEED:
-			fsm->substate = SIRDEV_STATE_WAIT_XMIT;
-			next_state = SIRDEV_STATE_DONGLE_CHECK;
-			break;
-
-		case SIRDEV_STATE_DONGLE_CHECK:
-			ret = irda_tx_complete_fsm(dev);
-			if (ret < 0) {
-				fsm->result = ret;
-				next_state = SIRDEV_STATE_ERROR;
-				break;
-			}
-			if ((delay=ret) != 0)
-				break;
-
-			if (dev->dongle_drv) {
-				fsm->substate = SIRDEV_STATE_DONGLE_RESET;
-				next_state = SIRDEV_STATE_DONGLE_RESET;
-			}
-			else {
-				dev->speed = fsm->param;
-				next_state = SIRDEV_STATE_PORT_SPEED;
-			}
-			break;
-
-		case SIRDEV_STATE_DONGLE_RESET:
-			if (dev->dongle_drv->reset) {
-				ret = dev->dongle_drv->reset(dev);	
-				if (ret < 0) {
-					fsm->result = ret;
-					next_state = SIRDEV_STATE_ERROR;
-					break;
-				}
-			}
-			else
-				ret = 0;
-			if ((delay=ret) == 0) {
-				/* set serial port according to dongle default speed */
-				if (dev->drv->set_speed)
-					dev->drv->set_speed(dev, dev->speed);
-				fsm->substate = SIRDEV_STATE_DONGLE_SPEED;
-				next_state = SIRDEV_STATE_DONGLE_SPEED;
-			}
-			break;
-
-		case SIRDEV_STATE_DONGLE_SPEED:				
-			if (dev->dongle_drv->reset) {
-				ret = dev->dongle_drv->set_speed(dev, fsm->param);
-				if (ret < 0) {
-					fsm->result = ret;
-					next_state = SIRDEV_STATE_ERROR;
-					break;
-				}
-			}
-			else
-				ret = 0;
-			if ((delay=ret) == 0)
-				next_state = SIRDEV_STATE_PORT_SPEED;
-			break;
-
-		case SIRDEV_STATE_PORT_SPEED:
-			/* Finally we are ready to change the serial port speed */
-			if (dev->drv->set_speed)
-				dev->drv->set_speed(dev, dev->speed);
-			dev->new_speed = 0;
-			next_state = SIRDEV_STATE_DONE;
-			break;
-
-		case SIRDEV_STATE_DONE:
-			/* Signal network layer so it can send more frames */
-			netif_wake_queue(dev->netdev);
-			next_state = SIRDEV_STATE_COMPLETE;
-			break;
-
-		default:
-			IRDA_ERROR("%s - undefined state\n", __FUNCTION__);
-			fsm->result = -EINVAL;
-			/* fall thru */
-
-		case SIRDEV_STATE_ERROR:
-			IRDA_ERROR("%s - error: %d\n", __FUNCTION__, fsm->result);
-
-#if 0	/* don't enable this before we have netdev->tx_timeout to recover */
-			netif_stop_queue(dev->netdev);
-#else
-			netif_wake_queue(dev->netdev);
-#endif
-			/* fall thru */
-
-		case SIRDEV_STATE_COMPLETE:
-			/* config change finished, so we are not busy any longer */
-			sirdev_enable_rx(dev);
-			up(&fsm->sem);
-			return;
-		}
-		fsm->state = next_state;
-	} while(!delay);
-
-	irda_queue_delayed_request(&fsm->rq, msecs_to_jiffies(delay));
-}
-
-/* schedule some device configuration task for execution by kIrDAd
- * on behalf of the above state machine.
- * can be called from process or interrupt/tasklet context.
- */
-
-int sirdev_schedule_request(struct sir_dev *dev, int initial_state, unsigned param)
-{
-	struct sir_fsm *fsm = &dev->fsm;
-	int xmit_was_down;
-
-	IRDA_DEBUG(2, "%s - state=0x%04x / param=%u\n", __FUNCTION__, initial_state, param);
-
-	if (down_trylock(&fsm->sem)) {
-		if (in_interrupt()  ||  in_atomic()  ||  irqs_disabled()) {
-			IRDA_DEBUG(1, "%s(), state machine busy!\n", __FUNCTION__);
-			return -EWOULDBLOCK;
-		} else
-			down(&fsm->sem);
-	}
-
-	if (fsm->state == SIRDEV_STATE_DEAD) {
-		/* race with sirdev_close should never happen */
-		IRDA_ERROR("%s(), instance staled!\n", __FUNCTION__);
-		up(&fsm->sem);
-		return -ESTALE;		/* or better EPIPE? */
-	}
-
-	xmit_was_down = netif_queue_stopped(dev->netdev);
-	netif_stop_queue(dev->netdev);
-	atomic_set(&dev->enable_rx, 0);
-
-	fsm->state = initial_state;
-	fsm->param = param;
-	fsm->result = 0;
-
-	INIT_LIST_HEAD(&fsm->rq.lh_request);
-	fsm->rq.pending = 0;
-	fsm->rq.func = irda_config_fsm;
-	fsm->rq.data = dev;
-
-	if (!irda_queue_request(&fsm->rq)) {	/* returns 0 on error! */
-		atomic_set(&dev->enable_rx, 1);
-		if (!xmit_was_down)
-			netif_wake_queue(dev->netdev);		
-		up(&fsm->sem);
-		return -EAGAIN;
-	}
-	return 0;
-}
-
-static int __init irda_thread_create(void)
-{
-	struct completion startup;
-	int pid;
-
-	spin_lock_init(&irda_rq_queue.lock);
-	irda_rq_queue.thread = NULL;
-	INIT_LIST_HEAD(&irda_rq_queue.request_list);
-	init_waitqueue_head(&irda_rq_queue.kick);
-	init_waitqueue_head(&irda_rq_queue.done);
-	atomic_set(&irda_rq_queue.num_pending, 0);
-
-	init_completion(&startup);
-	pid = kernel_thread(irda_thread, &startup, CLONE_FS|CLONE_FILES);
-	if (pid <= 0)
-		return -EAGAIN;
-	else
-		wait_for_completion(&startup);
-
-	return 0;
-}
-
-static void __exit irda_thread_join(void)
-{
-	if (irda_rq_queue.thread) {
-		flush_irda_queue();
-		init_completion(&irda_rq_queue.exit);
-		irda_rq_queue.thread = NULL;
-		wake_up(&irda_rq_queue.kick);		
-		wait_for_completion(&irda_rq_queue.exit);
-	}
-}
-
-module_init(irda_thread_create);
-module_exit(irda_thread_join);
-
-MODULE_AUTHOR("Martin Diehl <info@mdiehl.de>");
-MODULE_DESCRIPTION("IrDA SIR core");
-MODULE_LICENSE("GPL");
-
-- 
1.3.1


^ permalink raw reply related

* [PATCH 0/9] I/OAT network recv copy offload
From: Chris Leech @ 2006-05-08 22:16 UTC (permalink / raw)
  To: linux-kernel, netdev

A few changes after going over all the memory allocations, but mostly just
keeping the patches up to date.

This patch series is the a full release of the Intel(R) I/O
Acceleration Technology (I/OAT) for Linux.  It includes an in kernel API
for offloading memory copies to hardware, a driver for the I/OAT DMA memcpy
engine, and changes to the TCP stack to offload copies of received
networking data to application space.

Changes from last posting:
	Fixed a struct ioat_dma_chan memory leak on driver unload.
	Changed a lock that was never held in atomic contexts to a mutex
	as part of avoiding unneeded GFP_ATOMIC allocations.

These changes apply to Linus' tree as of commit
	6810b548b25114607e0814612d84125abccc0a4f
	[PATCH] x86_64: Move ondemand timer into own work queue

They are available to pull from
	git://63.64.152.142/~cleech/linux-2.6 ioat-2.6.17

There are 9 patches in the series:
	1) The memcpy offload APIs and class code
	2) The Intel I/OAT DMA driver (ioatdma)
	3) Core networking code to setup networking as a DMA memcpy client
	4) Utility functions for sk_buff to iovec offloaded copy
	5) Structure changes needed for TCP receive offload
	6) Rename cleanup_rbuf to tcp_cleanup_rbuf
	7) Make sk_eat_skb aware of early copied packets
	8) Add a sysctl to tune the minimum offloaded I/O size for TCP
	9) The main TCP receive offload changes

--
Chris Leech <christopher.leech@intel.com>
I/O Acceleration Technology Software Development
LAN Access Division / Digital Enterprise Group 

^ permalink raw reply

* [PATCH 3/9] [I/OAT] Setup the networking subsystem as a DMA client
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Attempts to allocate per-CPU DMA channels

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 drivers/dma/Kconfig       |   12 +++++
 include/linux/netdevice.h |    4 ++
 include/net/netdma.h      |   38 ++++++++++++++++
 net/core/dev.c            |  104 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 158 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 0f15e76..30d021d 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -10,6 +10,18 @@ config DMA_ENGINE
 	  DMA engines offload copy operations from the CPU to dedicated
 	  hardware, allowing the copies to happen asynchronously.
 
+comment "DMA Clients"
+
+config NET_DMA
+	bool "Network: TCP receive copy offload"
+	depends on DMA_ENGINE && NET
+	default y
+	---help---
+	  This enables the use of DMA engines in the network stack to
+	  offload receive copy-to-user operations, freeing CPU cycles.
+	  Since this is the main user of the DMA engine, it should be enabled;
+	  say Y here.
+
 comment "DMA Devices"
 
 config INTEL_IOATDMA
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 309f919..06bcabc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -37,6 +37,7 @@
 #include <linux/config.h>
 #include <linux/device.h>
 #include <linux/percpu.h>
+#include <linux/dmaengine.h>
 
 struct divert_blk;
 struct vlan_group;
@@ -594,6 +595,9 @@ struct softnet_data
 	struct sk_buff		*completion_queue;
 
 	struct net_device	backlog_dev;	/* Sorry. 8) */
+#ifdef CONFIG_NET_DMA
+	struct dma_chan		*net_dma;
+#endif
 };
 
 DECLARE_PER_CPU(struct softnet_data,softnet_data);
diff --git a/include/net/netdma.h b/include/net/netdma.h
new file mode 100644
index 0000000..cbfe89d
--- /dev/null
+++ b/include/net/netdma.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef NETDMA_H
+#define NETDMA_H
+#include <linux/config.h>
+#ifdef CONFIG_NET_DMA
+#include <linux/dmaengine.h>
+
+static inline struct dma_chan *get_softnet_dma(void)
+{
+	struct dma_chan *chan;
+	rcu_read_lock();
+	chan = rcu_dereference(__get_cpu_var(softnet_data.net_dma));
+	if (chan)
+		dma_chan_get(chan);
+	rcu_read_unlock();
+	return chan;
+}
+#endif /* CONFIG_NET_DMA */
+#endif /* NETDMA_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 9ab3cfa..ab34006 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -115,6 +115,7 @@
 #include <net/iw_handler.h>
 #include <asm/current.h>
 #include <linux/audit.h>
+#include <linux/dmaengine.h>
 
 /*
  *	The list of packet types we will receive (as opposed to discard)
@@ -148,6 +149,12 @@ static DEFINE_SPINLOCK(ptype_lock);
 static struct list_head ptype_base[16];	/* 16 way hashed list */
 static struct list_head ptype_all;		/* Taps */
 
+#ifdef CONFIG_NET_DMA
+static struct dma_client *net_dma_client;
+static unsigned int net_dma_count;
+static spinlock_t net_dma_event_lock;
+#endif
+
 /*
  * The @dev_base list is protected by @dev_base_lock and the rtln
  * semaphore.
@@ -1844,6 +1851,19 @@ static void net_rx_action(struct softirq
 		}
 	}
 out:
+#ifdef CONFIG_NET_DMA
+	/*
+	 * There may not be any more sk_buffs coming right now, so push
+	 * any pending DMA copies to hardware
+	 */
+	if (net_dma_client) {
+		struct dma_chan *chan;
+		rcu_read_lock();
+		list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
+			dma_async_memcpy_issue_pending(chan);
+		rcu_read_unlock();
+	}
+#endif
 	local_irq_enable();
 	return;
 
@@ -3307,6 +3327,88 @@ static int dev_cpu_callback(struct notif
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
+#ifdef CONFIG_NET_DMA
+/**
+ * net_dma_rebalance -
+ * This is called when the number of channels allocated to the net_dma_client
+ * changes.  The net_dma_client tries to have one DMA channel per CPU.
+ */
+static void net_dma_rebalance(void)
+{
+	unsigned int cpu, i, n;
+	struct dma_chan *chan;
+
+	lock_cpu_hotplug();
+
+	if (net_dma_count == 0) {
+		for_each_online_cpu(cpu)
+			rcu_assign_pointer(per_cpu(softnet_data.net_dma, cpu), NULL);
+		unlock_cpu_hotplug();
+		return;
+	}
+
+	i = 0;
+	cpu = first_cpu(cpu_online_map);
+
+	rcu_read_lock();
+	list_for_each_entry(chan, &net_dma_client->channels, client_node) {
+		n = ((num_online_cpus() / net_dma_count)
+		   + (i < (num_online_cpus() % net_dma_count) ? 1 : 0));
+
+		while(n) {
+			per_cpu(softnet_data.net_dma, cpu) = chan;
+			cpu = next_cpu(cpu, cpu_online_map);
+			n--;
+		}
+		i++;
+	}
+	rcu_read_unlock();
+
+	unlock_cpu_hotplug();
+}
+
+/**
+ * netdev_dma_event - event callback for the net_dma_client
+ * @client: should always be net_dma_client
+ * @chan:
+ * @event:
+ */
+static void netdev_dma_event(struct dma_client *client, struct dma_chan *chan,
+	enum dma_event event)
+{
+	spin_lock(&net_dma_event_lock);
+	switch (event) {
+	case DMA_RESOURCE_ADDED:
+		net_dma_count++;
+		net_dma_rebalance();
+		break;
+	case DMA_RESOURCE_REMOVED:
+		net_dma_count--;
+		net_dma_rebalance();
+		break;
+	default:
+		break;
+	}
+	spin_unlock(&net_dma_event_lock);
+}
+
+/**
+ * netdev_dma_regiser - register the networking subsystem as a DMA client
+ */
+static int __init netdev_dma_register(void)
+{
+	spin_lock_init(&net_dma_event_lock);
+	net_dma_client = dma_async_client_register(netdev_dma_event);
+	if (net_dma_client == NULL)
+		return -ENOMEM;
+
+	dma_async_client_chan_request(net_dma_client, num_online_cpus());
+	return 0;
+}
+
+#else
+static int __init netdev_dma_register(void) { return -ENODEV; }
+#endif /* CONFIG_NET_DMA */
 
 /*
  *	Initialize the DEV module. At boot time this walks the device list and
@@ -3360,6 +3462,8 @@ static int __init net_dev_init(void)
 		atomic_set(&queue->backlog_dev.refcnt, 1);
 	}
 
+	netdev_dma_register();
+
 	dev_boot_phase = 0;
 
 	open_softirq(NET_TX_SOFTIRQ, net_tx_action, NULL);


^ permalink raw reply related

* [PATCH 4/9] [I/OAT] Utility functions for offloading sk_buff to iovec copies
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Provides for pinning user space pages in memory, copying to iovecs,
and copying from sk_buffs including fragmented and chained sk_buffs.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 drivers/dma/Makefile      |    3 
 drivers/dma/iovlock.c     |  301 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h |   22 +++
 include/net/netdma.h      |    6 +
 net/core/Makefile         |    1 
 net/core/user_dma.c       |  127 +++++++++++++++++++
 6 files changed, 459 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index c8a5f56..bdcfdbd 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -1,2 +1,3 @@
-obj-y += dmaengine.o
+obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
+obj-$(CONFIG_NET_DMA) += iovlock.o
 obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
new file mode 100644
index 0000000..5ed327e
--- /dev/null
+++ b/drivers/dma/iovlock.c
@@ -0,0 +1,301 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ * Portions based on net/core/datagram.c and copyrighted by their authors.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This code allows the net stack to make use of a DMA engine for
+ * skb to iovec copies.
+ */
+
+#include <linux/dmaengine.h>
+#include <linux/pagemap.h>
+#include <net/tcp.h> /* for memcpy_toiovec */
+#include <asm/io.h>
+#include <asm/uaccess.h>
+
+int num_pages_spanned(struct iovec *iov)
+{
+	return
+	((PAGE_ALIGN((unsigned long)iov->iov_base + iov->iov_len) -
+	((unsigned long)iov->iov_base & PAGE_MASK)) >> PAGE_SHIFT);
+}
+
+/*
+ * Pin down all the iovec pages needed for len bytes.
+ * Return a struct dma_pinned_list to keep track of pages pinned down.
+ *
+ * We are allocating a single chunk of memory, and then carving it up into
+ * 3 sections, the latter 2 whose size depends on the number of iovecs and the
+ * total number of pages, respectively.
+ */
+struct dma_pinned_list *dma_pin_iovec_pages(struct iovec *iov, size_t len)
+{
+	struct dma_pinned_list *local_list;
+	struct page **pages;
+	int i;
+	int ret;
+	int nr_iovecs = 0;
+	int iovec_len_used = 0;
+	int iovec_pages_used = 0;
+	long err;
+
+	/* don't pin down non-user-based iovecs */
+	if (segment_eq(get_fs(), KERNEL_DS))
+		return NULL;
+
+	/* determine how many iovecs/pages there are, up front */
+	do {
+		iovec_len_used += iov[nr_iovecs].iov_len;
+		iovec_pages_used += num_pages_spanned(&iov[nr_iovecs]);
+		nr_iovecs++;
+	} while (iovec_len_used < len);
+
+	/* single kmalloc for pinned list, page_list[], and the page arrays */
+	local_list = kmalloc(sizeof(*local_list)
+		+ (nr_iovecs * sizeof (struct dma_page_list))
+		+ (iovec_pages_used * sizeof (struct page*)), GFP_KERNEL);
+	if (!local_list) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* list of pages starts right after the page list array */
+	pages = (struct page **) &local_list->page_list[nr_iovecs];
+
+	for (i = 0; i < nr_iovecs; i++) {
+		struct dma_page_list *page_list = &local_list->page_list[i];
+
+		len -= iov[i].iov_len;
+
+		if (!access_ok(VERIFY_WRITE, iov[i].iov_base, iov[i].iov_len)) {
+			err = -EFAULT;
+			goto unpin;
+		}
+
+		page_list->nr_pages = num_pages_spanned(&iov[i]);
+		page_list->base_address = iov[i].iov_base;
+
+		page_list->pages = pages;
+		pages += page_list->nr_pages;
+
+		/* pin pages down */
+		down_read(&current->mm->mmap_sem);
+		ret = get_user_pages(
+			current,
+			current->mm,
+			(unsigned long) iov[i].iov_base,
+			page_list->nr_pages,
+			1,	/* write */
+			0,	/* force */
+			page_list->pages,
+			NULL);
+		up_read(&current->mm->mmap_sem);
+
+		if (ret != page_list->nr_pages) {
+			err = -ENOMEM;
+			goto unpin;
+		}
+
+		local_list->nr_iovecs = i + 1;
+	}
+
+	return local_list;
+
+unpin:
+	dma_unpin_iovec_pages(local_list);
+out:
+	return ERR_PTR(err);
+}
+
+void dma_unpin_iovec_pages(struct dma_pinned_list *pinned_list)
+{
+	int i, j;
+
+	if (!pinned_list)
+		return;
+
+	for (i = 0; i < pinned_list->nr_iovecs; i++) {
+		struct dma_page_list *page_list = &pinned_list->page_list[i];
+		for (j = 0; j < page_list->nr_pages; j++) {
+			set_page_dirty_lock(page_list->pages[j]);
+			page_cache_release(page_list->pages[j]);
+		}
+	}
+
+	kfree(pinned_list);
+}
+
+static dma_cookie_t dma_memcpy_to_kernel_iovec(struct dma_chan *chan, struct
+	iovec *iov, unsigned char *kdata, size_t len)
+{
+	dma_cookie_t dma_cookie = 0;
+
+	while (len > 0) {
+		if (iov->iov_len) {
+			int copy = min_t(unsigned int, iov->iov_len, len);
+			dma_cookie = dma_async_memcpy_buf_to_buf(
+					chan,
+					iov->iov_base,
+					kdata,
+					copy);
+			kdata += copy;
+			len -= copy;
+			iov->iov_len -= copy;
+			iov->iov_base += copy;
+		}
+		iov++;
+	}
+
+	return dma_cookie;
+}
+
+/*
+ * We have already pinned down the pages we will be using in the iovecs.
+ * Each entry in iov array has corresponding entry in pinned_list->page_list.
+ * Using array indexing to keep iov[] and page_list[] in sync.
+ * Initial elements in iov array's iov->iov_len will be 0 if already copied into
+ *   by another call.
+ * iov array length remaining guaranteed to be bigger than len.
+ */
+dma_cookie_t dma_memcpy_to_iovec(struct dma_chan *chan, struct iovec *iov,
+	struct dma_pinned_list *pinned_list, unsigned char *kdata, size_t len)
+{
+	int iov_byte_offset;
+	int copy;
+	dma_cookie_t dma_cookie = 0;
+	int iovec_idx;
+	int page_idx;
+
+	if (!chan)
+		return memcpy_toiovec(iov, kdata, len);
+
+	/* -> kernel copies (e.g. smbfs) */
+	if (!pinned_list)
+		return dma_memcpy_to_kernel_iovec(chan, iov, kdata, len);
+
+	iovec_idx = 0;
+	while (iovec_idx < pinned_list->nr_iovecs) {
+		struct dma_page_list *page_list;
+
+		/* skip already used-up iovecs */
+		while (!iov[iovec_idx].iov_len)
+			iovec_idx++;
+
+		page_list = &pinned_list->page_list[iovec_idx];
+
+		iov_byte_offset = ((unsigned long)iov[iovec_idx].iov_base & ~PAGE_MASK);
+		page_idx = (((unsigned long)iov[iovec_idx].iov_base & PAGE_MASK)
+			 - ((unsigned long)page_list->base_address & PAGE_MASK)) >> PAGE_SHIFT;
+
+		/* break up copies to not cross page boundary */
+		while (iov[iovec_idx].iov_len) {
+			copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
+			copy = min_t(int, copy, iov[iovec_idx].iov_len);
+
+			dma_cookie = dma_async_memcpy_buf_to_pg(chan,
+					page_list->pages[page_idx],
+					iov_byte_offset,
+					kdata,
+					copy);
+
+			len -= copy;
+			iov[iovec_idx].iov_len -= copy;
+			iov[iovec_idx].iov_base += copy;
+
+			if (!len)
+				return dma_cookie;
+
+			kdata += copy;
+			iov_byte_offset = 0;
+			page_idx++;
+		}
+		iovec_idx++;
+	}
+
+	/* really bad if we ever run out of iovecs */
+	BUG();
+	return -EFAULT;
+}
+
+dma_cookie_t dma_memcpy_pg_to_iovec(struct dma_chan *chan, struct iovec *iov,
+	struct dma_pinned_list *pinned_list, struct page *page,
+	unsigned int offset, size_t len)
+{
+	int iov_byte_offset;
+	int copy;
+	dma_cookie_t dma_cookie = 0;
+	int iovec_idx;
+	int page_idx;
+	int err;
+
+	/* this needs as-yet-unimplemented buf-to-buff, so punt. */
+	/* TODO: use dma for this */
+	if (!chan || !pinned_list) {
+		u8 *vaddr = kmap(page);
+		err = memcpy_toiovec(iov, vaddr + offset, len);
+		kunmap(page);
+		return err;
+	}
+
+	iovec_idx = 0;
+	while (iovec_idx < pinned_list->nr_iovecs) {
+		struct dma_page_list *page_list;
+
+		/* skip already used-up iovecs */
+		while (!iov[iovec_idx].iov_len)
+			iovec_idx++;
+
+		page_list = &pinned_list->page_list[iovec_idx];
+
+		iov_byte_offset = ((unsigned long)iov[iovec_idx].iov_base & ~PAGE_MASK);
+		page_idx = (((unsigned long)iov[iovec_idx].iov_base & PAGE_MASK)
+			 - ((unsigned long)page_list->base_address & PAGE_MASK)) >> PAGE_SHIFT;
+
+		/* break up copies to not cross page boundary */
+		while (iov[iovec_idx].iov_len) {
+			copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
+			copy = min_t(int, copy, iov[iovec_idx].iov_len);
+
+			dma_cookie = dma_async_memcpy_pg_to_pg(chan,
+					page_list->pages[page_idx],
+					iov_byte_offset,
+					page,
+					offset,
+					copy);
+
+			len -= copy;
+			iov[iovec_idx].iov_len -= copy;
+			iov[iovec_idx].iov_base += copy;
+
+			if (!len)
+				return dma_cookie;
+
+			offset += copy;
+			iov_byte_offset = 0;
+			page_idx++;
+		}
+		iovec_idx++;
+	}
+
+	/* really bad if we ever run out of iovecs */
+	BUG();
+	return -EFAULT;
+}
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 3078154..78b236c 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -333,5 +333,27 @@ static inline enum dma_status dma_async_
 int dma_async_device_register(struct dma_device *device);
 void dma_async_device_unregister(struct dma_device *device);
 
+/* --- Helper iov-locking functions --- */
+
+struct dma_page_list {
+	char *base_address;
+	int nr_pages;
+	struct page **pages;
+};
+
+struct dma_pinned_list {
+	int nr_iovecs;
+	struct dma_page_list page_list[0];
+};
+
+struct dma_pinned_list *dma_pin_iovec_pages(struct iovec *iov, size_t len);
+void dma_unpin_iovec_pages(struct dma_pinned_list* pinned_list);
+
+dma_cookie_t dma_memcpy_to_iovec(struct dma_chan *chan, struct iovec *iov,
+	struct dma_pinned_list *pinned_list, unsigned char *kdata, size_t len);
+dma_cookie_t dma_memcpy_pg_to_iovec(struct dma_chan *chan, struct iovec *iov,
+	struct dma_pinned_list *pinned_list, struct page *page,
+	unsigned int offset, size_t len);
+
 #endif /* CONFIG_DMA_ENGINE */
 #endif /* DMAENGINE_H */
diff --git a/include/net/netdma.h b/include/net/netdma.h
index cbfe89d..19760eb 100644
--- a/include/net/netdma.h
+++ b/include/net/netdma.h
@@ -23,6 +23,7 @@
 #include <linux/config.h>
 #ifdef CONFIG_NET_DMA
 #include <linux/dmaengine.h>
+#include <linux/skbuff.h>
 
 static inline struct dma_chan *get_softnet_dma(void)
 {
@@ -34,5 +35,10 @@ static inline struct dma_chan *get_softn
 	rcu_read_unlock();
 	return chan;
 }
+
+int dma_skb_copy_datagram_iovec(struct dma_chan* chan,
+		const struct sk_buff *skb, int offset, struct iovec *to,
+		size_t len, struct dma_pinned_list *pinned_list);
+
 #endif /* CONFIG_NET_DMA */
 #endif /* NETDMA_H */
diff --git a/net/core/Makefile b/net/core/Makefile
index 79fe12c..e9bd246 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -16,3 +16,4 @@ obj-$(CONFIG_NET_DIVERT) += dv.o
 obj-$(CONFIG_NET_PKTGEN) += pktgen.o
 obj-$(CONFIG_WIRELESS_EXT) += wireless.o
 obj-$(CONFIG_NETPOLL) += netpoll.o
+obj-$(CONFIG_NET_DMA) += user_dma.o
diff --git a/net/core/user_dma.c b/net/core/user_dma.c
new file mode 100644
index 0000000..9eee91b
--- /dev/null
+++ b/net/core/user_dma.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ * Portions based on net/core/datagram.c and copyrighted by their authors.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This code allows the net stack to make use of a DMA engine for
+ * skb to iovec copies.
+ */
+
+#include <linux/dmaengine.h>
+#include <linux/socket.h>
+#include <linux/rtnetlink.h> /* for BUG_TRAP */
+#include <net/tcp.h>
+
+/**
+ *	dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
+ *	@skb - buffer to copy
+ *	@offset - offset in the buffer to start copying from
+ *	@iovec - io vector to copy to
+ *	@len - amount of data to copy from buffer to iovec
+ *	@pinned_list - locked iovec buffer data
+ *
+ *	Note: the iovec is modified during the copy.
+ */
+int dma_skb_copy_datagram_iovec(struct dma_chan *chan,
+			struct sk_buff *skb, int offset, struct iovec *to,
+			size_t len, struct dma_pinned_list *pinned_list)
+{
+	int start = skb_headlen(skb);
+	int i, copy = start - offset;
+	dma_cookie_t cookie = 0;
+
+	/* Copy header. */
+	if (copy > 0) {
+		if (copy > len)
+			copy = len;
+		cookie = dma_memcpy_to_iovec(chan, to, pinned_list,
+		                            skb->data + offset, copy);
+		if (cookie < 0)
+			goto fault;
+		len -= copy;
+		if (len == 0)
+			goto end;
+		offset += copy;
+	}
+
+	/* Copy paged appendix. Hmm... why does this look so complicated? */
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		int end;
+
+		BUG_TRAP(start <= offset + len);
+
+		end = start + skb_shinfo(skb)->frags[i].size;
+		copy = end - offset;
+		if ((copy = end - offset) > 0) {
+			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+			struct page *page = frag->page;
+
+			if (copy > len)
+				copy = len;
+
+			cookie = dma_memcpy_pg_to_iovec(chan, to, pinned_list, page,
+					frag->page_offset + offset - start, copy);
+			if (cookie < 0)
+				goto fault;
+			len -= copy;
+			if (len == 0)
+				goto end;
+			offset += copy;
+		}
+		start = end;
+	}
+
+	if (skb_shinfo(skb)->frag_list) {
+		struct sk_buff *list = skb_shinfo(skb)->frag_list;
+
+		for (; list; list = list->next) {
+			int end;
+
+			BUG_TRAP(start <= offset + len);
+
+			end = start + list->len;
+			copy = end - offset;
+			if (copy > 0) {
+				if (copy > len)
+					copy = len;
+				cookie = dma_skb_copy_datagram_iovec(chan, list,
+				                offset - start, to, copy,
+				                pinned_list);
+				if (cookie < 0)
+					goto fault;
+				len -= copy;
+				if (len == 0)
+					goto end;
+				offset += copy;
+			}
+			start = end;
+		}
+	}
+
+end:
+	if (!len) {
+		skb->dma_cookie = cookie;
+		return cookie;
+	}
+
+fault:
+ 	return -EFAULT;
+}


^ permalink raw reply related

* [PATCH 5/9] [I/OAT] Structure changes for TCP recv offload to I/OAT
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Adds an async_wait_queue and some additional fields to tcp_sock, and a
dma_cookie_t to sk_buff.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 include/linux/skbuff.h |    4 ++++
 include/linux/tcp.h    |    8 ++++++++
 include/net/sock.h     |    2 ++
 include/net/tcp.h      |    7 +++++++
 net/core/sock.c        |    6 ++++++
 5 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f8f2347..23bad3b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -29,6 +29,7 @@
 #include <linux/net.h>
 #include <linux/textsearch.h>
 #include <net/checksum.h>
+#include <linux/dmaengine.h>
 
 #define HAVE_ALLOC_SKB		/* For the drivers to know */
 #define HAVE_ALIGNABLE_SKB	/* Ditto 8)		   */
@@ -285,6 +286,9 @@ struct sk_buff {
 	__u16			tc_verd;	/* traffic control verdict */
 #endif
 #endif
+#ifdef CONFIG_NET_DMA
+	dma_cookie_t		dma_cookie;
+#endif
 
 
 	/* These elements must be at the end, see alloc_skb() for details.  */
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 542d395..c90daa5 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -18,6 +18,7 @@
 #define _LINUX_TCP_H
 
 #include <linux/types.h>
+#include <linux/dmaengine.h>
 #include <asm/byteorder.h>
 
 struct tcphdr {
@@ -233,6 +234,13 @@ struct tcp_sock {
 		struct iovec		*iov;
 		int			memory;
 		int			len;
+#ifdef CONFIG_NET_DMA
+		/* members for async copy */
+		struct dma_chan		*dma_chan;
+		int			wakeup;
+		struct dma_pinned_list	*pinned_list;
+		dma_cookie_t		dma_cookie;
+#endif
 	} ucopy;
 
 	__u32	snd_wl1;	/* Sequence for window update		*/
diff --git a/include/net/sock.h b/include/net/sock.h
index c9fad6f..90c65cb 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -132,6 +132,7 @@ struct sock_common {
   *	@sk_receive_queue: incoming packets
   *	@sk_wmem_alloc: transmit queue bytes committed
   *	@sk_write_queue: Packet sending queue
+  *	@sk_async_wait_queue: DMA copied packets
   *	@sk_omem_alloc: "o" is "option" or "other"
   *	@sk_wmem_queued: persistent queue size
   *	@sk_forward_alloc: space allocated forward
@@ -205,6 +206,7 @@ struct sock {
 	atomic_t		sk_omem_alloc;
 	struct sk_buff_head	sk_receive_queue;
 	struct sk_buff_head	sk_write_queue;
+	struct sk_buff_head	sk_async_wait_queue;
 	int			sk_wmem_queued;
 	int			sk_forward_alloc;
 	gfp_t			sk_allocation;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3c989db..d0c2c2f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -28,6 +28,7 @@
 #include <linux/cache.h>
 #include <linux/percpu.h>
 #include <linux/skbuff.h>
+#include <linux/dmaengine.h>
 
 #include <net/inet_connection_sock.h>
 #include <net/inet_timewait_sock.h>
@@ -817,6 +818,12 @@ static inline void tcp_prequeue_init(str
 	tp->ucopy.len = 0;
 	tp->ucopy.memory = 0;
 	skb_queue_head_init(&tp->ucopy.prequeue);
+#ifdef CONFIG_NET_DMA
+	tp->ucopy.dma_chan = NULL;
+	tp->ucopy.wakeup = 0;
+	tp->ucopy.pinned_list = NULL;
+	tp->ucopy.dma_cookie = 0;
+#endif
 }
 
 /* Packet is added to VJ-style prequeue for processing in process
diff --git a/net/core/sock.c b/net/core/sock.c
index ed2afdb..5d820c3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -832,6 +832,9 @@ struct sock *sk_clone(const struct sock 
 		atomic_set(&newsk->sk_omem_alloc, 0);
 		skb_queue_head_init(&newsk->sk_receive_queue);
 		skb_queue_head_init(&newsk->sk_write_queue);
+#ifdef CONFIG_NET_DMA
+		skb_queue_head_init(&newsk->sk_async_wait_queue);
+#endif
 
 		rwlock_init(&newsk->sk_dst_lock);
 		rwlock_init(&newsk->sk_callback_lock);
@@ -1383,6 +1386,9 @@ void sock_init_data(struct socket *sock,
 	skb_queue_head_init(&sk->sk_receive_queue);
 	skb_queue_head_init(&sk->sk_write_queue);
 	skb_queue_head_init(&sk->sk_error_queue);
+#ifdef CONFIG_NET_DMA
+	skb_queue_head_init(&sk->sk_async_wait_queue);
+#endif
 
 	sk->sk_send_head	=	NULL;
 


^ permalink raw reply related

* [PATCH 6/9] [I/OAT] Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-static
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 include/net/tcp.h |    2 ++
 net/ipv4/tcp.c    |   10 +++++-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d0c2c2f..578cccf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -294,6 +294,8 @@ extern int			tcp_rcv_established(struct 
 
 extern void			tcp_rcv_space_adjust(struct sock *sk);
 
+extern void			tcp_cleanup_rbuf(struct sock *sk, int copied);
+
 extern int			tcp_twsk_unique(struct sock *sk,
 						struct sock *sktw, void *twp);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e2b7b80..1c0cfd7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -937,7 +937,7 @@ static int tcp_recv_urg(struct sock *sk,
  * calculation of whether or not we must ACK for the sake of
  * a window update.
  */
-static void cleanup_rbuf(struct sock *sk, int copied)
+void tcp_cleanup_rbuf(struct sock *sk, int copied)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int time_to_ack = 0;
@@ -1086,7 +1086,7 @@ int tcp_read_sock(struct sock *sk, read_
 
 	/* Clean up data we have read: This will do ACK frames. */
 	if (copied)
-		cleanup_rbuf(sk, copied);
+		tcp_cleanup_rbuf(sk, copied);
 	return copied;
 }
 
@@ -1220,7 +1220,7 @@ int tcp_recvmsg(struct kiocb *iocb, stru
 			}
 		}
 
-		cleanup_rbuf(sk, copied);
+		tcp_cleanup_rbuf(sk, copied);
 
 		if (!sysctl_tcp_low_latency && tp->ucopy.task == user_recv) {
 			/* Install new reader */
@@ -1391,7 +1391,7 @@ skip_copy:
 	 */
 
 	/* Clean up data we have read: This will do ACK frames. */
-	cleanup_rbuf(sk, copied);
+	tcp_cleanup_rbuf(sk, copied);
 
 	TCP_CHECK_TIMER(sk);
 	release_sock(sk);
@@ -1858,7 +1858,7 @@ static int do_tcp_setsockopt(struct sock
 			    (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT) &&
 			    inet_csk_ack_scheduled(sk)) {
 				icsk->icsk_ack.pending |= ICSK_ACK_PUSHED;
-				cleanup_rbuf(sk, 1);
+				tcp_cleanup_rbuf(sk, 1);
 				if (!(val & 1))
 					icsk->icsk_ack.pingpong = 1;
 			}


^ permalink raw reply related

* [PATCH 8/9] [I/OAT] Add a sysctl for tuning the I/OAT offloaded I/O threshold
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Any socket recv of less than this ammount will not be offloaded

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 include/linux/sysctl.h     |    1 +
 include/net/tcp.h          |    1 +
 net/core/user_dma.c        |    4 ++++
 net/ipv4/sysctl_net_ipv4.c |   10 ++++++++++
 4 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 76eaeff..cd9e7c0 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -403,6 +403,7 @@ enum
  	NET_TCP_MTU_PROBING=113,
 	NET_TCP_BASE_MSS=114,
 	NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
+	NET_TCP_DMA_COPYBREAK=116,
 };
 
 enum {
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 578cccf..f1f4727 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -219,6 +219,7 @@ extern int sysctl_tcp_adv_win_scale;
 extern int sysctl_tcp_tw_reuse;
 extern int sysctl_tcp_frto;
 extern int sysctl_tcp_low_latency;
+extern int sysctl_tcp_dma_copybreak;
 extern int sysctl_tcp_nometrics_save;
 extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
diff --git a/net/core/user_dma.c b/net/core/user_dma.c
index 9eee91b..b7c98db 100644
--- a/net/core/user_dma.c
+++ b/net/core/user_dma.c
@@ -30,6 +30,10 @@
 #include <linux/rtnetlink.h> /* for BUG_TRAP */
 #include <net/tcp.h>
 
+#define NET_DMA_DEFAULT_COPYBREAK 4096
+
+int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
+
 /**
  *	dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
  *	@skb - buffer to copy
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 6b6c3ad..6a6aa53 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -688,6 +688,16 @@ ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
 	},
+#ifdef CONFIG_NET_DMA
+	{
+		.ctl_name	= NET_TCP_DMA_COPYBREAK,
+		.procname	= "tcp_dma_copybreak",
+		.data		= &sysctl_tcp_dma_copybreak,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+#endif
 	{ .ctl_name = 0 }
 };
 


^ permalink raw reply related

* [PATCH 2/9] I/OAT network recv copy offload
From: Chris Leech @ 2006-05-08 22:15 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

[I/OAT] Driver for the Intel(R) I/OAT DMA engine

From: Chris Leech <christopher.leech@intel.com>

Adds a new ioatdma driver

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 drivers/dma/Kconfig             |    9
 drivers/dma/Makefile            |    1
 drivers/dma/ioatdma.c           |  839 +++++++++++++++++++++++++++++++++++++++
 drivers/dma/ioatdma.h           |  126 ++++++
 drivers/dma/ioatdma_hw.h        |   52 ++
 drivers/dma/ioatdma_io.h        |  118 +++++
 drivers/dma/ioatdma_registers.h |  128 ++++++
 7 files changed, 1273 insertions(+), 0 deletions(-)

[-- Attachment #2: ioatdma_driver.gz --]
[-- Type: application/x-gzip, Size: 9366 bytes --]

^ permalink raw reply

* [PATCH 1/9] [I/OAT] DMA memcpy subsystem
From: Chris Leech @ 2006-05-08 22:17 UTC (permalink / raw)
  To: linux-kernel, netdev
In-Reply-To: <20060508221632.15181.50046.stgit@gitlost.site>

Provides an API for offloading memory copies to DMA devices

Signed-off-by: Chris Leech <christopher.leech@intel.com>
---

 drivers/Kconfig           |    2 
 drivers/Makefile          |    1 
 drivers/dma/Kconfig       |   13 +
 drivers/dma/Makefile      |    1 
 drivers/dma/dmaengine.c   |  408 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h |  337 +++++++++++++++++++++++++++++++++++++
 6 files changed, 762 insertions(+), 0 deletions(-)

diff --git a/drivers/Kconfig b/drivers/Kconfig
index aeb5ab2..8b11ceb 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -72,4 +72,6 @@ source "drivers/edac/Kconfig"
 
 source "drivers/rtc/Kconfig"
 
+source "drivers/dma/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 447d8e6..3c51703 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -74,3 +74,4 @@ obj-$(CONFIG_SGI_SN)		+= sn/
 obj-y				+= firmware/
 obj-$(CONFIG_CRYPTO)		+= crypto/
 obj-$(CONFIG_SUPERH)		+= sh/
+obj-$(CONFIG_DMA_ENGINE)	+= dma/
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
new file mode 100644
index 0000000..f9ac4bc
--- /dev/null
+++ b/drivers/dma/Kconfig
@@ -0,0 +1,13 @@
+#
+# DMA engine configuration
+#
+
+menu "DMA Engine support"
+
+config DMA_ENGINE
+	bool "Support for DMA engines"
+	---help---
+	  DMA engines offload copy operations from the CPU to dedicated
+	  hardware, allowing the copies to happen asynchronously.
+
+endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
new file mode 100644
index 0000000..10b7391
--- /dev/null
+++ b/drivers/dma/Makefile
@@ -0,0 +1 @@
+obj-y += dmaengine.o
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
new file mode 100644
index 0000000..473c47b
--- /dev/null
+++ b/drivers/dma/dmaengine.c
@@ -0,0 +1,408 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This code implements the DMA subsystem. It provides a HW-neutral interface
+ * for other kernel code to use asynchronous memory copy capabilities,
+ * if present, and allows different HW DMA drivers to register as providing
+ * this capability.
+ *
+ * Due to the fact we are accelerating what is already a relatively fast
+ * operation, the code goes to great lengths to avoid additional overhead,
+ * such as locking.
+ *
+ * LOCKING:
+ *
+ * The subsystem keeps two global lists, dma_device_list and dma_client_list.
+ * Both of these are protected by a mutex, dma_list_mutex.
+ *
+ * Each device has a channels list, which runs unlocked but is never modified
+ * once the device is registered, it's just setup by the driver.
+ *
+ * Each client has a channels list, it's only modified under the client->lock
+ * and in an RCU callback, so it's safe to read under rcu_read_lock().
+ *
+ * Each device has a kref, which is initialized to 1 when the device is
+ * registered. A kref_put is done for each class_device registered.  When the
+ * class_device is released, the coresponding kref_put is done in the release
+ * method. Every time one of the device's channels is allocated to a client,
+ * a kref_get occurs.  When the channel is freed, the coresponding kref_put
+ * happens. The device's release function does a completion, so
+ * unregister_device does a remove event, class_device_unregister, a kref_put
+ * for the first reference, then waits on the completion for all other
+ * references to finish.
+ *
+ * Each channel has an open-coded implementation of Rusty Russell's "bigref,"
+ * with a kref and a per_cpu local_t.  A single reference is set when on an
+ * ADDED event, and removed with a REMOVE event.  Net DMA client takes an
+ * extra reference per outstanding transaction.  The relase function does a
+ * kref_put on the device. -ChrisL
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/dmaengine.h>
+#include <linux/hardirq.h>
+#include <linux/spinlock.h>
+#include <linux/percpu.h>
+#include <linux/rcupdate.h>
+#include <linux/mutex.h>
+
+static DEFINE_MUTEX(dma_list_mutex);
+static LIST_HEAD(dma_device_list);
+static LIST_HEAD(dma_client_list);
+
+/* --- sysfs implementation --- */
+
+static ssize_t show_memcpy_count(struct class_device *cd, char *buf)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+	unsigned long count = 0;
+	int i;
+
+	for_each_cpu(i)
+		count += per_cpu_ptr(chan->local, i)->memcpy_count;
+
+	return sprintf(buf, "%lu\n", count);
+}
+
+static ssize_t show_bytes_transferred(struct class_device *cd, char *buf)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+	unsigned long count = 0;
+	int i;
+
+	for_each_cpu(i)
+		count += per_cpu_ptr(chan->local, i)->bytes_transferred;
+
+	return sprintf(buf, "%lu\n", count);
+}
+
+static ssize_t show_in_use(struct class_device *cd, char *buf)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+
+	return sprintf(buf, "%d\n", (chan->client ? 1 : 0));
+}
+
+static struct class_device_attribute dma_class_attrs[] = {
+	__ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+	__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
+	__ATTR(in_use, S_IRUGO, show_in_use, NULL),
+	__ATTR_NULL
+};
+
+static void dma_async_device_cleanup(struct kref *kref);
+
+static void dma_class_dev_release(struct class_device *cd)
+{
+	struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+	kref_put(&chan->device->refcount, dma_async_device_cleanup);
+}
+
+static struct class dma_devclass = {
+	.name            = "dma",
+	.class_dev_attrs = dma_class_attrs,
+	.release = dma_class_dev_release,
+};
+
+/* --- client and device registration --- */
+
+/**
+ * dma_client_chan_alloc - try to allocate a channel to a client
+ * @client: &dma_client
+ *
+ * Called with dma_list_mutex held.
+ */
+static struct dma_chan *dma_client_chan_alloc(struct dma_client *client)
+{
+	struct dma_device *device;
+	struct dma_chan *chan;
+	unsigned long flags;
+	int desc;	/* allocated descriptor count */
+
+	/* Find a channel, any DMA engine will do */
+	list_for_each_entry(device, &dma_device_list, global_node) {
+		list_for_each_entry(chan, &device->channels, device_node) {
+			if (chan->client)
+				continue;
+
+			desc = chan->device->device_alloc_chan_resources(chan);
+			if (desc >= 0) {
+				kref_get(&device->refcount);
+				kref_init(&chan->refcount);
+				chan->slow_ref = 0;
+				INIT_RCU_HEAD(&chan->rcu);
+				chan->client = client;
+				spin_lock_irqsave(&client->lock, flags);
+				list_add_tail_rcu(&chan->client_node,
+				                  &client->channels);
+				spin_unlock_irqrestore(&client->lock, flags);
+				return chan;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * dma_client_chan_free - release a DMA channel
+ * @chan: &dma_chan
+ */
+void dma_chan_cleanup(struct kref *kref)
+{
+	struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
+	chan->device->device_free_chan_resources(chan);
+	chan->client = NULL;
+	kref_put(&chan->device->refcount, dma_async_device_cleanup);
+}
+
+static void dma_chan_free_rcu(struct rcu_head *rcu)
+{
+	struct dma_chan *chan = container_of(rcu, struct dma_chan, rcu);
+	int bias = 0x7FFFFFFF;
+	int i;
+	for_each_cpu(i)
+		bias -= local_read(&per_cpu_ptr(chan->local, i)->refcount);
+	atomic_sub(bias, &chan->refcount.refcount);
+	kref_put(&chan->refcount, dma_chan_cleanup);
+}
+
+static void dma_client_chan_free(struct dma_chan *chan)
+{
+	atomic_add(0x7FFFFFFF, &chan->refcount.refcount);
+	chan->slow_ref = 1;
+	call_rcu(&chan->rcu, dma_chan_free_rcu);
+}
+
+/**
+ * dma_chans_rebalance - reallocate channels to clients
+ *
+ * When the number of DMA channel in the system changes,
+ * channels need to be rebalanced among clients
+ */
+static void dma_chans_rebalance(void)
+{
+	struct dma_client *client;
+	struct dma_chan *chan;
+	unsigned long flags;
+
+	mutex_lock(&dma_list_mutex);
+
+	list_for_each_entry(client, &dma_client_list, global_node) {
+		while (client->chans_desired > client->chan_count) {
+			chan = dma_client_chan_alloc(client);
+			if (!chan)
+				break;
+			client->chan_count++;
+			client->event_callback(client,
+	                                       chan,
+	                                       DMA_RESOURCE_ADDED);
+		}
+		while (client->chans_desired < client->chan_count) {
+			spin_lock_irqsave(&client->lock, flags);
+			chan = list_entry(client->channels.next,
+			                  struct dma_chan,
+			                  client_node);
+			list_del_rcu(&chan->client_node);
+			spin_unlock_irqrestore(&client->lock, flags);
+			client->chan_count--;
+			client->event_callback(client,
+			                       chan,
+			                       DMA_RESOURCE_REMOVED);
+			dma_client_chan_free(chan);
+		}
+	}
+
+	mutex_unlock(&dma_list_mutex);
+}
+
+/**
+ * dma_async_client_register - allocate and register a &dma_client
+ * @event_callback: callback for notification of channel addition/removal
+ */
+struct dma_client *dma_async_client_register(dma_event_callback event_callback)
+{
+	struct dma_client *client;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return NULL;
+
+	INIT_LIST_HEAD(&client->channels);
+	spin_lock_init(&client->lock);
+	client->chans_desired = 0;
+	client->chan_count = 0;
+	client->event_callback = event_callback;
+
+	mutex_lock(&dma_list_mutex);
+	list_add_tail(&client->global_node, &dma_client_list);
+	mutex_unlock(&dma_list_mutex);
+
+	return client;
+}
+
+/**
+ * dma_async_client_unregister - unregister a client and free the &dma_client
+ * @client:
+ *
+ * Force frees any allocated DMA channels, frees the &dma_client memory
+ */
+void dma_async_client_unregister(struct dma_client *client)
+{
+	struct dma_chan *chan;
+
+	if (!client)
+		return;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(chan, &client->channels, client_node)
+		dma_client_chan_free(chan);
+	rcu_read_unlock();
+
+	mutex_lock(&dma_list_mutex);
+	list_del(&client->global_node);
+	mutex_unlock(&dma_list_mutex);
+
+	kfree(client);
+	dma_chans_rebalance();
+}
+
+/**
+ * dma_async_client_chan_request - request DMA channels
+ * @client: &dma_client
+ * @number: count of DMA channels requested
+ *
+ * Clients call dma_async_client_chan_request() to specify how many
+ * DMA channels they need, 0 to free all currently allocated.
+ * The resulting allocations/frees are indicated to the client via the
+ * event callback.
+ */
+void dma_async_client_chan_request(struct dma_client *client,
+			unsigned int number)
+{
+	client->chans_desired = number;
+	dma_chans_rebalance();
+}
+
+/**
+ * dma_async_device_register -
+ * @device: &dma_device
+ */
+int dma_async_device_register(struct dma_device *device)
+{
+	static int id;
+	int chancnt = 0;
+	struct dma_chan* chan;
+
+	if (!device)
+		return -ENODEV;
+
+	init_completion(&device->done);
+	kref_init(&device->refcount);
+	device->dev_id = id++;
+
+	/* represent channels in sysfs. Probably want devs too */
+	list_for_each_entry(chan, &device->channels, device_node) {
+		chan->local = alloc_percpu(typeof(*chan->local));
+		if (chan->local == NULL)
+			continue;
+
+		chan->chan_id = chancnt++;
+		chan->class_dev.class = &dma_devclass;
+		chan->class_dev.dev = NULL;
+		snprintf(chan->class_dev.class_id, BUS_ID_SIZE, "dma%dchan%d",
+		         device->dev_id, chan->chan_id);
+
+		kref_get(&device->refcount);
+		class_device_register(&chan->class_dev);
+	}
+
+	mutex_lock(&dma_list_mutex);
+	list_add_tail(&device->global_node, &dma_device_list);
+	mutex_unlock(&dma_list_mutex);
+
+	dma_chans_rebalance();
+
+	return 0;
+}
+
+/**
+ * dma_async_device_unregister -
+ * @device: &dma_device
+ */
+static void dma_async_device_cleanup(struct kref *kref)
+{
+	struct dma_device *device;
+
+	device = container_of(kref, struct dma_device, refcount);
+	complete(&device->done);
+}
+
+void dma_async_device_unregister(struct dma_device* device)
+{
+	struct dma_chan *chan;
+	unsigned long flags;
+
+	mutex_lock(&dma_list_mutex);
+	list_del(&device->global_node);
+	mutex_unlock(&dma_list_mutex);
+
+	list_for_each_entry(chan, &device->channels, device_node) {
+		if (chan->client) {
+			spin_lock_irqsave(&chan->client->lock, flags);
+			list_del(&chan->client_node);
+			chan->client->chan_count--;
+			spin_unlock_irqrestore(&chan->client->lock, flags);
+			chan->client->event_callback(chan->client,
+			                             chan,
+			                             DMA_RESOURCE_REMOVED);
+			dma_client_chan_free(chan);
+		}
+		class_device_unregister(&chan->class_dev);
+	}
+	dma_chans_rebalance();
+
+	kref_put(&device->refcount, dma_async_device_cleanup);
+	wait_for_completion(&device->done);
+}
+
+static int __init dma_bus_init(void)
+{
+	mutex_init(&dma_list_mutex);
+	return class_register(&dma_devclass);
+}
+
+subsys_initcall(dma_bus_init);
+
+EXPORT_SYMBOL(dma_async_client_register);
+EXPORT_SYMBOL(dma_async_client_unregister);
+EXPORT_SYMBOL(dma_async_client_chan_request);
+EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
+EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
+EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
+EXPORT_SYMBOL(dma_async_memcpy_complete);
+EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_device_register);
+EXPORT_SYMBOL(dma_async_device_unregister);
+EXPORT_SYMBOL(dma_chan_cleanup);
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
new file mode 100644
index 0000000..3078154
--- /dev/null
+++ b/include/linux/dmaengine.h
@@ -0,0 +1,337 @@
+/*
+ * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef DMAENGINE_H
+#define DMAENGINE_H
+#include <linux/config.h>
+#ifdef CONFIG_DMA_ENGINE
+
+#include <linux/device.h>
+#include <linux/uio.h>
+#include <linux/kref.h>
+#include <linux/completion.h>
+#include <linux/rcupdate.h>
+
+/**
+ * enum dma_event - resource PNP/power managment events
+ * @DMA_RESOURCE_SUSPEND: DMA device going into low power state
+ * @DMA_RESOURCE_RESUME: DMA device returning to full power
+ * @DMA_RESOURCE_ADDED: DMA device added to the system
+ * @DMA_RESOURCE_REMOVED: DMA device removed from the system
+ */
+enum dma_event {
+	DMA_RESOURCE_SUSPEND,
+	DMA_RESOURCE_RESUME,
+	DMA_RESOURCE_ADDED,
+	DMA_RESOURCE_REMOVED,
+};
+
+/**
+ * typedef dma_cookie_t
+ *
+ * if dma_cookie_t is >0 it's a DMA request cookie, <0 it's an error code
+ */
+typedef s32 dma_cookie_t;
+
+#define dma_submit_error(cookie) ((cookie) < 0 ? 1 : 0)
+
+/**
+ * enum dma_status - DMA transaction status
+ * @DMA_SUCCESS: transaction completed successfully
+ * @DMA_IN_PROGRESS: transaction not yet processed
+ * @DMA_ERROR: transaction failed
+ */
+enum dma_status {
+	DMA_SUCCESS,
+	DMA_IN_PROGRESS,
+	DMA_ERROR,
+};
+
+/**
+ * struct dma_chan_percpu - the per-CPU part of struct dma_chan
+ * @refcount: local_t used for open-coded "bigref" counting
+ * @memcpy_count: transaction counter
+ * @bytes_transferred: byte counter
+ */
+
+struct dma_chan_percpu {
+	local_t refcount;
+	/* stats */
+	unsigned long memcpy_count;
+	unsigned long bytes_transferred;
+};
+
+/**
+ * struct dma_chan - devices supply DMA channels, clients use them
+ * @client: ptr to the client user of this chan, will be NULL when unused
+ * @device: ptr to the dma device who supplies this channel, always !NULL
+ * @cookie: last cookie value returned to client
+ * @chan_id:
+ * @class_dev:
+ * @refcount: kref, used in "bigref" slow-mode
+ * @slow_ref:
+ * @rcu:
+ * @client_node: used to add this to the client chan list
+ * @device_node: used to add this to the device chan list
+ * @local: per-cpu pointer to a struct dma_chan_percpu
+ */
+struct dma_chan {
+	struct dma_client *client;
+	struct dma_device *device;
+	dma_cookie_t cookie;
+
+	/* sysfs */
+	int chan_id;
+	struct class_device class_dev;
+
+	struct kref refcount;
+	int slow_ref;
+	struct rcu_head rcu;
+
+	struct list_head client_node;
+	struct list_head device_node;
+	struct dma_chan_percpu *local;
+};
+
+void dma_chan_cleanup(struct kref *kref);
+
+static inline void dma_chan_get(struct dma_chan *chan)
+{
+	if (unlikely(chan->slow_ref))
+		kref_get(&chan->refcount);
+	else {
+		local_inc(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+		put_cpu();
+	}
+}
+
+static inline void dma_chan_put(struct dma_chan *chan)
+{
+	if (unlikely(chan->slow_ref))
+		kref_put(&chan->refcount, dma_chan_cleanup);
+	else {
+		local_dec(&(per_cpu_ptr(chan->local, get_cpu())->refcount));
+		put_cpu();
+	}
+}
+
+/*
+ * typedef dma_event_callback - function pointer to a DMA event callback
+ */
+typedef void (*dma_event_callback) (struct dma_client *client,
+		struct dma_chan *chan, enum dma_event event);
+
+/**
+ * struct dma_client - info on the entity making use of DMA services
+ * @event_callback: func ptr to call when something happens
+ * @chan_count: number of chans allocated
+ * @chans_desired: number of chans requested. Can be +/- chan_count
+ * @lock: protects access to the channels list
+ * @channels: the list of DMA channels allocated
+ * @global_node: list_head for global dma_client_list
+ */
+struct dma_client {
+	dma_event_callback	event_callback;
+	unsigned int		chan_count;
+	unsigned int		chans_desired;
+
+	spinlock_t		lock;
+	struct list_head	channels;
+	struct list_head	global_node;
+};
+
+/**
+ * struct dma_device - info on the entity supplying DMA services
+ * @chancnt: how many DMA channels are supported
+ * @channels: the list of struct dma_chan
+ * @global_node: list_head for global dma_device_list
+ * @refcount:
+ * @done:
+ * @dev_id:
+ * Other func ptrs: used to make use of this device's capabilities
+ */
+struct dma_device {
+
+	unsigned int chancnt;
+	struct list_head channels;
+	struct list_head global_node;
+
+	struct kref refcount;
+	struct completion done;
+
+	int dev_id;
+
+	int (*device_alloc_chan_resources)(struct dma_chan *chan);
+	void (*device_free_chan_resources)(struct dma_chan *chan);
+	dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
+			void *dest, void *src, size_t len);
+	dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
+			struct page *page, unsigned int offset, void *kdata,
+			size_t len);
+	dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
+			struct page *dest_pg, unsigned int dest_off,
+			struct page *src_pg, unsigned int src_off, size_t len);
+	enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
+			dma_cookie_t cookie, dma_cookie_t *last,
+			dma_cookie_t *used);
+	void (*device_memcpy_issue_pending)(struct dma_chan *chan);
+};
+
+/* --- public DMA engine API --- */
+
+struct dma_client *dma_async_client_register(dma_event_callback event_callback);
+void dma_async_client_unregister(struct dma_client *client);
+void dma_async_client_chan_request(struct dma_client *client,
+		unsigned int number);
+
+/**
+ * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * @chan: DMA channel to offload copy to
+ * @dest: destination address (virtual)
+ * @src: source address (virtual)
+ * @len: length
+ *
+ * Both @dest and @src must be mappable to a bus address according to the
+ * DMA mapping API rules for streaming mappings.
+ * Both @dest and @src must stay memory resident (kernel memory or locked
+ * user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
+	void *dest, void *src, size_t len)
+{
+	int cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+}
+
+/**
+ * dma_async_memcpy_buf_to_pg - offloaded copy
+ * @chan: DMA channel to offload copy to
+ * @page: destination page
+ * @offset: offset in page to copy to
+ * @kdata: source address (virtual)
+ * @len: length
+ *
+ * Both @page/@offset and @kdata must be mappable to a bus address according
+ * to the DMA mapping API rules for streaming mappings.
+ * Both @page/@offset and @kdata must stay memory resident (kernel memory or
+ * locked user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
+	struct page *page, unsigned int offset, void *kdata, size_t len)
+{
+	int cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
+	                                             kdata, len);
+}
+
+/**
+ * dma_async_memcpy_buf_to_pg - offloaded copy
+ * @chan: DMA channel to offload copy to
+ * @dest_page: destination page
+ * @dest_off: offset in page to copy to
+ * @src_page: source page
+ * @src_off: offset in page to copy from
+ * @len: length
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
+	struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
+	unsigned int src_off, size_t len)
+{
+	int cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
+	                                            src_pg, src_off, len);
+}
+
+/**
+ * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * @chan:
+ *
+ * This allows drivers to push copies to HW in batches,
+ * reducing MMIO writes where possible.
+ */
+static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+{
+	return chan->device->device_memcpy_issue_pending(chan);
+}
+
+/**
+ * dma_async_memcpy_complete - poll for transaction completion
+ * @chan: DMA channel
+ * @cookie: transaction identifier to check status of
+ * @last: returns last completed cookie, can be NULL
+ * @used: returns last issued cookie, can be NULL
+ *
+ * If @last and @used are passed in, upon return they reflect the driver
+ * internal state and can be used with dma_async_is_complete() to check
+ * the status of multiple cookies without re-checking hardware state.
+ */
+static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
+	dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+{
+	return chan->device->device_memcpy_complete(chan, cookie, last, used);
+}
+
+/**
+ * dma_async_is_complete - test a cookie against chan state
+ * @cookie: transaction identifier to test status of
+ * @last_complete: last know completed transaction
+ * @last_used: last cookie value handed out
+ *
+ * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * the test logic is seperated for lightweight testing of multiple cookies
+ */
+static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
+			dma_cookie_t last_complete, dma_cookie_t last_used)
+{
+	if (last_complete <= last_used) {
+		if ((cookie <= last_complete) || (cookie > last_used))
+			return DMA_SUCCESS;
+	} else {
+		if ((cookie <= last_complete) && (cookie > last_used))
+			return DMA_SUCCESS;
+	}
+	return DMA_IN_PROGRESS;
+}
+
+
+/* --- DMA device --- */
+
+int dma_async_device_register(struct dma_device *device);
+void dma_async_device_unregister(struct dma_device *device);
+
+#endif /* CONFIG_DMA_ENGINE */
+#endif /* DMAENGINE_H */

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox