* [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
@ 2007-11-29 20:08 Nelson, Shannon
2007-11-29 20:17 ` Shannon Nelson
2007-11-29 23:43 ` jamal
0 siblings, 2 replies; 4+ messages in thread
From: Nelson, Shannon @ 2007-11-29 20:08 UTC (permalink / raw)
To: netdev
[RFC] I/OAT: Handle incoming udp through ioatdma
From: Shannon Nelson <shannon.nelson@intel.com>
If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try
pushing it through the ioatdma asynchronous memcpy. This is very much
the
same as the tcp copy offload. This is an RFC because we know there are
stability problems under high traffic.
This code was originally proposed by the Capstone students at Portland
State University: Aaron Armstrong, Greg Nishikawa, Sean Gayner, Toai
Nguyen,
Stephen Bekefi, and Derek Chiles.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
---
include/net/udp.h | 5 +++
net/core/user_dma.c | 1 +
net/ipv4/udp.c | 79
++++++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 81 insertions(+), 4 deletions(-)
diff --git a/include/net/udp.h b/include/net/udp.h
index 98755eb..d5e05d8 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -173,4 +173,9 @@ extern void udp_proc_unregister(struct
udp_seq_afinfo *afinfo);
extern int udp4_proc_init(void);
extern void udp4_proc_exit(void);
#endif
+
+#ifdef CONFIG_NET_DMA
+extern int sysctl_udp_dma_copybreak;
+#endif
+
#endif /* _UDP_H */
diff --git a/net/core/user_dma.c b/net/core/user_dma.c
index 0ad1cd5..e876ca4 100644
--- a/net/core/user_dma.c
+++ b/net/core/user_dma.c
@@ -34,6 +34,7 @@
#define NET_DMA_DEFAULT_COPYBREAK 4096
int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
+int sysctl_udp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
/**
* dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 69d4bd1..3b6d91c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -102,6 +102,8 @@
#include <net/route.h>
#include <net/checksum.h>
#include <net/xfrm.h>
+#include <net/netdma.h>
+#include <linux/dmaengine.h>
#include "udp_impl.h"
/*
@@ -819,6 +821,11 @@ int udp_recvmsg(struct kiocb *iocb, struct sock
*sk, struct msghdr *msg,
unsigned int ulen, copied;
int err;
int is_udplite = IS_UDPLITE(sk);
+#ifdef CONFIG_NET_DMA
+ struct dma_chan *dma_chan = NULL;
+ struct dma_pinned_list *pinned_list = NULL;
+ dma_cookie_t dma_cookie = 0;
+#endif
/*
* Check any passed addresses
@@ -829,6 +836,18 @@ int udp_recvmsg(struct kiocb *iocb, struct sock
*sk, struct msghdr *msg,
if (flags & MSG_ERRQUEUE)
return ip_recv_error(sk, msg, len);
+#ifdef CONFIG_NET_DMA
+ preempt_disable();
+ if ((len > sysctl_udp_dma_copybreak) &&
+ !(flags & MSG_PEEK) &&
+ __get_cpu_var(softnet_data).net_dma) {
+
+ preempt_enable_no_resched();
+ pinned_list = dma_pin_iovec_pages(msg->msg_iov, len);
+ } else
+ preempt_enable_no_resched();
+#endif
+
try_again:
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (!skb)
@@ -852,10 +871,30 @@ try_again:
goto csum_copy_err;
}
- if (skb_csum_unnecessary(skb))
- err = skb_copy_datagram_iovec(skb, sizeof(struct
udphdr),
- msg->msg_iov, copied
);
- else {
+ if (skb_csum_unnecessary(skb)) {
+#ifdef CONFIG_NET_DMA
+ if (pinned_list && !dma_chan)
+ dma_chan = get_softnet_dma();
+ if (dma_chan) {
+ dma_cookie = dma_skb_copy_datagram_iovec(
+ dma_chan, skb, sizeof(struct
udphdr),
+ msg->msg_iov, copied,
pinned_list);
+ if (dma_cookie < 0) {
+ printk(KERN_ALERT "dma_cookie < 0\n");
+
+ /* Exception. Bailout! */
+ if (!copied)
+ copied = -EFAULT;
+ goto out_free;
+ }
+ err = 0;
+ }
+ else
+#endif
+ err = skb_copy_datagram_iovec(skb,
+ sizeof(struct
udphdr),
+ msg->msg_iov,
copied);
+ } else {
err = skb_copy_and_csum_datagram_iovec(skb,
sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
@@ -882,6 +921,35 @@ try_again:
if (flags & MSG_TRUNC)
err = ulen;
+#ifdef CONFIG_NET_DMA
+ if (dma_chan) {
+ struct sk_buff *skb;
+ dma_cookie_t done, used;
+
+ dma_async_memcpy_issue_pending(dma_chan);
+
+ while (dma_async_memcpy_complete(dma_chan, dma_cookie,
&done,
+ &used) == DMA_IN_PROGRESS) {
+ /* do partial cleanup of sk_async_wait_queue */
+ while ((skb =
skb_peek(&sk->sk_async_wait_queue)) &&
+
(dma_async_is_complete(skb->dma_cookie,
+ done, used) == DMA_SUCCESS)) {
+ __skb_dequeue(&sk->sk_async_wait_queue);
+ kfree_skb(skb);
+ }
+ }
+
+ /* Safe to free early-copied skbs now */
+ __skb_queue_purge(&sk->sk_async_wait_queue);
+ dma_chan_put(dma_chan);
+ dma_chan = NULL;
+ }
+ if (pinned_list) {
+ dma_unpin_iovec_pages(pinned_list);
+ pinned_list = NULL;
+ }
+#endif
+
out_free:
skb_free_datagram(sk, skb);
out:
@@ -906,6 +974,9 @@ int udp_disconnect(struct sock *sk, int flags)
*/
sk->sk_state = TCP_CLOSE;
+#ifdef CONFIG_NET_DMA
+ __skb_queue_purge(&sk->sk_async_wait_queue);
+#endif
inet->daddr = 0;
inet->dport = 0;
sk->sk_bound_dev_if = 0;
--
======================================================================
Mr. Shannon Nelson LAN Access Division, Intel Corp.
Shannon.Nelson@intel.com I don't speak for Intel
(503) 712-7659 Parents can't afford to be squeamish.
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
2007-11-29 20:08 [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma Nelson, Shannon
@ 2007-11-29 20:17 ` Shannon Nelson
2007-11-29 23:43 ` jamal
1 sibling, 0 replies; 4+ messages in thread
From: Shannon Nelson @ 2007-11-29 20:17 UTC (permalink / raw)
To: netdev
Argh - mind the line breaks...
sln
On Nov 29, 2007 12:08 PM, Nelson, Shannon <shannon.nelson@intel.com> wrote:
> [RFC] I/OAT: Handle incoming udp through ioatdma
>
> From: Shannon Nelson <shannon.nelson@intel.com>
>
> If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try
> pushing it through the ioatdma asynchronous memcpy. This is very much
> the
> same as the tcp copy offload. This is an RFC because we know there are
> stability problems under high traffic.
>
> This code was originally proposed by the Capstone students at Portland
> State University: Aaron Armstrong, Greg Nishikawa, Sean Gayner, Toai
> Nguyen,
> Stephen Bekefi, and Derek Chiles.
>
> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
> ---
>
> include/net/udp.h | 5 +++
> net/core/user_dma.c | 1 +
> net/ipv4/udp.c | 79
> ++++++++++++++++++++++++++++++++++++++++++++++++---
> 3 files changed, 81 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 98755eb..d5e05d8 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -173,4 +173,9 @@ extern void udp_proc_unregister(struct
> udp_seq_afinfo *afinfo);
> extern int udp4_proc_init(void);
> extern void udp4_proc_exit(void);
> #endif
> +
> +#ifdef CONFIG_NET_DMA
> +extern int sysctl_udp_dma_copybreak;
> +#endif
> +
> #endif /* _UDP_H */
> diff --git a/net/core/user_dma.c b/net/core/user_dma.c
> index 0ad1cd5..e876ca4 100644
> --- a/net/core/user_dma.c
> +++ b/net/core/user_dma.c
> @@ -34,6 +34,7 @@
> #define NET_DMA_DEFAULT_COPYBREAK 4096
>
> int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
> +int sysctl_udp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
>
> /**
> * dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 69d4bd1..3b6d91c 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -102,6 +102,8 @@
> #include <net/route.h>
> #include <net/checksum.h>
> #include <net/xfrm.h>
> +#include <net/netdma.h>
> +#include <linux/dmaengine.h>
> #include "udp_impl.h"
>
> /*
> @@ -819,6 +821,11 @@ int udp_recvmsg(struct kiocb *iocb, struct sock
> *sk, struct msghdr *msg,
> unsigned int ulen, copied;
> int err;
> int is_udplite = IS_UDPLITE(sk);
> +#ifdef CONFIG_NET_DMA
> + struct dma_chan *dma_chan = NULL;
> + struct dma_pinned_list *pinned_list = NULL;
> + dma_cookie_t dma_cookie = 0;
> +#endif
>
> /*
> * Check any passed addresses
> @@ -829,6 +836,18 @@ int udp_recvmsg(struct kiocb *iocb, struct sock
> *sk, struct msghdr *msg,
> if (flags & MSG_ERRQUEUE)
> return ip_recv_error(sk, msg, len);
>
> +#ifdef CONFIG_NET_DMA
> + preempt_disable();
> + if ((len > sysctl_udp_dma_copybreak) &&
> + !(flags & MSG_PEEK) &&
> + __get_cpu_var(softnet_data).net_dma) {
> +
> + preempt_enable_no_resched();
> + pinned_list = dma_pin_iovec_pages(msg->msg_iov, len);
> + } else
> + preempt_enable_no_resched();
> +#endif
> +
> try_again:
> skb = skb_recv_datagram(sk, flags, noblock, &err);
> if (!skb)
> @@ -852,10 +871,30 @@ try_again:
> goto csum_copy_err;
> }
>
> - if (skb_csum_unnecessary(skb))
> - err = skb_copy_datagram_iovec(skb, sizeof(struct
> udphdr),
> - msg->msg_iov, copied
> );
> - else {
> + if (skb_csum_unnecessary(skb)) {
> +#ifdef CONFIG_NET_DMA
> + if (pinned_list && !dma_chan)
> + dma_chan = get_softnet_dma();
> + if (dma_chan) {
> + dma_cookie = dma_skb_copy_datagram_iovec(
> + dma_chan, skb, sizeof(struct
> udphdr),
> + msg->msg_iov, copied,
> pinned_list);
> + if (dma_cookie < 0) {
> + printk(KERN_ALERT "dma_cookie < 0\n");
> +
> + /* Exception. Bailout! */
> + if (!copied)
> + copied = -EFAULT;
> + goto out_free;
> + }
> + err = 0;
> + }
> + else
> +#endif
> + err = skb_copy_datagram_iovec(skb,
> + sizeof(struct
> udphdr),
> + msg->msg_iov,
> copied);
> + } else {
> err = skb_copy_and_csum_datagram_iovec(skb,
> sizeof(struct udphdr), msg->msg_iov);
>
> if (err == -EINVAL)
> @@ -882,6 +921,35 @@ try_again:
> if (flags & MSG_TRUNC)
> err = ulen;
>
> +#ifdef CONFIG_NET_DMA
> + if (dma_chan) {
> + struct sk_buff *skb;
> + dma_cookie_t done, used;
> +
> + dma_async_memcpy_issue_pending(dma_chan);
> +
> + while (dma_async_memcpy_complete(dma_chan, dma_cookie,
> &done,
> + &used) == DMA_IN_PROGRESS) {
> + /* do partial cleanup of sk_async_wait_queue */
> + while ((skb =
> skb_peek(&sk->sk_async_wait_queue)) &&
> +
> (dma_async_is_complete(skb->dma_cookie,
> + done, used) == DMA_SUCCESS)) {
> + __skb_dequeue(&sk->sk_async_wait_queue);
> + kfree_skb(skb);
> + }
> + }
> +
> + /* Safe to free early-copied skbs now */
> + __skb_queue_purge(&sk->sk_async_wait_queue);
> + dma_chan_put(dma_chan);
> + dma_chan = NULL;
> + }
> + if (pinned_list) {
> + dma_unpin_iovec_pages(pinned_list);
> + pinned_list = NULL;
> + }
> +#endif
> +
> out_free:
> skb_free_datagram(sk, skb);
> out:
> @@ -906,6 +974,9 @@ int udp_disconnect(struct sock *sk, int flags)
> */
>
> sk->sk_state = TCP_CLOSE;
> +#ifdef CONFIG_NET_DMA
> + __skb_queue_purge(&sk->sk_async_wait_queue);
> +#endif
> inet->daddr = 0;
> inet->dport = 0;
> sk->sk_bound_dev_if = 0;
>
>
> --
> ======================================================================
> Mr. Shannon Nelson LAN Access Division, Intel Corp.
> Shannon.Nelson@intel.com I don't speak for Intel
> (503) 712-7659 Parents can't afford to be squeamish.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
==============================================
Mr. Shannon Nelson Parents can't afford to be squeamish.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
2007-11-29 20:08 [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma Nelson, Shannon
2007-11-29 20:17 ` Shannon Nelson
@ 2007-11-29 23:43 ` jamal
2007-11-30 1:27 ` Nelson, Shannon
1 sibling, 1 reply; 4+ messages in thread
From: jamal @ 2007-11-29 23:43 UTC (permalink / raw)
To: Nelson, Shannon; +Cc: netdev
On Thu, 2007-29-11 at 12:08 -0800, Nelson, Shannon wrote:
> [RFC] I/OAT: Handle incoming udp through ioatdma
>
> From: Shannon Nelson <shannon.nelson@intel.com>
>
> If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try
> pushing it through the ioatdma asynchronous memcpy. This is very much
> the
> same as the tcp copy offload. This is an RFC because we know there are
> stability problems under high traffic.
What stability problems?
Is there some magic sysctl_udp_dma_copybreak threshold value where you
start seeing the benefit of IOAT-ing? Since you mentioned
"students"<evil grin here>, it would be interesting to see data where
udp starts benefitting.
cheers,
jamal
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
2007-11-29 23:43 ` jamal
@ 2007-11-30 1:27 ` Nelson, Shannon
0 siblings, 0 replies; 4+ messages in thread
From: Nelson, Shannon @ 2007-11-30 1:27 UTC (permalink / raw)
To: hadi; +Cc: netdev
>From: J Hadi Salim [mailto:j.hadi123@gmail.com] On Behalf Of jamal
>
>On Thu, 2007-29-11 at 12:08 -0800, Nelson, Shannon wrote:
>> [RFC] I/OAT: Handle incoming udp through ioatdma
>>
>> From: Shannon Nelson <shannon.nelson@intel.com>
>>
>> If the incoming udp packet is larger than
>sysctl_udp_dma_copybreak, try
>> pushing it through the ioatdma asynchronous memcpy. This is
>very much
>> the
>> same as the tcp copy offload. This is an RFC because we
>know there are
>> stability problems under high traffic.
>
>
>What stability problems?
Under a heavy stress test combining TCP and UDP traffic we would get a
kernel panic from a NULL dereference in dma_unpin_iovec_pages(). Remove
this patch and the panic goes away. Unfortunately, this problem is
below our priority line so it has received little attention since then.
We know of interest in this patch, however, so decided to release it
into the wild and see if it garners any other attention.
Part of the panic message:
Unable to handle kernel NULL pointer dereference at 0000000000000000
RIP:
[<ffffffff8025b406>] set_page_dirty_lock+0xe/0x3a
PGD 2b91f067 PUD 2a04b067 PMD 0
Oops: 0002 [1] SMP
CPU 5
Modules linked in: ioatdma dca igb i2c_dev i2c_core e1000
Pid: 10998, comm: netserver Not tainted 2.6.22.9_CB-2.05_patched #1
RIP: 0010:[<ffffffff8025b406>] [<ffffffff8025b406>]
set_page_dirty_lock+0xe/0x3a
RSP: 0018:ffff810028fedb68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff81003afea648 RCX: ffff81002a382b88
RDX: ffff810028fedfd8 RSI: 0000000000000282 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffff806c13e0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000000 R15: ffff81003afea660
FS: 00002b23ba4177c0(0000) GS:ffff810001164e40(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000029831000 CR4: 00000000000006e0
Process netserver (pid: 10998, threadinfo ffff810028fec000, task
ffff81003a881590)
Stack: ffff810039887c80 ffff81003afea648 ffff81003afea640
ffffffff8048545a
fffffffffffffff4 ffff810028fedf38 ffff81003afea658 ffff81003afea670
ffff81003afea640 ffffffff80485657 ffff81003afea660 0000000000000000
Call Trace:
[<ffffffff8048545a>] dma_unpin_iovec_pages+0x31/0x6e
[<ffffffff80485657>] dma_pin_iovec_pages+0x1c0/0x1d9
[<ffffffff804ce479>] udp_recvmsg+0x94/0x43e
[<ffffffff8049268e>] sock_common_recvmsg+0x30/0x45
[<ffffffff80491013>] sock_recvmsg+0xd5/0xed
[<ffffffff80518d48>] mutex_lock+0xd/0x1e
[<ffffffff802425ff>] autoremove_wake_function+0x0/0x2e
[<ffffffff802564ca>] find_get_page+0x21/0x50
[<ffffffff80258572>] filemap_nopage+0x180/0x2b0
[<ffffffff80262b59>] __handle_mm_fault+0x404/0x9fc
[<ffffffff80245b35>] getnstimeofday+0x32/0x8d
[<ffffffff80245b35>] getnstimeofday+0x32/0x8d
[<ffffffff80491dc8>] sys_recvfrom+0xe2/0x130
[<ffffffff802445ca>] enqueue_hrtimer+0x64/0x6b
[<ffffffff80244b18>] hrtimer_start+0xf2/0x104
[<ffffffff80234d27>] do_setitimer+0x15e/0x329
[<ffffffff80234fb9>] alarm_setitimer+0x35/0x65
[<ffffffff8020935e>] system_call+0x7e/0x83
Code: f0 0f ba 6d 00 00 19 c0 85 c0 74 08 48 89 ef e8 89 ce ff ff
RIP [<ffffffff8025b406>] set_page_dirty_lock+0xe/0x3a
RSP <ffff810028fedb68>
CR2: 0000000000000000
>
>Is there some magic sysctl_udp_dma_copybreak threshold value where you
>start seeing the benefit of IOAT-ing? Since you mentioned
>"students"<evil grin here>, it would be interesting to see data where
>udp starts benefitting.
As I said, this is low on our priority list, so this data has not been
gathered.
>cheers,
>jamal
Thanks for your interest.
sln
--
======================================================================
Mr. Shannon Nelson LAN Access Division, Intel Corp.
Shannon.Nelson@intel.com I don't speak for Intel
(503) 712-7659 Parents can't afford to be squeamish.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-11-30 1:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-29 20:08 [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma Nelson, Shannon
2007-11-29 20:17 ` Shannon Nelson
2007-11-29 23:43 ` jamal
2007-11-30 1:27 ` Nelson, Shannon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).