netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] net_dma removal, and dma debug extension
@ 2014-01-09 20:12 Dan Williams
  2014-01-09 20:16 ` [PATCH v2 1/4] net_dma: simple removal Dan Williams
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Dan Williams @ 2014-01-09 20:12 UTC (permalink / raw)
  To: dmaengine; +Cc: netdev, linux-kernel

Follow up patches to 77873803363c "net_dma: mark broken" to remove
net_dma bits and provide debug infrastructure to flag other
get_user_pages() vs dma instances that might violate the dma api.

Will takes this through the dmaengine tree once acked.

Changes since v1 [1]:

1/ net_dma removal patch has been expanded to revert other
   net_dma induced changes.

2/ updated the debug_dma_assert_idle() api to be gated on
   CONFIG_DMA_VS_CPU_DEBUG

---

Dan Williams (4):
      net_dma: simple removal
      net_dma: revert 'copied_early'
      net: make tcp_cleanup_rbuf private
      dma debug: introduce debug_dma_assert_idle()


 Documentation/ABI/removed/net_dma      |    8 +
 Documentation/networking/ip-sysctl.txt |    6 -
 drivers/dma/Kconfig                    |   12 -
 drivers/dma/Makefile                   |    1 
 drivers/dma/dmaengine.c                |  104 ------------
 drivers/dma/ioat/dma.c                 |    1 
 drivers/dma/ioat/dma.h                 |    7 -
 drivers/dma/ioat/dma_v2.c              |    1 
 drivers/dma/ioat/dma_v3.c              |    1 
 drivers/dma/iovlock.c                  |  280 --------------------------------
 include/linux/dma-debug.h              |    6 +
 include/linux/dmaengine.h              |   22 ---
 include/linux/skbuff.h                 |    8 -
 include/linux/tcp.h                    |    8 -
 include/net/netdma.h                   |   32 ----
 include/net/sock.h                     |   19 --
 include/net/tcp.h                      |    9 -
 kernel/sysctl_binary.c                 |    1 
 lib/Kconfig.debug                      |   13 +
 lib/dma-debug.c                        |  130 +++++++++++++--
 mm/memory.c                            |    3 
 net/core/Makefile                      |    1 
 net/core/dev.c                         |   10 -
 net/core/sock.c                        |    6 -
 net/core/user_dma.c                    |  131 ---------------
 net/dccp/proto.c                       |    4 
 net/ipv4/sysctl_net_ipv4.c             |    9 -
 net/ipv4/tcp.c                         |  157 ++----------------
 net/ipv4/tcp_input.c                   |   83 +--------
 net/ipv4/tcp_ipv4.c                    |   18 --
 net/ipv6/tcp_ipv6.c                    |   13 -
 net/llc/af_llc.c                       |   10 +
 32 files changed, 186 insertions(+), 928 deletions(-)
 create mode 100644 Documentation/ABI/removed/net_dma
 delete mode 100644 drivers/dma/iovlock.c
 delete mode 100644 include/net/netdma.h
 delete mode 100644 net/core/user_dma.c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] net_dma: simple removal
  2014-01-09 20:12 [PATCH v2 0/4] net_dma removal, and dma debug extension Dan Williams
@ 2014-01-09 20:16 ` Dan Williams
  2014-01-09 20:16 ` [PATCH v2 2/4] net_dma: revert 'copied_early' Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2014-01-09 20:16 UTC (permalink / raw)
  To: dmaengine
  Cc: Alexander Duyck, Dave Jiang, Vinod Koul, netdev, David Whipple,
	linux-kernel, David S. Miller

Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.

This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.

Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Updated the changelog to foreshadow the other reverts, otherwise the
same straightforward removal as before.

 Documentation/ABI/removed/net_dma      |    8 +
 Documentation/networking/ip-sysctl.txt |    6 -
 drivers/dma/Kconfig                    |   12 -
 drivers/dma/Makefile                   |    1 
 drivers/dma/dmaengine.c                |  104 ------------
 drivers/dma/ioat/dma.c                 |    1 
 drivers/dma/ioat/dma.h                 |    7 -
 drivers/dma/ioat/dma_v2.c              |    1 
 drivers/dma/ioat/dma_v3.c              |    1 
 drivers/dma/iovlock.c                  |  280 --------------------------------
 include/linux/dmaengine.h              |   22 ---
 include/linux/skbuff.h                 |    8 -
 include/linux/tcp.h                    |    8 -
 include/net/netdma.h                   |   32 ----
 include/net/sock.h                     |   19 --
 include/net/tcp.h                      |    8 -
 kernel/sysctl_binary.c                 |    1 
 net/core/Makefile                      |    1 
 net/core/dev.c                         |   10 -
 net/core/sock.c                        |    6 -
 net/core/user_dma.c                    |  131 ---------------
 net/dccp/proto.c                       |    4 
 net/ipv4/sysctl_net_ipv4.c             |    9 -
 net/ipv4/tcp.c                         |  147 ++---------------
 net/ipv4/tcp_input.c                   |   61 -------
 net/ipv4/tcp_ipv4.c                    |   18 --
 net/ipv6/tcp_ipv6.c                    |   13 -
 net/llc/af_llc.c                       |   10 +
 28 files changed, 35 insertions(+), 894 deletions(-)
 create mode 100644 Documentation/ABI/removed/net_dma
 delete mode 100644 drivers/dma/iovlock.c
 delete mode 100644 include/net/netdma.h
 delete mode 100644 net/core/user_dma.c

diff --git a/Documentation/ABI/removed/net_dma b/Documentation/ABI/removed/net_dma
new file mode 100644
index 000000000000..a173aecc2f18
--- /dev/null
+++ b/Documentation/ABI/removed/net_dma
@@ -0,0 +1,8 @@
+What:		tcp_dma_copybreak sysctl
+Date:		Removed in kernel v3.13
+Contact:	Dan Williams <dan.j.williams@intel.com>
+Description:
+	Formerly the lower limit, in bytes, of the size of socket reads
+	that will be offloaded to a DMA copy engine.  Removed due to
+	coherency issues of the cpu potentially touching the buffers
+	while dma is in flight.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 3c12d9a7ed00..bdd8a67f0be2 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -538,12 +538,6 @@ tcp_workaround_signed_windows - BOOLEAN
 	not receive a window scaling option from them.
 	Default: 0
 
-tcp_dma_copybreak - INTEGER
-	Lower limit, in bytes, of the size of socket reads that will be
-	offloaded to a DMA copy engine, if one is present in the system
-	and CONFIG_NET_DMA is enabled.
-	Default: 4096
-
 tcp_thin_linear_timeouts - BOOLEAN
 	Enable dynamic triggering of linear timeouts for thin streams.
 	If set, a check is performed upon retransmission by timeout to
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index c823daaf9043..b24f13195272 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -351,18 +351,6 @@ config DMA_OF
 comment "DMA Clients"
 	depends on DMA_ENGINE
 
-config NET_DMA
-	bool "Network: TCP receive copy offload"
-	depends on DMA_ENGINE && NET
-	default (INTEL_IOATDMA || FSL_DMA)
-	depends on BROKEN
-	help
-	  This enables the use of DMA engines in the network stack to
-	  offload receive copy-to-user operations, freeing CPU cycles.
-
-	  Say Y here if you enabled INTEL_IOATDMA or FSL_DMA, otherwise
-	  say N.
-
 config ASYNC_TX_DMA
 	bool "Async_tx: Offload support for the async_tx api"
 	depends on DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 0ce2da97e429..024b008a25de 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -6,7 +6,6 @@ obj-$(CONFIG_DMA_VIRTUAL_CHANNELS) += virt-dma.o
 obj-$(CONFIG_DMA_ACPI) += acpi-dma.o
 obj-$(CONFIG_DMA_OF) += of-dma.o
 
-obj-$(CONFIG_NET_DMA) += iovlock.o
 obj-$(CONFIG_INTEL_MID_DMAC) += intel_mid_dma.o
 obj-$(CONFIG_DMATEST) += dmatest.o
 obj-$(CONFIG_INTEL_IOATDMA) += ioat/
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index ef63b9058f3c..d7f4f4e0d71f 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1029,110 +1029,6 @@ dmaengine_get_unmap_data(struct device *dev, int nr, gfp_t flags)
 }
 EXPORT_SYMBOL(dmaengine_get_unmap_data);
 
-/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
- * @chan: DMA channel to offload copy to
- * @dest_pg: destination page
- * @dest_off: offset in page to copy to
- * @src_pg: source page
- * @src_off: offset in page to copy from
- * @len: length
- *
- * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
- * address according to the DMA mapping API rules for streaming mappings.
- * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
- * (kernel memory or locked user space pages).
- */
-dma_cookie_t
-dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg,
-	unsigned int dest_off, struct page *src_pg, unsigned int src_off,
-	size_t len)
-{
-	struct dma_device *dev = chan->device;
-	struct dma_async_tx_descriptor *tx;
-	struct dmaengine_unmap_data *unmap;
-	dma_cookie_t cookie;
-	unsigned long flags;
-
-	unmap = dmaengine_get_unmap_data(dev->dev, 2, GFP_NOWAIT);
-	if (!unmap)
-		return -ENOMEM;
-
-	unmap->to_cnt = 1;
-	unmap->from_cnt = 1;
-	unmap->addr[0] = dma_map_page(dev->dev, src_pg, src_off, len,
-				      DMA_TO_DEVICE);
-	unmap->addr[1] = dma_map_page(dev->dev, dest_pg, dest_off, len,
-				      DMA_FROM_DEVICE);
-	unmap->len = len;
-	flags = DMA_CTRL_ACK;
-	tx = dev->device_prep_dma_memcpy(chan, unmap->addr[1], unmap->addr[0],
-					 len, flags);
-
-	if (!tx) {
-		dmaengine_unmap_put(unmap);
-		return -ENOMEM;
-	}
-
-	dma_set_unmap(tx, unmap);
-	cookie = tx->tx_submit(tx);
-	dmaengine_unmap_put(unmap);
-
-	preempt_disable();
-	__this_cpu_add(chan->local->bytes_transferred, len);
-	__this_cpu_inc(chan->local->memcpy_count);
-	preempt_enable();
-
-	return cookie;
-}
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-
-/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
- * @chan: DMA channel to offload copy to
- * @dest: destination address (virtual)
- * @src: source address (virtual)
- * @len: length
- *
- * Both @dest and @src must be mappable to a bus address according to the
- * DMA mapping API rules for streaming mappings.
- * Both @dest and @src must stay memory resident (kernel memory or locked
- * user space pages).
- */
-dma_cookie_t
-dma_async_memcpy_buf_to_buf(struct dma_chan *chan, void *dest,
-			    void *src, size_t len)
-{
-	return dma_async_memcpy_pg_to_pg(chan, virt_to_page(dest),
-					 (unsigned long) dest & ~PAGE_MASK,
-					 virt_to_page(src),
-					 (unsigned long) src & ~PAGE_MASK, len);
-}
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-
-/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
- * @chan: DMA channel to offload copy to
- * @page: destination page
- * @offset: offset in page to copy to
- * @kdata: source address (virtual)
- * @len: length
- *
- * Both @page/@offset and @kdata must be mappable to a bus address according
- * to the DMA mapping API rules for streaming mappings.
- * Both @page/@offset and @kdata must stay memory resident (kernel memory or
- * locked user space pages)
- */
-dma_cookie_t
-dma_async_memcpy_buf_to_pg(struct dma_chan *chan, struct page *page,
-			   unsigned int offset, void *kdata, size_t len)
-{
-	return dma_async_memcpy_pg_to_pg(chan, page, offset,
-					 virt_to_page(kdata),
-					 (unsigned long) kdata & ~PAGE_MASK, len);
-}
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-
 void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
 	struct dma_chan *chan)
 {
diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
index 1a49c777607c..97fa394ca855 100644
--- a/drivers/dma/ioat/dma.c
+++ b/drivers/dma/ioat/dma.c
@@ -1175,7 +1175,6 @@ int ioat1_dma_probe(struct ioatdma_device *device, int dca)
 	err = ioat_probe(device);
 	if (err)
 		return err;
-	ioat_set_tcp_copy_break(4096);
 	err = ioat_register(device);
 	if (err)
 		return err;
diff --git a/drivers/dma/ioat/dma.h b/drivers/dma/ioat/dma.h
index 11fb877ddca9..664ec9cbd651 100644
--- a/drivers/dma/ioat/dma.h
+++ b/drivers/dma/ioat/dma.h
@@ -214,13 +214,6 @@ __dump_desc_dbg(struct ioat_chan_common *chan, struct ioat_dma_descriptor *hw,
 #define dump_desc_dbg(c, d) \
 	({ if (d) __dump_desc_dbg(&c->base, d->hw, &d->txd, desc_id(d)); 0; })
 
-static inline void ioat_set_tcp_copy_break(unsigned long copybreak)
-{
-	#ifdef CONFIG_NET_DMA
-	sysctl_tcp_dma_copybreak = copybreak;
-	#endif
-}
-
 static inline struct ioat_chan_common *
 ioat_chan_by_index(struct ioatdma_device *device, int index)
 {
diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index 5d3affe7e976..31e8098e444f 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -900,7 +900,6 @@ int ioat2_dma_probe(struct ioatdma_device *device, int dca)
 	err = ioat_probe(device);
 	if (err)
 		return err;
-	ioat_set_tcp_copy_break(2048);
 
 	list_for_each_entry(c, &dma->channels, device_node) {
 		chan = to_chan_common(c);
diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
index 820817e97e62..4bb81346bee2 100644
--- a/drivers/dma/ioat/dma_v3.c
+++ b/drivers/dma/ioat/dma_v3.c
@@ -1652,7 +1652,6 @@ int ioat3_dma_probe(struct ioatdma_device *device, int dca)
 	err = ioat_probe(device);
 	if (err)
 		return err;
-	ioat_set_tcp_copy_break(262144);
 
 	list_for_each_entry(c, &dma->channels, device_node) {
 		chan = to_chan_common(c);
diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
deleted file mode 100644
index bb48a57c2fc1..000000000000
--- a/drivers/dma/iovlock.c
+++ /dev/null
@@ -1,280 +0,0 @@
-/*
- * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
- * Portions based on net/core/datagram.c and copyrighted by their authors.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59
- * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
- *
- * The full GNU General Public License is included in this distribution in the
- * file called COPYING.
- */
-
-/*
- * This code allows the net stack to make use of a DMA engine for
- * skb to iovec copies.
- */
-
-#include <linux/dmaengine.h>
-#include <linux/pagemap.h>
-#include <linux/slab.h>
-#include <net/tcp.h> /* for memcpy_toiovec */
-#include <asm/io.h>
-#include <asm/uaccess.h>
-
-static int num_pages_spanned(struct iovec *iov)
-{
-	return
-	((PAGE_ALIGN((unsigned long)iov->iov_base + iov->iov_len) -
-	((unsigned long)iov->iov_base & PAGE_MASK)) >> PAGE_SHIFT);
-}
-
-/*
- * Pin down all the iovec pages needed for len bytes.
- * Return a struct dma_pinned_list to keep track of pages pinned down.
- *
- * We are allocating a single chunk of memory, and then carving it up into
- * 3 sections, the latter 2 whose size depends on the number of iovecs and the
- * total number of pages, respectively.
- */
-struct dma_pinned_list *dma_pin_iovec_pages(struct iovec *iov, size_t len)
-{
-	struct dma_pinned_list *local_list;
-	struct page **pages;
-	int i;
-	int ret;
-	int nr_iovecs = 0;
-	int iovec_len_used = 0;
-	int iovec_pages_used = 0;
-
-	/* don't pin down non-user-based iovecs */
-	if (segment_eq(get_fs(), KERNEL_DS))
-		return NULL;
-
-	/* determine how many iovecs/pages there are, up front */
-	do {
-		iovec_len_used += iov[nr_iovecs].iov_len;
-		iovec_pages_used += num_pages_spanned(&iov[nr_iovecs]);
-		nr_iovecs++;
-	} while (iovec_len_used < len);
-
-	/* single kmalloc for pinned list, page_list[], and the page arrays */
-	local_list = kmalloc(sizeof(*local_list)
-		+ (nr_iovecs * sizeof (struct dma_page_list))
-		+ (iovec_pages_used * sizeof (struct page*)), GFP_KERNEL);
-	if (!local_list)
-		goto out;
-
-	/* list of pages starts right after the page list array */
-	pages = (struct page **) &local_list->page_list[nr_iovecs];
-
-	local_list->nr_iovecs = 0;
-
-	for (i = 0; i < nr_iovecs; i++) {
-		struct dma_page_list *page_list = &local_list->page_list[i];
-
-		len -= iov[i].iov_len;
-
-		if (!access_ok(VERIFY_WRITE, iov[i].iov_base, iov[i].iov_len))
-			goto unpin;
-
-		page_list->nr_pages = num_pages_spanned(&iov[i]);
-		page_list->base_address = iov[i].iov_base;
-
-		page_list->pages = pages;
-		pages += page_list->nr_pages;
-
-		/* pin pages down */
-		down_read(&current->mm->mmap_sem);
-		ret = get_user_pages(
-			current,
-			current->mm,
-			(unsigned long) iov[i].iov_base,
-			page_list->nr_pages,
-			1,	/* write */
-			0,	/* force */
-			page_list->pages,
-			NULL);
-		up_read(&current->mm->mmap_sem);
-
-		if (ret != page_list->nr_pages)
-			goto unpin;
-
-		local_list->nr_iovecs = i + 1;
-	}
-
-	return local_list;
-
-unpin:
-	dma_unpin_iovec_pages(local_list);
-out:
-	return NULL;
-}
-
-void dma_unpin_iovec_pages(struct dma_pinned_list *pinned_list)
-{
-	int i, j;
-
-	if (!pinned_list)
-		return;
-
-	for (i = 0; i < pinned_list->nr_iovecs; i++) {
-		struct dma_page_list *page_list = &pinned_list->page_list[i];
-		for (j = 0; j < page_list->nr_pages; j++) {
-			set_page_dirty_lock(page_list->pages[j]);
-			page_cache_release(page_list->pages[j]);
-		}
-	}
-
-	kfree(pinned_list);
-}
-
-
-/*
- * We have already pinned down the pages we will be using in the iovecs.
- * Each entry in iov array has corresponding entry in pinned_list->page_list.
- * Using array indexing to keep iov[] and page_list[] in sync.
- * Initial elements in iov array's iov->iov_len will be 0 if already copied into
- *   by another call.
- * iov array length remaining guaranteed to be bigger than len.
- */
-dma_cookie_t dma_memcpy_to_iovec(struct dma_chan *chan, struct iovec *iov,
-	struct dma_pinned_list *pinned_list, unsigned char *kdata, size_t len)
-{
-	int iov_byte_offset;
-	int copy;
-	dma_cookie_t dma_cookie = 0;
-	int iovec_idx;
-	int page_idx;
-
-	if (!chan)
-		return memcpy_toiovec(iov, kdata, len);
-
-	iovec_idx = 0;
-	while (iovec_idx < pinned_list->nr_iovecs) {
-		struct dma_page_list *page_list;
-
-		/* skip already used-up iovecs */
-		while (!iov[iovec_idx].iov_len)
-			iovec_idx++;
-
-		page_list = &pinned_list->page_list[iovec_idx];
-
-		iov_byte_offset = ((unsigned long)iov[iovec_idx].iov_base & ~PAGE_MASK);
-		page_idx = (((unsigned long)iov[iovec_idx].iov_base & PAGE_MASK)
-			 - ((unsigned long)page_list->base_address & PAGE_MASK)) >> PAGE_SHIFT;
-
-		/* break up copies to not cross page boundary */
-		while (iov[iovec_idx].iov_len) {
-			copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
-			copy = min_t(int, copy, iov[iovec_idx].iov_len);
-
-			dma_cookie = dma_async_memcpy_buf_to_pg(chan,
-					page_list->pages[page_idx],
-					iov_byte_offset,
-					kdata,
-					copy);
-			/* poll for a descriptor slot */
-			if (unlikely(dma_cookie < 0)) {
-				dma_async_issue_pending(chan);
-				continue;
-			}
-
-			len -= copy;
-			iov[iovec_idx].iov_len -= copy;
-			iov[iovec_idx].iov_base += copy;
-
-			if (!len)
-				return dma_cookie;
-
-			kdata += copy;
-			iov_byte_offset = 0;
-			page_idx++;
-		}
-		iovec_idx++;
-	}
-
-	/* really bad if we ever run out of iovecs */
-	BUG();
-	return -EFAULT;
-}
-
-dma_cookie_t dma_memcpy_pg_to_iovec(struct dma_chan *chan, struct iovec *iov,
-	struct dma_pinned_list *pinned_list, struct page *page,
-	unsigned int offset, size_t len)
-{
-	int iov_byte_offset;
-	int copy;
-	dma_cookie_t dma_cookie = 0;
-	int iovec_idx;
-	int page_idx;
-	int err;
-
-	/* this needs as-yet-unimplemented buf-to-buff, so punt. */
-	/* TODO: use dma for this */
-	if (!chan || !pinned_list) {
-		u8 *vaddr = kmap(page);
-		err = memcpy_toiovec(iov, vaddr + offset, len);
-		kunmap(page);
-		return err;
-	}
-
-	iovec_idx = 0;
-	while (iovec_idx < pinned_list->nr_iovecs) {
-		struct dma_page_list *page_list;
-
-		/* skip already used-up iovecs */
-		while (!iov[iovec_idx].iov_len)
-			iovec_idx++;
-
-		page_list = &pinned_list->page_list[iovec_idx];
-
-		iov_byte_offset = ((unsigned long)iov[iovec_idx].iov_base & ~PAGE_MASK);
-		page_idx = (((unsigned long)iov[iovec_idx].iov_base & PAGE_MASK)
-			 - ((unsigned long)page_list->base_address & PAGE_MASK)) >> PAGE_SHIFT;
-
-		/* break up copies to not cross page boundary */
-		while (iov[iovec_idx].iov_len) {
-			copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
-			copy = min_t(int, copy, iov[iovec_idx].iov_len);
-
-			dma_cookie = dma_async_memcpy_pg_to_pg(chan,
-					page_list->pages[page_idx],
-					iov_byte_offset,
-					page,
-					offset,
-					copy);
-			/* poll for a descriptor slot */
-			if (unlikely(dma_cookie < 0)) {
-				dma_async_issue_pending(chan);
-				continue;
-			}
-
-			len -= copy;
-			iov[iovec_idx].iov_len -= copy;
-			iov[iovec_idx].iov_base += copy;
-
-			if (!len)
-				return dma_cookie;
-
-			offset += copy;
-			iov_byte_offset = 0;
-			page_idx++;
-		}
-		iovec_idx++;
-	}
-
-	/* really bad if we ever run out of iovecs */
-	BUG();
-	return -EFAULT;
-}
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 41cf0c399288..890545871af0 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -875,18 +875,6 @@ static inline void dmaengine_put(void)
 }
 #endif
 
-#ifdef CONFIG_NET_DMA
-#define net_dmaengine_get()	dmaengine_get()
-#define net_dmaengine_put()	dmaengine_put()
-#else
-static inline void net_dmaengine_get(void)
-{
-}
-static inline void net_dmaengine_put(void)
-{
-}
-#endif
-
 #ifdef CONFIG_ASYNC_TX_DMA
 #define async_dmaengine_get()	dmaengine_get()
 #define async_dmaengine_put()	dmaengine_put()
@@ -908,16 +896,8 @@ async_dma_find_channel(enum dma_transaction_type type)
 	return NULL;
 }
 #endif /* CONFIG_ASYNC_TX_DMA */
-
-dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
-	void *dest, void *src, size_t len);
-dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
-	struct page *page, unsigned int offset, void *kdata, size_t len);
-dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
-	struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
-	unsigned int src_off, size_t len);
 void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
-	struct dma_chan *chan);
+				  struct dma_chan *chan);
 
 static inline void async_tx_ack(struct dma_async_tx_descriptor *tx)
 {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bec1cc7d5e3c..ac4f84dfa84b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -28,7 +28,6 @@
 #include <linux/textsearch.h>
 #include <net/checksum.h>
 #include <linux/rcupdate.h>
-#include <linux/dmaengine.h>
 #include <linux/hrtimer.h>
 #include <linux/dma-mapping.h>
 #include <linux/netdev_features.h>
@@ -496,11 +495,8 @@ struct sk_buff {
 	/* 6/8 bit hole (depending on ndisc_nodetype presence) */
 	kmemcheck_bitfield_end(flags2);
 
-#if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL
-	union {
-		unsigned int	napi_id;
-		dma_cookie_t	dma_cookie;
-	};
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	unsigned int	napi_id;
 #endif
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index d68633452d9b..26f16021ce1d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -19,7 +19,6 @@
 
 
 #include <linux/skbuff.h>
-#include <linux/dmaengine.h>
 #include <net/sock.h>
 #include <net/inet_connection_sock.h>
 #include <net/inet_timewait_sock.h>
@@ -169,13 +168,6 @@ struct tcp_sock {
 		struct iovec		*iov;
 		int			memory;
 		int			len;
-#ifdef CONFIG_NET_DMA
-		/* members for async copy */
-		struct dma_chan		*dma_chan;
-		int			wakeup;
-		struct dma_pinned_list	*pinned_list;
-		dma_cookie_t		dma_cookie;
-#endif
 	} ucopy;
 
 	u32	snd_wl1;	/* Sequence for window update		*/
diff --git a/include/net/netdma.h b/include/net/netdma.h
deleted file mode 100644
index 8ba8ce284eeb..000000000000
--- a/include/net/netdma.h
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59
- * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
- *
- * The full GNU General Public License is included in this distribution in the
- * file called COPYING.
- */
-#ifndef NETDMA_H
-#define NETDMA_H
-#ifdef CONFIG_NET_DMA
-#include <linux/dmaengine.h>
-#include <linux/skbuff.h>
-
-int dma_skb_copy_datagram_iovec(struct dma_chan* chan,
-		struct sk_buff *skb, int offset, struct iovec *to,
-		size_t len, struct dma_pinned_list *pinned_list);
-
-#endif /* CONFIG_NET_DMA */
-#endif /* NETDMA_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index e3a18ff0c38b..9d5f716e921e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -231,7 +231,6 @@ struct cg_proto;
   *	@sk_receive_queue: incoming packets
   *	@sk_wmem_alloc: transmit queue bytes committed
   *	@sk_write_queue: Packet sending queue
-  *	@sk_async_wait_queue: DMA copied packets
   *	@sk_omem_alloc: "o" is "option" or "other"
   *	@sk_wmem_queued: persistent queue size
   *	@sk_forward_alloc: space allocated forward
@@ -354,10 +353,6 @@ struct sock {
 	struct sk_filter __rcu	*sk_filter;
 	struct socket_wq __rcu	*sk_wq;
 
-#ifdef CONFIG_NET_DMA
-	struct sk_buff_head	sk_async_wait_queue;
-#endif
-
 #ifdef CONFIG_XFRM
 	struct xfrm_policy	*sk_policy[2];
 #endif
@@ -2200,27 +2195,15 @@ void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags);
  * sk_eat_skb - Release a skb if it is no longer needed
  * @sk: socket to eat this skb from
  * @skb: socket buffer to eat
- * @copied_early: flag indicating whether DMA operations copied this data early
  *
  * This routine must be called with interrupts disabled or with the socket
  * locked so that the sk_buff queue operation is ok.
 */
-#ifdef CONFIG_NET_DMA
-static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, bool copied_early)
-{
-	__skb_unlink(skb, &sk->sk_receive_queue);
-	if (!copied_early)
-		__kfree_skb(skb);
-	else
-		__skb_queue_tail(&sk->sk_async_wait_queue, skb);
-}
-#else
-static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, bool copied_early)
+static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
 {
 	__skb_unlink(skb, &sk->sk_receive_queue);
 	__kfree_skb(skb);
 }
-#endif
 
 static inline
 struct net *sock_net(const struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70e55d200610..084c163e9d40 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -27,7 +27,6 @@
 #include <linux/cache.h>
 #include <linux/percpu.h>
 #include <linux/skbuff.h>
-#include <linux/dmaengine.h>
 #include <linux/crypto.h>
 #include <linux/cryptohash.h>
 #include <linux/kref.h>
@@ -267,7 +266,6 @@ extern int sysctl_tcp_adv_win_scale;
 extern int sysctl_tcp_tw_reuse;
 extern int sysctl_tcp_frto;
 extern int sysctl_tcp_low_latency;
-extern int sysctl_tcp_dma_copybreak;
 extern int sysctl_tcp_nometrics_save;
 extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
@@ -1032,12 +1030,6 @@ static inline void tcp_prequeue_init(struct tcp_sock *tp)
 	tp->ucopy.len = 0;
 	tp->ucopy.memory = 0;
 	skb_queue_head_init(&tp->ucopy.prequeue);
-#ifdef CONFIG_NET_DMA
-	tp->ucopy.dma_chan = NULL;
-	tp->ucopy.wakeup = 0;
-	tp->ucopy.pinned_list = NULL;
-	tp->ucopy.dma_cookie = 0;
-#endif
 }
 
 bool tcp_prequeue(struct sock *sk, struct sk_buff *skb);
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 653cbbd9e7ad..d457005acedf 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -390,7 +390,6 @@ static const struct bin_table bin_net_ipv4_table[] = {
 	{ CTL_INT,	NET_TCP_MTU_PROBING,			"tcp_mtu_probing" },
 	{ CTL_INT,	NET_TCP_BASE_MSS,			"tcp_base_mss" },
 	{ CTL_INT,	NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS,	"tcp_workaround_signed_windows" },
-	{ CTL_INT,	NET_TCP_DMA_COPYBREAK,			"tcp_dma_copybreak" },
 	{ CTL_INT,	NET_TCP_SLOW_START_AFTER_IDLE,		"tcp_slow_start_after_idle" },
 	{ CTL_INT,	NET_CIPSOV4_CACHE_ENABLE,		"cipso_cache_enable" },
 	{ CTL_INT,	NET_CIPSOV4_CACHE_BUCKET_SIZE,		"cipso_cache_bucket_size" },
diff --git a/net/core/Makefile b/net/core/Makefile
index b33b996f5dd6..5f98e5983bd3 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -16,7 +16,6 @@ obj-y += net-sysfs.o
 obj-$(CONFIG_PROC_FS) += net-procfs.o
 obj-$(CONFIG_NET_PKTGEN) += pktgen.o
 obj-$(CONFIG_NETPOLL) += netpoll.o
-obj-$(CONFIG_NET_DMA) += user_dma.o
 obj-$(CONFIG_FIB_RULES) += fib_rules.o
 obj-$(CONFIG_TRACEPOINTS) += net-traces.o
 obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
diff --git a/net/core/dev.c b/net/core/dev.c
index ba3b7ea5ebb3..677a5a4dcca7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1262,7 +1262,6 @@ static int __dev_open(struct net_device *dev)
 		clear_bit(__LINK_STATE_START, &dev->state);
 	else {
 		dev->flags |= IFF_UP;
-		net_dmaengine_get();
 		dev_set_rx_mode(dev);
 		dev_activate(dev);
 		add_device_randomness(dev->dev_addr, dev->addr_len);
@@ -1338,7 +1337,6 @@ static int __dev_close_many(struct list_head *head)
 			ops->ndo_stop(dev);
 
 		dev->flags &= ~IFF_UP;
-		net_dmaengine_put();
 	}
 
 	return 0;
@@ -4362,14 +4360,6 @@ static void net_rx_action(struct softirq_action *h)
 out:
 	net_rps_action_and_irq_enable(sd);
 
-#ifdef CONFIG_NET_DMA
-	/*
-	 * There may not be any more sk_buffs coming right now, so push
-	 * any pending DMA copies to hardware
-	 */
-	dma_issue_pending_all();
-#endif
-
 	return;
 
 softnet_break:
diff --git a/net/core/sock.c b/net/core/sock.c
index ab20ed9b0f31..411dab3a5726 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1461,9 +1461,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		atomic_set(&newsk->sk_omem_alloc, 0);
 		skb_queue_head_init(&newsk->sk_receive_queue);
 		skb_queue_head_init(&newsk->sk_write_queue);
-#ifdef CONFIG_NET_DMA
-		skb_queue_head_init(&newsk->sk_async_wait_queue);
-#endif
 
 		spin_lock_init(&newsk->sk_dst_lock);
 		rwlock_init(&newsk->sk_callback_lock);
@@ -2290,9 +2287,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	skb_queue_head_init(&sk->sk_receive_queue);
 	skb_queue_head_init(&sk->sk_write_queue);
 	skb_queue_head_init(&sk->sk_error_queue);
-#ifdef CONFIG_NET_DMA
-	skb_queue_head_init(&sk->sk_async_wait_queue);
-#endif
 
 	sk->sk_send_head	=	NULL;
 
diff --git a/net/core/user_dma.c b/net/core/user_dma.c
deleted file mode 100644
index 1b5fefdb8198..000000000000
--- a/net/core/user_dma.c
+++ /dev/null
@@ -1,131 +0,0 @@
-/*
- * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
- * Portions based on net/core/datagram.c and copyrighted by their authors.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59
- * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
- *
- * The full GNU General Public License is included in this distribution in the
- * file called COPYING.
- */
-
-/*
- * This code allows the net stack to make use of a DMA engine for
- * skb to iovec copies.
- */
-
-#include <linux/dmaengine.h>
-#include <linux/socket.h>
-#include <linux/export.h>
-#include <net/tcp.h>
-#include <net/netdma.h>
-
-#define NET_DMA_DEFAULT_COPYBREAK 4096
-
-int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
-EXPORT_SYMBOL(sysctl_tcp_dma_copybreak);
-
-/**
- *	dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
- *	@skb - buffer to copy
- *	@offset - offset in the buffer to start copying from
- *	@iovec - io vector to copy to
- *	@len - amount of data to copy from buffer to iovec
- *	@pinned_list - locked iovec buffer data
- *
- *	Note: the iovec is modified during the copy.
- */
-int dma_skb_copy_datagram_iovec(struct dma_chan *chan,
-			struct sk_buff *skb, int offset, struct iovec *to,
-			size_t len, struct dma_pinned_list *pinned_list)
-{
-	int start = skb_headlen(skb);
-	int i, copy = start - offset;
-	struct sk_buff *frag_iter;
-	dma_cookie_t cookie = 0;
-
-	/* Copy header. */
-	if (copy > 0) {
-		if (copy > len)
-			copy = len;
-		cookie = dma_memcpy_to_iovec(chan, to, pinned_list,
-					    skb->data + offset, copy);
-		if (cookie < 0)
-			goto fault;
-		len -= copy;
-		if (len == 0)
-			goto end;
-		offset += copy;
-	}
-
-	/* Copy paged appendix. Hmm... why does this look so complicated? */
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		int end;
-		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-
-		WARN_ON(start > offset + len);
-
-		end = start + skb_frag_size(frag);
-		copy = end - offset;
-		if (copy > 0) {
-			struct page *page = skb_frag_page(frag);
-
-			if (copy > len)
-				copy = len;
-
-			cookie = dma_memcpy_pg_to_iovec(chan, to, pinned_list, page,
-					frag->page_offset + offset - start, copy);
-			if (cookie < 0)
-				goto fault;
-			len -= copy;
-			if (len == 0)
-				goto end;
-			offset += copy;
-		}
-		start = end;
-	}
-
-	skb_walk_frags(skb, frag_iter) {
-		int end;
-
-		WARN_ON(start > offset + len);
-
-		end = start + frag_iter->len;
-		copy = end - offset;
-		if (copy > 0) {
-			if (copy > len)
-				copy = len;
-			cookie = dma_skb_copy_datagram_iovec(chan, frag_iter,
-							     offset - start,
-							     to, copy,
-							     pinned_list);
-			if (cookie < 0)
-				goto fault;
-			len -= copy;
-			if (len == 0)
-				goto end;
-			offset += copy;
-		}
-		start = end;
-	}
-
-end:
-	if (!len) {
-		skb->dma_cookie = cookie;
-		return cookie;
-	}
-
-fault:
-	return -EFAULT;
-}
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index eb892b4f4814..f9076f295b13 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -848,7 +848,7 @@ int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		default:
 			dccp_pr_debug("packet_type=%s\n",
 				      dccp_packet_name(dh->dccph_type));
-			sk_eat_skb(sk, skb, false);
+			sk_eat_skb(sk, skb);
 		}
 verify_sock_status:
 		if (sock_flag(sk, SOCK_DONE)) {
@@ -905,7 +905,7 @@ verify_sock_status:
 			len = skb->len;
 	found_fin_ok:
 		if (!(flags & MSG_PEEK))
-			sk_eat_skb(sk, skb, false);
+			sk_eat_skb(sk, skb);
 		break;
 	} while (1);
 out:
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 3d69ec8dac57..79a90b92e12d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -642,15 +642,6 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
-#ifdef CONFIG_NET_DMA
-	{
-		.procname	= "tcp_dma_copybreak",
-		.data		= &sysctl_tcp_dma_copybreak,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec
-	},
-#endif
 	{
 		.procname	= "tcp_slow_start_after_idle",
 		.data		= &sysctl_tcp_slow_start_after_idle,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c4638e6f0238..8dc913dfbaef 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -274,7 +274,6 @@
 #include <net/tcp.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
-#include <net/netdma.h>
 #include <net/sock.h>
 
 #include <asm/uaccess.h>
@@ -1409,39 +1408,6 @@ static void tcp_prequeue_process(struct sock *sk)
 	tp->ucopy.memory = 0;
 }
 
-#ifdef CONFIG_NET_DMA
-static void tcp_service_net_dma(struct sock *sk, bool wait)
-{
-	dma_cookie_t done, used;
-	dma_cookie_t last_issued;
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (!tp->ucopy.dma_chan)
-		return;
-
-	last_issued = tp->ucopy.dma_cookie;
-	dma_async_issue_pending(tp->ucopy.dma_chan);
-
-	do {
-		if (dma_async_is_tx_complete(tp->ucopy.dma_chan,
-					      last_issued, &done,
-					      &used) == DMA_COMPLETE) {
-			/* Safe to free early-copied skbs now */
-			__skb_queue_purge(&sk->sk_async_wait_queue);
-			break;
-		} else {
-			struct sk_buff *skb;
-			while ((skb = skb_peek(&sk->sk_async_wait_queue)) &&
-			       (dma_async_is_complete(skb->dma_cookie, done,
-						      used) == DMA_COMPLETE)) {
-				__skb_dequeue(&sk->sk_async_wait_queue);
-				kfree_skb(skb);
-			}
-		}
-	} while (wait);
-}
-#endif
-
 static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 {
 	struct sk_buff *skb;
@@ -1459,7 +1425,7 @@ static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 		 * splitted a fat GRO packet, while we released socket lock
 		 * in skb_splice_bits()
 		 */
-		sk_eat_skb(sk, skb, false);
+		sk_eat_skb(sk, skb);
 	}
 	return NULL;
 }
@@ -1525,11 +1491,11 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 				continue;
 		}
 		if (tcp_hdr(skb)->fin) {
-			sk_eat_skb(sk, skb, false);
+			sk_eat_skb(sk, skb);
 			++seq;
 			break;
 		}
-		sk_eat_skb(sk, skb, false);
+		sk_eat_skb(sk, skb);
 		if (!desc->count)
 			break;
 		tp->copied_seq = seq;
@@ -1567,7 +1533,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int target;		/* Read at least this many bytes */
 	long timeo;
 	struct task_struct *user_recv = NULL;
-	bool copied_early = false;
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
@@ -1610,28 +1575,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 
-#ifdef CONFIG_NET_DMA
-	tp->ucopy.dma_chan = NULL;
-	preempt_disable();
-	skb = skb_peek_tail(&sk->sk_receive_queue);
-	{
-		int available = 0;
-
-		if (skb)
-			available = TCP_SKB_CB(skb)->seq + skb->len - (*seq);
-		if ((available < target) &&
-		    (len > sysctl_tcp_dma_copybreak) && !(flags & MSG_PEEK) &&
-		    !sysctl_tcp_low_latency &&
-		    net_dma_find_channel()) {
-			preempt_enable_no_resched();
-			tp->ucopy.pinned_list =
-					dma_pin_iovec_pages(msg->msg_iov, len);
-		} else {
-			preempt_enable_no_resched();
-		}
-	}
-#endif
-
 	do {
 		u32 offset;
 
@@ -1762,16 +1705,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			/* __ Set realtime policy in scheduler __ */
 		}
 
-#ifdef CONFIG_NET_DMA
-		if (tp->ucopy.dma_chan) {
-			if (tp->rcv_wnd == 0 &&
-			    !skb_queue_empty(&sk->sk_async_wait_queue)) {
-				tcp_service_net_dma(sk, true);
-				tcp_cleanup_rbuf(sk, copied);
-			} else
-				dma_async_issue_pending(tp->ucopy.dma_chan);
-		}
-#endif
 		if (copied >= target) {
 			/* Do not sleep, just process backlog. */
 			release_sock(sk);
@@ -1779,11 +1712,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		} else
 			sk_wait_data(sk, &timeo);
 
-#ifdef CONFIG_NET_DMA
-		tcp_service_net_dma(sk, false);  /* Don't block */
-		tp->ucopy.wakeup = 0;
-#endif
-
 		if (user_recv) {
 			int chunk;
 
@@ -1841,43 +1769,13 @@ do_prequeue:
 		}
 
 		if (!(flags & MSG_TRUNC)) {
-#ifdef CONFIG_NET_DMA
-			if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
-				tp->ucopy.dma_chan = net_dma_find_channel();
-
-			if (tp->ucopy.dma_chan) {
-				tp->ucopy.dma_cookie = dma_skb_copy_datagram_iovec(
-					tp->ucopy.dma_chan, skb, offset,
-					msg->msg_iov, used,
-					tp->ucopy.pinned_list);
-
-				if (tp->ucopy.dma_cookie < 0) {
-
-					pr_alert("%s: dma_cookie < 0\n",
-						 __func__);
-
-					/* Exception. Bailout! */
-					if (!copied)
-						copied = -EFAULT;
-					break;
-				}
-
-				dma_async_issue_pending(tp->ucopy.dma_chan);
-
-				if ((offset + used) == skb->len)
-					copied_early = true;
-
-			} else
-#endif
-			{
-				err = skb_copy_datagram_iovec(skb, offset,
-						msg->msg_iov, used);
-				if (err) {
-					/* Exception. Bailout! */
-					if (!copied)
-						copied = -EFAULT;
-					break;
-				}
+			err = skb_copy_datagram_iovec(skb, offset,
+						      msg->msg_iov, used);
+			if (err) {
+				/* Exception. Bailout! */
+				if (!copied)
+					copied = -EFAULT;
+				break;
 			}
 		}
 
@@ -1897,19 +1795,15 @@ skip_copy:
 
 		if (tcp_hdr(skb)->fin)
 			goto found_fin_ok;
-		if (!(flags & MSG_PEEK)) {
-			sk_eat_skb(sk, skb, copied_early);
-			copied_early = false;
-		}
+		if (!(flags & MSG_PEEK))
+			sk_eat_skb(sk, skb);
 		continue;
 
 	found_fin_ok:
 		/* Process the FIN. */
 		++*seq;
-		if (!(flags & MSG_PEEK)) {
-			sk_eat_skb(sk, skb, copied_early);
-			copied_early = false;
-		}
+		if (!(flags & MSG_PEEK))
+			sk_eat_skb(sk, skb);
 		break;
 	} while (len > 0);
 
@@ -1932,16 +1826,6 @@ skip_copy:
 		tp->ucopy.len = 0;
 	}
 
-#ifdef CONFIG_NET_DMA
-	tcp_service_net_dma(sk, true);  /* Wait for queue to drain */
-	tp->ucopy.dma_chan = NULL;
-
-	if (tp->ucopy.pinned_list) {
-		dma_unpin_iovec_pages(tp->ucopy.pinned_list);
-		tp->ucopy.pinned_list = NULL;
-	}
-#endif
-
 	/* According to UNIX98, msg_name/msg_namelen are ignored
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
@@ -2285,9 +2169,6 @@ int tcp_disconnect(struct sock *sk, int flags)
 	__skb_queue_purge(&sk->sk_receive_queue);
 	tcp_write_queue_purge(sk);
 	__skb_queue_purge(&tp->out_of_order_queue);
-#ifdef CONFIG_NET_DMA
-	__skb_queue_purge(&sk->sk_async_wait_queue);
-#endif
 
 	inet->inet_dport = 0;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c53b7f35c51d..33ef18e550c5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -73,7 +73,6 @@
 #include <net/inet_common.h>
 #include <linux/ipsec.h>
 #include <asm/unaligned.h>
-#include <net/netdma.h>
 
 int sysctl_tcp_timestamps __read_mostly = 1;
 int sysctl_tcp_window_scaling __read_mostly = 1;
@@ -4967,53 +4966,6 @@ static inline bool tcp_checksum_complete_user(struct sock *sk,
 	       __tcp_checksum_complete_user(sk, skb);
 }
 
-#ifdef CONFIG_NET_DMA
-static bool tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb,
-				  int hlen)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-	int chunk = skb->len - hlen;
-	int dma_cookie;
-	bool copied_early = false;
-
-	if (tp->ucopy.wakeup)
-		return false;
-
-	if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
-		tp->ucopy.dma_chan = net_dma_find_channel();
-
-	if (tp->ucopy.dma_chan && skb_csum_unnecessary(skb)) {
-
-		dma_cookie = dma_skb_copy_datagram_iovec(tp->ucopy.dma_chan,
-							 skb, hlen,
-							 tp->ucopy.iov, chunk,
-							 tp->ucopy.pinned_list);
-
-		if (dma_cookie < 0)
-			goto out;
-
-		tp->ucopy.dma_cookie = dma_cookie;
-		copied_early = true;
-
-		tp->ucopy.len -= chunk;
-		tp->copied_seq += chunk;
-		tcp_rcv_space_adjust(sk);
-
-		if ((tp->ucopy.len == 0) ||
-		    (tcp_flag_word(tcp_hdr(skb)) & TCP_FLAG_PSH) ||
-		    (atomic_read(&sk->sk_rmem_alloc) > (sk->sk_rcvbuf >> 1))) {
-			tp->ucopy.wakeup = 1;
-			sk->sk_data_ready(sk, 0);
-		}
-	} else if (chunk > 0) {
-		tp->ucopy.wakeup = 1;
-		sk->sk_data_ready(sk, 0);
-	}
-out:
-	return copied_early;
-}
-#endif /* CONFIG_NET_DMA */
-
 /* Does PAWS and seqno based validation of an incoming segment, flags will
  * play significant role here.
  */
@@ -5198,14 +5150,6 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 
 			if (tp->copied_seq == tp->rcv_nxt &&
 			    len - tcp_header_len <= tp->ucopy.len) {
-#ifdef CONFIG_NET_DMA
-				if (tp->ucopy.task == current &&
-				    sock_owned_by_user(sk) &&
-				    tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
-					copied_early = 1;
-					eaten = 1;
-				}
-#endif
 				if (tp->ucopy.task == current &&
 				    sock_owned_by_user(sk) && !copied_early) {
 					__set_current_state(TASK_RUNNING);
@@ -5271,11 +5215,6 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			if (!copied_early || tp->rcv_nxt != tp->rcv_wup)
 				__tcp_ack_snd_check(sk, 0);
 no_ack:
-#ifdef CONFIG_NET_DMA
-			if (copied_early)
-				__skb_queue_tail(&sk->sk_async_wait_queue, skb);
-			else
-#endif
 			if (eaten)
 				kfree_skb_partial(skb, fragstolen);
 			sk->sk_data_ready(sk, 0);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 59a6f8b90cd9..dc92ba9d0350 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -72,7 +72,6 @@
 #include <net/inet_common.h>
 #include <net/timewait_sock.h>
 #include <net/xfrm.h>
-#include <net/netdma.h>
 #include <net/secure_seq.h>
 #include <net/tcp_memcontrol.h>
 #include <net/busy_poll.h>
@@ -2000,18 +1999,8 @@ process:
 	bh_lock_sock_nested(sk);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
-#ifdef CONFIG_NET_DMA
-		struct tcp_sock *tp = tcp_sk(sk);
-		if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
-			tp->ucopy.dma_chan = net_dma_find_channel();
-		if (tp->ucopy.dma_chan)
+		if (!tcp_prequeue(sk, skb))
 			ret = tcp_v4_do_rcv(sk, skb);
-		else
-#endif
-		{
-			if (!tcp_prequeue(sk, skb))
-				ret = tcp_v4_do_rcv(sk, skb);
-		}
 	} else if (unlikely(sk_add_backlog(sk, skb,
 					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
 		bh_unlock_sock(sk);
@@ -2170,11 +2159,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
 	}
 #endif
 
-#ifdef CONFIG_NET_DMA
-	/* Cleans up our sk_async_wait_queue */
-	__skb_queue_purge(&sk->sk_async_wait_queue);
-#endif
-
 	/* Clean prequeue, it must be empty really */
 	__skb_queue_purge(&tp->ucopy.prequeue);
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0740f93a114a..e27972590379 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -59,7 +59,6 @@
 #include <net/snmp.h>
 #include <net/dsfield.h>
 #include <net/timewait_sock.h>
-#include <net/netdma.h>
 #include <net/inet_common.h>
 #include <net/secure_seq.h>
 #include <net/tcp_memcontrol.h>
@@ -1504,18 +1503,8 @@ process:
 	bh_lock_sock_nested(sk);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
-#ifdef CONFIG_NET_DMA
-		struct tcp_sock *tp = tcp_sk(sk);
-		if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
-			tp->ucopy.dma_chan = net_dma_find_channel();
-		if (tp->ucopy.dma_chan)
+		if (!tcp_prequeue(sk, skb))
 			ret = tcp_v6_do_rcv(sk, skb);
-		else
-#endif
-		{
-			if (!tcp_prequeue(sk, skb))
-				ret = tcp_v6_do_rcv(sk, skb);
-		}
 	} else if (unlikely(sk_add_backlog(sk, skb,
 					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
 		bh_unlock_sock(sk);
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 7b01b9f5846c..e1b46709f8d6 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -838,7 +838,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
 
 		if (!(flags & MSG_PEEK)) {
 			spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
-			sk_eat_skb(sk, skb, false);
+			sk_eat_skb(sk, skb);
 			spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
 			*seq = 0;
 		}
@@ -860,10 +860,10 @@ copy_uaddr:
 		llc_cmsg_rcv(msg, skb);
 
 	if (!(flags & MSG_PEEK)) {
-			spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
-			sk_eat_skb(sk, skb, false);
-			spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
-			*seq = 0;
+		spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
+		sk_eat_skb(sk, skb);
+		spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
+		*seq = 0;
 	}
 
 	goto out;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] net_dma: revert 'copied_early'
  2014-01-09 20:12 [PATCH v2 0/4] net_dma removal, and dma debug extension Dan Williams
  2014-01-09 20:16 ` [PATCH v2 1/4] net_dma: simple removal Dan Williams
@ 2014-01-09 20:16 ` Dan Williams
  2014-01-09 20:16 ` [PATCH v2 3/4] net: make tcp_cleanup_rbuf private Dan Williams
  2014-01-09 20:17 ` [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle() Dan Williams
  3 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2014-01-09 20:16 UTC (permalink / raw)
  To: dmaengine
  Cc: Hideaki YOSHIFUJI, netdev, Ali Saidi, James Morris,
	David S. Miller, Eric Dumazet, Alexey Kuznetsov, Dave Jones,
	Patrick McHardy, linux-kernel

Now that tcp_dma_try_early_copy() is gone nothing ever sets
copied_early.

Also reverts "53240c208776 tcp: Fix possible double-ack w/ user dma"
since it is no longer necessary.

Cc: Ali Saidi <saidi@engin.umich.edu>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
New in v2.

 net/ipv4/tcp_input.c |   22 ++++++++--------------
 1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 33ef18e550c5..15911a280485 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5145,19 +5145,15 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			}
 		} else {
 			int eaten = 0;
-			int copied_early = 0;
 			bool fragstolen = false;
 
-			if (tp->copied_seq == tp->rcv_nxt &&
-			    len - tcp_header_len <= tp->ucopy.len) {
-				if (tp->ucopy.task == current &&
-				    sock_owned_by_user(sk) && !copied_early) {
-					__set_current_state(TASK_RUNNING);
+			if (tp->ucopy.task == current &&
+			    tp->copied_seq == tp->rcv_nxt &&
+			    len - tcp_header_len <= tp->ucopy.len &&
+			    sock_owned_by_user(sk)) {
+				__set_current_state(TASK_RUNNING);
 
-					if (!tcp_copy_to_iovec(sk, skb, tcp_header_len))
-						eaten = 1;
-				}
-				if (eaten) {
+				if (!tcp_copy_to_iovec(sk, skb, tcp_header_len)) {
 					/* Predicted packet is in window by definition.
 					 * seq == rcv_nxt and rcv_wup <= rcv_nxt.
 					 * Hence, check seq<=rcv_wup reduces to:
@@ -5173,9 +5169,8 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 					__skb_pull(skb, tcp_header_len);
 					tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
 					NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);
+					eaten = 1;
 				}
-				if (copied_early)
-					tcp_cleanup_rbuf(sk, skb->len);
 			}
 			if (!eaten) {
 				if (tcp_checksum_complete_user(sk, skb))
@@ -5212,8 +5207,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 					goto no_ack;
 			}
 
-			if (!copied_early || tp->rcv_nxt != tp->rcv_wup)
-				__tcp_ack_snd_check(sk, 0);
+			__tcp_ack_snd_check(sk, 0);
 no_ack:
 			if (eaten)
 				kfree_skb_partial(skb, fragstolen);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] net: make tcp_cleanup_rbuf private
  2014-01-09 20:12 [PATCH v2 0/4] net_dma removal, and dma debug extension Dan Williams
  2014-01-09 20:16 ` [PATCH v2 1/4] net_dma: simple removal Dan Williams
  2014-01-09 20:16 ` [PATCH v2 2/4] net_dma: revert 'copied_early' Dan Williams
@ 2014-01-09 20:16 ` Dan Williams
  2014-01-09 20:26   ` Neal Cardwell
  2014-01-09 20:17 ` [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle() Dan Williams
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2014-01-09 20:16 UTC (permalink / raw)
  To: dmaengine
  Cc: Hideaki YOSHIFUJI, netdev, James Morris, David S. Miller,
	Alexey Kuznetsov, Patrick McHardy, linux-kernel

net_dma was the only external user so this can become local to tcp.c
again.

Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
New in v2
 include/net/tcp.h |    1 -
 net/ipv4/tcp.c    |   10 +++++-----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 084c163e9d40..571036b3bead 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -370,7 +370,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			 const struct tcphdr *th, unsigned int len);
 void tcp_rcv_space_adjust(struct sock *sk);
-void tcp_cleanup_rbuf(struct sock *sk, int copied);
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp);
 void tcp_twsk_destructor(struct sock *sk);
 ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8dc913dfbaef..70c905a7d2b8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1332,7 +1332,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
  * calculation of whether or not we must ACK for the sake of
  * a window update.
  */
-void tcp_cleanup_rbuf(struct sock *sk, int copied)
+static void cleanup_rbuf(struct sock *sk, int copied)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	bool time_to_ack = false;
@@ -1507,7 +1507,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 	/* Clean up data we have read: This will do ACK frames. */
 	if (copied > 0) {
 		tcp_recv_skb(sk, seq, &offset);
-		tcp_cleanup_rbuf(sk, copied);
+		cleanup_rbuf(sk, copied);
 	}
 	return copied;
 }
@@ -1658,7 +1658,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			}
 		}
 
-		tcp_cleanup_rbuf(sk, copied);
+		cleanup_rbuf(sk, copied);
 
 		if (!sysctl_tcp_low_latency && tp->ucopy.task == user_recv) {
 			/* Install new reader */
@@ -1831,7 +1831,7 @@ skip_copy:
 	 */
 
 	/* Clean up data we have read: This will do ACK frames. */
-	tcp_cleanup_rbuf(sk, copied);
+	cleanup_rbuf(sk, copied);
 
 	release_sock(sk);
 	return copied;
@@ -2494,7 +2494,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 			    (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT) &&
 			    inet_csk_ack_scheduled(sk)) {
 				icsk->icsk_ack.pending |= ICSK_ACK_PUSHED;
-				tcp_cleanup_rbuf(sk, 1);
+				cleanup_rbuf(sk, 1);
 				if (!(val & 1))
 					icsk->icsk_ack.pingpong = 1;
 			}

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle()
  2014-01-09 20:12 [PATCH v2 0/4] net_dma removal, and dma debug extension Dan Williams
                   ` (2 preceding siblings ...)
  2014-01-09 20:16 ` [PATCH v2 3/4] net: make tcp_cleanup_rbuf private Dan Williams
@ 2014-01-09 20:17 ` Dan Williams
  2014-01-10  0:38   ` Andrew Morton
  3 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2014-01-09 20:17 UTC (permalink / raw)
  To: dmaengine
  Cc: Vinod Koul, netdev, Joerg Roedel, linux-kernel, James Bottomley,
	Russell King, Andrew Morton

Record actively mapped pages and provide an api for asserting a given
page is dma inactive before execution proceeds.  Placing
debug_dma_assert_idle() in cow_user_page() flagged the violation of the
dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma:
mark broken").

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Added CONFIG_DMA_VS_CPU_DEBUG

 include/linux/dma-debug.h |    6 ++
 lib/Kconfig.debug         |   13 ++++-
 lib/dma-debug.c           |  130 +++++++++++++++++++++++++++++++++++++++++----
 mm/memory.c               |    3 +
 4 files changed, 138 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index fc0e34ce038f..27a029fb2ead 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -185,4 +185,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
 
 #endif /* CONFIG_DMA_API_DEBUG */
 
+#ifdef CONFIG_DMA_VS_CPU_DEBUG
+extern void debug_dma_assert_idle(struct page *page);
+#else
+static inline void debug_dma_assert_idle(struct page *page) { }
+#endif /* CONFIG_DMA_VS_CPU_DEBUG  */
+
 #endif /* __DMA_DEBUG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index db25707aa41b..99751c47fa87 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1575,9 +1575,20 @@ config DMA_API_DEBUG
 	  With this option you will be able to detect common bugs in device
 	  drivers like double-freeing of DMA mappings or freeing mappings that
 	  were never allocated.
-	  This option causes a performance degredation.  Use only if you want
+	  This option causes a performance degradation.  Use only if you want
 	  to debug device drivers. If unsure, say N.
 
+config DMA_VS_CPU_DEBUG
+	bool "Enable debugging CPU vs DMA ownership of pages"
+	depends on DMA_API_DEBUG
+	help
+	  Enable this option to attempt to catch cases where a page owned by
+	  DMA is being touched by the cpu in a way that could cause data
+	  corruption.  For example, this enables cow_user_page() to check
+	  that the source page is not undergoing DMA.  This option
+	  further increases the overhead of DMA_API_DEBUG.  Use only if
+	  debugging device drivers.  If unsure, say N.
+
 source "samples/Kconfig"
 
 source "lib/Kconfig.kgdb"
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index d87a17a819d0..f67ae111cd2f 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -57,7 +57,8 @@ struct dma_debug_entry {
 	struct list_head list;
 	struct device    *dev;
 	int              type;
-	phys_addr_t      paddr;
+	unsigned long	 pfn;
+	size_t		 offset;
 	u64              dev_addr;
 	u64              size;
 	int              direction;
@@ -372,6 +373,11 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
 	list_del(&entry->list);
 }
 
+static unsigned long long phys_addr(struct dma_debug_entry *entry)
+{
+	return page_to_phys(pfn_to_page(entry->pfn)) + entry->offset;
+}
+
 /*
  * Dump mapping entries for debugging purposes
  */
@@ -389,9 +395,9 @@ void debug_dma_dump_mappings(struct device *dev)
 		list_for_each_entry(entry, &bucket->list, list) {
 			if (!dev || dev == entry->dev) {
 				dev_info(entry->dev,
-					 "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
+					 "%s idx %d P=%Lx N=%lx D=%Lx L=%Lx %s %s\n",
 					 type2name[entry->type], idx,
-					 (unsigned long long)entry->paddr,
+					 phys_addr(entry), entry->pfn,
 					 entry->dev_addr, entry->size,
 					 dir2name[entry->direction],
 					 maperr2str[entry->map_err_type]);
@@ -403,6 +409,83 @@ void debug_dma_dump_mappings(struct device *dev)
 }
 EXPORT_SYMBOL(debug_dma_dump_mappings);
 
+/* memory usage is constrained by the maximum number of available
+ * dma-debug entries
+ */
+static RADIX_TREE(dma_active_pfn, GFP_NOWAIT);
+static DEFINE_SPINLOCK(radix_lock);
+
+static void __active_pfn_inc_overlap(struct dma_debug_entry *entry)
+{
+	unsigned long pfn = entry->pfn;
+	int i;
+
+	for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
+		if (radix_tree_tag_get(&dma_active_pfn, pfn, i) == 0) {
+			radix_tree_tag_set(&dma_active_pfn, pfn, i);
+			return;
+		}
+	pr_debug("DMA-API: max overlap count (%d) reached for pfn 0x%lx\n",
+		 RADIX_TREE_MAX_TAGS, pfn);
+}
+
+static void __active_pfn_dec_overlap(struct dma_debug_entry *entry)
+{
+	unsigned long pfn = entry->pfn;
+	int i;
+
+	for (i = RADIX_TREE_MAX_TAGS - 1; i >= 0; i--)
+		if (radix_tree_tag_get(&dma_active_pfn, pfn, i)) {
+			radix_tree_tag_clear(&dma_active_pfn, pfn, i);
+			return;
+		}
+	radix_tree_delete(&dma_active_pfn, pfn);
+}
+
+static int active_pfn_insert(struct dma_debug_entry *entry)
+{
+	unsigned long flags;
+	int rc;
+
+	spin_lock_irqsave(&radix_lock, flags);
+	rc = radix_tree_insert(&dma_active_pfn, entry->pfn, entry);
+	if (rc == -EEXIST)
+		__active_pfn_inc_overlap(entry);
+	spin_unlock_irqrestore(&radix_lock, flags);
+
+	return rc;
+}
+
+static void active_pfn_remove(struct dma_debug_entry *entry)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&radix_lock, flags);
+	__active_pfn_dec_overlap(entry);
+	spin_unlock_irqrestore(&radix_lock, flags);
+}
+
+void debug_dma_assert_idle(struct page *page)
+{
+	unsigned long flags;
+	struct dma_debug_entry *entry;
+
+	if (!page)
+		return;
+
+	spin_lock_irqsave(&radix_lock, flags);
+	entry = radix_tree_lookup(&dma_active_pfn, page_to_pfn(page));
+	spin_unlock_irqrestore(&radix_lock, flags);
+
+	if (!entry)
+		return;
+
+	err_printk(entry->dev, entry,
+		   "DMA-API: cpu touching an active dma mapped page "
+		   "[pfn=0x%lx]\n", entry->pfn);
+}
+EXPORT_SYMBOL_GPL(debug_dma_assert_idle);
+
 /*
  * Wrapper function for adding an entry to the hash.
  * This function takes care of locking itself.
@@ -411,10 +494,22 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 {
 	struct hash_bucket *bucket;
 	unsigned long flags;
+	int rc;
 
 	bucket = get_hash_bucket(entry, &flags);
 	hash_bucket_add(bucket, entry);
 	put_hash_bucket(bucket, &flags);
+
+	rc = active_pfn_insert(entry);
+	if (rc == -ENOMEM) {
+		pr_err("DMA-API: pfn tracking out of memory - "
+		       "disabling dma-debug\n");
+		global_disable = true;
+	}
+
+	/* TODO: report -EEXIST errors as overlapping mappings are not
+	 * supported by the DMA API
+	 */
 }
 
 static struct dma_debug_entry *__dma_entry_alloc(void)
@@ -469,6 +564,8 @@ static void dma_entry_free(struct dma_debug_entry *entry)
 {
 	unsigned long flags;
 
+	active_pfn_remove(entry);
+
 	/*
 	 * add to beginning of the list - this way the entries are
 	 * more likely cache hot when they are reallocated.
@@ -895,15 +992,15 @@ static void check_unmap(struct dma_debug_entry *ref)
 			   ref->dev_addr, ref->size,
 			   type2name[entry->type], type2name[ref->type]);
 	} else if ((entry->type == dma_debug_coherent) &&
-		   (ref->paddr != entry->paddr)) {
+		   (phys_addr(ref) != phys_addr(entry))) {
 		err_printk(ref->dev, entry, "DMA-API: device driver frees "
 			   "DMA memory with different CPU address "
 			   "[device address=0x%016llx] [size=%llu bytes] "
 			   "[cpu alloc address=0x%016llx] "
 			   "[cpu free address=0x%016llx]",
 			   ref->dev_addr, ref->size,
-			   (unsigned long long)entry->paddr,
-			   (unsigned long long)ref->paddr);
+			   phys_addr(entry),
+			   phys_addr(ref));
 	}
 
 	if (ref->sg_call_ents && ref->type == dma_debug_sg &&
@@ -1052,7 +1149,8 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
 
 	entry->dev       = dev;
 	entry->type      = dma_debug_page;
-	entry->paddr     = page_to_phys(page) + offset;
+	entry->pfn	 = page_to_pfn(page);
+	entry->offset	 = offset,
 	entry->dev_addr  = dma_addr;
 	entry->size      = size;
 	entry->direction = direction;
@@ -1148,7 +1246,8 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
 
 		entry->type           = dma_debug_sg;
 		entry->dev            = dev;
-		entry->paddr          = sg_phys(s);
+		entry->pfn	      = page_to_pfn(sg_page(s));
+		entry->offset	      = s->offset,
 		entry->size           = sg_dma_len(s);
 		entry->dev_addr       = sg_dma_address(s);
 		entry->direction      = direction;
@@ -1198,7 +1297,8 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
 		struct dma_debug_entry ref = {
 			.type           = dma_debug_sg,
 			.dev            = dev,
-			.paddr          = sg_phys(s),
+			.pfn		= page_to_pfn(sg_page(s)),
+			.offset		= s->offset,
 			.dev_addr       = sg_dma_address(s),
 			.size           = sg_dma_len(s),
 			.direction      = dir,
@@ -1233,7 +1333,8 @@ void debug_dma_alloc_coherent(struct device *dev, size_t size,
 
 	entry->type      = dma_debug_coherent;
 	entry->dev       = dev;
-	entry->paddr     = virt_to_phys(virt);
+	entry->pfn	 = page_to_pfn(virt_to_page(virt));
+	entry->offset	 = (size_t) virt & PAGE_MASK;
 	entry->size      = size;
 	entry->dev_addr  = dma_addr;
 	entry->direction = DMA_BIDIRECTIONAL;
@@ -1248,7 +1349,8 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
 	struct dma_debug_entry ref = {
 		.type           = dma_debug_coherent,
 		.dev            = dev,
-		.paddr          = virt_to_phys(virt),
+		.pfn		= page_to_pfn(virt_to_page(virt)),
+		.offset		= (size_t) virt & PAGE_MASK,
 		.dev_addr       = addr,
 		.size           = size,
 		.direction      = DMA_BIDIRECTIONAL,
@@ -1356,7 +1458,8 @@ void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 		struct dma_debug_entry ref = {
 			.type           = dma_debug_sg,
 			.dev            = dev,
-			.paddr          = sg_phys(s),
+			.pfn		= page_to_pfn(sg_page(s)),
+			.offset		= s->offset,
 			.dev_addr       = sg_dma_address(s),
 			.size           = sg_dma_len(s),
 			.direction      = direction,
@@ -1388,7 +1491,8 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 		struct dma_debug_entry ref = {
 			.type           = dma_debug_sg,
 			.dev            = dev,
-			.paddr          = sg_phys(s),
+			.pfn		= page_to_pfn(sg_page(s)),
+			.offset		= s->offset,
 			.dev_addr       = sg_dma_address(s),
 			.size           = sg_dma_len(s),
 			.direction      = direction,
diff --git a/mm/memory.c b/mm/memory.c
index 5d9025f3b3e1..c89788436f81 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -59,6 +59,7 @@
 #include <linux/gfp.h>
 #include <linux/migrate.h>
 #include <linux/string.h>
+#include <linux/dma-debug.h>
 
 #include <asm/io.h>
 #include <asm/pgalloc.h>
@@ -2559,6 +2560,8 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
 
 static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
 {
+	debug_dma_assert_idle(src);
+
 	/*
 	 * If the source page was a PFN mapping, we don't have
 	 * a "struct page" for it. We do a best-effort copy by

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private
  2014-01-09 20:16 ` [PATCH v2 3/4] net: make tcp_cleanup_rbuf private Dan Williams
@ 2014-01-09 20:26   ` Neal Cardwell
  2014-01-09 20:33     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Neal Cardwell @ 2014-01-09 20:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: dmaengine, Hideaki YOSHIFUJI, Netdev, James Morris,
	David S. Miller, Alexey Kuznetsov, Patrick McHardy, LKML

On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> net_dma was the only external user so this can become local to tcp.c
> again.
...
> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
> +static void cleanup_rbuf(struct sock *sk, int copied)

I would vote to keep the tcp_ prefix. In the TCP code base that is the
more common idiom, even for internal/static TCP functions, and
personally I find it easier to read and work with in stack traces,
etc. My 2 cents.

neal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private
  2014-01-09 20:26   ` Neal Cardwell
@ 2014-01-09 20:33     ` Dan Williams
  2014-01-09 20:42       ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2014-01-09 20:33 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: dmaengine@vger.kernel.org, Hideaki YOSHIFUJI, Netdev,
	James Morris, David S. Miller, Alexey Kuznetsov, Patrick McHardy,
	LKML

On Thu, Jan 9, 2014 at 12:26 PM, Neal Cardwell <ncardwell@google.com> wrote:
> On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>> net_dma was the only external user so this can become local to tcp.c
>> again.
> ...
>> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
>> +static void cleanup_rbuf(struct sock *sk, int copied)
>
> I would vote to keep the tcp_ prefix. In the TCP code base that is the
> more common idiom, even for internal/static TCP functions, and
> personally I find it easier to read and work with in stack traces,
> etc. My 2 cents.
>

Ok.  It was cleanup_rbuf() in a former life, but one vote for leaving
the name as is is enough for me.

--
Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private
  2014-01-09 20:33     ` Dan Williams
@ 2014-01-09 20:42       ` David Miller
  2014-01-10 10:38         ` David Laight
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2014-01-09 20:42 UTC (permalink / raw)
  To: dan.j.williams
  Cc: ncardwell, dmaengine, yoshfuji, netdev, jmorris, kuznet, kaber,
	linux-kernel

From: Dan Williams <dan.j.williams@intel.com>
Date: Thu, 9 Jan 2014 12:33:29 -0800

> On Thu, Jan 9, 2014 at 12:26 PM, Neal Cardwell <ncardwell@google.com> wrote:
>> On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> net_dma was the only external user so this can become local to tcp.c
>>> again.
>> ...
>>> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
>>> +static void cleanup_rbuf(struct sock *sk, int copied)
>>
>> I would vote to keep the tcp_ prefix. In the TCP code base that is the
>> more common idiom, even for internal/static TCP functions, and
>> personally I find it easier to read and work with in stack traces,
>> etc. My 2 cents.
>>
> 
> Ok.  It was cleanup_rbuf() in a former life, but one vote for leaving
> the name as is is enough for me.

You can make that two votes :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle()
  2014-01-09 20:17 ` [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle() Dan Williams
@ 2014-01-10  0:38   ` Andrew Morton
  2014-01-11  2:35     ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2014-01-10  0:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: dmaengine, Vinod Koul, netdev, Joerg Roedel, linux-kernel,
	James Bottomley, Russell King

On Thu, 09 Jan 2014 12:17:26 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> Record actively mapped pages and provide an api for asserting a given
> page is dma inactive before execution proceeds.  Placing
> debug_dma_assert_idle() in cow_user_page() flagged the violation of the
> dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma:
> mark broken").
> 
> --- a/include/linux/dma-debug.h
> +++ b/include/linux/dma-debug.h
> @@ -185,4 +185,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
>  
>  #endif /* CONFIG_DMA_API_DEBUG */
>  
> +#ifdef CONFIG_DMA_VS_CPU_DEBUG
> +extern void debug_dma_assert_idle(struct page *page);
> +#else
> +static inline void debug_dma_assert_idle(struct page *page) { }

Surely this breaks the build when CONFIG_DMA_VS_CPU_DEBUG=n? 
lib/dma-debug.c is missing the necessary "#ifdef
CONFIG_DMA_VS_CPU_DEBUG"s.

Do we really need this config setting anyway?  What goes bad if we
permanently enable this subfeature when dma debugging is enabled?

>
> ...
>
> index d87a17a819d0..f67ae111cd2f 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -57,7 +57,8 @@ struct dma_debug_entry {
>  	struct list_head list;
>  	struct device    *dev;
>  	int              type;
> -	phys_addr_t      paddr;
> +	unsigned long	 pfn;
> +	size_t		 offset;

Some documentation for the fields would be nice.  offset of what
relative to what, in what units?

>  	u64              dev_addr;
>  	u64              size;
>  	int              direction;
> @@ -372,6 +373,11 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
>  	list_del(&entry->list);
>  }
>  
>
> ...
>
>  
> +/* memory usage is constrained by the maximum number of available
> + * dma-debug entries
> + */

A brief design overview would be useful.  What goes in tree, how is it
indexed, when and why do we add/remove/test items, etc.

> +static RADIX_TREE(dma_active_pfn, GFP_NOWAIT);
> +static DEFINE_SPINLOCK(radix_lock);
> +
> +static void __active_pfn_inc_overlap(struct dma_debug_entry *entry)
> +{
> +	unsigned long pfn = entry->pfn;
> +	int i;
> +
> +	for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
> +		if (radix_tree_tag_get(&dma_active_pfn, pfn, i) == 0) {
> +			radix_tree_tag_set(&dma_active_pfn, pfn, i);
> +			return;
> +		}
> +	pr_debug("DMA-API: max overlap count (%d) reached for pfn 0x%lx\n",
> +		 RADIX_TREE_MAX_TAGS, pfn);
> +}
> +
>
> ...
>
> +void debug_dma_assert_idle(struct page *page)
> +{
> +	unsigned long flags;
> +	struct dma_debug_entry *entry;
> +
> +	if (!page)
> +		return;
> +
> +	spin_lock_irqsave(&radix_lock, flags);
> +	entry = radix_tree_lookup(&dma_active_pfn, page_to_pfn(page));
> +	spin_unlock_irqrestore(&radix_lock, flags);
> +
> +	if (!entry)
> +		return;
> +
> +	err_printk(entry->dev, entry,
> +		   "DMA-API: cpu touching an active dma mapped page "
> +		   "[pfn=0x%lx]\n", entry->pfn);
> +}
> +EXPORT_SYMBOL_GPL(debug_dma_assert_idle);

The export isn't needed for mm/memory.c

>
> ...
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private
  2014-01-09 20:42       ` David Miller
@ 2014-01-10 10:38         ` David Laight
  0 siblings, 0 replies; 11+ messages in thread
From: David Laight @ 2014-01-10 10:38 UTC (permalink / raw)
  To: 'David Miller', dan.j.williams@intel.com
  Cc: ncardwell@google.com, dmaengine@vger.kernel.org,
	yoshfuji@linux-ipv6.org, netdev@vger.kernel.org,
	jmorris@namei.org, kuznet@ms2.inr.ac.ru, kaber@trash.net,
	linux-kernel@vger.kernel.org

From: David Miller
...
> > On Thu, Jan 9, 2014 at 12:26 PM, Neal Cardwell <ncardwell@google.com> wrote:
> >> On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> >>> net_dma was the only external user so this can become local to tcp.c
> >>> again.
> >> ...
> >>> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
> >>> +static void cleanup_rbuf(struct sock *sk, int copied)
> >>
> >> I would vote to keep the tcp_ prefix. In the TCP code base that is the
> >> more common idiom, even for internal/static TCP functions, and
> >> personally I find it easier to read and work with in stack traces,
> >> etc. My 2 cents.
> >>
> >
> > Ok.  It was cleanup_rbuf() in a former life, but one vote for leaving
> > the name as is is enough for me.
> 
> You can make that two votes :)

I suspect DM adds 2000 votes :-)

Keeping the prefix makes it easier to grep for, and makes it more
obvious that it isn't a generic function (when being called).

	David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle()
  2014-01-10  0:38   ` Andrew Morton
@ 2014-01-11  2:35     ` Dan Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Dan Williams @ 2014-01-11  2:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: dmaengine@vger.kernel.org, Vinod Koul, Netdev, Joerg Roedel,
	linux-kernel@vger.kernel.org, James Bottomley, Russell King

On Thu, Jan 9, 2014 at 4:38 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 09 Jan 2014 12:17:26 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>
>> Record actively mapped pages and provide an api for asserting a given
>> page is dma inactive before execution proceeds.  Placing
>> debug_dma_assert_idle() in cow_user_page() flagged the violation of the
>> dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma:
>> mark broken").
>>
>> --- a/include/linux/dma-debug.h
>> +++ b/include/linux/dma-debug.h
>> @@ -185,4 +185,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
>>
>>  #endif /* CONFIG_DMA_API_DEBUG */
>>
>> +#ifdef CONFIG_DMA_VS_CPU_DEBUG
>> +extern void debug_dma_assert_idle(struct page *page);
>> +#else
>> +static inline void debug_dma_assert_idle(struct page *page) { }
>
> Surely this breaks the build when CONFIG_DMA_VS_CPU_DEBUG=n?
> lib/dma-debug.c is missing the necessary "#ifdef
> CONFIG_DMA_VS_CPU_DEBUG"s.

facepalm

> Do we really need this config setting anyway?  What goes bad if we
> permanently enable this subfeature when dma debugging is enabled?

I did want to provide notification/description of this extra check,
but I'll go ahead and fold it into the DMA_API_DEBUG description.

The only thing that potentially goes bad is no longer having hard
expectation of memory consumption.  Before the patch it's a simple
sizeof(struct dma_debug_entry) * PREALLOC_DMA_DEBUG_ENTRIES, after the
patch it's variable size of the radix tree based on sparseness and
variable based on the number of pages included in each dma_map_sg
call.  The testing with NET_DMA did not involve dma_map_sg calls

>> ...
>>
>> index d87a17a819d0..f67ae111cd2f 100644
>> --- a/lib/dma-debug.c
>> +++ b/lib/dma-debug.c
>> @@ -57,7 +57,8 @@ struct dma_debug_entry {
>>       struct list_head list;
>>       struct device    *dev;
>>       int              type;
>> -     phys_addr_t      paddr;
>> +     unsigned long    pfn;
>> +     size_t           offset;
>
> Some documentation for the fields would be nice.  offset of what
> relative to what, in what units?

This is the same 'offset' passed to dma_map_page(), will document.

>
>>       u64              dev_addr;
>>       u64              size;
>>       int              direction;
>> @@ -372,6 +373,11 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
>>       list_del(&entry->list);
>>  }
>>
>>
>> ...
>>
>>
>> +/* memory usage is constrained by the maximum number of available
>> + * dma-debug entries
>> + */
>
> A brief design overview would be useful.  What goes in tree, how is it
> indexed, when and why do we add/remove/test items, etc.
>

...added this documentation to dma_active_pfn for the next revision.

/*
 * For each page mapped (initial page in the case of
 * dma_alloc_coherent/dma_map_{single|page}, or each page in a
 * scatterlist) insert into this tree using the pfn as the key. At
 * dma_unmap_{single|sg|page} or dma_free_coherent delete the entry.  If
 * the pfn already exists at insertion time add a tag as a reference
 * count for the overlapping mappings.  For now the overlap tracking
 * just ensures that 'unmaps' balance 'maps' before marking the pfn
 * idle, but we should also be flagging overlaps as an API violation.
 *
 * Memory usage is mostly constrained by the maximum number of available
 * dma-debug entries.  In the case of dma_map_{single|page} and
 * dma_alloc_coherent there is only one dma_debug_entry and one pfn to
 * track per each of these calls.  dma_map_sg(), on the other hand,
 * consumes a single dma_debug_entry, but inserts 'nents' entries into
 * the tree.
 *
 * At any time debug_dma_assert_idle() can be called to trigger a
 * warning if the given page is in the active set.
 */


>> +static RADIX_TREE(dma_active_pfn, GFP_NOWAIT);
>> +static DEFINE_SPINLOCK(radix_lock);
>> +
>> +static void __active_pfn_inc_overlap(struct dma_debug_entry *entry)
>> +{
>> +     unsigned long pfn = entry->pfn;
>> +     int i;
>> +
>> +     for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
>> +             if (radix_tree_tag_get(&dma_active_pfn, pfn, i) == 0) {
>> +                     radix_tree_tag_set(&dma_active_pfn, pfn, i);
>> +                     return;
>> +             }
>> +     pr_debug("DMA-API: max overlap count (%d) reached for pfn 0x%lx\n",
>> +              RADIX_TREE_MAX_TAGS, pfn);
>> +}
>> +
>>
>> ...
>>
>> +void debug_dma_assert_idle(struct page *page)
>> +{
>> +     unsigned long flags;
>> +     struct dma_debug_entry *entry;
>> +
>> +     if (!page)
>> +             return;
>> +
>> +     spin_lock_irqsave(&radix_lock, flags);
>> +     entry = radix_tree_lookup(&dma_active_pfn, page_to_pfn(page));
>> +     spin_unlock_irqrestore(&radix_lock, flags);
>> +
>> +     if (!entry)
>> +             return;
>> +
>> +     err_printk(entry->dev, entry,
>> +                "DMA-API: cpu touching an active dma mapped page "
>> +                "[pfn=0x%lx]\n", entry->pfn);
>> +}
>> +EXPORT_SYMBOL_GPL(debug_dma_assert_idle);
>
> The export isn't needed for mm/memory.c

True, it can wait until other call sites arise.

Thanks Andrew.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-01-11  2:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-09 20:12 [PATCH v2 0/4] net_dma removal, and dma debug extension Dan Williams
2014-01-09 20:16 ` [PATCH v2 1/4] net_dma: simple removal Dan Williams
2014-01-09 20:16 ` [PATCH v2 2/4] net_dma: revert 'copied_early' Dan Williams
2014-01-09 20:16 ` [PATCH v2 3/4] net: make tcp_cleanup_rbuf private Dan Williams
2014-01-09 20:26   ` Neal Cardwell
2014-01-09 20:33     ` Dan Williams
2014-01-09 20:42       ` David Miller
2014-01-10 10:38         ` David Laight
2014-01-09 20:17 ` [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle() Dan Williams
2014-01-10  0:38   ` Andrew Morton
2014-01-11  2:35     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).