* [PATCH 00/12] Swap-over-NFS without deadlocking V6
@ 2012-06-20 9:36 Mel Gorman
2012-06-20 9:36 ` [PATCH 01/12] selinux: tag avc cache alloc as non-critical Mel Gorman
2012-06-20 9:36 ` [PATCH 02/12] mm: Methods for teaching filesystems about PG_swapcache pages Mel Gorman
0 siblings, 2 replies; 3+ messages in thread
From: Mel Gorman @ 2012-06-20 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
Mike Christie, Eric B Munson, Mel Gorman
Changelog since V5
o Rebase to v3.5-rc3
Changelog since V4
o Catch if SOCK_MEMALLOC flag is cleared with rmem tokens (davem)
Changelog since V3
o Rebase to 3.4-rc5
o kmap pages for writing to swap (akpm)
o Move forward declaration to reduce chance of duplication (akpm)
Changelog since V2
o Nothing significant, just rebases. A radix tree lookup is replaced with
a linear search would be the biggest rebase artifact
This patch series is based on top of "Swap-over-NBD without deadlocking v12"
as it depends on the same reservation of PF_MEMALLOC reserves logic.
When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it with
swapon. In diskless systems this is not an option so if swap if required
then swapping over the network is considered. The two likely scenarios
are when blade servers are used as part of a cluster where the form factor
or maintenance costs do not allow the use of disks and thin clients.
The Linux Terminal Server Project recommends the use of the Network
Block Device (NBD) for swap but this is not always an option. There is
no guarantee that the network attached storage (NAS) device is running
Linux or supports NBD. However, it is likely that it supports NFS so there
are users that want support for swapping over NFS despite any performance
concern. Some distributions currently carry patches that support swapping
over NFS but it would be preferable to support it in the mainline kernel.
Patch 1 avoids a stream-specific deadlock that potentially affects TCP.
Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
reserves.
Patch 3 adds three helpers for filesystems to handle swap cache pages.
For example, page_file_mapping() returns page->mapping for
file-backed pages and the address_space of the underlying
swap file for swap cache pages.
Patch 4 adds two address_space_operations to allow a filesystem
to pin all metadata relevant to a swapfile in memory. Upon
successful activation, the swapfile is marked SWP_FILE and
the address space operation ->direct_IO is used for writing
and ->readpage for reading in swap pages.
Patch 5 notes that patch 3 is bolting
filesystem-specific-swapfile-support onto the side and that
the default handlers have different information to what
is available to the filesystem. This patch refactors the
code so that there are generic handlers for each of the new
address_space operations.
Patch 6 adds an API to allow a vector of kernel addresses to be
translated to struct pages and pinned for IO.
Patch 7 adds support for using highmem pages for swap by kmapping
the pages before calling the direct_IO handler.
Patch 8 updates NFS to use the helpers from patch 3 where necessary.
Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.
Patch 10 implements the new swapfile-related address_space operations
for NFS and teaches the direct IO handler how to manage
kernel addresses.
Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
where appropriate.
Patch 12 fixes a NULL pointer dereference that occurs when using
swap-over-NFS.
With the patches applied, it is possible to mount a swapfile that is on an
NFS filesystem. Swap performance is not great with a swap stress test taking
roughly twice as long to complete than if the swap device was backed by NBD.
Documentation/filesystems/Locking | 13 ++++
Documentation/filesystems/vfs.txt | 12 +++
fs/nfs/Kconfig | 8 ++
fs/nfs/direct.c | 82 ++++++++++++++-------
fs/nfs/file.c | 28 +++++--
fs/nfs/inode.c | 4 +
fs/nfs/internal.h | 7 +-
fs/nfs/pagelist.c | 4 +-
fs/nfs/read.c | 6 +-
fs/nfs/write.c | 91 ++++++++++++++---------
include/linux/blk_types.h | 2 +
include/linux/fs.h | 8 ++
include/linux/highmem.h | 7 ++
include/linux/mm.h | 29 ++++++++
include/linux/nfs_fs.h | 4 +-
include/linux/pagemap.h | 5 ++
include/linux/sunrpc/xprt.h | 3 +
include/linux/swap.h | 8 ++
mm/highmem.c | 12 +++
mm/memory.c | 52 +++++++++++++
mm/page_io.c | 145 +++++++++++++++++++++++++++++++++++++
mm/swap_state.c | 2 +-
mm/swapfile.c | 141 ++++++++++++++----------------------
net/core/sock.c | 1 +
net/sunrpc/Kconfig | 5 ++
net/sunrpc/clnt.c | 2 +
net/sunrpc/sched.c | 7 +-
net/sunrpc/xprtsock.c | 53 ++++++++++++++
security/selinux/avc.c | 2 +-
29 files changed, 573 insertions(+), 170 deletions(-)
--
1.7.9.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 01/12] selinux: tag avc cache alloc as non-critical
2012-06-20 9:36 [PATCH 00/12] Swap-over-NFS without deadlocking V6 Mel Gorman
@ 2012-06-20 9:36 ` Mel Gorman
2012-06-20 9:36 ` [PATCH 02/12] mm: Methods for teaching filesystems about PG_swapcache pages Mel Gorman
1 sibling, 0 replies; 3+ messages in thread
From: Mel Gorman @ 2012-06-20 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
Mike Christie, Eric B Munson, Mel Gorman
Failing to allocate a cache entry will only harm performance not
correctness. Do not consume valuable reserve pages for something
like that.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Eric Paris <eparis@redhat.com>
---
security/selinux/avc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 68d82da..4d3fab4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -274,7 +274,7 @@ static struct avc_node *avc_alloc_node(void)
{
struct avc_node *node;
- node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
+ node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
if (!node)
goto out;
--
1.7.9.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 02/12] mm: Methods for teaching filesystems about PG_swapcache pages
2012-06-20 9:36 [PATCH 00/12] Swap-over-NFS without deadlocking V6 Mel Gorman
2012-06-20 9:36 ` [PATCH 01/12] selinux: tag avc cache alloc as non-critical Mel Gorman
@ 2012-06-20 9:36 ` Mel Gorman
1 sibling, 0 replies; 3+ messages in thread
From: Mel Gorman @ 2012-06-20 9:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Linux-MM, Linux-Netdev, Linux-NFS, LKML, David Miller,
Trond Myklebust, Neil Brown, Christoph Hellwig, Peter Zijlstra,
Mike Christie, Eric B Munson, Mel Gorman
In order to teach filesystems to handle swap cache pages, three new
page functions are introduced:
pgoff_t page_file_index(struct page *);
loff_t page_file_offset(struct page *);
struct address_space *page_file_mapping(struct page *);
page_file_index() - gives the offset of this page in the file in
PAGE_CACHE_SIZE blocks. Like page->index is for mapped pages, this
function also gives the correct index for PG_swapcache pages.
page_file_offset() - uses page_file_index(), so that it will give
the expected result, even for PG_swapcache pages.
page_file_mapping() - gives the mapping backing the actual page;
that is for swap cache pages it will give swap_file->f_mapping.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/mm.h | 25 +++++++++++++++++++++++++
include/linux/pagemap.h | 5 +++++
mm/swapfile.c | 19 +++++++++++++++++++
3 files changed, 49 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..0c0301c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -805,6 +805,17 @@ static inline void *page_rmapping(struct page *page)
return (void *)((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
}
+extern struct address_space *__page_file_mapping(struct page *);
+
+static inline
+struct address_space *page_file_mapping(struct page *page)
+{
+ if (unlikely(PageSwapCache(page)))
+ return __page_file_mapping(page);
+
+ return page->mapping;
+}
+
static inline int PageAnon(struct page *page)
{
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
@@ -821,6 +832,20 @@ static inline pgoff_t page_index(struct page *page)
return page->index;
}
+extern pgoff_t __page_file_index(struct page *page);
+
+/*
+ * Return the file index of the page. Regular pagecache pages use ->index
+ * whereas swapcache pages use swp_offset(->private)
+ */
+static inline pgoff_t page_file_index(struct page *page)
+{
+ if (unlikely(PageSwapCache(page)))
+ return __page_file_index(page);
+
+ return page->index;
+}
+
/*
* Return true if this page is mapped into pagetables.
*/
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7cfad3b..e42c762 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -286,6 +286,11 @@ static inline loff_t page_offset(struct page *page)
return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
}
+static inline loff_t page_file_offset(struct page *page)
+{
+ return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
+}
+
extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma,
unsigned long address);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index de5bc51..f03a560 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -33,6 +33,7 @@
#include <linux/oom.h>
#include <linux/frontswap.h>
#include <linux/swapfile.h>
+#include <linux/export.h>
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
@@ -2290,6 +2291,24 @@ int swapcache_prepare(swp_entry_t entry)
}
/*
+ * out-of-line __page_file_ methods to avoid include hell.
+ */
+struct address_space *__page_file_mapping(struct page *page)
+{
+ VM_BUG_ON(!PageSwapCache(page));
+ return page_swap_info(page)->swap_file->f_mapping;
+}
+EXPORT_SYMBOL_GPL(__page_file_mapping);
+
+pgoff_t __page_file_index(struct page *page)
+{
+ swp_entry_t swap = { .val = page_private(page) };
+ VM_BUG_ON(!PageSwapCache(page));
+ return swp_offset(swap);
+}
+EXPORT_SYMBOL_GPL(__page_file_index);
+
+/*
* add_swap_count_continuation - called when a swap count is duplicated
* beyond SWAP_MAP_MAX, it allocates a new page and links that to the entry's
* page of the original vmalloc'ed swap_map, to hold the continuation count
--
1.7.9.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-06-20 9:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-20 9:36 [PATCH 00/12] Swap-over-NFS without deadlocking V6 Mel Gorman
2012-06-20 9:36 ` [PATCH 01/12] selinux: tag avc cache alloc as non-critical Mel Gorman
2012-06-20 9:36 ` [PATCH 02/12] mm: Methods for teaching filesystems about PG_swapcache pages Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).