From: Neil Horman <nhorman@tuxdriver.com>
To: linux-nfs@vger.kernel.org
Cc: Neil Horman <nhorman@tuxdriver.com>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
security@kernel.org, Jeff Layton <jlayton@redhat.com>
Subject: [PATCH] nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)
Date: Fri, 4 Mar 2011 19:26:03 -0500 [thread overview]
Message-ID: <1299284763-18186-1-git-send-email-nhorman@tuxdriver.com> (raw)
In-Reply-To: <1299273900.2901.49.camel@heimdal.trondhjem.org>
This oops was reported recently:
<IRQ> [<ffffffff800ca161>] bad_page+0x69/0x91
[<ffffffff8000b8ae>] free_hot_cold_page+0x81/0x144
[<ffffffff8022f28a>] skb_release_data+0x5f/0x98
[<ffffffff80028b88>] __kfree_skb+0x11/0x1a
[<ffffffff800243b8>] tcp_ack+0x6a3/0x1868
[<ffffffff8001bff4>] tcp_rcv_established+0x7a6/0x8b9
[<ffffffff8003b684>] tcp_v4_do_rcv+0x2a/0x2fa
[<ffffffff8002730e>] tcp_v4_rcv+0x9a2/0x9f6
[<ffffffff80099b45>] do_timer+0x2df/0x52c
[<ffffffff8003495c>] ip_local_deliver+0x19d/0x263
[<ffffffff80035ab8>] ip_rcv+0x539/0x57c
[<ffffffff80020ab6>] netif_receive_skb+0x470/0x49f
[<ffffffff883190b5>] :virtio_net:virtnet_poll+0x46b/0x5c5
[<ffffffff8000c979>] net_rx_action+0xac/0x1b3
[<ffffffff80012464>] __do_softirq+0x89/0x133
[<ffffffff8005e2fc>] call_softirq+0x1c/0x28
[<ffffffff8006d5f5>] do_softirq+0x2c/0x7d
[<ffffffff8006d485>] do_IRQ+0xec/0xf5
[<ffffffff8006bdb5>] default_idle+0x0/0x50
[<ffffffff8005d615>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff8006bdde>] default_idle+0x29/0x50
[<ffffffff800492fd>] cpu_idle+0x95/0xb8
[<ffffffff80461807>] start_kernel+0x220/0x225
[<ffffffff8046122f>] _sinittext+0x22f/0x236
It occurs, because an skb with a fraglist was freed from the tcp retransmit
queue when it was acked, but a page on that fraglist had PG_Slab set (indicating
it was allocated from the Slab allocator (which means the free path above can't
safely free it via put_page)
We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages.
__nfs4_proc_set_acl just converts each page in the buffer to a page struct via
virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab
bit is set. We can't create a buffer with kmalloc and free it later in the tcp
ack path with put_page, so we need to either:
1) ensure that when we create the list of pages, no page struct has PG_Slab set
or
2) not use a page list to send this data
Given that these buffers can be multiple pages and arbitrarily sized, I think
(1) is the right way to go. I've written the below patch to allocate a page
from the buddy allocator directly and copy the data over to it. This ensures
that we have a put_page free-able page for every entry that winds up on an skb
frag list, so it can be safely freed when the frame is acked. We do a put page
on each entry after the rpc_call_sync call so as to drop our own reference count
to the page, leaving only the ref count taken by tcp_sendpages. This way the
data will be properly freed when the ack comes in
Successfully tested by myself to solve the above oops.
Note, as this is the result of a setacl operation that exceeded a page of data,
I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm
CCing security on this as well.
Change notes:
v2:
Updated to reflect Tronds requests, specifically that buf_to_pages_noslab just
return the number of pages copied to avoid the extra memset. Also made the copy
operation a bit more efficient and removed one of the virt_to_page calls.
v3:
Fixed nits (brackets around *pages++ where none are needed and duplicate
initalization of acl_pgbase variable)
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
---
fs/nfs/nfs4proc.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 42 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 78936a8..5368c79 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -51,6 +51,7 @@
#include <linux/sunrpc/bc_xprt.h>
#include <linux/xattr.h>
#include <linux/utsname.h>
+#include <linux/mm.h>
#include "nfs4_fs.h"
#include "delegation.h"
@@ -3252,6 +3253,35 @@ static void buf_to_pages(const void *buf, size_t buflen,
}
}
+static int buf_to_pages_noslab(const void *buf, size_t buflen,
+ struct page **pages, unsigned int *pgbase)
+{
+ struct page *newpage, **spages;
+ int rc = 0;
+ size_t len;
+ spages = pages;
+
+ do {
+ len = min(PAGE_CACHE_SIZE, buflen);
+ newpage = alloc_page(GFP_KERNEL);
+
+ if (newpage == NULL)
+ goto unwind;
+ memcpy(page_address(newpage), buf, len);
+ buf += len;
+ buflen -= len;
+ *pages++ = newpage;
+ rc++;
+ } while (buflen != 0);
+
+ return rc;
+
+unwind:
+ for(; rc > 0; rc--)
+ __free_page(spages[rc-1]);
+ return -ENOMEM;
+}
+
struct nfs4_cached_acl {
int cached;
size_t len;
@@ -3420,13 +3450,23 @@ static int __nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t bufl
.rpc_argp = &arg,
.rpc_resp = &res,
};
- int ret;
+ int ret, i;
if (!nfs4_server_supports_acls(server))
return -EOPNOTSUPP;
+ i = buf_to_pages_noslab(buf, buflen, arg.acl_pages, &arg.acl_pgbase);
+ if (i < 0)
+ return i;
nfs_inode_return_delegation(inode);
- buf_to_pages(buf, buflen, arg.acl_pages, &arg.acl_pgbase);
ret = nfs4_call_sync(server, &msg, &arg, &res, 1);
+
+ /*
+ * Free each page after tx, so the only ref left is
+ * held by the network stack
+ */
+ for (; i > 0; i--)
+ put_page(pages[i-1]);
+
/*
* Acl update can result in inode attribute update.
* so mark the attribute cache invalid.
--
1.7.4
prev parent reply other threads:[~2011-03-05 0:26 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-04 16:44 [PATCH] nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab Neil Horman
2011-03-04 16:58 ` Christoph Hellwig
2011-03-04 17:13 ` J. Bruce Fields
2011-03-04 18:45 ` Neil Horman
2011-03-04 19:33 ` J. Bruce Fields
2011-03-04 19:48 ` [Security] " Linus Torvalds
2011-03-04 20:07 ` J. Bruce Fields
2011-03-04 20:30 ` Jeff Layton
2011-03-04 20:40 ` Trond Myklebust
[not found] ` <20110304153059.79374df7-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-03-04 21:04 ` J. Bruce Fields
2011-03-04 19:01 ` Trond Myklebust
2011-03-04 19:17 ` Neil Horman
2011-03-04 19:25 ` Trond Myklebust
2011-03-04 19:59 ` Neil Horman
2011-03-04 21:09 ` [PATCH] nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v2) Neil Horman
2011-03-04 21:25 ` Trond Myklebust
2011-03-05 0:26 ` Neil Horman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1299284763-18186-1-git-send-email-nhorman@tuxdriver.com \
--to=nhorman@tuxdriver.com \
--cc=Trond.Myklebust@netapp.com \
--cc=jlayton@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=security@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).