From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, Indan Zupancic <indan@nul.nu>,
Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
Daniel Phillips <phillips@google.com>,
Rik van Riel <riel@redhat.com>,
David Miller <davem@davemloft.net>
Subject: Re: [PATCH 4/4] nfs: deadlock prevention for NFS
Date: Fri, 25 Aug 2006 22:36:22 +0200 [thread overview]
Message-ID: <1156538183.26945.15.camel@lappy> (raw)
In-Reply-To: <1156536880.5927.29.camel@localhost>
On Fri, 2006-08-25 at 16:14 -0400, Trond Myklebust wrote:
> Grumble... If your patches are targetting NFS, could you please at the
> very least Cc nfs@lists.sourceforge.net and/or myself.
Sorry, will make sure you're on the CC list next round.
> On Fri, 2006-08-25 at 17:40 +0200, Peter Zijlstra wrote:
> > Provide a proper a_ops->swapfile() implementation for NFS. This will
> > set the NFS socket to SOCK_VMIO and put the socket reconnection under
> > PF_MEMALLOC (I hope this is enough, otherwise more work needs to be done).
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> > fs/nfs/file.c | 21 ++++++++++++++++++++-
> > include/linux/sunrpc/xprt.h | 4 +++-
> > net/sunrpc/xprtsock.c | 16 ++++++++++++++++
> > 3 files changed, 39 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6/fs/nfs/file.c
> > ===================================================================
> > --- linux-2.6.orig/fs/nfs/file.c
> > +++ linux-2.6/fs/nfs/file.c
> > @@ -27,6 +27,7 @@
> > #include <linux/slab.h>
> > #include <linux/pagemap.h>
> > #include <linux/smp_lock.h>
> > +#include <net/sock.h>
> >
> > #include <asm/uaccess.h>
> > #include <asm/system.h>
> > @@ -317,7 +318,25 @@ static int nfs_release_page(struct page
> >
> > static int nfs_swapfile(struct address_space *mapping, int enable)
> > {
> > - return 0;
> > + int err = -EINVAL;
> > + struct rpc_clnt *client = NFS_CLIENT(mapping->host);
> > + struct sock *sk = client->cl_xprt->inet;
> > +
> > + if (enable) {
> > + client->cl_xprt->swapper = 1;
> > + /*
> > + * keep one extra sock reference so the reserve won't dip
> > + * when the socket gets reconnected.
> > + */
> > + sk_adjust_memalloc(1, 1);
> > + err = sk_set_vmio(sk);
> > + } else if (client->cl_xprt->swapper) {
> > + client->cl_xprt->swapper = 0;
> > + sk_adjust_memalloc(-1, -1);
> > + err = sk_clear_vmio(sk);
> > + }
> > +
> > + return err;
> > }
>
> This all belongs in net/sunrpc/xprtsock.c. The NFS code has no business
> screwing around with the internals of the sunrpc transport.
Ok, I'll make a function there, and call that.
> > const struct address_space_operations nfs_file_aops = {
> > Index: linux-2.6/net/sunrpc/xprtsock.c
> > ===================================================================
> > --- linux-2.6.orig/net/sunrpc/xprtsock.c
> > +++ linux-2.6/net/sunrpc/xprtsock.c
> > @@ -1014,6 +1014,7 @@ static void xs_udp_connect_worker(void *
> > {
> > struct rpc_xprt *xprt = (struct rpc_xprt *) args;
> > struct socket *sock = xprt->sock;
> > + unsigned long pflags = current->flags;
> > int err, status = -EIO;
> >
> > if (xprt->shutdown || xprt->addr.sin_port == 0)
> > @@ -1021,6 +1022,9 @@ static void xs_udp_connect_worker(void *
> >
> > dprintk("RPC: xs_udp_connect_worker for xprt %p\n", xprt);
> >
> > + if (xprt->swapper)
> > + current->flags |= PF_MEMALLOC;
> > +
> > /* Start by resetting any existing state */
> > xs_close(xprt);
> >
> > @@ -1054,6 +1058,9 @@ static void xs_udp_connect_worker(void *
> > xprt->sock = sock;
> > xprt->inet = sk;
> >
> > + if (xprt->swapper)
> > + sk_set_vmio(sk);
> > +
> > write_unlock_bh(&sk->sk_callback_lock);
> > }
> > xs_udp_do_set_buffer_size(xprt);
> > @@ -1061,6 +1068,7 @@ static void xs_udp_connect_worker(void *
> > out:
> > xprt_wake_pending_tasks(xprt, status);
> > xprt_clear_connecting(xprt);
> > + current->flags = pflags;
> > }
> >
> > /*
> > @@ -1097,11 +1105,15 @@ static void xs_tcp_connect_worker(void *
> > {
> > struct rpc_xprt *xprt = (struct rpc_xprt *)args;
> > struct socket *sock = xprt->sock;
> > + unsigned long pflags = current->flags;
> > int err, status = -EIO;
> >
> > if (xprt->shutdown || xprt->addr.sin_port == 0)
> > goto out;
> >
> > + if (xprt->swapper)
> > + current->flags |= PF_MEMALLOC;
> > +
> > dprintk("RPC: xs_tcp_connect_worker for xprt %p\n", xprt);
> >
> > if (!xprt->sock) {
> > @@ -1170,10 +1182,14 @@ static void xs_tcp_connect_worker(void *
> > break;
> > }
> > }
> > +
> > + if (xprt->swapper)
> > + sk_set_vmio(xprt->inet);
> > out:
> > xprt_wake_pending_tasks(xprt, status);
> > out_clear:
> > xprt_clear_connecting(xprt);
> > + current->flags = pflags;
> > }
>
> How does this guarantee that the socket reconnection won't fail?
I was afraid this might not be enough, I really have to go through the
network code.
> Also, what about the case of rpc_malloc()? Can't that cause rpciod to
> deadlock when you add NFS swap into the equation?
I will have to plead ignorance for now, I'll look into this on monday.
On first glance it looks like rpc_malloc could use an |__GFP_EMERG for
RPC_TASK_SWAPPER.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, Indan Zupancic <indan@nul.nu>,
Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
Daniel Phillips <phillips@google.com>,
Rik van Riel <riel@redhat.com>,
David Miller <davem@davemloft.net>
Subject: Re: [PATCH 4/4] nfs: deadlock prevention for NFS
Date: Fri, 25 Aug 2006 22:36:22 +0200 [thread overview]
Message-ID: <1156538183.26945.15.camel@lappy> (raw)
In-Reply-To: <1156536880.5927.29.camel@localhost>
On Fri, 2006-08-25 at 16:14 -0400, Trond Myklebust wrote:
> Grumble... If your patches are targetting NFS, could you please at the
> very least Cc nfs@lists.sourceforge.net and/or myself.
Sorry, will make sure you're on the CC list next round.
> On Fri, 2006-08-25 at 17:40 +0200, Peter Zijlstra wrote:
> > Provide a proper a_ops->swapfile() implementation for NFS. This will
> > set the NFS socket to SOCK_VMIO and put the socket reconnection under
> > PF_MEMALLOC (I hope this is enough, otherwise more work needs to be done).
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > ---
> > fs/nfs/file.c | 21 ++++++++++++++++++++-
> > include/linux/sunrpc/xprt.h | 4 +++-
> > net/sunrpc/xprtsock.c | 16 ++++++++++++++++
> > 3 files changed, 39 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6/fs/nfs/file.c
> > ===================================================================
> > --- linux-2.6.orig/fs/nfs/file.c
> > +++ linux-2.6/fs/nfs/file.c
> > @@ -27,6 +27,7 @@
> > #include <linux/slab.h>
> > #include <linux/pagemap.h>
> > #include <linux/smp_lock.h>
> > +#include <net/sock.h>
> >
> > #include <asm/uaccess.h>
> > #include <asm/system.h>
> > @@ -317,7 +318,25 @@ static int nfs_release_page(struct page
> >
> > static int nfs_swapfile(struct address_space *mapping, int enable)
> > {
> > - return 0;
> > + int err = -EINVAL;
> > + struct rpc_clnt *client = NFS_CLIENT(mapping->host);
> > + struct sock *sk = client->cl_xprt->inet;
> > +
> > + if (enable) {
> > + client->cl_xprt->swapper = 1;
> > + /*
> > + * keep one extra sock reference so the reserve won't dip
> > + * when the socket gets reconnected.
> > + */
> > + sk_adjust_memalloc(1, 1);
> > + err = sk_set_vmio(sk);
> > + } else if (client->cl_xprt->swapper) {
> > + client->cl_xprt->swapper = 0;
> > + sk_adjust_memalloc(-1, -1);
> > + err = sk_clear_vmio(sk);
> > + }
> > +
> > + return err;
> > }
>
> This all belongs in net/sunrpc/xprtsock.c. The NFS code has no business
> screwing around with the internals of the sunrpc transport.
Ok, I'll make a function there, and call that.
> > const struct address_space_operations nfs_file_aops = {
> > Index: linux-2.6/net/sunrpc/xprtsock.c
> > ===================================================================
> > --- linux-2.6.orig/net/sunrpc/xprtsock.c
> > +++ linux-2.6/net/sunrpc/xprtsock.c
> > @@ -1014,6 +1014,7 @@ static void xs_udp_connect_worker(void *
> > {
> > struct rpc_xprt *xprt = (struct rpc_xprt *) args;
> > struct socket *sock = xprt->sock;
> > + unsigned long pflags = current->flags;
> > int err, status = -EIO;
> >
> > if (xprt->shutdown || xprt->addr.sin_port == 0)
> > @@ -1021,6 +1022,9 @@ static void xs_udp_connect_worker(void *
> >
> > dprintk("RPC: xs_udp_connect_worker for xprt %p\n", xprt);
> >
> > + if (xprt->swapper)
> > + current->flags |= PF_MEMALLOC;
> > +
> > /* Start by resetting any existing state */
> > xs_close(xprt);
> >
> > @@ -1054,6 +1058,9 @@ static void xs_udp_connect_worker(void *
> > xprt->sock = sock;
> > xprt->inet = sk;
> >
> > + if (xprt->swapper)
> > + sk_set_vmio(sk);
> > +
> > write_unlock_bh(&sk->sk_callback_lock);
> > }
> > xs_udp_do_set_buffer_size(xprt);
> > @@ -1061,6 +1068,7 @@ static void xs_udp_connect_worker(void *
> > out:
> > xprt_wake_pending_tasks(xprt, status);
> > xprt_clear_connecting(xprt);
> > + current->flags = pflags;
> > }
> >
> > /*
> > @@ -1097,11 +1105,15 @@ static void xs_tcp_connect_worker(void *
> > {
> > struct rpc_xprt *xprt = (struct rpc_xprt *)args;
> > struct socket *sock = xprt->sock;
> > + unsigned long pflags = current->flags;
> > int err, status = -EIO;
> >
> > if (xprt->shutdown || xprt->addr.sin_port == 0)
> > goto out;
> >
> > + if (xprt->swapper)
> > + current->flags |= PF_MEMALLOC;
> > +
> > dprintk("RPC: xs_tcp_connect_worker for xprt %p\n", xprt);
> >
> > if (!xprt->sock) {
> > @@ -1170,10 +1182,14 @@ static void xs_tcp_connect_worker(void *
> > break;
> > }
> > }
> > +
> > + if (xprt->swapper)
> > + sk_set_vmio(xprt->inet);
> > out:
> > xprt_wake_pending_tasks(xprt, status);
> > out_clear:
> > xprt_clear_connecting(xprt);
> > + current->flags = pflags;
> > }
>
> How does this guarantee that the socket reconnection won't fail?
I was afraid this might not be enough, I really have to go through the
network code.
> Also, what about the case of rpc_malloc()? Can't that cause rpciod to
> deadlock when you add NFS swap into the equation?
I will have to plead ignorance for now, I'll look into this on monday.
On first glance it looks like rpc_malloc could use an |__GFP_EMERG for
RPC_TASK_SWAPPER.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-08-25 20:37 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-25 15:39 [PATCH 0/4] VM deadlock prevention -v5 Peter Zijlstra
2006-08-25 15:39 ` Peter Zijlstra
2006-08-25 15:39 ` [PATCH 1/4] net: VM deadlock avoidance framework Peter Zijlstra
2006-08-25 15:39 ` Peter Zijlstra
2006-08-26 2:37 ` Indan Zupancic
2006-08-26 2:37 ` Indan Zupancic
2006-08-28 10:22 ` Peter Zijlstra
2006-08-28 10:22 ` Peter Zijlstra
2006-08-28 16:03 ` Indan Zupancic
2006-08-28 16:03 ` Indan Zupancic
2006-08-28 17:32 ` Peter Zijlstra
2006-08-28 17:32 ` Peter Zijlstra
2006-08-29 0:01 ` Indan Zupancic
2006-08-29 0:01 ` Indan Zupancic
2006-08-29 9:49 ` Peter Zijlstra
2006-08-29 9:49 ` Peter Zijlstra
2006-08-29 19:53 ` Indan Zupancic
2006-08-29 19:53 ` Indan Zupancic
2006-08-25 15:40 ` [PATCH 2/4] blkdev: iosched selection for queue creation Peter Zijlstra
2006-08-25 15:40 ` Peter Zijlstra
2006-08-25 15:40 ` [PATCH 3/4] nbd: deadlock prevention for NBD Peter Zijlstra
2006-08-25 15:40 ` Peter Zijlstra
2006-08-25 15:40 ` [PATCH 4/4] nfs: deadlock prevention for NFS Peter Zijlstra
2006-08-25 15:40 ` Peter Zijlstra
2006-08-25 20:14 ` Trond Myklebust
2006-08-25 20:14 ` Trond Myklebust
2006-08-25 20:36 ` Peter Zijlstra [this message]
2006-08-25 20:36 ` Peter Zijlstra
2006-08-26 3:05 ` Indan Zupancic
2006-08-26 3:05 ` Indan Zupancic
2006-08-25 15:51 ` [PATCH 0/4] VM deadlock prevention -v5 Christoph Lameter
2006-08-25 15:51 ` Christoph Lameter
2006-08-25 15:52 ` Peter Zijlstra
2006-08-25 15:52 ` Peter Zijlstra
2006-08-25 16:04 ` Rik van Riel
2006-08-25 16:04 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1156538183.26945.15.camel@lappy \
--to=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=indan@nul.nu \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=phillips@google.com \
--cc=riel@redhat.com \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.