linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kinglong Mee <kinglongmee@gmail.com>
To: Peng Tao <tao.peng@primarydata.com>, linux-nfs@vger.kernel.org
Cc: Trond Myklebust <trond.myklebust@primarydata.com>, kinglongmee@gmail.com
Subject: Re: [PATCH v2] NFS41: make close wait for layoutreturn
Date: Wed, 23 Sep 2015 15:55:18 +0800	[thread overview]
Message-ID: <56025AE6.6020808@gmail.com> (raw)
In-Reply-To: <56025A7D.2060207@gmail.com>

On 9/23/2015 15:53, Kinglong Mee wrote:
> Hi Tao, 
> 
> I meet a panic with this patch on 4.3.0-rc2 from linus's tree every time,

The export is ,
/nfs/pnfs       *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,pnfs,fsid=0)

/nfs/pnfs is mounted an XFS filesystem which supports block layout.

thanks,
Kinglong Mee 

> 
> # mount -t nfs nfs-server:/ /mnt  <------ a nfs4.2 mount
> # cat /mnt/test1
> # echo sdfsdijfd > /mnt/test2
> # umount /mnt                     <----- panic here
> 
> [  391.565636] BUG: unable to handle kernel paging request at ffffffffffffffe0
> [  391.565667] IP: [<ffffffffa04fc019>] nfs4_delegreturn_prepare+0x19/0x70 [nfsv4]
> [  391.565696] PGD 1c14067 PUD 1c16067 PMD 0
> [  391.565711] Oops: 0000 [#1]
> [  391.565721] Modules linked in: blocklayoutdriver(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c btrfs ppdev coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nfsd xor raid6_pq vmw_balloon auth_rpcgss nfs_acl lockd parport_pc vmw_vmci parport shpchp i2c_piix4 grace sunrpc vmwgfx drm_kms_helper ttm drm serio_raw mptspi e1000 scsi_transport_spi mptscsih ata_generic mptbase pata_acpi [last unloaded: fscache]
> [  391.567216] CPU: 0 PID: 498 Comm: kworker/0:1H Tainted: G           OE   4.3.0-rc2+ #257
> [  391.567672] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> [  391.568586] Workqueue: rpciod rpc_async_schedule [sunrpc]
> [  391.569038] task: ffff8800362bc8c0 ti: ffff880075264000 task.ti: ffff880075264000
> [  391.569505] RIP: 0010:[<ffffffffa04fc019>]  [<ffffffffa04fc019>] nfs4_delegreturn_prepare+0x19/0x70 [nfsv4]
> [  391.570441] RSP: 0018:ffff880075267cc0  EFLAGS: 00010282
> [  391.570910] RAX: ffffffffa052e700 RBX: ffff880041235000 RCX: 0000000000000001
> [  391.571379] RDX: ffff8800362bcfd0 RSI: ffff880041235000 RDI: 0000000000000000
> [  391.571833] RBP: ffff880075267cd0 R08: 0000000000000000 R09: 0000000000942000
> [  391.572332] R10: ffff8800362bc8c0 R11: ffff880075267d78 R12: ffff880069da5600
> [  391.572739] R13: ffff88007ff49d00 R14: ffffffffa00881b0 R15: ffff880069da5688
> [  391.573142] FS:  0000000000000000(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
> [  391.573552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  391.573964] CR2: ffffffffffffffe0 CR3: 0000000041095000 CR4: 00000000001406f0
> [  391.574413] Stack:
> [  391.574819]  ffff880069da5600 ffffffffa00881b0 ffff880075267ce0 ffffffffa00881c3
> [  391.575283]  ffff880075267d48 ffffffffa0089e44 0000000000010000 ffff880069da5670
> [  391.575711]  0000000000000292 ffffffff810a37fd 0000000100000000 00000000512aefa5
> [  391.576137] Call Trace:
> [  391.576560]  [<ffffffffa00881b0>] ? __rpc_atrun+0x20/0x20 [sunrpc]
> [  391.576996]  [<ffffffffa00881c3>] rpc_prepare_task+0x13/0x20 [sunrpc]
> [  391.577432]  [<ffffffffa0089e44>] __rpc_execute+0x94/0x3f0 [sunrpc]
> [  391.577919]  [<ffffffff810a37fd>] ? process_one_work+0x16d/0x4c0
> [  391.578521]  [<ffffffffa008a1b5>] rpc_async_schedule+0x15/0x20 [sunrpc]
> [  391.579180]  [<ffffffff810a38ac>] process_one_work+0x21c/0x4c0
> [  391.579886]  [<ffffffff810a37fd>] ? process_one_work+0x16d/0x4c0
> [  391.580495]  [<ffffffff810a3b9a>] worker_thread+0x4a/0x440
> [  391.581051]  [<ffffffff810a3b50>] ? process_one_work+0x4c0/0x4c0
> [  391.581484]  [<ffffffff810a3b50>] ? process_one_work+0x4c0/0x4c0
> [  391.581901]  [<ffffffff810a8da5>] kthread+0xf5/0x110
> [  391.582307]  [<ffffffff810a8cb0>] ? kthread_create_on_node+0x240/0x240
> [  391.582760]  [<ffffffff8172d01f>] ret_from_fork+0x3f/0x70
> [  391.583152]  [<ffffffff810a8cb0>] ? kthread_create_on_node+0x240/0x240
> [  391.583584] Code: f5 75 e6 48 89 df e8 67 2f b8 ff eb dc 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 8b be d8 01 00 00 48 89 f3 <48> 83 7f e0 00 74 11 4c 89 e6 e8 c8 df 02 00 84 c0 74 05 5b 41
> [  391.584794] RIP  [<ffffffffa04fc019>] nfs4_delegreturn_prepare+0x19/0x70 [nfsv4]
> [  391.585186]  RSP <ffff880075267cc0>
> [  391.585561] CR2: ffffffffffffffe0
> 
> thanks,
> Kinglong Mee
> 
> On 9/22/2015 11:35, Peng Tao wrote:
>> If we send a layoutreturn asynchronously before close, the close
>> might reach server first and layoutreturn would fail with BADSTATEID
>> because there is nothing keeping the layout stateid alive.
>>
>> Also do not pretend sending layoutreturn if we are not.
>>
>> Signed-off-by: Peng Tao <tao.peng@primarydata.com>
>> ---
>> v2: grab lo refcount when doing ROC
>>
>>  fs/nfs/nfs4proc.c | 17 +++++++++++++++++
>>  fs/nfs/pnfs.c     | 35 +++++++++++++++++++++++++----------
>>  fs/nfs/pnfs.h     |  7 +++++++
>>  3 files changed, 49 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 693b903..05f2da4 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -2645,6 +2645,15 @@ out:
>>  	return err;
>>  }
>>  
>> +static bool
>> +nfs4_wait_on_layoutreturn(struct inode *inode, struct rpc_task *task)
>> +{
>> +	if (!nfs_have_layout(inode))
>> +		return false;
>> +
>> +	return pnfs_wait_on_layoutreturn(inode, task);
>> +}
>> +
>>  struct nfs4_closedata {
>>  	struct inode *inode;
>>  	struct nfs4_state *state;
>> @@ -2763,6 +2772,11 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)
>>  		goto out_no_action;
>>  	}
>>  
>> +	if (nfs4_wait_on_layoutreturn(inode, task)) {
>> +		nfs_release_seqid(calldata->arg.seqid);
>> +		goto out_wait;
>> +	}
>> +
>>  	if (calldata->arg.fmode == 0)
>>  		task->tk_msg.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_CLOSE];
>>  	if (calldata->roc)
>> @@ -5308,6 +5322,9 @@ static void nfs4_delegreturn_prepare(struct rpc_task *task, void *data)
>>  
>>  	d_data = (struct nfs4_delegreturndata *)data;
>>  
>> +	if (nfs4_wait_on_layoutreturn(d_data->inode, task))
>> +		return;
>> +
>>  	if (d_data->roc)
>>  		pnfs_roc_get_barrier(d_data->inode, &d_data->roc_barrier);
>>  
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index ba12464..8abe271 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -1104,20 +1104,15 @@ bool pnfs_roc(struct inode *ino)
>>  			mark_lseg_invalid(lseg, &tmp_list);
>>  			found = true;
>>  		}
>> -	/* pnfs_prepare_layoutreturn() grabs lo ref and it will be put
>> -	 * in pnfs_roc_release(). We don't really send a layoutreturn but
>> -	 * still want others to view us like we are sending one!
>> -	 *
>> -	 * If pnfs_prepare_layoutreturn() fails, it means someone else is doing
>> -	 * LAYOUTRETURN, so we proceed like there are no layouts to return.
>> -	 *
>> -	 * ROC in three conditions:
>> +	/* ROC in two conditions:
>>  	 * 1. there are ROC lsegs
>>  	 * 2. we don't send layoutreturn
>> -	 * 3. no others are sending layoutreturn
>>  	 */
>> -	if (found && !layoutreturn && pnfs_prepare_layoutreturn(lo))
>> +	if (found && !layoutreturn) {
>> +		/* lo ref dropped in pnfs_roc_release() */
>> +		pnfs_get_layout_hdr(lo);
>>  		roc = true;
>> +	}
>>  
>>  out_noroc:
>>  	spin_unlock(&ino->i_lock);
>> @@ -1172,6 +1167,26 @@ void pnfs_roc_get_barrier(struct inode *ino, u32 *barrier)
>>  	spin_unlock(&ino->i_lock);
>>  }
>>  
>> +bool pnfs_wait_on_layoutreturn(struct inode *ino, struct rpc_task *task)
>> +{
>> +	struct nfs_inode *nfsi = NFS_I(ino);
>> +        struct pnfs_layout_hdr *lo;
>> +        bool sleep = false;
>> +
>> +	/* we might not have grabbed lo reference. so need to check under
>> +	 * i_lock */
>> +        spin_lock(&ino->i_lock);
>> +        lo = nfsi->layout;
>> +        if (lo && test_bit(NFS_LAYOUT_RETURN, &lo->plh_flags))
>> +                sleep = true;
>> +        spin_unlock(&ino->i_lock);
>> +
>> +        if (sleep)
>> +                rpc_sleep_on(&NFS_SERVER(ino)->roc_rpcwaitq, task, NULL);
>> +
>> +        return sleep;
>> +}
>> +
>>  /*
>>   * Compare two layout segments for sorting into layout cache.
>>   * We want to preferentially return RW over RO layouts, so ensure those
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 78c9351..d1990e9 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -270,6 +270,7 @@ bool pnfs_roc(struct inode *ino);
>>  void pnfs_roc_release(struct inode *ino);
>>  void pnfs_roc_set_barrier(struct inode *ino, u32 barrier);
>>  void pnfs_roc_get_barrier(struct inode *ino, u32 *barrier);
>> +bool pnfs_wait_on_layoutreturn(struct inode *ino, struct rpc_task *task);
>>  void pnfs_set_layoutcommit(struct inode *, struct pnfs_layout_segment *, loff_t);
>>  void pnfs_cleanup_layoutcommit(struct nfs4_layoutcommit_data *data);
>>  int pnfs_layoutcommit_inode(struct inode *inode, bool sync);
>> @@ -639,6 +640,12 @@ pnfs_roc_get_barrier(struct inode *ino, u32 *barrier)
>>  {
>>  }
>>  
>> +static inline bool
>> +pnfs_wait_on_layoutreturn(struct inode *ino, struct rpc_task *task)
>> +{
>> +	return false;
>> +}
>> +
>>  static inline void set_pnfs_layoutdriver(struct nfs_server *s,
>>  					 const struct nfs_fh *mntfh, u32 id)
>>  {
>>
> 

  reply	other threads:[~2015-09-23  7:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22  3:35 [PATCH v2] NFS41: make close wait for layoutreturn Peng Tao
2015-09-23  7:53 ` Kinglong Mee
2015-09-23  7:55   ` Kinglong Mee [this message]
2015-09-23  8:27     ` Kinglong Mee
2015-09-23 12:05       ` Trond Myklebust
2015-09-23 12:45         ` Kinglong Mee
2015-09-23 12:52           ` Trond Myklebust
2015-09-23 12:59             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56025AE6.6020808@gmail.com \
    --to=kinglongmee@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tao.peng@primarydata.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).