From: Benny Halevy <bhalevy@panasas.com>
To: Andy Adamson <andros@netapp.com>
Cc: "linux-nfs@vger.kernel.org Mailing list" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 0/10] pnfs-submit add layoutget,layoutreturn error handling version 2
Date: Mon, 28 Jun 2010 21:53:08 +0300 [thread overview]
Message-ID: <4C28EF94.6000503@panasas.com> (raw)
In-Reply-To: <E0AA78BD-59D5-471B-8EA3-A3789C13BAAC@netapp.com>
On Jun. 28, 2010, 19:44 +0300, Andy Adamson <andros@netapp.com> wrote:
> Hi Benny
>
> I have not been able to reproduce this BUG. I've tried against the
> files pyNFS server with return_on_close False as well a True, and
> against a GFS2/pNFS cluster with write layouts turned on.
>
> Patch 0003-SQUASHME-pnfs-submit-clear-page-lseg-on-partial-i-o.patch
> calls put_lseg when I/O to a DS fails. I tested this using the pyNFS
> files layout server and blocking the DS with iptables. I think this is
> the only change in this patch set that would affect the refcounting.
>
> Are you able to reproduce the BUG?
The easiest way I found to reproduce this bug is running the cthon tests
on a locally mounted file system exported over PNFSD_LOCAL_EXPORT.
The test machine is a dual core SMP machine.
Are you testing over a VM? Is it uni-processor?
Benny
>
> -->Andy
>
> On Jun 24, 2010, at 1:02 PM, William A. (Andy) Adamson wrote:
>
>> OK - I'll look into it.
>>
>> Sorry I missed today's pNFS call.
>>
>> -->Andy
>>
>> On Thu, Jun 24, 2010 at 9:14 AM, Benny Halevy <bhalevy@panasas.com>
>> wrote:
>>> On Jun. 23, 2010, 22:21 +0300, andros@netapp.com wrote:
>>>> Responded to comments, added a 2 cleanup patchses
>>>>
>>>> Plus some code cleanup
>>>> 0001-SQUASHME-pnfs-submit-remove-unused-filelayout_mount_.patch
>>>>
>>>> and some bug fixes
>>>> 0002-SQUASHME-pnfs-submit-pnfs_try_to_read-write-commit-u.patch
>>>>
>>>> NOTE: this patch: 0003-SQUASHME-pnfs-submit-tell-commit-to-use-the-
>>>> MDS.patch
>>>> was replaced by:
>>>> 0003-SQUASHME-pnfs-submit-clear-page-lseg-on-partial-i-o.patch
>>>>
>>>>
>>>> Remove unused (by file layout) encode_layoutreturn io operation
>>>> 0004-SQUASHME-pnfs-submit-remove-encode_layoutreturn.patch
>>>> 0005-SQUASHME-pnfs-submit-add-error-handling-to-layout-re.patch
>>>>
>>>> 0006-SQUASHME-pnfs-submit-handle-assassinated-layoutcommi.patch
>>>>
>>>> Note: pnfs4_proc_layoutget is only called by send_layout() which
>>>> prints
>>>> the status.
>>>> 0007-SQUASHME-pnfs-submit-add-error-handlers-to-layout-ge.patch
>>>>
>>>> Add back encode_layoutreturn io operation
>>>> 0008-pnfs-post-submit-restore-encode_layoutreturn.patch
>>>>
>>>>
>>>> New patches:
>>>> 0009-SQUASHME-pnfs-submit-don-t-re-initialize-i_lock.patch
>>>>
>>>> This gets rid of a frame stack warning;
>>>> 0010-SQUASHME-pnfs-submit-remove-struct-nfs_server-from-s.patch
>>>>
>>>> Testing:
>>>> ---------
>>>>
>>>> CONFIG_NFS_V4_1 set: NFSv4.0 NFSv4.1 pNFS
>>>> Passes Connectathon tests
>>>>
>>>> Tested layoutget and layoutreturn recovery from
>>>> NFS4ERR_DEAD_SESSION with the
>>>> pyNFS server and the testclient framework.
>>>>
>>>> Still todo:
>>>>
>>>> Recover from NFS4ERR_BAD_STATEID. Currently layoutreturn,
>>>> layoutget, and
>>>> layoutcommit do not pass nfs_stste to the error handlers.
>>>>
>>>> Handle NFS4ERR_BAD_LAYOUT.
>>>>
>>>> CONFIG_NFS_V4_1 not set: NFSv4.o mount passes cthon tests.
>>>>
>>>> -->Andy
>>>
>>> Andy, I've hit
>>> BUG_ON(lo->refcount <= 0);
>>> in put_layout() with this patchset.
>>> I'm not sure if it introduced it or not, still investigating...
>>>
>>> Jun 24 12:07:26 tl2 kernel: pnfs_destroy_inode: WARNING:
>>> layout.refcount 1
>>> Jun 24 12:07:26 tl2 kernel: ------------[ cut here ]------------
>>> Jun 24 12:07:26 tl2 kernel: kernel BUG at /usr0/export/dev/bhalevy/
>>> git/linux-pnfs-bh-nfs41/fs/nfs/pnfs.c:341!
>>> Jun 24 12:07:26 tl2 kernel: invalid opcode: 0000 [#1] SMP
>>> DEBUG_PAGEALLOC
>>> Jun 24 12:07:26 tl2 kernel: last sysfs file: /sys/module/nfs/
>>> initstate
>>> Jun 24 12:07:26 tl2 kernel: CPU 1
>>> Jun 24 12:07:26 tl2 kernel: Modules linked in: nfslayoutdriver nfsd
>>> exportfs nfs lockd nfs_acl auth_rpcgss sunrpc osd libosd autofs4
>>> crc32c ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>>> cpufreq_ondemand acpi_cpufreq freq_table mperf ext3 jbd dm_mirror
>>> dm_region_hash dm_log dm_multipath dm_mod kvm_intel kvm
>>> snd_hda_codec_realtek i915 drm_kms_helper drm snd_hda_intel
>>> snd_hda_codec snd_hwdep i2c_algo_bit snd_seq i2c_i801 i2c_core
>>> snd_seq_device snd_pcm r8169 mii snd_timer sr_mod snd soundcore
>>> snd_page_alloc button video output rng_core sg cdrom ata_generic
>>> ata_piix libata sd_mod scsi_mod ext4 mbcache jbd2 crc16 uhci_hcd
>>> ohci_hcd ehci_hcd [last unloaded: microcode]
>>> Jun 24 12:07:26 tl2 kernel:
>>> Jun 24 12:07:26 tl2 kernel: Pid: 1920, comm: rpciod/1 Not tainted
>>> 2.6.35-rc3-pnfs+ #54 G31M4 (MS-7527)/MS-7527
>>> Jun 24 12:07:26 tl2 kernel: RIP: 0010:[<ffffffffa05d0ea4>]
>>> [<ffffffffa05d0ea4>] put_layout+0x2f/0xa7 [nfs]
>>> Jun 24 12:07:26 tl2 kernel: RSP: 0018:ffff88007525dd20 EFLAGS:
>>> 00010246
>>> Jun 24 12:07:26 tl2 kernel: RAX: 0000000000000000 RBX:
>>> ffff8800704b6b78 RCX: 0000000000000066
>>> Jun 24 12:07:26 tl2 kernel: RDX: ffff8800704b69a8 RSI:
>>> ffffea0001b931a8 RDI: ffff8800704b6b78
>>> Jun 24 12:07:26 tl2 kernel: RBP: ffff88007525dd30 R08:
>>> 0000000000000000 R09: ffff88007356a500
>>> Jun 24 12:07:26 tl2 kernel: R10: ffff88007525dd80 R11:
>>> 0000000000000003 R12: ffff8800704b69a8
>>> Jun 24 12:07:26 tl2 kernel: R13: ffff880073854f00 R14:
>>> ffff88007356a508 R15: ffff88007356a590
>>> Jun 24 12:07:26 tl2 kernel: FS: 0000000000000000(0000)
>>> GS:ffff880001a80000(0000) knlGS:0000000000000000
>>> Jun 24 12:07:26 tl2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
>>> 000000008005003b
>>> Jun 24 12:07:26 tl2 kernel: CR2: 0000003944279000 CR3:
>>> 0000000001698000 CR4: 00000000000406e0
>>> Jun 24 12:07:26 tl2 kernel: DR0: 0000000000000000 DR1:
>>> 0000000000000000 DR2: 0000000000000000
>>> Jun 24 12:07:26 tl2 kernel: DR3: 0000000000000000 DR6:
>>> 00000000ffff0ff0 DR7: 0000000000000400
>>> Jun 24 12:07:26 tl2 kernel: Process rpciod/1 (pid: 1920, threadinfo
>>> ffff88007525c000, task ffff88007d988000)
>>> Jun 24 12:07:26 tl2 kernel: Stack:
>>> Jun 24 12:07:26 tl2 kernel: ffff8800704b6b78 ffff8800704b69a8
>>> ffff88007525dd60 ffffffffa05d203f
>>> Jun 24 12:07:26 tl2 kernel: <0> ffff88007525dd60 ffff880073854f18
>>> ffff880073854f00 ffffffffa05d5880
>>> Jun 24 12:07:26 tl2 kernel: <0> ffff88007525dd80 ffffffffa05bfb5c
>>> ffff88007525dd90 ffff88007356a500
>>> Jun 24 12:07:26 tl2 kernel: Call Trace:
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa05d203f>] pnfs_layout_release
>>> +0x43/0x68 [nfs]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa05bfb5c>]
>>> nfs4_pnfs_layoutreturn_release+0x61/0x8b [nfs]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa056207d>]
>>> rpc_release_calldata+0x17/0x19 [sunrpc]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa05621bd>] rpc_free_task+0x5e/
>>> 0x66 [sunrpc]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa056225d>] rpc_put_task
>>> +0x98/0x9c [sunrpc]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa0562ea7>] __rpc_execute
>>> +0x205/0x212 [sunrpc]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa0562ef0>] rpc_async_schedule
>>> +0x15/0x17 [sunrpc]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff81052cb7>] worker_thread
>>> +0x1aa/0x23b
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffffa0562edb>] ?
>>> rpc_async_schedule+0x0/0x17 [sunrpc]
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff81056ab7>] ?
>>> autoremove_wake_function+0x0/0x39
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff8102f96d>] ?
>>> spin_unlock_irqrestore+0xe/0x10
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff81052b0d>] ? worker_thread
>>> +0x0/0x23b
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff81056645>] kthread+0x7f/0x87
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff81003a24>]
>>> kernel_thread_helper+0x4/0x10
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff810565c6>] ? kthread+0x0/0x87
>>> Jun 24 12:07:26 tl2 kernel: [<ffffffff81003a20>] ?
>>> kernel_thread_helper+0x0/0x10
>>> Jun 24 12:07:26 tl2 kernel: Code: 41 54 53 0f 1f 44 00 00 8b 87 24
>>> 01 00 00 48 89 fb 48 8d 97 30 fe ff ff 89 c1 c1 f9 08 38 c1 75 04
>>> 0f 0b eb fe 8b 07 85 c0 7f 04 <0f> 0b eb fe ff c8 85 c0 89 07 75 67
>>> 48 8b 82 48 03 00 00 f6 05
>>> Jun 24 12:07:26 tl2 kernel: RIP [<ffffffffa05d0ea4>] put_layout
>>> +0x2f/0xa7 [nfs]
>>> Jun 24 12:07:27 tl2 kernel: RSP <ffff88007525dd20>
>>> Jun 24 12:07:27 tl2 kernel: ---[ end trace 0468384c0ab45a1f ]---
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-06-28 18:53 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-23 19:21 [PATCH 0/10] pnfs-submit add layoutget,layoutreturn error handling version 2 andros
2010-06-23 19:21 ` [PATCH 01/10] SQUASHME: pnfs-submit remove unused filelayout_mount_type andros
2010-06-23 19:21 ` [PATCH 02/10] SQUASHME pnfs-submit: pnfs_try_to_read, write, commit using freed memory andros
2010-06-23 19:21 ` [PATCH 03/10] SQUASHME pnfs-submit: clear page lseg on partial i/o andros
2010-06-23 19:21 ` [PATCH 04/10] SQUASHME pnfs-submit: remove encode_layoutreturn andros
2010-06-23 19:21 ` [PATCH 05/10] SQUASHME pnfs-submit: add error handling to layout return andros
2010-06-23 19:21 ` [PATCH 06/10] SQUASHME pnfs-submit: handle assassinated layoutcommit andros
2010-06-23 19:21 ` [PATCH 07/10] SQUASHME pnfs-submit: add error handlers to layout get andros
2010-06-23 19:21 ` [PATCH 08/10] pnfs-post-submit: restore encode_layoutreturn andros
2010-06-23 19:21 ` [PATCH 09/10] SQUASHME: pnfs-submit: don't re-initialize i_lock andros
2010-06-23 19:21 ` [PATCH 10/10] SQUASHME pnfs-submit: remove struct nfs_server from stack andros
2010-06-30 15:19 ` [PATCH 07/10] SQUASHME pnfs-submit: add error handlers to layout get Boaz Harrosh
2010-06-30 19:23 ` William A. (Andy) Adamson
2010-06-24 13:14 ` [PATCH 0/10] pnfs-submit add layoutget,layoutreturn error handling version 2 Benny Halevy
2010-06-24 17:02 ` William A. (Andy) Adamson
[not found] ` <AANLkTikJWftkWhU8TIOGxvGxo8s2_sXyMn8VIsk9caTv-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-06-28 16:44 ` Andy Adamson
2010-06-28 18:53 ` Benny Halevy [this message]
2010-06-28 19:22 ` William A. (Andy) Adamson
[not found] ` <AANLkTilDLWK8rfwzlI8xJJUckxljCqgmblAYj9ANOMnb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-06-28 20:02 ` William A. (Andy) Adamson
2010-07-01 18:27 ` Benny Halevy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C28EF94.6000503@panasas.com \
--to=bhalevy@panasas.com \
--cc=andros@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.