From: Boaz Harrosh <bharrosh@panasas.com>
To: Johannes Schild <JSchild@gmx.de>
Cc: <linux-nfs@vger.kernel.org>, <osd-dev@open-osd.org>
Subject: Re: Questions about Exofs
Date: Wed, 16 May 2012 13:30:34 +0300 [thread overview]
Message-ID: <4FB381CA.7090906@panasas.com> (raw)
In-Reply-To: <20120516090006.192470@gmx.net>
On 05/16/2012 12:00 PM, Johannes Schild wrote:
> Hi Boaz,
<>
>> Do you see any prints in dmsg regarding iscsi, before the crash?
>
> I see output like this. Always "registered" no unloading execpt after the crash.
>
> [ 4.713107] iscsi: registered transport (tcp)
> #<some output removed>
> [ 4.739465] iscsi: registered transport (cxgb3i)
> #<some output removed>
> [ 4.750756] iscsi: registered transport (cxgb4i)
> #<some output removed>
> [ 4.771300] iscsi: registered transport (bnx2i)
> [ 4.781045] iscsi: registered transport (be2iscsi)
>
<>
>> could you please do:
>> []$ gdb fs/exofs/exofs.ko
>
> [root@ExB osd-repo]# gdb /root/pnfs-repo/fs/exofs/exofs.ko
> GNU gdb (GDB) Fedora (7.3.50.20110722-13.fc16)
> Copyright (C) 2011 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /root/pnfs-repo/fs/exofs/exofs.ko...done.
>
>> Inside gdb
>>> list *(exofs_free_sbi+0x59)
>
> (gdb) list *(exofs_free_sbi+0x59)
> 0x47a9 is in exofs_free_sbi (include/scsi/osd_ore.h:83).
> 78 /* ore_comp_dev Recievies a logical device index */
> 79 static inline struct osd_dev *ore_comp_dev(
> 80 const struct ore_components *oc, unsigned i)
> 81 {
> 82 BUG_ON((i < oc->first_dev) || (oc->first_dev + oc->numdevs <= i));
> 83 return oc->ods[i - oc->first_dev]->od;
> 84 }
> 85
> 86 static inline void ore_comp_set_dev(
> 87 struct ore_components *oc, unsigned i, struct osd_dev *od)
>
>> and also
>>> list *(exofs_fill_super+0x440)
>
> (gdb) list *(exofs_fill_super+0x440)
> 0x5850 is in exofs_fill_super (fs/exofs/super.c:847).
> 842 dput(sb->s_root);
> 843 sb->s_root = NULL;
> 844 goto free_sbi;
> 845 }
> 846
> 847 _exofs_print_device("Mounting", opts->dev_name,
> 848 ore_comp_dev(&sbi->oc, 0),
> 849 sbi->one_comp.obj.partition);
> 850 return 0;
> 851
> (gdb)
>
OK I understand we are _exofs_print_device an array that does
not exists yet.
>>
>> Could you enable CONFIG_EXOFS_DEBUG it's under:
>> miscellaneous-filesystems/exofs in make xconfig
>
> I enabled it.
>
>> Then re-run everything send me the output
>> []$ ./do-osd stop
>
> [root@ExB osd-repo]# ./do-osd stop
> /dev/osd0
> FATAL: Module osd is builtin
>
> Should it be a modul or doesn't matter?
>
It should be fine. scripts expect it as a module.
>> []$ ls /dev/osd*
>
> [root@ExB osd-repo]# ls /dev/osd*
> ls: cannot access /dev/osd*: No such file or directory
>
>> []$ ./do-osd
>
> [root@ExB osd-repo]# ./do-osd
> iscsid.service - LSB: Starts and stops login iSCSI daemon.
> Loaded: loaded (/etc/rc.d/init.d/iscsid)
> Active: inactive (dead) since Wed, 16 May 2012 10:46:23 +0200; 3min 11s ago
> Process: 2287 ExecStop=/etc/rc.d/init.d/iscsid stop (code=exited, status=0/SUCCESS)
> Process: 1168 ExecStart=/etc/rc.d/init.d/iscsid start (code=exited, status=0/SUCCESS)
> Main PID: 1213 (code=exited, status=0/SUCCESS)
> CGroup: name=systemd:/system/iscsid.service
> 18446744072101122080
> login into: 192.168.0.1:3260
> 192.168.0.1:3260,1 .root.var.osd-tgt.tgt-1.ExA
>
>> []$ ls /dev/osd*
>
> [root@ExB server]# ls /dev/os*
> /dev/osd1
>
/dev/osd1 interesting. make sure your scripts are using /dev/osd1.
I suspect this is an artifact of the last games. On a clean reboot
a single device should be /dev/osd0. The scripts expect that.
>> []$ ./do-exofs format
>> Send me the output of that
>
> ./do-exofs format
> mkexofs_format >>>
No output from the format command? that is not good. mkfs.exofs is
very bad in not saying anything when failing.
Probably because it was formatting /dev/osd0 and we have /dev/osd1 only
> osd stop? >>>
> FATAL: Module osd is builtin
> osd start? >>>
> iscsid.service - LSB: Starts and stops login iSCSI daemon.
> Loaded: loaded (/etc/rc.d/init.d/iscsid)
> Active: inactive (dead) since Wed, 16 May 2012 10:46:23 +0200; 6min ago
> Process: 2287 ExecStop=/etc/rc.d/init.d/iscsid stop (code=exited, status=0/SUCCESS)
> Process: 1168 ExecStart=/etc/rc.d/init.d/iscsid start (code=exited, status=0/SUCCESS)
> Main PID: 1213 (code=exited, status=0/SUCCESS)
> CGroup: name=systemd:/system/iscsid.service
> 18446744072101122080
> login into: 192.168.0.1:3260
> 192.168.0.1:3260,1 .root.var.osd-tgt.tgt-1.ExA
> Logging in to [iface: default, target: .root.var.osd-tgt.tgt-1.ExA, portal: 192.168.0.1,3260] (multiple)
> Login to [iface: default, target: .root.var.osd-tgt.tgt-1.ExA, portal: 192.168.0.1,3260] successful.
>
>> []$ ./do-exofs start
>> Send me the dmesg output of this stage, or if not too big
>> the dmesg output of from before ./do-osd <1>
>
> I pushed it on nopaste:
> http://nopaste.info/cd3c6f9141.html
>
in the dmesg I see:
[ 2516.994781] exofs @parse_options:88: parse_options osdname=d2683732-c906-4ee1-9dbd-c10c27bb40df,pid=0x10000
[ 2516.994808] osd @_mach_odi:261: found device sysid_len=0 osdname=36
[ 2516.994816] osd @_osdv2_req_encode_common:617: OSDv2 execute opcode 0x8885
[ 2516.994831] osd @_init_blk_request:1616: or=ffff880020d7ec00 has_in=1 has_out=0 => 0, ffff88003bbf8a10
the very first read below fails. This is the first read from super-block object.
Here it gets an -5 (-EIO) if it was an osd-target error you would have
a scsi-sense printout so it means it is a communication problem.
[ 2516.996034] exofs @exofs_read_kern:245: osd_execute_request() => -5
[ 2516.996041] exofs: Unable to mount exofs on (null) pid=0x10000 err=-5
This crash below I should fix. Code is not dealing properly with the IO error
and continues to try and dmesg-print an array that does not exist yet.
I will fix that.
[ 2516.996106] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2516.996111] IP: [<ffffffffa033c779>] exofs_free_sbi+0x59/0xa0 [exofs]
But the problem still remains why do we get IO errors from iscsi?
Later we have:
[ 3241.802074] connection1:0: detected conn error (1020)
disconnect. Do you see some prints at the otgtd side.
If you use the ./up script it might rederect these to a log file
do "./up log"
[ 3398.831629] Chelsio T3 iSCSI Driver cxgb3i v2.0.0 (Jun. 2010)
[ 3398.831919] iscsi: registered transport (cxgb3i)
[ 3398.836776] Chelsio T4 iSCSI Driver cxgb4i v0.9.1 (Aug. 2010)
[ 3398.836996] iscsi: registered transport (cxgb4i)
[ 3398.841397] cnic: Broadcom NetXtreme II CNIC Driver cnic v2.5.8 (Jan 3, 2012)
[ 3398.845267] Broadcom NetXtreme II iSCSI Driver bnx2i v2.7.0.3 (Jun 15, 2011)
[ 3398.845475] iscsi: registered transport (bnx2i)
[ 3400.201828] scsi4 : iSCSI Initiator over TCP/IP
[ 3400.715101] scsi 4:0:0:0: Object storage IET OSD 0001 PQ: 0 ANSI: 5
[ 3400.718038] osd @__detect_osd:359: start scsi_test_unit_ready ffff880020db3800 ffff880020dfa000 ffff88003974aca0
Right after the crash. So iscsi unloaded and loaded. There was a disconnect.
We must investigate why iscsi has communication problems?
the "192.168.0.1:3260" above is that your host's IP? You are running the otgtd on
the host and exofs in VM? That's good that's what I use all the time.
If you have time you should do two experiments.
1. Please run the "./do-osd test" test. send me the output.
It runs a user mode test of the osd device and does some
very basic communications.
Note that it will wipe your OSD and you will need to ./do-exofs format again
after it.
2. on the osd-target side you probably ran ./up. the otgtd also supports
none-osd regular disk-devices. Could you set up a regular disk
backbend as well. Look into "man tgtadm" on how to add a second
disk target.
Once you login to the target you will see a new /dev/sdX device
try to dd into it, and also mkfs and mount an ext FS on it.
Or else investigate why there are iscsi communication problems.
>
>
>>
>>> Just now i am using the 3.3.0 kernel from the linux-pnfs repository.
>>>
That's perfect it should have everything.
>>
>>
>> When compiling the Kernel, Did you enable CONFIG_PNFSD ?
>> (That is the pNFSD Server Kernel Support)
>
> No pNFSD Server support wasn't enabled, i recompiled and activate it
>
It's fine for this stage you don't need it
>
>
>
>> What platform are you using? Distro + ARCH ?
>
> Iam experimenting with Fedora 16 (3.3.0 pnfs kernel) and arch x86_64
>
I use that here too
>
> Thanks for your efforts
> Johannes
Hope that helps. Thanks for the report we got a bug fix
Boaz
next prev parent reply other threads:[~2012-05-16 10:30 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-15 9:03 Questions about Exofs Johannes Schild
2012-05-15 9:48 ` Boaz Harrosh
2012-05-15 12:19 ` Johannes Schild
2012-05-15 13:09 ` Idan Kedar
2012-05-15 13:42 ` Boaz Harrosh
2012-05-15 14:22 ` Idan Kedar
2012-05-15 15:06 ` Boaz Harrosh
2012-05-15 16:21 ` Idan Kedar
2012-05-15 17:20 ` Boaz Harrosh
2012-05-16 8:07 ` Idan Kedar
2012-05-16 12:15 ` Boaz Harrosh
2012-05-16 15:18 ` Idan Kedar
2012-05-15 13:18 ` Boaz Harrosh
2012-05-16 9:00 ` Johannes Schild
2012-05-16 10:30 ` Boaz Harrosh [this message]
2012-05-21 13:07 ` Johannes Schild
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FB381CA.7090906@panasas.com \
--to=bharrosh@panasas.com \
--cc=JSchild@gmx.de \
--cc=linux-nfs@vger.kernel.org \
--cc=osd-dev@open-osd.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.