* About pNFS installation process.
@ 2012-03-15 20:23 Bruno Silva
2012-03-16 0:16 ` Jim Rees
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Bruno Silva @ 2012-03-15 20:23 UTC (permalink / raw)
To: linux-nfs
Hello,
I am PhD student and I intend to conduct performance experiments
adopting pNFS. First of all, i need to build the pNFS Block Layout
environment. I apologize for the long mail, but i need to explain all
the steps conducted to build the environment. I followed the steps
presented in http://wiki.linux-nfs.org/wiki/index.php/PNFS_Block_Server_Setup_Instructions.
Steps.
(1) Building and install the kernel. And these configurations on .config file
i adopted this code: git clone git://linux-nfs.org/~bhalevy/linux-pnfs.git
CONFIG_NFSD=m
CONFIG_NFSD_V4=y
CONFIG_PNFSD=y
# CONFIG_PNFSD_LOCAL_EXPORT is not set
CONFIG_PNFSD_BLOCK=y
(2) Building the nfsutils and utils/blkmapd
git clone git://linux-nfs.org/~bhalevy/pnfs-nfs-utils.git
(3) Export the file system.
-----------------------------------------------------------------
For the block access to work properly the disks must have a signature.
Partitioned the disks using "parted". Disks partitioned with "fdisk"
doesn't have the signatures.
I have followed the below mentioned steps.
parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart 1 <Provide start and end of the partetions>
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 53.7GB 53.7GB ext3 1 msftres
I have tested with ext4 file system, create ext4 file system with 4K block size.
# mkfs.ext4 -b 4096 /dev/sdb1
-----------------------------------------------------------------
The steps 1, 2 and 3 i did to all machines (data servers, metadata
server and client).
I adopted the export option in metadata server exports file.
/mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs)
And run the following script to start the server.
#/bin/bash
# UMOUNT /mnt
umount /mnt
#start the service
service tgtd restart
sleep 8
# Create iSCSI target
tgtadm --lld iscsi --op new --mode target --tid 1 -T
iqn.1992-05.com.emc:openblock
# Expose LUN as iSCSI target
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1
--backing-store /dev/sdb
# Allow acces of all initiator
tgtadm --lld iscsi --mode target --op bind --tid 1 --initiator-address ALL
# show all the details
tgtadm --lld iscsi --op show --mode target
# mount the partetion
mount /dev/sdb1 /mnt
sleep 3
# start the nfs server
service nfs restart
sleep 3
# start the deamon
cd <CTL_SRC>/ctl/
./ctl -u &
If the data server and metadata server have been run on the same
machine everything works normally. My question is how do I add other
pNFS data servers in the environment. I know that is related to
creating iscsi targets. But how the data servers are linked with the
metadata server. There is some configuration file to inform the
metadata server like in spNFS? How it works?
Thanks in advance.
--
Bruno Silva
Computer Engineer
Modcs Group
---------------------------------------------------------------------
Facebook goo.gl/QHaZx
Twitter goo.gl/yk4jf
Google+ goo.gl/xIbgk
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: About pNFS installation process. 2012-03-15 20:23 About pNFS installation process Bruno Silva @ 2012-03-16 0:16 ` Jim Rees 2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch 2012-03-18 10:51 ` About pNFS installation process Lev Solomonov 2 siblings, 0 replies; 9+ messages in thread From: Jim Rees @ 2012-03-16 0:16 UTC (permalink / raw) To: Bruno Silva; +Cc: linux-nfs Bruno Silva wrote: The steps 1, 2 and 3 i did to all machines (data servers, metadata server and client). You only need blkmapd on the client. Client setup instructions are at http://wiki.linux-nfs.org/wiki/index.php/Fedora_pNFS_Client_Setup . I added a link to that page. ^ permalink raw reply [flat|nested] 9+ messages in thread
* About Direct I/O 2012-03-15 20:23 About pNFS installation process Bruno Silva 2012-03-16 0:16 ` Jim Rees @ 2012-03-16 18:14 ` Alexandre Depoutovitch 2012-03-16 20:35 ` J. Bruce Fields 2012-03-18 10:51 ` About pNFS installation process Lev Solomonov 2 siblings, 1 reply; 9+ messages in thread From: Alexandre Depoutovitch @ 2012-03-16 18:14 UTC (permalink / raw) To: linux-nfs Hello, I am trying to do random sector aligned writes to an NFS mounted disk. The performance is order of magnitude worse than 4K (file system block size) aligned I/O. The reason is that NFS demon (Linux kernel 2.6.32) on the server side always does buffered I/O, which behaves poorly for block unaligned requests. Is there a way to tell NFS daemon to use direct I/O? If not, is it an implementation limitation or there is a fundamental problem with using direct I/O in NFS server? Thank you, Alex ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About Direct I/O 2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch @ 2012-03-16 20:35 ` J. Bruce Fields 2012-03-16 20:58 ` Myklebust, Trond 0 siblings, 1 reply; 9+ messages in thread From: J. Bruce Fields @ 2012-03-16 20:35 UTC (permalink / raw) To: Alexandre Depoutovitch; +Cc: linux-nfs On Fri, Mar 16, 2012 at 11:14:04AM -0700, Alexandre Depoutovitch wrote: > Hello, > I am trying to do random sector aligned writes to an NFS mounted disk. The > performance is order of magnitude worse than 4K (file system block size) > aligned I/O. > The reason is that NFS demon (Linux kernel 2.6.32) on the server side > always does buffered I/O, which behaves poorly for block unaligned > requests. > Is there a way to tell NFS daemon to use direct I/O? No. > If not, is it an implementation limitation or there is a fundamental > problem with using direct I/O in NFS server? I'm shamefully ignorant of Direct IO.... If we supported Direct IO, are there heuristics that would let the server figure out on its own when it helped and when it didn't? Or would the administrator be stuck trying to figure that out? Is Direct IO possible from kernel buffers these days? Are there alignment restrictions? --b. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About Direct I/O 2012-03-16 20:35 ` J. Bruce Fields @ 2012-03-16 20:58 ` Myklebust, Trond 0 siblings, 0 replies; 9+ messages in thread From: Myklebust, Trond @ 2012-03-16 20:58 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Alexandre Depoutovitch, linux-nfs@vger.kernel.org T24gRnJpLCAyMDEyLTAzLTE2IGF0IDE2OjM1IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6 DQo+IE9uIEZyaSwgTWFyIDE2LCAyMDEyIGF0IDExOjE0OjA0QU0gLTA3MDAsIEFsZXhhbmRyZSBE ZXBvdXRvdml0Y2ggd3JvdGU6DQo+ID4gSGVsbG8sDQo+ID4gSSBhbSB0cnlpbmcgdG8gZG8gcmFu ZG9tIHNlY3RvciBhbGlnbmVkIHdyaXRlcyB0byBhbiBORlMgbW91bnRlZCBkaXNrLiBUaGUNCj4g PiBwZXJmb3JtYW5jZSBpcyBvcmRlciBvZiBtYWduaXR1ZGUgd29yc2UgdGhhbiA0SyAoZmlsZSBz eXN0ZW0gYmxvY2sgc2l6ZSkNCj4gPiBhbGlnbmVkIEkvTy4NCj4gPiBUaGUgcmVhc29uIGlzIHRo YXQgTkZTIGRlbW9uIChMaW51eCBrZXJuZWwgMi42LjMyKSBvbiB0aGUgc2VydmVyIHNpZGUNCj4g PiBhbHdheXMgZG9lcyBidWZmZXJlZCBJL08sIHdoaWNoIGJlaGF2ZXMgcG9vcmx5IGZvciBibG9j ayB1bmFsaWduZWQNCj4gPiByZXF1ZXN0cy4NCj4gPiBJcyB0aGVyZSBhIHdheSB0byB0ZWxsIE5G UyBkYWVtb24gdG8gdXNlIGRpcmVjdCBJL08/IA0KPiANCj4gTm8uDQo+IA0KPiA+IElmIG5vdCwg aXMgaXQgYW4gaW1wbGVtZW50YXRpb24gbGltaXRhdGlvbiBvciB0aGVyZSBpcyBhIGZ1bmRhbWVu dGFsDQo+ID4gcHJvYmxlbSB3aXRoIHVzaW5nIGRpcmVjdCBJL08gaW4gTkZTIHNlcnZlcj8NCj4g DQo+IEknbSBzaGFtZWZ1bGx5IGlnbm9yYW50IG9mIERpcmVjdCBJTy4uLi4NCj4gDQo+IElmIHdl IHN1cHBvcnRlZCBEaXJlY3QgSU8sIGFyZSB0aGVyZSBoZXVyaXN0aWNzIHRoYXQgd291bGQgbGV0 IHRoZQ0KPiBzZXJ2ZXIgZmlndXJlIG91dCBvbiBpdHMgb3duIHdoZW4gaXQgaGVscGVkIGFuZCB3 aGVuIGl0IGRpZG4ndD8gIE9yDQo+IHdvdWxkIHRoZSBhZG1pbmlzdHJhdG9yIGJlIHN0dWNrIHRy eWluZyB0byBmaWd1cmUgdGhhdCBvdXQ/DQo+IA0KPiBJcyBEaXJlY3QgSU8gcG9zc2libGUgZnJv bSBrZXJuZWwgYnVmZmVycyB0aGVzZSBkYXlzPyAgQXJlIHRoZXJlDQo+IGFsaWdubWVudCByZXN0 cmljdGlvbnM/DQoNCldvcmsgaXMgb24gaXRzIHdheSB0byBhbGxvdyBkaXJlY3QgaS9vIGZyb20g a2VybmVsIGJ1ZmZlcnMsIGJ1dCBpdCBpcw0Kbm90IHBvc3NpYmxlIHdpdGggZXhpc3Rpbmcga2Vy bmVscy4NCg0KSGF2ZSBwYXRpZW5jZS4NCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMg Y2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0K d3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process. 2012-03-15 20:23 About pNFS installation process Bruno Silva 2012-03-16 0:16 ` Jim Rees 2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch @ 2012-03-18 10:51 ` Lev Solomonov 2012-03-23 17:21 ` Bruno Silva 2 siblings, 1 reply; 9+ messages in thread From: Lev Solomonov @ 2012-03-18 10:51 UTC (permalink / raw) To: Bruno Silva; +Cc: linux-nfs On Thu, Mar 15, 2012 at 22:23, Bruno Silva <bs@cin.ufpe.br> wrote: <snip> > 1 17.4kB 53.7GB 53.7GB ext3 1 msftres > I have tested with ext4 file system, create ext4 file system with > 4K block size. > > # mkfs.ext4 -b 4096 /dev/sdb1 > ----------------------------------------------------------------- i suspect you might encounter some issues with ext4, once you get past the initial setup problems. > The steps 1, 2 and 3 i did to all machines (data servers, metadata > server and client). assuming iSCSI as the underlying storage model, generally speaking: * blkmapd and blocklayoutdriver are client-specific (as jim mentioned). * ctl daemon is MDS-specific. * iSCSI target is DS-specific. * both MDS and client will need iSCSI initiator. you'll need to discover and login to the iSCSI target on the DS from both the MDS and the client for proper pNFS operation. security often gets in the way, mind the ACLs on the iSCSI targets and firewall settings on all three (client/MDS/DS) and between them. > I adopted the export option in metadata server exports file. > /mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs) <snip> > If the data server and metadata server have been run on the same > machine everything works normally. see above. once you set everything up make sure you're actually running in pNFS mode, i.e. that client sends/receives file data directly from the DS over iSCSI, not as a fallback through MDS. solo. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process. 2012-03-18 10:51 ` About pNFS installation process Lev Solomonov @ 2012-03-23 17:21 ` Bruno Silva 2012-03-23 18:02 ` Jim Rees 2012-04-04 0:20 ` Lev Solomonov 0 siblings, 2 replies; 9+ messages in thread From: Bruno Silva @ 2012-03-23 17:21 UTC (permalink / raw) To: Lev Solomonov; +Cc: linux-nfs Fist of all, thanks for your replies. I still have a few questions: 1. What file system do you suggest? Why ext4 is not recommended? 2. How know whether the pNFS is correctly set? Follows the blkmapd output: [bruno@fedora blkmapd]$ sudo ./blkmapd -f blkmapd: process_deviceinfo: 2 vols ***** POINT 1 ******* blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string %�c,�kkG�+ΉJ�2 blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568 blkmapd: decode_blk_volume: simple 0 blkmapd: decode_blk_volume: slice 1 device-mapper: reload ioctl failed: Invalid argument ***** POINT 2 ******* blkmapd: Create device pnfs_vol_0 failed blkmapd: dm_device_create: 1 pnfs_vol_0 0:0 blkmapd: process_deviceinfo: 2 vols blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string %�c,�kkG�+ΉJ�2 blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568 blkmapd: decode_blk_volume: simple 0 blkmapd: decode_blk_volume: slice 1 device-mapper: reload ioctl failed: Invalid argument blkmapd: Create device pnfs_vol_1 failed blkmapd: dm_device_create: 1 pnfs_vol_1 0:0 blkmapd: process_deviceinfo: 2 vols blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string %�c,�kkG�+ΉJ�2 blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568 blkmapd: decode_blk_volume: simple 0 blkmapd: decode_blk_volume: slice 1 device-mapper: reload ioctl failed: Invalid argument blkmapd: Create device pnfs_vol_2 failed blkmapd: dm_device_create: 1 pnfs_vol_2 0:0 ***** POINT 1 ******* I connected with four data servers using the command iscsiadm-m discovery-t-p SendTargets <IP DATE OF server>-l. But, note the output blkmapd "blkmapd: process_deviceinfo: 2 vols". I believe that should be listed four volumes, not two; ***** POINT 2 ******* What means this message? "device-mapper: reload ioctl failed: Invalid argument" Follows the output of command of "grep nfs /proc/self/mountstats" [bruno@fedora ~]$ grep nfs /proc/self/mountstats device sunrpc mounted on /var/lib/nfs/rpc_pipefs with fstype rpc_pipefs device 192.168.0.203:/ mounted on /home/bruno/shared with fstype nfs4 statvers=1.0 nfsv4: bm0=0xfdffbfff,bm1=0x40f9be3e,acl=0x3,sessions,pnfs=not configured RPC iostats version: 1.0 p/v: 100003/4 (nfs) My question is: How can i add other data servers in the structure and how do I find out where the problem is? Thanks in advance. []'s Em 18 de março de 2012 07:51, Lev Solomonov <solo@tonian.com> escreveu: > > On Thu, Mar 15, 2012 at 22:23, Bruno Silva <bs@cin.ufpe.br> wrote: > <snip> > > 1 17.4kB 53.7GB 53.7GB ext3 1 msftres > > I have tested with ext4 file system, create ext4 file system with > > 4K block size. > > > > # mkfs.ext4 -b 4096 /dev/sdb1 > > ----------------------------------------------------------------- > > i suspect you might encounter some issues with ext4, once you get past > the initial setup problems. > > > The steps 1, 2 and 3 i did to all machines (data servers, metadata > > server and client). > > assuming iSCSI as the underlying storage model, generally speaking: > * blkmapd and blocklayoutdriver are client-specific (as jim mentioned). > * ctl daemon is MDS-specific. > * iSCSI target is DS-specific. > * both MDS and client will need iSCSI initiator. > > you'll need to discover and login to the iSCSI target on the DS from > both the MDS and the client for proper pNFS operation. > > security often gets in the way, mind the ACLs on the iSCSI targets and > firewall settings on all three (client/MDS/DS) and between them. > > > I adopted the export option in metadata server exports file. > > /mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs) > <snip> > > If the data server and metadata server have been run on the same > > machine everything works normally. > > see above. once you set everything up make sure you're actually running > in pNFS mode, i.e. that client sends/receives file data directly from > the DS over iSCSI, not as a fallback through MDS. > > solo. -- Bruno Silva Computer Engineer Modcs Group --------------------------------------------------------------------- Facebook goo.gl/QHaZx Twitter goo.gl/yk4jf Google+ goo.gl/xIbgk ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process. 2012-03-23 17:21 ` Bruno Silva @ 2012-03-23 18:02 ` Jim Rees 2012-04-04 0:20 ` Lev Solomonov 1 sibling, 0 replies; 9+ messages in thread From: Jim Rees @ 2012-03-23 18:02 UTC (permalink / raw) To: Bruno Silva; +Cc: Lev Solomonov, linux-nfs Blkmapd is finding your iscsi devices, but can't create the mapped device for some reason. I suspect the geometry is wrong but you'd have to do some debugging to figure out exactly why. I guess we should fix or get rid of pretty_sig. Date: Thu, 2 Dec 2010 09:41:26 -0500 From: Jim Rees <rees@umich.edu> Subject: Re: [PATCH 4/5] various minor cleanups To: Benny Halevy <bhalevy@panasas.com> Cc: linux-nfs@vger.kernel.org, peter honeyman <honey@citi.umich.edu> ... I am glad you are paying attention! I am aware of the shortcomings of pretty_sig(). In addition to the problems you noted, it also assumes that a signature over 8 bytes long is representable as a text string, which is not guaranteed. The code it replaced was worse. I put this in because for debugging I need to be able to follow a signature all the way from my EMC server to the devmapper. pretty_sig() simply prints the signature in a way that I can match it up with the signature on the server. I don't want to spend a lot of time on this, but I also am uneasy leaving EMC-specific code in nfs-utils, especially since it can blow up if you use it against a non-EMC server. My inclination is to remove this debugging code when I no longer need it. I guess at the very least I should put in a comment. I am open to suggestions. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process. 2012-03-23 17:21 ` Bruno Silva 2012-03-23 18:02 ` Jim Rees @ 2012-04-04 0:20 ` Lev Solomonov 1 sibling, 0 replies; 9+ messages in thread From: Lev Solomonov @ 2012-04-04 0:20 UTC (permalink / raw) To: Bruno Silva; +Cc: linux-nfs sorry for the delay. On Fri, Mar 23, 2012 at 19:21, Bruno Silva <bs@cin.ufpe.br> wrote: > Fist of all, thanks for your replies. > > I still have a few questions: > > 1. What file system do you suggest? Why ext4 is not recommended? i'm not aware of a single FS that will work perfectly out-of-the-box with pNFS blocks layout on linux. in case of ext4/btrfs, there's tension between the newer, advanced FS features and pNFS block layout underlying FS requirements. > 2. How know whether the pNFS is correctly set? Follows the blkmapd output: IIRC neither client nor MDS nor DS emit blatantly bogus error messages, so if you see any errors (e.g. 'failed' below) - that's a bad sign. once you've cleared those, make sure that MDS exports pNFS, e.g.: grep pnfs /var/lib/nfs/etab or similar shows your expected exports, and that client mounts those as v4.1, e.g.: grep minorversion=1 /proc/mounts or similar shows the expected mounts. then use the mounts on client while sniffing the traffic, you'll want to see the file data move around over iSCSI client<->DS rather than over NFS client<->MDS. <snip> > ***** POINT 1 ******* I connected with four data servers using > the command iscsiadm-m discovery-t-p SendTargets <IP DATE OF > server>-l. But, note the output blkmapd "blkmapd: process_deviceinfo: > 2 vols". I believe that should be listed four volumes, not two; nope, the "2 vols" is just the single device info: simple+slice, and you appear to have several of such devices. take a peek around GETDEVICEINFO in RFC 5661 + 5663. regardless, DM barfed. what export topology are you aiming for with those 4 volumes? <snip> > Follows the output of command of "grep nfs /proc/self/mountstats" > > [bruno@fedora ~]$ grep nfs /proc/self/mountstats > device sunrpc mounted on /var/lib/nfs/rpc_pipefs with fstype rpc_pipefs > device 192.168.0.203:/ mounted on /home/bruno/shared with fstype nfs4 > statvers=1.0 > nfsv4: bm0=0xfdffbfff,bm1=0x40f9be3e,acl=0x3,sessions,pnfs=not configured > RPC iostats version: 1.0 p/v: 100003/4 (nfs) the "pnfs=not configured" is bad news (no active pNFS layout driver), you'll want that to say "pnfs=LAYOUT_BLOCK_VOLUME". > My question is: How can i add other data servers in the structure and did you manage to successfully export a single DS iSCSI LUN through MDS over pNFS? > how do I find out where the problem is? DM is likely to have left something in /var/log/messages (or wherever) on failed reloads. any 'device-mapper' entries there? solo. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-04-04 0:21 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-15 20:23 About pNFS installation process Bruno Silva 2012-03-16 0:16 ` Jim Rees 2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch 2012-03-16 20:35 ` J. Bruce Fields 2012-03-16 20:58 ` Myklebust, Trond 2012-03-18 10:51 ` About pNFS installation process Lev Solomonov 2012-03-23 17:21 ` Bruno Silva 2012-03-23 18:02 ` Jim Rees 2012-04-04 0:20 ` Lev Solomonov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox