* About pNFS installation process.
@ 2012-03-15 20:23 Bruno Silva
2012-03-16 0:16 ` Jim Rees
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Bruno Silva @ 2012-03-15 20:23 UTC (permalink / raw)
To: linux-nfs
Hello,
I am PhD student and I intend to conduct performance experiments
adopting pNFS. First of all, i need to build the pNFS Block Layout
environment. I apologize for the long mail, but i need to explain all
the steps conducted to build the environment. I followed the steps
presented in http://wiki.linux-nfs.org/wiki/index.php/PNFS_Block_Server_Setup_Instructions.
Steps.
(1) Building and install the kernel. And these configurations on .config file
i adopted this code: git clone git://linux-nfs.org/~bhalevy/linux-pnfs.git
CONFIG_NFSD=m
CONFIG_NFSD_V4=y
CONFIG_PNFSD=y
# CONFIG_PNFSD_LOCAL_EXPORT is not set
CONFIG_PNFSD_BLOCK=y
(2) Building the nfsutils and utils/blkmapd
git clone git://linux-nfs.org/~bhalevy/pnfs-nfs-utils.git
(3) Export the file system.
-----------------------------------------------------------------
For the block access to work properly the disks must have a signature.
Partitioned the disks using "parted". Disks partitioned with "fdisk"
doesn't have the signatures.
I have followed the below mentioned steps.
parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart 1 <Provide start and end of the partetions>
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 53.7GB 53.7GB ext3 1 msftres
I have tested with ext4 file system, create ext4 file system with 4K block size.
# mkfs.ext4 -b 4096 /dev/sdb1
-----------------------------------------------------------------
The steps 1, 2 and 3 i did to all machines (data servers, metadata
server and client).
I adopted the export option in metadata server exports file.
/mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs)
And run the following script to start the server.
#/bin/bash
# UMOUNT /mnt
umount /mnt
#start the service
service tgtd restart
sleep 8
# Create iSCSI target
tgtadm --lld iscsi --op new --mode target --tid 1 -T
iqn.1992-05.com.emc:openblock
# Expose LUN as iSCSI target
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1
--backing-store /dev/sdb
# Allow acces of all initiator
tgtadm --lld iscsi --mode target --op bind --tid 1 --initiator-address ALL
# show all the details
tgtadm --lld iscsi --op show --mode target
# mount the partetion
mount /dev/sdb1 /mnt
sleep 3
# start the nfs server
service nfs restart
sleep 3
# start the deamon
cd <CTL_SRC>/ctl/
./ctl -u &
If the data server and metadata server have been run on the same
machine everything works normally. My question is how do I add other
pNFS data servers in the environment. I know that is related to
creating iscsi targets. But how the data servers are linked with the
metadata server. There is some configuration file to inform the
metadata server like in spNFS? How it works?
Thanks in advance.
--
Bruno Silva
Computer Engineer
Modcs Group
---------------------------------------------------------------------
Facebook goo.gl/QHaZx
Twitter goo.gl/yk4jf
Google+ goo.gl/xIbgk
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process.
2012-03-15 20:23 About pNFS installation process Bruno Silva
@ 2012-03-16 0:16 ` Jim Rees
2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch
2012-03-18 10:51 ` About pNFS installation process Lev Solomonov
2 siblings, 0 replies; 9+ messages in thread
From: Jim Rees @ 2012-03-16 0:16 UTC (permalink / raw)
To: Bruno Silva; +Cc: linux-nfs
Bruno Silva wrote:
The steps 1, 2 and 3 i did to all machines (data servers, metadata
server and client).
You only need blkmapd on the client. Client setup instructions are at
http://wiki.linux-nfs.org/wiki/index.php/Fedora_pNFS_Client_Setup . I added
a link to that page.
^ permalink raw reply [flat|nested] 9+ messages in thread
* About Direct I/O
2012-03-15 20:23 About pNFS installation process Bruno Silva
2012-03-16 0:16 ` Jim Rees
@ 2012-03-16 18:14 ` Alexandre Depoutovitch
2012-03-16 20:35 ` J. Bruce Fields
2012-03-18 10:51 ` About pNFS installation process Lev Solomonov
2 siblings, 1 reply; 9+ messages in thread
From: Alexandre Depoutovitch @ 2012-03-16 18:14 UTC (permalink / raw)
To: linux-nfs
Hello,
I am trying to do random sector aligned writes to an NFS mounted disk. The
performance is order of magnitude worse than 4K (file system block size)
aligned I/O.
The reason is that NFS demon (Linux kernel 2.6.32) on the server side
always does buffered I/O, which behaves poorly for block unaligned
requests.
Is there a way to tell NFS daemon to use direct I/O?
If not, is it an implementation limitation or there is a fundamental
problem with using direct I/O in NFS server?
Thank you,
Alex
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About Direct I/O
2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch
@ 2012-03-16 20:35 ` J. Bruce Fields
2012-03-16 20:58 ` Myklebust, Trond
0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2012-03-16 20:35 UTC (permalink / raw)
To: Alexandre Depoutovitch; +Cc: linux-nfs
On Fri, Mar 16, 2012 at 11:14:04AM -0700, Alexandre Depoutovitch wrote:
> Hello,
> I am trying to do random sector aligned writes to an NFS mounted disk. The
> performance is order of magnitude worse than 4K (file system block size)
> aligned I/O.
> The reason is that NFS demon (Linux kernel 2.6.32) on the server side
> always does buffered I/O, which behaves poorly for block unaligned
> requests.
> Is there a way to tell NFS daemon to use direct I/O?
No.
> If not, is it an implementation limitation or there is a fundamental
> problem with using direct I/O in NFS server?
I'm shamefully ignorant of Direct IO....
If we supported Direct IO, are there heuristics that would let the
server figure out on its own when it helped and when it didn't? Or
would the administrator be stuck trying to figure that out?
Is Direct IO possible from kernel buffers these days? Are there
alignment restrictions?
--b.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About Direct I/O
2012-03-16 20:35 ` J. Bruce Fields
@ 2012-03-16 20:58 ` Myklebust, Trond
0 siblings, 0 replies; 9+ messages in thread
From: Myklebust, Trond @ 2012-03-16 20:58 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Alexandre Depoutovitch, linux-nfs@vger.kernel.org
T24gRnJpLCAyMDEyLTAzLTE2IGF0IDE2OjM1IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIEZyaSwgTWFyIDE2LCAyMDEyIGF0IDExOjE0OjA0QU0gLTA3MDAsIEFsZXhhbmRyZSBE
ZXBvdXRvdml0Y2ggd3JvdGU6DQo+ID4gSGVsbG8sDQo+ID4gSSBhbSB0cnlpbmcgdG8gZG8gcmFu
ZG9tIHNlY3RvciBhbGlnbmVkIHdyaXRlcyB0byBhbiBORlMgbW91bnRlZCBkaXNrLiBUaGUNCj4g
PiBwZXJmb3JtYW5jZSBpcyBvcmRlciBvZiBtYWduaXR1ZGUgd29yc2UgdGhhbiA0SyAoZmlsZSBz
eXN0ZW0gYmxvY2sgc2l6ZSkNCj4gPiBhbGlnbmVkIEkvTy4NCj4gPiBUaGUgcmVhc29uIGlzIHRo
YXQgTkZTIGRlbW9uIChMaW51eCBrZXJuZWwgMi42LjMyKSBvbiB0aGUgc2VydmVyIHNpZGUNCj4g
PiBhbHdheXMgZG9lcyBidWZmZXJlZCBJL08sIHdoaWNoIGJlaGF2ZXMgcG9vcmx5IGZvciBibG9j
ayB1bmFsaWduZWQNCj4gPiByZXF1ZXN0cy4NCj4gPiBJcyB0aGVyZSBhIHdheSB0byB0ZWxsIE5G
UyBkYWVtb24gdG8gdXNlIGRpcmVjdCBJL08/IA0KPiANCj4gTm8uDQo+IA0KPiA+IElmIG5vdCwg
aXMgaXQgYW4gaW1wbGVtZW50YXRpb24gbGltaXRhdGlvbiBvciB0aGVyZSBpcyBhIGZ1bmRhbWVu
dGFsDQo+ID4gcHJvYmxlbSB3aXRoIHVzaW5nIGRpcmVjdCBJL08gaW4gTkZTIHNlcnZlcj8NCj4g
DQo+IEknbSBzaGFtZWZ1bGx5IGlnbm9yYW50IG9mIERpcmVjdCBJTy4uLi4NCj4gDQo+IElmIHdl
IHN1cHBvcnRlZCBEaXJlY3QgSU8sIGFyZSB0aGVyZSBoZXVyaXN0aWNzIHRoYXQgd291bGQgbGV0
IHRoZQ0KPiBzZXJ2ZXIgZmlndXJlIG91dCBvbiBpdHMgb3duIHdoZW4gaXQgaGVscGVkIGFuZCB3
aGVuIGl0IGRpZG4ndD8gIE9yDQo+IHdvdWxkIHRoZSBhZG1pbmlzdHJhdG9yIGJlIHN0dWNrIHRy
eWluZyB0byBmaWd1cmUgdGhhdCBvdXQ/DQo+IA0KPiBJcyBEaXJlY3QgSU8gcG9zc2libGUgZnJv
bSBrZXJuZWwgYnVmZmVycyB0aGVzZSBkYXlzPyAgQXJlIHRoZXJlDQo+IGFsaWdubWVudCByZXN0
cmljdGlvbnM/DQoNCldvcmsgaXMgb24gaXRzIHdheSB0byBhbGxvdyBkaXJlY3QgaS9vIGZyb20g
a2VybmVsIGJ1ZmZlcnMsIGJ1dCBpdCBpcw0Kbm90IHBvc3NpYmxlIHdpdGggZXhpc3Rpbmcga2Vy
bmVscy4NCg0KSGF2ZSBwYXRpZW5jZS4NCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMg
Y2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0K
d3d3Lm5ldGFwcC5jb20NCg0K
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process.
2012-03-15 20:23 About pNFS installation process Bruno Silva
2012-03-16 0:16 ` Jim Rees
2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch
@ 2012-03-18 10:51 ` Lev Solomonov
2012-03-23 17:21 ` Bruno Silva
2 siblings, 1 reply; 9+ messages in thread
From: Lev Solomonov @ 2012-03-18 10:51 UTC (permalink / raw)
To: Bruno Silva; +Cc: linux-nfs
On Thu, Mar 15, 2012 at 22:23, Bruno Silva <bs@cin.ufpe.br> wrote:
<snip>
> 1 17.4kB 53.7GB 53.7GB ext3 1 msftres
> I have tested with ext4 file system, create ext4 file system with
> 4K block size.
>
> # mkfs.ext4 -b 4096 /dev/sdb1
> -----------------------------------------------------------------
i suspect you might encounter some issues with ext4, once you get past
the initial setup problems.
> The steps 1, 2 and 3 i did to all machines (data servers, metadata
> server and client).
assuming iSCSI as the underlying storage model, generally speaking:
* blkmapd and blocklayoutdriver are client-specific (as jim mentioned).
* ctl daemon is MDS-specific.
* iSCSI target is DS-specific.
* both MDS and client will need iSCSI initiator.
you'll need to discover and login to the iSCSI target on the DS from
both the MDS and the client for proper pNFS operation.
security often gets in the way, mind the ACLs on the iSCSI targets and
firewall settings on all three (client/MDS/DS) and between them.
> I adopted the export option in metadata server exports file.
> /mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs)
<snip>
> If the data server and metadata server have been run on the same
> machine everything works normally.
see above. once you set everything up make sure you're actually running
in pNFS mode, i.e. that client sends/receives file data directly from
the DS over iSCSI, not as a fallback through MDS.
solo.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process.
2012-03-18 10:51 ` About pNFS installation process Lev Solomonov
@ 2012-03-23 17:21 ` Bruno Silva
2012-03-23 18:02 ` Jim Rees
2012-04-04 0:20 ` Lev Solomonov
0 siblings, 2 replies; 9+ messages in thread
From: Bruno Silva @ 2012-03-23 17:21 UTC (permalink / raw)
To: Lev Solomonov; +Cc: linux-nfs
Fist of all, thanks for your replies.
I still have a few questions:
1. What file system do you suggest? Why ext4 is not recommended?
2. How know whether the pNFS is correctly set? Follows the blkmapd output:
[bruno@fedora blkmapd]$ sudo ./blkmapd -f
blkmapd: process_deviceinfo: 2 vols ***** POINT 1 *******
blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string
%�c,�kkG�+ΉJ�2
blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568
blkmapd: decode_blk_volume: simple 0
blkmapd: decode_blk_volume: slice 1
device-mapper: reload ioctl failed: Invalid argument ***** POINT 2 *******
blkmapd: Create device pnfs_vol_0 failed
blkmapd: dm_device_create: 1 pnfs_vol_0 0:0
blkmapd: process_deviceinfo: 2 vols
blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string
%�c,�kkG�+ΉJ�2
blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568
blkmapd: decode_blk_volume: simple 0
blkmapd: decode_blk_volume: slice 1
device-mapper: reload ioctl failed: Invalid argument
blkmapd: Create device pnfs_vol_1 failed
blkmapd: dm_device_create: 1 pnfs_vol_1 0:0
blkmapd: process_deviceinfo: 2 vols
blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string
%�c,�kkG�+ΉJ�2
blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568
blkmapd: decode_blk_volume: simple 0
blkmapd: decode_blk_volume: slice 1
device-mapper: reload ioctl failed: Invalid argument
blkmapd: Create device pnfs_vol_2 failed
blkmapd: dm_device_create: 1 pnfs_vol_2 0:0
***** POINT 1 ******* I connected with four data servers using
the command iscsiadm-m discovery-t-p SendTargets <IP DATE OF
server>-l. But, note the output blkmapd "blkmapd: process_deviceinfo:
2 vols". I believe that should be listed four volumes, not two;
***** POINT 2 ******* What means this message? "device-mapper:
reload ioctl failed: Invalid argument"
Follows the output of command of "grep nfs /proc/self/mountstats"
[bruno@fedora ~]$ grep nfs /proc/self/mountstats
device sunrpc mounted on /var/lib/nfs/rpc_pipefs with fstype rpc_pipefs
device 192.168.0.203:/ mounted on /home/bruno/shared with fstype nfs4
statvers=1.0
nfsv4: bm0=0xfdffbfff,bm1=0x40f9be3e,acl=0x3,sessions,pnfs=not configured
RPC iostats version: 1.0 p/v: 100003/4 (nfs)
My question is: How can i add other data servers in the structure and
how do I find out where the problem is?
Thanks in advance.
[]'s
Em 18 de março de 2012 07:51, Lev Solomonov <solo@tonian.com> escreveu:
>
> On Thu, Mar 15, 2012 at 22:23, Bruno Silva <bs@cin.ufpe.br> wrote:
> <snip>
> > 1 17.4kB 53.7GB 53.7GB ext3 1 msftres
> > I have tested with ext4 file system, create ext4 file system with
> > 4K block size.
> >
> > # mkfs.ext4 -b 4096 /dev/sdb1
> > -----------------------------------------------------------------
>
> i suspect you might encounter some issues with ext4, once you get past
> the initial setup problems.
>
> > The steps 1, 2 and 3 i did to all machines (data servers, metadata
> > server and client).
>
> assuming iSCSI as the underlying storage model, generally speaking:
> * blkmapd and blocklayoutdriver are client-specific (as jim mentioned).
> * ctl daemon is MDS-specific.
> * iSCSI target is DS-specific.
> * both MDS and client will need iSCSI initiator.
>
> you'll need to discover and login to the iSCSI target on the DS from
> both the MDS and the client for proper pNFS operation.
>
> security often gets in the way, mind the ACLs on the iSCSI targets and
> firewall settings on all three (client/MDS/DS) and between them.
>
> > I adopted the export option in metadata server exports file.
> > /mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs)
> <snip>
> > If the data server and metadata server have been run on the same
> > machine everything works normally.
>
> see above. once you set everything up make sure you're actually running
> in pNFS mode, i.e. that client sends/receives file data directly from
> the DS over iSCSI, not as a fallback through MDS.
>
> solo.
--
Bruno Silva
Computer Engineer
Modcs Group
---------------------------------------------------------------------
Facebook goo.gl/QHaZx
Twitter goo.gl/yk4jf
Google+ goo.gl/xIbgk
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process.
2012-03-23 17:21 ` Bruno Silva
@ 2012-03-23 18:02 ` Jim Rees
2012-04-04 0:20 ` Lev Solomonov
1 sibling, 0 replies; 9+ messages in thread
From: Jim Rees @ 2012-03-23 18:02 UTC (permalink / raw)
To: Bruno Silva; +Cc: Lev Solomonov, linux-nfs
Blkmapd is finding your iscsi devices, but can't create the mapped device
for some reason. I suspect the geometry is wrong but you'd have to do some
debugging to figure out exactly why.
I guess we should fix or get rid of pretty_sig.
Date: Thu, 2 Dec 2010 09:41:26 -0500
From: Jim Rees <rees@umich.edu>
Subject: Re: [PATCH 4/5] various minor cleanups
To: Benny Halevy <bhalevy@panasas.com>
Cc: linux-nfs@vger.kernel.org, peter honeyman <honey@citi.umich.edu>
...
I am glad you are paying attention! I am aware of the shortcomings of
pretty_sig(). In addition to the problems you noted, it also assumes that a
signature over 8 bytes long is representable as a text string, which is not
guaranteed. The code it replaced was worse.
I put this in because for debugging I need to be able to follow a signature
all the way from my EMC server to the devmapper. pretty_sig() simply prints
the signature in a way that I can match it up with the signature on the
server.
I don't want to spend a lot of time on this, but I also am uneasy leaving
EMC-specific code in nfs-utils, especially since it can blow up if you use
it against a non-EMC server. My inclination is to remove this debugging
code when I no longer need it. I guess at the very least I should put in a
comment. I am open to suggestions.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: About pNFS installation process.
2012-03-23 17:21 ` Bruno Silva
2012-03-23 18:02 ` Jim Rees
@ 2012-04-04 0:20 ` Lev Solomonov
1 sibling, 0 replies; 9+ messages in thread
From: Lev Solomonov @ 2012-04-04 0:20 UTC (permalink / raw)
To: Bruno Silva; +Cc: linux-nfs
sorry for the delay.
On Fri, Mar 23, 2012 at 19:21, Bruno Silva <bs@cin.ufpe.br> wrote:
> Fist of all, thanks for your replies.
>
> I still have a few questions:
>
> 1. What file system do you suggest? Why ext4 is not recommended?
i'm not aware of a single FS that will work perfectly out-of-the-box
with pNFS blocks layout on linux. in case of ext4/btrfs, there's tension
between the newer, advanced FS features and pNFS block layout underlying
FS requirements.
> 2. How know whether the pNFS is correctly set? Follows the blkmapd output:
IIRC neither client nor MDS nor DS emit blatantly bogus error messages,
so if you see any errors (e.g. 'failed' below) - that's a bad sign.
once you've cleared those, make sure that MDS exports pNFS, e.g.:
grep pnfs /var/lib/nfs/etab
or similar shows your expected exports, and that client mounts those as
v4.1, e.g.:
grep minorversion=1 /proc/mounts
or similar shows the expected mounts. then use the mounts on client
while sniffing the traffic, you'll want to see the file data move
around over iSCSI client<->DS rather than over NFS client<->MDS.
<snip>
> ***** POINT 1 ******* I connected with four data servers using
> the command iscsiadm-m discovery-t-p SendTargets <IP DATE OF
> server>-l. But, note the output blkmapd "blkmapd: process_deviceinfo:
> 2 vols". I believe that should be listed four volumes, not two;
nope, the "2 vols" is just the single device info: simple+slice, and you
appear to have several of such devices. take a peek around GETDEVICEINFO
in RFC 5661 + 5663. regardless, DM barfed.
what export topology are you aiming for with those 4 volumes?
<snip>
> Follows the output of command of "grep nfs /proc/self/mountstats"
>
> [bruno@fedora ~]$ grep nfs /proc/self/mountstats
> device sunrpc mounted on /var/lib/nfs/rpc_pipefs with fstype rpc_pipefs
> device 192.168.0.203:/ mounted on /home/bruno/shared with fstype nfs4
> statvers=1.0
> nfsv4: bm0=0xfdffbfff,bm1=0x40f9be3e,acl=0x3,sessions,pnfs=not configured
> RPC iostats version: 1.0 p/v: 100003/4 (nfs)
the "pnfs=not configured" is bad news (no active pNFS layout driver),
you'll want that to say "pnfs=LAYOUT_BLOCK_VOLUME".
> My question is: How can i add other data servers in the structure and
did you manage to successfully export a single DS iSCSI LUN through MDS
over pNFS?
> how do I find out where the problem is?
DM is likely to have left something in /var/log/messages (or wherever)
on failed reloads. any 'device-mapper' entries there?
solo.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-04-04 0:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-15 20:23 About pNFS installation process Bruno Silva
2012-03-16 0:16 ` Jim Rees
2012-03-16 18:14 ` About Direct I/O Alexandre Depoutovitch
2012-03-16 20:35 ` J. Bruce Fields
2012-03-16 20:58 ` Myklebust, Trond
2012-03-18 10:51 ` About pNFS installation process Lev Solomonov
2012-03-23 17:21 ` Bruno Silva
2012-03-23 18:02 ` Jim Rees
2012-04-04 0:20 ` Lev Solomonov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.