From: "J. Bruce Fields" <bfields@fieldses.org>
To: Benjamin ESTRABAUD <be@mpstor.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"bc@mpstor.com" <bc@mpstor.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: Issue running buffered writes to a pNFS (NFS 4.1 backed by SAN) filesystem.
Date: Wed, 20 May 2015 15:40:48 -0400 [thread overview]
Message-ID: <20150520194048.GA20221@fieldses.org> (raw)
In-Reply-To: <555CB5EE.2@mpstor.com>
On Wed, May 20, 2015 at 05:27:26PM +0100, Benjamin ESTRABAUD wrote:
> On 15/05/15 20:20, J. Bruce Fields wrote:
> >On Fri, May 15, 2015 at 10:44:13AM -0700, Benjamin ESTRABAUD wrote:
> >>I've been using pNFS for a while since recently, and I am very pleased
> >>with its overall stability and performance.
> >>
> >>A pNFS MDS server was setup with SAN storage in the backend (a RAID0
> >>built ontop of multiple LUNs). Clients were given access to the same
> >>RAID0 using the same LUNs on the same SAN.
> >>
> >>However, I've been noticing a small issue with it that prevents me
> >>from using pNFS to its full potential: If I run non-direct IOs (for
> >>instance "dd" without the "oflag=direct" option), IOs run excessively
> >>slowly (3-4MB/sec) and the dd process hangs until forcefully
> >>terminated.
> >
> Sorry for the late reply, I was unavailable for the past few days. I
> had time to look at the problem further.
>
> >And that's reproduceable every time?
> >
Thanks for the detailed report. Quick questions:
> It is, and here is what is happening more in details:
>
> on the client, "/mnt/pnfs1" is the "pNFS" mount point. We use NFS v 4.1.
>
> * Running dd with bs=512 and no "direct" set on the client:
>
> dd if=/dev/zero of=/mnt/pnfs1/testfile bs=512 count=100000000
>
> => Here we get variable performance, dd's average is 100MB/sec, and
> we can see all the IOs going to the SAN block device. nfsstat
> confirms that no IOs are going through the NFS server (no "writes"
> are recorded, only "layoutcommit". Performance is maybe low but at
> this block size we don't really care.
>
> * Running dd with bs=512 and "direct" setL
>
> dd if=/dev/zero of=/mnt/pnfs1/testfile bs=512 count=100000000 oflag=direct
>
> => Here, funnily enough, all the IOs are sent over NFS. The
> "nfsstat" command shows writes increasing, the SAN block device
> activity on the client is idle. The performance is about 13MB/sec,
> but again expected with such a small IO size. The only unexpected is
> that small 512bytes IOs are not going through the iSCSI SAN.
>
> * Running dd with bs=1M and no "direct" set on the client:
>
> dd if=/dev/zero of=/mnt/pnfs1/testfile bs=1M count=100000000
>
> => Here the IOs "work" and go through the SAN (no "write" counter
> increasing in "nfsstat" and I can see disk statistics on the block
> device on the client increasing). However the speed at which the IOs
> go through is really slow (the actual speed recorded on the SAN
> device fluctuates a lot, from 3MB/sec to a lot more). Overall dd is
> not really happy and "Ctrl-C"ing it takes a long time, and in the
> last try actually caused a kernel panic (see
> http://imgur.com/YpXjvQ3 sorry about the picture format, did not
> have the dmesg output capturing and had access to the VGA only).
> When "dd" finally comes around and terminates, the average speed is
> 200MB/sec.
> Again the SAN block device shows IOs being submitted and "nfsstat"
> shows no "writes" but a few "layoutcommits", showing that the writes
> are not going through the "regular" NFS server.
>
>
> * Running dd with bs=1M and no "direct" set on the client:
I think you meant to leave out the "no" there?
> dd if=/dev/zero of=/mnt/pnfs1/testfile bs=1M count=100000000 oflag=direct
>
> => Here the IOs work much faster (almost twice as fast as with
> "direct" set, or 350+MB/sec) and dd is much more responsive (can
> "Ctrl-C" it almost instantly). Again the SAN block device shows IOs
> being submitted and "nfsstat" shows no "writes" but a few
> "layoutcommits", showing that the writes are not going through the
> "regular" NFS server.
>
> This shows that somehow running with "oflag=direct" causes
> unstability and lower performance, at least on this version.
And I think you mean "running without", not "running with"?
Assuming those are just typos, unless I'm missing something.
--b.
>
> Both clients are running Linux 4.1.0-rc2 on CentOS 7.0 and the
> server is running Linux 4.1.0-rc2 on CentOS 7.1.
>
> >Can you get network captures and figure out (for example), whether the
> >slow writes are going over iSCSI or NFS, and if they're returning errors
> >in either case?
> >
> I'm going to do that now (try and locate errors). However "nfsstat"
> does indicate that slower writes are going through iSCSI.
>
> >>The same behaviour can be observed laying out an IO file
> >>with FIO for instance, or using some applications which do not use the
> >>ODIRECT flag. When using direct IO I can observe lots of iSCSI
> >>traffic, at extremely good performance (same performance as the SAN
> >>gets on "raw" block devices).
> >>
> >>All the systems are running CentOS 7.0 with a custom kernel 4.1-rc2
> >>(pNFS enabled) apart from the storage nodes which are running a custom
> >>minimal Linux distro with Kernel 3.18.
> >>
> >>The SAN is all 40G Mellanox Ethernet, and we are not using the OFED
> >>driver anywhere (Everything is only "standard" upstream Linux).
> >
> >What's the non-SAN network (that the NFS traffic goes over)?
> >
> The NFS traffic also goes through the same SAN actually, both the
> iSCSI LUNs and the NFS server are accessible over the same 40G/sec
> Ethernet fabric.
>
> Regards,
> Ben.
>
> >--b.
> >
> >>
> >>Would anybody have any ideas where this issue could be coming from?
> >>
> >>Regards, Ben - MPSTOR.-- To unsubscribe from this list: send the line
> >>"unsubscribe linux-nfs" in the body of a message to
> >>majordomo@vger.kernel.org More majordomo info at
> >>http://vger.kernel.org/majordomo-info.html
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-05-20 19:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-15 17:44 Issue running buffered writes to a pNFS (NFS 4.1 backed by SAN) filesystem Benjamin ESTRABAUD
2015-05-15 19:20 ` J. Bruce Fields
2015-05-20 16:27 ` Benjamin ESTRABAUD
2015-05-20 18:31 ` Benjamin ESTRABAUD
2015-05-25 15:13 ` Christoph Hellwig
2015-05-26 16:43 ` Benjamin ESTRABAUD
2015-05-20 19:40 ` J. Bruce Fields [this message]
2015-05-21 10:09 ` Benjamin ESTRABAUD
2015-05-17 16:38 ` Christoph Hellwig
2015-05-20 16:30 ` Benjamin ESTRABAUD
2015-05-25 15:14 ` Christoph Hellwig
2015-05-26 16:44 ` Benjamin ESTRABAUD
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150520194048.GA20221@fieldses.org \
--to=bfields@fieldses.org \
--cc=bc@mpstor.com \
--cc=be@mpstor.com \
--cc=hch@infradead.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).