* [linux-lvm] Snit fight beetween LVM, MD and NFSD.
@ 2004-05-06 1:43 Dr. Greg Wettstein
2004-05-11 11:10 ` Heinz Mauelshagen
0 siblings, 1 reply; 2+ messages in thread
From: Dr. Greg Wettstein @ 2004-05-06 1:43 UTC (permalink / raw)
To: neilb, linux-LVM; +Cc: linux-kernel
Good evening, hope the day is going well for everyone.
We just spent the last 24 hours dealing with a rather strange
situation on one of our big file servers. I wanted to summarize what
happened to find out if there is an issue or whether this is a "don't
do that type of thing situation".
The server in question is a dual 1.2Ghz PIII with 1 gigabyte of RAM
running 2.4.26 and providing NFS services to around 100 Linux clients
(IA32/IA64). Storage is implemented with a 8x160 Gbyte MD based RAID5
array using a 7508 3-ware controller. LVM is used to carve the MD
device into 5 logical volumes supporting ext3 filesystems which serve
as the NFS export sources. LVM is up to date with whatever patches
were relevant from the 1.0.8 distribution.
Clients are mounted with the following options:
tcp,nfsvers=3,hard,intr,rsize=8192,wsize=8192
Last week one of the drives in the RAID5 stripe failed. In order to
avoid a double fault situation we migrated all the physical extents
from the RAID5 based PV to a FC based PV on the SAN. SAN access is
provided through a Qlogic 2300 with firmware 3.02.16 using the 6.06.10
driver from Qlogic.
Migration to the FC based physical volume was uneventful. The faulty
drive was replaced this week and the extents were migrated back from
the FC based physical volume on an LV by LV basis. All of this went
fine until the final 150 Gbyte LV was migrated.
Early into the migration the load on the box went high (10-12). Both
the pvmove process and the NFSD processes were persistently stuck for
long periods of time in D state. The pvmove process would stick in
get_active_stripe while the NFSD processes were stuck in
log_wait_commit.
I/O patterns were very similar for NFS and the pvmove process. NFS
clients would hang for 20-30 seconds followed by a burst of I/O. On
the FC controllers we would see a burst of I/O from the pvmove process
followed by a 20-30 seconds of no activity. Interactive performance
on the fileserver was good.
We unmounted almost all of the NFS clients and reduced the situation
to a case where we had 5-7 clients doing modest I/O, mostly listing
directories and other common interactive functions. Load remained
high with the NFSD processes oscillating in and out of D state with
the pvmove process.
We then unmounted all the clients that were accessing the filesystem
supported by the LV which was having its physical extents migrated.
Load patterns remained the same. We then unmounted the physical
filesystem and the load still remained high.
As a final test we stopped NFS services. This caused the pvmove
process to run almost continuously with only occasional D state waits.
We confirmed this by observing almost continuous traffic on the FC
controller. When the pvmove completed NFS services were restarted,
all clients were remounted and the server is running with 80-90 client
connections with modest load.
So it would seem that the NFSD processes and the pvmove process were
involved in some type of resource contention problem. I would write
this off to "LVM doesn't work well for NFS exported filesystems"
except for the fact that we had successfully transferred 250+
gigabytes of filesystems off the box and back onto the box without
event before this incident.
I would be interested in any thoughts that anyone may have. We can
setup a testbed to try and re-create the problem if there are
additional diagnostics that would be helpful in figuring out what was
going on.
Best wishes for a productive end of the week.
As always,
Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
4206 N. 19th Ave. Specializing in information infra-structure
Fargo, ND 58102 development.
PH: 701-281-1686
FAX: 701-281-3949 EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"There are two ways of constructing a software design. One is to make
it so simple that there are obviously no deficiencies; the other is to
make it so complicated that there are no obvious deficiencies. The
first method is far more difficult."
-- C. A. R. Hoare
The Emperor's Old Clothes
CACM February 1981
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [linux-lvm] Snit fight beetween LVM, MD and NFSD.
2004-05-06 1:43 [linux-lvm] Snit fight beetween LVM, MD and NFSD Dr. Greg Wettstein
@ 2004-05-11 11:10 ` Heinz Mauelshagen
0 siblings, 0 replies; 2+ messages in thread
From: Heinz Mauelshagen @ 2004-05-11 11:10 UTC (permalink / raw)
To: greg, LVM general discussion and development; +Cc: hjm
Greg,
this looks very much like the resource contention problem
NFSD and pvmove had (as you assumed below) causing a severe slowdown
of pvmove.
With LVM2/device-mapper the problem is likely to be much less visible,
because of the use of temprorary mirrors for data relocation and background
copies used for mirror resynchronization.
IOW: I expect LVM2/device-mapper to be smoother ITR but of course
not free of resource contention problems.
Regards,
Heinz -- The LVM Guy --
On Wed, May 05, 2004 at 08:43:45PM -0500, Dr. Greg Wettstein wrote:
> Good evening, hope the day is going well for everyone.
>
> We just spent the last 24 hours dealing with a rather strange
> situation on one of our big file servers. I wanted to summarize what
> happened to find out if there is an issue or whether this is a "don't
> do that type of thing situation".
>
> The server in question is a dual 1.2Ghz PIII with 1 gigabyte of RAM
> running 2.4.26 and providing NFS services to around 100 Linux clients
> (IA32/IA64). Storage is implemented with a 8x160 Gbyte MD based RAID5
> array using a 7508 3-ware controller. LVM is used to carve the MD
> device into 5 logical volumes supporting ext3 filesystems which serve
> as the NFS export sources. LVM is up to date with whatever patches
> were relevant from the 1.0.8 distribution.
>
> Clients are mounted with the following options:
>
> tcp,nfsvers=3,hard,intr,rsize=8192,wsize=8192
>
> Last week one of the drives in the RAID5 stripe failed. In order to
> avoid a double fault situation we migrated all the physical extents
> from the RAID5 based PV to a FC based PV on the SAN. SAN access is
> provided through a Qlogic 2300 with firmware 3.02.16 using the 6.06.10
> driver from Qlogic.
>
> Migration to the FC based physical volume was uneventful. The faulty
> drive was replaced this week and the extents were migrated back from
> the FC based physical volume on an LV by LV basis. All of this went
> fine until the final 150 Gbyte LV was migrated.
>
> Early into the migration the load on the box went high (10-12). Both
> the pvmove process and the NFSD processes were persistently stuck for
> long periods of time in D state. The pvmove process would stick in
> get_active_stripe while the NFSD processes were stuck in
> log_wait_commit.
>
> I/O patterns were very similar for NFS and the pvmove process. NFS
> clients would hang for 20-30 seconds followed by a burst of I/O. On
> the FC controllers we would see a burst of I/O from the pvmove process
> followed by a 20-30 seconds of no activity. Interactive performance
> on the fileserver was good.
>
> We unmounted almost all of the NFS clients and reduced the situation
> to a case where we had 5-7 clients doing modest I/O, mostly listing
> directories and other common interactive functions. Load remained
> high with the NFSD processes oscillating in and out of D state with
> the pvmove process.
>
> We then unmounted all the clients that were accessing the filesystem
> supported by the LV which was having its physical extents migrated.
> Load patterns remained the same. We then unmounted the physical
> filesystem and the load still remained high.
>
> As a final test we stopped NFS services. This caused the pvmove
> process to run almost continuously with only occasional D state waits.
> We confirmed this by observing almost continuous traffic on the FC
> controller. When the pvmove completed NFS services were restarted,
> all clients were remounted and the server is running with 80-90 client
> connections with modest load.
>
> So it would seem that the NFSD processes and the pvmove process were
> involved in some type of resource contention problem. I would write
> this off to "LVM doesn't work well for NFS exported filesystems"
> except for the fact that we had successfully transferred 250+
> gigabytes of filesystems off the box and back onto the box without
> event before this incident.
>
> I would be interested in any thoughts that anyone may have. We can
> setup a testbed to try and re-create the problem if there are
> additional diagnostics that would be helpful in figuring out what was
> going on.
>
> Best wishes for a productive end of the week.
>
> As always,
> Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
> 4206 N. 19th Ave. Specializing in information infra-structure
> Fargo, ND 58102 development.
> PH: 701-281-1686
> FAX: 701-281-3949 EMAIL: greg@enjellic.com
> ------------------------------------------------------------------------------
> "There are two ways of constructing a software design. One is to make
> it so simple that there are obviously no deficiencies; the other is to
> make it so complicated that there are no obvious deficiencies. The
> first method is far more difficult."
> -- C. A. R. Hoare
> The Emperor's Old Clothes
> CACM February 1981
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
*** Software bugs are stupid.
Nevertheless it needs not so stupid people to solve them ***
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Red Hat GmbH
Consulting Development Engineer Am Sonnenhang 11
56242 Marienrachdorf
Germany
Mauelshagen@RedHat.com +49 2626 141200
FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2004-05-11 11:10 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-06 1:43 [linux-lvm] Snit fight beetween LVM, MD and NFSD Dr. Greg Wettstein
2004-05-11 11:10 ` Heinz Mauelshagen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox