From: Tejas Rao <raot@bnl.gov>
To: Aaron Knister <aaron.s.knister@nasa.gov>
Cc: NeilBrown <neilb@suse.de>, Scott Sinno <scott.sinno@nasa.gov>,
linux-raid@vger.kernel.org
Subject: Re: clustered MD - beyond RAID1
Date: Mon, 21 Dec 2015 21:33:55 -0500 [thread overview]
Message-ID: <5678B693.40907@bnl.gov> (raw)
In-Reply-To: <5678AC55.7070606@nasa.gov>
On 12/21/2015 20:50, Aaron Knister wrote:
> Hi Tejas et al,
>
> I'm fairly confident in saying that GPFS can have many servers actively
> writing to a given NSD (LUN) at any given time. In our production
> environment the NSDs have 6 servers defined and clients more or less
> write to whichever one their little hearts desire. Do you think it's
> possible that the explicit primary/secondary concept is from an older
> version of GPFS? I'm not sure what the locking granularity is for
> NSDs/disks, but even if it's a single GPFS FS block and that block size
> corresponds to the stripe width of the array I'm pretty nervous relying
> on that assumption for data integrity :)
>
> The use case here is creating effectively highly available block storage
> from shared JBODs for use by VMs on the servers as well as to be
> exported to other nodes. The filesystem we're using for this is actually
> GPFS. The intent was to use RAID6 in an active/active fashion on two
> nodes sharing a common set of disks. The active/active was in an effort
> to simplify the configuration.
You are probably not defining the NSD parameter "servers=ServerList". If
this parameter is not defined, GPFS assumes that the disks are SAN
attached to all the NSD nodes, in this case there is no
primary/secondary server. Of-course there is no risk of data integrity
even if the "servers" parameter is not defined.
>
> I'm curious now, Redhat doesn't support SW raid failover? I did some
> googling and found this:
>
> https://access.redhat.com/solutions/231643
>
> While I can't read the solution I have to figure that they're now
> supporting that. I might actually explore that for this project.
https://access.redhat.com/solutions/410203
This article states that md raid is not supported in RHEL6/7 under any
circumstances, including active/passive modes.
>
> -Aaron
>
> On 12/21/15 8:09 PM, Tejas Rao wrote:
>> Each GPFS disk (block device) has a list of servers associated with it.
>> When the first storage server fails (expired disk lease), the storage
>> node is expelled and a different server which also sees the shared
>> storage will do I/O.
>>
>> There is a "leaseRecoveryWait" parameter which tells the filesystem
>> manager to wait for few seconds to allow the expelled node to complete
>> any I/O in flight to the shared storage device to avoid any out of order
>> i/O. After this wait time, the filesystem manager completes recovery on
>> the failed node, replaying journal logs, freeing up shared tokens/locks
>> etc. After the recovery is complete a different storage node will do
>> I/O. There is a concept of primary/secondary servers for a given block
>> device. The secondary server will only do I/O when the primary server
>> has failed and this has been confirmed.
>>
>> See "servers=ServerList" in man page for mmcrnsd. ( I don't think I am
>> allowed to send web links)
>>
>> We currently have 10's of petabytes in production using linux md raid.
>> We are currently not sharing md devices, only hardware raid block
>> devices are shared. In our experience hardware raid controllers are
>> expensive. Linux raid has worked well over the years and performance is
>> very good as GPFS coalesces I/O in large filesystem blocksize blocks
>> (8MB) and if aligned properly eliminate RMW (doing full stripe writes)
>> and the need for NVRAM (unless someone is doing POSIX fsync).
>>
>> In the future ,we would prefer to use linux raid (RAID6) in a shared
>> environment shielding us against server failures. Unfortunately we can
>> only do this after Redhat supports such an environment with linux raid.
>> Currently they do not support this even in an active/passive environment
>> (only one server can have a md device assembled and active regardless).
>>
>> Tejas.
>>
>> On 12/21/2015 17:03, NeilBrown wrote:
>> > On Tue, Dec 22 2015, Tejas Rao wrote:
>> >
>> >> GPFS guarantees that only one node will write to a linux block device
>> >> using disk leases.
>> >
>> > Do you have a reference to documentation explaining that?
>> > A few moments searching the internet suggests that a "disk lease" is
>> > much like a heart-beat. A node uses it to say "I'm still alive, please
>> > don't ignore me". I could find no evidence that only one node could
>> > hold a disk lease at any time.
>> >
>> > NeilBrown
>>
>
next prev parent reply other threads:[~2015-12-22 2:33 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-18 15:29 clustered MD - beyond RAID1 Scott Sinno
2015-12-20 23:25 ` NeilBrown
2015-12-21 19:19 ` Tejas Rao
2015-12-21 20:47 ` NeilBrown
2015-12-21 21:27 ` Tejas Rao
2015-12-21 22:03 ` NeilBrown
2015-12-21 22:29 ` Adam Goryachev
2015-12-21 23:09 ` NeilBrown
2015-12-22 1:36 ` Tejas Rao
2015-12-22 2:29 ` Alireza Haghdoost
2015-12-22 4:13 ` NeilBrown
[not found] ` <CAB9NSeXhoHd3_BDRrWAsBrW0Dj2=NucyUFt8pSP0zB5K=RkUOg@mail.gmail.com>
2016-12-05 1:46 ` Aaron Knister
[not found] ` <5678A2B9.6070008@bnl.gov>
2015-12-22 1:50 ` Aaron Knister
2015-12-22 2:33 ` Tejas Rao [this message]
[not found] ` <5678B693.40907-IGkKxAqZmp0@public.gmane.org>
2015-12-25 8:47 ` roger zhou
-- strict thread matches above, loose matches on Subject: below --
2016-12-02 18:12 Robert Woodworth
2016-12-02 20:02 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5678B693.40907@bnl.gov \
--to=raot@bnl.gov \
--cc=aaron.s.knister@nasa.gov \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=scott.sinno@nasa.gov \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).