* [linux-lvm] new to cLVM - some principal questions @ 2011-11-22 19:22 Lentes, Bernd 2011-11-22 19:32 ` Digimer 0 siblings, 1 reply; 11+ messages in thread From: Lentes, Bernd @ 2011-11-22 19:22 UTC (permalink / raw) To: LVM general discussion and development Hi, i have a bit experience in LVM, but not in cLVM. So i have some principal questions: My idea is to establish a HA-Cluster with two nodes. The ressources which are managed by the cluster are virtual machines (KVM). I have a FC SAN, where the vm's will reside. I want to create vdisks in my SAN which are integrated as a PV in both hosts. On top of the PV's i will create a VG, and finally LV's. For each VM one LV. How are things going with cLVM ? Do i have to create PV ==> VG ==> LV seperately ? Or does cLVM replicate the information from one host to the other ? So that i have to create PV, VG and LV only once on the first node and this configuration is replicated to the second host. What is about e.g. resizing a LV ? Is this replicated, or do i have to resize twice, on each host ? E.g. one host is running VM3 in the corresponding lv3 on the first host. Is the second host able to access lv3 simultaneously or is there a kind of locking ? Is it possible to run some vm's on the first host and others on the second (as a kind of load-balancing) ? Is it possible to perform a live-migration from one host to the other in this scenario ? I will not install a filesystem in the lv's, because i got recommendations to run the vm's in bare partitions, this would be faster. Thanks for any eye-opening answer. Bernd -- Bernd Lentes Systemadministration Institut f�r Entwicklungsgenetik HelmholtzZentrum m�nchen bernd.lentes@helmholtz-muenchen.de phone: +49 89 3187 1241 fax: +49 89 3187 3826 http://www.helmholtz-muenchen.de/idg Es gibt nur eines, was auf Dauer teuerer ist, als in die Bildung zu investieren: Nicht in die Bildung zu investieren John F. Kennedy Helmholtz Zentrum M�nchen Deutsches Forschungszentrum f�r Gesundheit und Umwelt (GmbH) Ingolst�dter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir�in B�rbel Brumme-Bothe Gesch�ftsf�hrer: Prof. Dr. G�nther Wess und Dr. Nikolaus Blum Registergericht: Amtsgericht M�nchen HRB 6466 USt-IdNr: DE 129521671 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-22 19:22 [linux-lvm] new to cLVM - some principal questions Lentes, Bernd @ 2011-11-22 19:32 ` Digimer 2011-11-23 15:35 ` Lentes, Bernd 0 siblings, 1 reply; 11+ messages in thread From: Digimer @ 2011-11-22 19:32 UTC (permalink / raw) To: LVM general discussion and development On 11/22/2011 02:22 PM, Lentes, Bernd wrote: > Hi, > > i have a bit experience in LVM, but not in cLVM. So i have some principal questions: > My idea is to establish a HA-Cluster with two nodes. The ressources which are managed by the cluster are virtual machines (KVM). > I have a FC SAN, where the vm's will reside. I want to create vdisks in my SAN which are integrated as a PV in both hosts. On top of the PV's i will create a VG, and finally LV's. For each VM one LV. > > How are things going with cLVM ? Do i have to create PV ==> VG ==> LV seperately ? Or does cLVM replicate the information from one host to the other ? So that i have to create PV, VG and LV only once on the first node and this configuration is replicated to the second host. > > What is about e.g. resizing a LV ? Is this replicated, or do i have to resize twice, on each host ? > > E.g. one host is running VM3 in the corresponding lv3 on the first host. Is the second host able to access lv3 simultaneously or is there a kind of locking ? > > Is it possible to run some vm's on the first host and others on the second (as a kind of load-balancing) ? > > Is it possible to perform a live-migration from one host to the other in this scenario ? > > I will not install a filesystem in the lv's, because i got recommendations to run the vm's in bare partitions, this would be faster. > > > Thanks for any eye-opening answer. > > > Bernd Clustered LVM is, effectively, just normal LVM with external (clustered) locking using DLM. Once built, anything you do on one node will be seen immediately on all other nodes. Mount your iSCSI target as your normally would on all nodes. On one node, with clvmd running, 'pvcreate /dev/foo' then 'vgcreate -c y -n bar /dev/foo'. If you then run 'vgscan' on all other nodes, you'll see the VG you just created. Be absolutely sure you configure fencing in your cluster! If a node falls silent, it must be forcibly removed from the cluster before any recovery can commence. Failed fencing will hang the cluster, and short-circuited fencing will lead to corruption. Finally, yes, you can do live migration between nodes in the same cluster (specifically, they need to be in the same DLM lockspace). I use clvmd quite a bit, feel free to ask if you have any more questions. I also have an in-progress tutorial using clvmd on DRBD, but you could just replace "/dev/drbdX" with the appropriate iSCSI target and the rest is the same. -- Digimer E-Mail: digimer@alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-22 19:32 ` Digimer @ 2011-11-23 15:35 ` Lentes, Bernd 2011-11-23 18:20 ` Digimer 0 siblings, 1 reply; 11+ messages in thread From: Lentes, Bernd @ 2011-11-23 15:35 UTC (permalink / raw) To: LVM general discussion and development Digimer wrote: > > On 11/22/2011 02:22 PM, Lentes, Bernd wrote: > > Hi, > > > > i have a bit experience in LVM, but not in cLVM. So i have > some principal questions: > > My idea is to establish a HA-Cluster with two nodes. The > ressources which are managed by the cluster are virtual > machines (KVM). > > I have a FC SAN, where the vm's will reside. I want to > create vdisks in my SAN which are integrated as a PV in both > hosts. On top of the PV's i will create a VG, and finally > LV's. For each VM one LV. > > > > How are things going with cLVM ? Do i have to create PV ==> > VG ==> LV seperately ? Or does cLVM replicate the > information from one host to the other ? So that i have to > create PV, VG and LV only once on the first node and this > configuration is replicated to the second host. > > > > What is about e.g. resizing a LV ? Is this replicated, or > do i have to resize twice, on each host ? > > > > E.g. one host is running VM3 in the corresponding lv3 on > the first host. Is the second host able to access lv3 > simultaneously or is there a kind of locking ? > > > > Is it possible to run some vm's on the first host and > others on the second (as a kind of load-balancing) ? > > > > Is it possible to perform a live-migration from one host to > the other in this scenario ? > > > > I will not install a filesystem in the lv's, because i got > recommendations to run the vm's in bare partitions, this > would be faster. > > > > > > Thanks for any eye-opening answer. > > > > > > Bernd > > Clustered LVM is, effectively, just normal LVM with external > (clustered) > locking using DLM. Once built, anything you do on one node > will be seen > immediately on all other nodes. > > Mount your iSCSI target as your normally would on all nodes. On one > node, with clvmd running, 'pvcreate /dev/foo' then 'vgcreate > -c y -n bar > /dev/foo'. If you then run 'vgscan' on all other nodes, > you'll see the > VG you just created. > > Be absolutely sure you configure fencing in your cluster! If a node > falls silent, it must be forcibly removed from the cluster before any > recovery can commence. Failed fencing will hang the cluster, and > short-circuited fencing will lead to corruption. > > Finally, yes, you can do live migration between nodes in the same > cluster (specifically, they need to be in the same DLM lockspace). > > I use clvmd quite a bit, feel free to ask if you have any more > questions. I also have an in-progress tutorial using clvmd on > DRBD, but > you could just replace "/dev/drbdX" with the appropriate iSCSI target > and the rest is the same. > > -- Hi Digimer, we met already on the DRBD-ML. clvmd must be running on all nodes ? I'm planning to implement fencing. I use two HP Server which support iLO. Using this i can restart a server when the OS is not longer accessible. I think that's a kind of STONITH. Is that what you describe with "short-circuited fencing" ? You recommend not using a STONITH method ? What else can i use for fencing ? What is about concurrent access from both nodes to the same lv ? Is that possible with cLVM ? Does cLVM sync access from the two nodes, or does it lock the lv so that only one has exclusive access to the lv ? Thanks for your answer. Bernd Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum Registergericht: Amtsgericht München HRB 6466 USt-IdNr: DE 129521671 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-23 15:35 ` Lentes, Bernd @ 2011-11-23 18:20 ` Digimer 2011-11-24 16:32 ` Lentes, Bernd 0 siblings, 1 reply; 11+ messages in thread From: Digimer @ 2011-11-23 18:20 UTC (permalink / raw) To: LVM general discussion and development On 11/23/2011 10:35 AM, Lentes, Bernd wrote: > > Digimer wrote: >> >> On 11/22/2011 02:22 PM, Lentes, Bernd wrote: >>> Hi, >>> >>> i have a bit experience in LVM, but not in cLVM. So i have >> some principal questions: >>> My idea is to establish a HA-Cluster with two nodes. The >> ressources which are managed by the cluster are virtual >> machines (KVM). >>> I have a FC SAN, where the vm's will reside. I want to >> create vdisks in my SAN which are integrated as a PV in both >> hosts. On top of the PV's i will create a VG, and finally >> LV's. For each VM one LV. >>> >>> How are things going with cLVM ? Do i have to create PV ==> >> VG ==> LV seperately ? Or does cLVM replicate the >> information from one host to the other ? So that i have to >> create PV, VG and LV only once on the first node and this >> configuration is replicated to the second host. >>> >>> What is about e.g. resizing a LV ? Is this replicated, or >> do i have to resize twice, on each host ? >>> >>> E.g. one host is running VM3 in the corresponding lv3 on >> the first host. Is the second host able to access lv3 >> simultaneously or is there a kind of locking ? >>> >>> Is it possible to run some vm's on the first host and >> others on the second (as a kind of load-balancing) ? >>> >>> Is it possible to perform a live-migration from one host to >> the other in this scenario ? >>> >>> I will not install a filesystem in the lv's, because i got >> recommendations to run the vm's in bare partitions, this >> would be faster. >>> >>> >>> Thanks for any eye-opening answer. >>> >>> >>> Bernd >> >> Clustered LVM is, effectively, just normal LVM with external >> (clustered) >> locking using DLM. Once built, anything you do on one node >> will be seen >> immediately on all other nodes. >> >> Mount your iSCSI target as your normally would on all nodes. On one >> node, with clvmd running, 'pvcreate /dev/foo' then 'vgcreate >> -c y -n bar >> /dev/foo'. If you then run 'vgscan' on all other nodes, >> you'll see the >> VG you just created. >> >> Be absolutely sure you configure fencing in your cluster! If a node >> falls silent, it must be forcibly removed from the cluster before any >> recovery can commence. Failed fencing will hang the cluster, and >> short-circuited fencing will lead to corruption. >> >> Finally, yes, you can do live migration between nodes in the same >> cluster (specifically, they need to be in the same DLM lockspace). >> >> I use clvmd quite a bit, feel free to ask if you have any more >> questions. I also have an in-progress tutorial using clvmd on >> DRBD, but >> you could just replace "/dev/drbdX" with the appropriate iSCSI target >> and the rest is the same. >> >> -- > > Hi Digimer, > > we met already on the DRBD-ML. > clvmd must be running on all nodes ? Yes, but more to the point, they must also be in the same cluster. Even more specifically, they must be in the same DLM lockspace. :) > I'm planning to implement fencing. I use two HP Server which support iLO. Good, fencing is required. It's a good idea to also use a switched PDU as a backup fence device. If the iLO loses power (ie, blown power supply or failed BMC), the fence will fail. Having the PDU provides an alternative method to confirm node death and will avoid blocking. That is, when a fence is pending (and it will wait forever for success), DLM will not give out locks so your storage will block. > Using this i can restart a server when the OS is not longer accessible. The cluster, fenced specifically, will do this for you. > I think that's a kind of STONITH. Is that what you describe with "short-circuited fencing" ? Fencing and Stonith are two names for the same thing; Fencing was traditionally used in Red Hat clusters and STONITH in heartbeat/pacemaker clusters. It's arguable which is preferable, but I personally prefer fencing as it more directly describes the goal of "fencing off" (isolating) a failed node from the rest of the cluster. To "short circuit" the fence, I mean return a success message to fenced without actually properly fencing the device. This is an incredibly bad idea that I've seen people try to do in the past. > You recommend not using a STONITH method ? What else can i use for fencing ? I generally use a mix of IPMI (or iLO/RSA/DRAC, effectively the same thing, but vendor-specific) as my primary fence device because it can confirm that the node is off. However, as mentioned above, it will fail if the node it is in dies badly enough. In that case, a switched PDU, like the APC 7900 (http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=AP7900) makes a perfect backup. I don't use it as primary though because it can only confirm that power has been cut to the specified port(s), not that the node itself is off, leaving room for configuration or cabling errors returning false-positives. It is critical to test PDU fence devices prior to deployment and to ensure that cables are then never moved around after. > What is about concurrent access from both nodes to the same lv ? Is that possible with cLVM ? Yes, that is the whole point. For example, with a cluster-enabled VG, you can create a new LV on one node, and then immediately see that new LV on all other nodes. Keep in mind, this does *not* magically provide cluster awareness to filesystems. For example, you can not use ext3 on a clustered VG->LV on two nodes at once. You will still need a cluster-aware filesystem like GFS2. > Does cLVM sync access from the two nodes, or does it lock the lv so that only one has exclusive access to the lv ? When a node wants access to a clustered LV, it requests a lock from DLM. There are a few types of locks, but let's look at exclusive, which is needed to write to the LV (simplified example). So Node 1 decides it wants to write to an LV. It sends a request to DLM for an exclusive lock on the LV. DLM sees that no other node has a lock, so the lock is granted to Node 1 for that LV's lockspace. Node 1 then proceeds to use the LV as if it was a simple local LV. Meanwhile, Node 2 also wants access to that LV and asks DLM for a lock. This time DLM sees that Node 1 has an exclusive lock in that LV's lockspace and denies the request. Node 2 can not use the LV. At some point, Node 1 finishes and releases the lock. Now Node 2 can re-request the lock, and it will be granted. Now let's talk about how fencing fits; Let's assume that Node 1 hangs or dies while it still holds the lock. The fenced daemon will be triggered and it will notify DLM that there is a problem, and DLM will block all further requests. Next, fenced tries to fence the node using one of it's configured fence methods. It will try the first, then the second, then the first again, looping forever until one of the fence calls succeeds. Once a fence call succeeds, fenced notifies DLM that the node is gone and then DLM will clean up any locks formerly held by Node 1. After this, Node 2 can get a lock, despite Node 1 never itself releasing it. Now, let's imagine that a fence agent returned success but the node wasn't actually fenced. Let's also assume that Node 1 was hung, not dead. So DLM thinks that Node 1 was fenced, clears it's old locks and gives a new one to Node 2. Node 2 goes about recovering the filesystem and the proceeds to write new data. At some point later, Node 1 unfreezes, thinks it still has an exclusive lock on the LV and finishes writing to the disk. Voila, you just corrupted your storage. You can apply this to anything using DLM lockspaces, by the way. > Thanks for your answer. Happy to help. :) -- Digimer E-Mail: digimer@alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-23 18:20 ` Digimer @ 2011-11-24 16:32 ` Lentes, Bernd 2011-11-24 16:46 ` Digimer 0 siblings, 1 reply; 11+ messages in thread From: Lentes, Bernd @ 2011-11-24 16:32 UTC (permalink / raw) To: LVM general discussion and development Digimer wrote: > > Hi Digimer, > > > > we met already on the DRBD-ML. > > clvmd must be running on all nodes ? > > Yes, but more to the point, they must also be in the same > cluster. Even > more specifically, they must be in the same DLM lockspace. :) > > > I'm planning to implement fencing. I use two HP Server > which support iLO. > > Good, fencing is required. It's a good idea to also use a > switched PDU > as a backup fence device. If the iLO loses power (ie, blown > power supply > or failed BMC), the fence will fail. Having the PDU provides an > alternative method to confirm node death and will avoid > blocking. That > is, when a fence is pending (and it will wait forever for > success), DLM > will not give out locks so your storage will block. > > > Using this i can restart a server when the OS is not longer > accessible. > > The cluster, fenced specifically, will do this for you. Yes, that's logical. > > > I think that's a kind of STONITH. Is that what you describe > with "short-circuited fencing" ? > > Fencing and Stonith are two names for the same thing; Fencing was > traditionally used in Red Hat clusters and STONITH in > heartbeat/pacemaker clusters. It's arguable which is > preferable, but I > personally prefer fencing as it more directly describes the goal of > "fencing off" (isolating) a failed node from the rest of the cluster. > > To "short circuit" the fence, I mean return a success message > to fenced > without actually properly fencing the device. This is an > incredibly bad > idea that I've seen people try to do in the past. Strange people who have ideas like that. > > > You recommend not using a STONITH method ? What else can i > use for fencing ? > > I generally use a mix of IPMI (or iLO/RSA/DRAC, effectively the same > thing, but vendor-specific) as my primary fence device because it can > confirm that the node is off. However, as mentioned above, it > will fail > if the node it is in dies badly enough. > > In that case, a switched PDU, like the APC 7900 > (http://www.apc.com/products/resource/include/techspec_index.c > fm?base_sku=AP7900) > makes a perfect backup. I don't use it as primary though > because it can > only confirm that power has been cut to the specified > port(s), not that > the node itself is off, leaving room for configuration or > cabling errors > returning false-positives. It is critical to test PDU fence devices > prior to deployment and to ensure that cables are then never moved > around after. I ordered one. > > > What is about concurrent access from both nodes to the same > lv ? Is that possible with cLVM ? > > Yes, that is the whole point. For example, with a cluster-enabled VG, > you can create a new LV on one node, and then immediately see > that new > LV on all other nodes. > > Keep in mind, this does *not* magically provide cluster awareness to > filesystems. For example, you can not use ext3 on a clustered > VG->LV on > two nodes at once. You will still need a cluster-aware > filesystem like GFS2. I don't have a filesystem. I will install the vm's (using KVM) in bare partitions (lv's). Is that a problem ? I got recommendations this is faster than installing them in partitions with a filesystem. > > > Does cLVM sync access from the two nodes, or does it lock > the lv so that only one has exclusive access to the lv ? > > When a node wants access to a clustered LV, it requests a > lock from DLM. > There are a few types of locks, but let's look at exclusive, which is > needed to write to the LV (simplified example). > > So Node 1 decides it wants to write to an LV. It sends a > request to DLM > for an exclusive lock on the LV. DLM sees that no other node > has a lock, > so the lock is granted to Node 1 for that LV's lockspace. Node 1 then > proceeds to use the LV as if it was a simple local LV. > > Meanwhile, Node 2 also wants access to that LV and asks DLM > for a lock. > This time DLM sees that Node 1 has an exclusive lock in that LV's > lockspace and denies the request. Node 2 can not use the LV. > > At some point, Node 1 finishes and releases the lock. Now Node 2 can > re-request the lock, and it will be granted. > > Now let's talk about how fencing fits; > > Let's assume that Node 1 hangs or dies while it still holds the lock. > The fenced daemon will be triggered and it will notify DLM > that there is > a problem, and DLM will block all further requests. Next, > fenced tries > to fence the node using one of it's configured fence methods. It will > try the first, then the second, then the first again, looping forever > until one of the fence calls succeeds. > > Once a fence call succeeds, fenced notifies DLM that the node is gone > and then DLM will clean up any locks formerly held by Node 1. After > this, Node 2 can get a lock, despite Node 1 never itself releasing it. > > Now, let's imagine that a fence agent returned success but the node > wasn't actually fenced. Let's also assume that Node 1 was > hung, not dead. > > So DLM thinks that Node 1 was fenced, clears it's old locks > and gives a > new one to Node 2. Node 2 goes about recovering the > filesystem and the > proceeds to write new data. At some point later, Node 1 unfreezes, > thinks it still has an exclusive lock on the LV and finishes > writing to > the disk. But you said "So DLM thinks that Node 1 was fenced, clears it's old locks and gives a new one to Node 2" How can node 1 get access after unfreezing, when the lock is cleared ? > > Voila, you just corrupted your storage. > > You can apply this to anything using DLM lockspaces, by the way. > > > Thanks for your answer. > > Happy to help. :) > The situation that two nodes offer the same service should normally be prevented by the CRM. Thanks for your very detailed answer. Bernd Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum Registergericht: Amtsgericht München HRB 6466 USt-IdNr: DE 129521671 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-24 16:32 ` Lentes, Bernd @ 2011-11-24 16:46 ` Digimer 2011-11-25 17:49 ` Lentes, Bernd 0 siblings, 1 reply; 11+ messages in thread From: Digimer @ 2011-11-24 16:46 UTC (permalink / raw) To: LVM general discussion and development On 11/24/2011 11:32 AM, Lentes, Bernd wrote: > Digimer wrote: > >>> Hi Digimer, >>> >>> we met already on the DRBD-ML. >>> clvmd must be running on all nodes ? >> >> Yes, but more to the point, they must also be in the same >> cluster. Even >> more specifically, they must be in the same DLM lockspace. :) >> >>> I'm planning to implement fencing. I use two HP Server >> which support iLO. >> >> Good, fencing is required. It's a good idea to also use a >> switched PDU >> as a backup fence device. If the iLO loses power (ie, blown >> power supply >> or failed BMC), the fence will fail. Having the PDU provides an >> alternative method to confirm node death and will avoid >> blocking. That >> is, when a fence is pending (and it will wait forever for >> success), DLM >> will not give out locks so your storage will block. >> >>> Using this i can restart a server when the OS is not longer >> accessible. >> >> The cluster, fenced specifically, will do this for you. > > Yes, that's logical. > >> >>> I think that's a kind of STONITH. Is that what you describe >> with "short-circuited fencing" ? >> >> Fencing and Stonith are two names for the same thing; Fencing was >> traditionally used in Red Hat clusters and STONITH in >> heartbeat/pacemaker clusters. It's arguable which is >> preferable, but I >> personally prefer fencing as it more directly describes the goal of >> "fencing off" (isolating) a failed node from the rest of the cluster. >> >> To "short circuit" the fence, I mean return a success message >> to fenced >> without actually properly fencing the device. This is an >> incredibly bad >> idea that I've seen people try to do in the past. > > Strange people who have ideas like that. > >> >>> You recommend not using a STONITH method ? What else can i >> use for fencing ? >> >> I generally use a mix of IPMI (or iLO/RSA/DRAC, effectively the same >> thing, but vendor-specific) as my primary fence device because it can >> confirm that the node is off. However, as mentioned above, it >> will fail >> if the node it is in dies badly enough. >> >> In that case, a switched PDU, like the APC 7900 >> (http://www.apc.com/products/resource/include/techspec_index.c >> fm?base_sku=AP7900) >> makes a perfect backup. I don't use it as primary though >> because it can >> only confirm that power has been cut to the specified >> port(s), not that >> the node itself is off, leaving room for configuration or >> cabling errors >> returning false-positives. It is critical to test PDU fence devices >> prior to deployment and to ensure that cables are then never moved >> around after. > > I ordered one. > >> >>> What is about concurrent access from both nodes to the same >> lv ? Is that possible with cLVM ? >> >> Yes, that is the whole point. For example, with a cluster-enabled VG, >> you can create a new LV on one node, and then immediately see >> that new >> LV on all other nodes. >> >> Keep in mind, this does *not* magically provide cluster awareness to >> filesystems. For example, you can not use ext3 on a clustered >> VG->LV on >> two nodes at once. You will still need a cluster-aware >> filesystem like GFS2. > > I don't have a filesystem. I will install the vm's (using KVM) in bare partitions (lv's). > Is that a problem ? > I got recommendations this is faster than installing them in partitions with a filesystem. s/filesystem/clustered storage/ Whenever two independent servers access a shared chunk of storage, be it using LVM or an actual filesystem, access *must* be coordinated. Perhaps it is someone less risky, but the risk remains and it is non-trivial. I also install VMs directly using raw LVs, which I also find to be better performing. >>> Does cLVM sync access from the two nodes, or does it lock >> the lv so that only one has exclusive access to the lv ? >> >> When a node wants access to a clustered LV, it requests a >> lock from DLM. >> There are a few types of locks, but let's look at exclusive, which is >> needed to write to the LV (simplified example). >> >> So Node 1 decides it wants to write to an LV. It sends a >> request to DLM >> for an exclusive lock on the LV. DLM sees that no other node >> has a lock, >> so the lock is granted to Node 1 for that LV's lockspace. Node 1 then >> proceeds to use the LV as if it was a simple local LV. >> >> Meanwhile, Node 2 also wants access to that LV and asks DLM >> for a lock. >> This time DLM sees that Node 1 has an exclusive lock in that LV's >> lockspace and denies the request. Node 2 can not use the LV. >> >> At some point, Node 1 finishes and releases the lock. Now Node 2 can >> re-request the lock, and it will be granted. >> >> Now let's talk about how fencing fits; >> >> Let's assume that Node 1 hangs or dies while it still holds the lock. >> The fenced daemon will be triggered and it will notify DLM >> that there is >> a problem, and DLM will block all further requests. Next, >> fenced tries >> to fence the node using one of it's configured fence methods. It will >> try the first, then the second, then the first again, looping forever >> until one of the fence calls succeeds. >> >> Once a fence call succeeds, fenced notifies DLM that the node is gone >> and then DLM will clean up any locks formerly held by Node 1. After >> this, Node 2 can get a lock, despite Node 1 never itself releasing it. >> >> Now, let's imagine that a fence agent returned success but the node >> wasn't actually fenced. Let's also assume that Node 1 was >> hung, not dead. >> >> So DLM thinks that Node 1 was fenced, clears it's old locks >> and gives a >> new one to Node 2. Node 2 goes about recovering the >> filesystem and the >> proceeds to write new data. At some point later, Node 1 unfreezes, >> thinks it still has an exclusive lock on the LV and finishes >> writing to >> the disk. > > But you said "So DLM thinks that Node 1 was fenced, clears it's old locks and gives a > new one to Node 2" How can node 1 get access after unfreezing, when the lock is cleared ? DLM clears the lock, but it has no way of telling Node 1 that the lock is no longer valid (remember, it thinks the node has been ejected from the cluster, removing any communication). Meanwhile, Node 1 has no reason to think that the lock it holds is no longer valid, so it just goes ahead and accesses the storage figuring it has exclusive access still. >> Voila, you just corrupted your storage. >> >> You can apply this to anything using DLM lockspaces, by the way. >> >>> Thanks for your answer. >> >> Happy to help. :) >> > > The situation that two nodes offer the same service should normally be prevented by the CRM. > > Thanks for your very detailed answer. > > Bernd CRM, or any other cluster resource manager, works on the assumption that the nodes are in sync. By definition, a failed node is no longer in sync. Take the use-case of a two-node cluster where, by necessity, quorum has been disabled. At some point, the cluster partitions and then either node thinks that it is the sole remaining node. The node that had been backup tries to start the VM while the same VM is still running on the former node. There are other ways that things can go wrong. The important thing to understand is that, once communication has been lost to a node, it *must* be confirmed removed from the cluster before recovery can commence. DLM and the like can only work when all running nodes are working together. Cheers -- Digimer E-Mail: digimer@alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-24 16:46 ` Digimer @ 2011-11-25 17:49 ` Lentes, Bernd 2011-11-25 18:10 ` Digimer 0 siblings, 1 reply; 11+ messages in thread From: Lentes, Bernd @ 2011-11-25 17:49 UTC (permalink / raw) To: LVM general discussion and development Digimer wrote: > >> > >> Fencing and Stonith are two names for the same thing; Fencing was > >> traditionally used in Red Hat clusters and STONITH in > >> heartbeat/pacemaker clusters. It's arguable which is > >> preferable, but I > >> personally prefer fencing as it more directly describes the goal of > >> "fencing off" (isolating) a failed node from the rest of > the cluster. Yes, but "STONITH" is a wonderful acronym. > >> > >> Now let's talk about how fencing fits; > >> > >> Let's assume that Node 1 hangs or dies while it still > holds the lock. > >> The fenced daemon will be triggered and it will notify DLM > >> that there is > >> a problem, and DLM will block all further requests. Next, > >> fenced tries > >> to fence the node using one of it's configured fence > methods. It will > >> try the first, then the second, then the first again, > looping forever > >> until one of the fence calls succeeds. > >> > >> Once a fence call succeeds, fenced notifies DLM that the > node is gone > >> and then DLM will clean up any locks formerly held by Node 1. After > >> this, Node 2 can get a lock, despite Node 1 never itself > releasing it. > >> > >> Now, let's imagine that a fence agent returned success but the node > >> wasn't actually fenced. Let's also assume that Node 1 was > >> hung, not dead. > >> > >> So DLM thinks that Node 1 was fenced, clears it's old locks > >> and gives a > >> new one to Node 2. Node 2 goes about recovering the > >> filesystem and the > >> proceeds to write new data. At some point later, Node 1 unfreezes, > >> thinks it still has an exclusive lock on the LV and finishes > >> writing to > >> the disk. > > > > But you said "So DLM thinks that Node 1 was fenced, clears > it's old locks and gives a > > new one to Node 2" How can node 1 get access after > unfreezing, when the lock is cleared ? > > DLM clears the lock, but it has no way of telling Node 1 that > the lock > is no longer valid (remember, it thinks the node has been > ejected from > the cluster, removing any communication). Meanwhile, Node 1 has no > reason to think that the lock it holds is no longer valid, so it just > goes ahead and accesses the storage figuring it has exclusive > access still. But does DLM not prevent node 1 in this situation accessing the filesystem ? DLM "knows" that the lock from node 1 has been cleared. Can't DLM "say" to node 1: "You think you have a valid lock, but don't have. Sorry, no access !" Bernd Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum Registergericht: Amtsgericht München HRB 6466 USt-IdNr: DE 129521671 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-25 17:49 ` Lentes, Bernd @ 2011-11-25 18:10 ` Digimer 2011-12-04 18:58 ` Lentes, Bernd 0 siblings, 1 reply; 11+ messages in thread From: Digimer @ 2011-11-25 18:10 UTC (permalink / raw) To: LVM general discussion and development On 11/25/2011 12:49 PM, Lentes, Bernd wrote: > > Digimer wrote: > > > >>>> >>>> Fencing and Stonith are two names for the same thing; Fencing was >>>> traditionally used in Red Hat clusters and STONITH in >>>> heartbeat/pacemaker clusters. It's arguable which is >>>> preferable, but I >>>> personally prefer fencing as it more directly describes the goal of >>>> "fencing off" (isolating) a failed node from the rest of >> the cluster. > > Yes, but "STONITH" is a wonderful acronym. > > >>>> >>>> Now let's talk about how fencing fits; >>>> >>>> Let's assume that Node 1 hangs or dies while it still >> holds the lock. >>>> The fenced daemon will be triggered and it will notify DLM >>>> that there is >>>> a problem, and DLM will block all further requests. Next, >>>> fenced tries >>>> to fence the node using one of it's configured fence >> methods. It will >>>> try the first, then the second, then the first again, >> looping forever >>>> until one of the fence calls succeeds. >>>> >>>> Once a fence call succeeds, fenced notifies DLM that the >> node is gone >>>> and then DLM will clean up any locks formerly held by Node 1. After >>>> this, Node 2 can get a lock, despite Node 1 never itself >> releasing it. >>>> >>>> Now, let's imagine that a fence agent returned success but the node >>>> wasn't actually fenced. Let's also assume that Node 1 was >>>> hung, not dead. >>>> >>>> So DLM thinks that Node 1 was fenced, clears it's old locks >>>> and gives a >>>> new one to Node 2. Node 2 goes about recovering the >>>> filesystem and the >>>> proceeds to write new data. At some point later, Node 1 unfreezes, >>>> thinks it still has an exclusive lock on the LV and finishes >>>> writing to >>>> the disk. >>> >>> But you said "So DLM thinks that Node 1 was fenced, clears >> it's old locks and gives a >>> new one to Node 2" How can node 1 get access after >> unfreezing, when the lock is cleared ? >> >> DLM clears the lock, but it has no way of telling Node 1 that >> the lock >> is no longer valid (remember, it thinks the node has been >> ejected from >> the cluster, removing any communication). Meanwhile, Node 1 has no >> reason to think that the lock it holds is no longer valid, so it just >> goes ahead and accesses the storage figuring it has exclusive >> access still. > > But does DLM not prevent node 1 in this situation accessing the filesystem ? > DLM "knows" that the lock from node 1 has been cleared. Can't DLM "say" to node 1: > "You think you have a valid lock, but don't have. Sorry, no access !" > > Bernd Nope, it doesn't work that way. There is no way for DLM to tell the server to discard any locks. First of all, DLM thinks the node is gone anyway. Secondly, Node 1 could have hung in the middle of a write. When it recovers, it could be quite literally in the middle of a write which is finished. DLM doesn't act as a barrier to the raw data... it's merely a lock manager. -- Digimer E-Mail: digimer@alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-11-25 18:10 ` Digimer @ 2011-12-04 18:58 ` Lentes, Bernd 2011-12-04 20:44 ` Digimer 0 siblings, 1 reply; 11+ messages in thread From: Lentes, Bernd @ 2011-12-04 18:58 UTC (permalink / raw) To: LVM general discussion and development Some more questions: - Is it necessary to have cLVM running as a resource in the cluster ? - And is it possible having the vm's running on different nodes, not all on one (because of load balancing) ? Remember that i want to run the vm's in bare lv's, without a filesystem. The lv's reside on a SAN. Bernd Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum Registergericht: Amtsgericht München HRB 6466 USt-IdNr: DE 129521671 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-12-04 18:58 ` Lentes, Bernd @ 2011-12-04 20:44 ` Digimer 2012-02-09 16:20 ` Lentes, Bernd 0 siblings, 1 reply; 11+ messages in thread From: Digimer @ 2011-12-04 20:44 UTC (permalink / raw) To: LVM general discussion and development On 12/04/2011 01:58 PM, Lentes, Bernd wrote: > > Some more questions: > - Is it necessary to have cLVM running as a resource in the cluster ? No, so long as it is running at all, it's fine. I do recommend it though, as you can make anything using clvmd dependent on it having started properly. > - And is it possible having the vm's running on different nodes, not all on one (because of load balancing) ? > Remember that i want to run the vm's in bare lv's, without a filesystem. The lv's reside on a SAN. That's what I do. Mount the same iSCSI target on all VM nodes, then on one node set it up as a PV and then a clustered VG. You will see the new PV and VG on all the other nodes in the cluster. You can then create LVs, setup VMs on them and move the VMs around the nodes using live migration. -- Digimer E-Mail: digimer@alteeve.com Freenode handle: digimer Papers and Projects: http://alteeve.com Node Assassin: http://nodeassassin.org "omg my singularity battery is dead again. stupid hawking radiation." - epitron ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [linux-lvm] new to cLVM - some principal questions 2011-12-04 20:44 ` Digimer @ 2012-02-09 16:20 ` Lentes, Bernd 0 siblings, 0 replies; 11+ messages in thread From: Lentes, Bernd @ 2012-02-09 16:20 UTC (permalink / raw) To: LVM general discussion and development Digimer wrote: > > On 12/04/2011 01:58 PM, Lentes, Bernd wrote: > > > > Some more questions: > > - Is it necessary to have cLVM running as a resource in the > cluster ? > > No, so long as it is running at all, it's fine. I do recommend it > though, as you can make anything using clvmd dependent on it having > started properly. > > > - And is it possible having the vm's running on different > nodes, not all on one (because of load balancing) ? > > Remember that i want to run the vm's in bare lv's, without > a filesystem. The lv's reside on a SAN. > > That's what I do. Mount the same iSCSI target on all VM nodes, then on > one node set it up as a PV and then a clustered VG. You will > see the new > PV and VG on all the other nodes in the cluster. You can then create > LVs, setup VMs on them and move the VMs around the nodes using live > migration. > Hi, Again me :-). I lost sight of this thread. Just for clarification: - Is it possible having vm's running on different nodes simultaneously in a cLVM environment (one LV per VM, each LV accessing the same pv and vg on a FC SAN) ? E.g. vm1 on node 1, vm2 on node 2. - And is cLVM able to lock the lv, so that only one host has exclusive access to that lv ? So that the case if a second node tries to access the same lv corruption is prevented. Sorry to insist, but this is important for me. This is my basic setup, and i'd like to know if it is possible. Because if not, i have to conceive something new. Bernd Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum Registergericht: Amtsgericht München HRB 6466 USt-IdNr: DE 129521671 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-02-09 16:20 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-22 19:22 [linux-lvm] new to cLVM - some principal questions Lentes, Bernd 2011-11-22 19:32 ` Digimer 2011-11-23 15:35 ` Lentes, Bernd 2011-11-23 18:20 ` Digimer 2011-11-24 16:32 ` Lentes, Bernd 2011-11-24 16:46 ` Digimer 2011-11-25 17:49 ` Lentes, Bernd 2011-11-25 18:10 ` Digimer 2011-12-04 18:58 ` Lentes, Bernd 2011-12-04 20:44 ` Digimer 2012-02-09 16:20 ` Lentes, Bernd
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).