All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Re: [Ocfs2-commits] zab commits r2451 - trunk/fs/ocfs2/cluster
       [not found] <200507011649.j61GnMlY022352@oss.oracle.com>
@ 2005-07-01 12:05 ` Lars Marowsky-Bree
  2005-07-01 13:20   ` Zach Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Lars Marowsky-Bree @ 2005-07-01 12:05 UTC (permalink / raw)
  To: ocfs2-devel

On 2005-07-01T11:49:22, svn-commits@oss.oracle.com wrote:

> +/* This quorum hack is only here until we transition to some more rational
> + * approach that is driven from userspace.  Honest.  No foolin'.

Hopefully we can hack this up at OLS/KS.


Sincerely,
    Lars Marowsky-Br?e <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] Re: [Ocfs2-commits] zab commits r2451 - trunk/fs/ocfs2/cluster
  2005-07-01 12:05 ` [Ocfs2-devel] Re: [Ocfs2-commits] zab commits r2451 - trunk/fs/ocfs2/cluster Lars Marowsky-Bree
@ 2005-07-01 13:20   ` Zach Brown
  2005-07-01 18:51     ` [Ocfs2-devel] 256 node limit Bruce Schwartz
  0 siblings, 1 reply; 10+ messages in thread
From: Zach Brown @ 2005-07-01 13:20 UTC (permalink / raw)
  To: ocfs2-devel

Lars Marowsky-Bree wrote:
> On 2005-07-01T11:49:22, svn-commits@oss.oracle.com wrote:
> 
> 
>>+/* This quorum hack is only here until we transition to some more rational
>>+ * approach that is driven from userspace.  Honest.  No foolin'.
> 
> Hopefully we can hack this up at OLS/KS.

Agreed, that would be very good.

- z

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-01 13:20   ` Zach Brown
@ 2005-07-01 18:51     ` Bruce Schwartz
  2005-07-05 18:23       ` Sunil Mushran
  0 siblings, 1 reply; 10+ messages in thread
From: Bruce Schwartz @ 2005-07-01 18:51 UTC (permalink / raw)
  To: ocfs2-devel

Hi all --

In the "what's new in OCFS2" document at 
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-whats-new.txtit
says that the 256 node limit is a software limit and could be lifted.
Why
is that limit there? Are there some algorithms that don't scale nicely with 
larger number of nodes? I'm guessing that there is more to it than saving a 
byte of RAM in a few data structures.

Thanks,
Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20050701/0c255206/attachment.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-01 18:51     ` [Ocfs2-devel] 256 node limit Bruce Schwartz
@ 2005-07-05 18:23       ` Sunil Mushran
  2005-07-06 12:38         ` Bruce Schwartz
  0 siblings, 1 reply; 10+ messages in thread
From: Sunil Mushran @ 2005-07-05 18:23 UTC (permalink / raw)
  To: ocfs2-devel

It's actually 255. Yes, that doc needs to be updated.

No, the algorithms did not play much role in setting this limit.
Guess, can say the limit is part arbitrary, part practical.
(That extra byte adds up pretty quickly.)

Bruce Schwartz wrote:

> Hi all --
>
> In the "what's new in OCFS2" document at  
> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-whats-new.txt 
> it says that the 256 node limit is a software limit and could be 
> lifted.  Why is that limit there?  Are there some algorithms that 
> don't scale nicely with larger number of nodes?  I'm guessing that 
> there is more to it than saving a byte of RAM in a few data structures.
>
> Thanks,
> Bruce
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Ocfs2-devel mailing list
>Ocfs2-devel@oss.oracle.com
>http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>  
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-05 18:23       ` Sunil Mushran
@ 2005-07-06 12:38         ` Bruce Schwartz
  2005-07-06 13:04           ` Wim Coekaerts
  0 siblings, 1 reply; 10+ messages in thread
From: Bruce Schwartz @ 2005-07-06 12:38 UTC (permalink / raw)
  To: ocfs2-devel

Thanks. From looking at the code it appears that the maximum number of nodes 
are controlled by some #defines (OCFS2_NODE_MAP_MAX_NODES, O2NM_MAX_NODES) 
and that bumping the number up to something like 300 should be a simple 
matter. There is a note in the code that reads: "if we need more, we can do 
a kmalloc for the map" which I would guess addresses the case where you'd 
want thousands of nodes.

Is my reading correct? And would it be a bad idea to try to set up a 300+ 
node OCFS2 system?

Thanks,
Bruce

On 7/5/05, Sunil Mushran <Sunil.Mushran@oracle.com> wrote:
> 
> It's actually 255. Yes, that doc needs to be updated.
> 
> No, the algorithms did not play much role in setting this limit.
> Guess, can say the limit is part arbitrary, part practical.
> (That extra byte adds up pretty quickly.)
> 
> Bruce Schwartz wrote:
> 
> > Hi all --
> >
> > In the "what's new in OCFS2" document at
> > 
> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-whats-new.txt
> > it says that the 256 node limit is a software limit and could be
> > lifted. Why is that limit there? Are there some algorithms that
> > don't scale nicely with larger number of nodes? I'm guessing that
> > there is more to it than saving a byte of RAM in a few data structures.
> >
> > Thanks,
> > Bruce
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >Ocfs2-devel mailing list
> >Ocfs2-devel@oss.oracle.com
> >http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20050706/fa6dc4e6/attachment.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-06 12:38         ` Bruce Schwartz
@ 2005-07-06 13:04           ` Wim Coekaerts
  2005-07-06 13:51             ` Kurt Hackel
  2005-07-06 13:57             ` Mark Fasheh
  0 siblings, 2 replies; 10+ messages in thread
From: Wim Coekaerts @ 2005-07-06 13:04 UTC (permalink / raw)
  To: ocfs2-devel

it would be entertaining to see how it even works... but uhm go ahead.
would be a good test we sure don't have the hardware to do that

On Wed, Jul 06, 2005 at 10:38:42AM -0700, Bruce Schwartz wrote:
> Thanks. From looking at the code it appears that the maximum number of nodes 
> are controlled by some #defines (OCFS2_NODE_MAP_MAX_NODES, O2NM_MAX_NODES) 
> and that bumping the number up to something like 300 should be a simple 
> matter. There is a note in the code that reads: "if we need more, we can do 
> a kmalloc for the map" which I would guess addresses the case where you'd 
> want thousands of nodes.
> 
> Is my reading correct? And would it be a bad idea to try to set up a 300+ 
> node OCFS2 system?
> 
> Thanks,
> Bruce
> 
> On 7/5/05, Sunil Mushran <Sunil.Mushran@oracle.com> wrote:
> > 
> > It's actually 255. Yes, that doc needs to be updated.
> > 
> > No, the algorithms did not play much role in setting this limit.
> > Guess, can say the limit is part arbitrary, part practical.
> > (That extra byte adds up pretty quickly.)
> > 
> > Bruce Schwartz wrote:
> > 
> > > Hi all --
> > >
> > > In the "what's new in OCFS2" document at
> > > 
> > http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-whats-new.txt
> > > it says that the 256 node limit is a software limit and could be
> > > lifted. Why is that limit there? Are there some algorithms that
> > > don't scale nicely with larger number of nodes? I'm guessing that
> > > there is more to it than saving a byte of RAM in a few data structures.
> > >
> > > Thanks,
> > > Bruce
> > >
> > >------------------------------------------------------------------------
> > >
> > >_______________________________________________
> > >Ocfs2-devel mailing list
> > >Ocfs2-devel@oss.oracle.com
> > >http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> > >
> > >
> >

> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-06 13:04           ` Wim Coekaerts
@ 2005-07-06 13:51             ` Kurt Hackel
  2005-07-06 13:57             ` Mark Fasheh
  1 sibling, 0 replies; 10+ messages in thread
From: Kurt Hackel @ 2005-07-06 13:51 UTC (permalink / raw)
  To: ocfs2-devel

Hi,

Be careful here.  In many structures you will find a node number is
represented by a single u8.  Changing the maximum by modifying
O2NM_MAX_NODES will not affect this storage size.  

If you do make the change in the dlm, you will have to:

  1) seek out all of these single-byte values and change them to
     whatever is appropriate for your new upper bound.  For instance,
     if you choose 65535 as the new max, a u16 will suffice.

  2) make sure to reserve one value at the top of your new (unsigned)
     upper bound.  From the example above, a u16 ranges from 
     0 - 65536, so use 65535 as your new O2NM_MAX_NODES.  This is needed
     for such things as DLM_LOCK_RES_OWNER_UNKNOWN, an unknown nodenum.

  3) properly pad all of your new structs to 64 bit boundaries.

  4) for network structures, modify each of the _to_net and _to_host
     byteordering functions for the new u16.  You will notice that many
     of these functions are empty because they consist only of u8 values
     currently, but they must now be implemented.

  5) change any functions which take a u8 as a parameter.  Fortunately,
     in most cases we're already using a u8, so I think (with
     appropriate gcc pedantic-ness) you can catch those at compile time.

  6) watch out for the dlm_node_iter structure.  These are almost always
     stack-allocated.  At 256 nodes, the node_map portion of these are
     32 bytes wide.  If you do actually bump this to something huge
     (like the example above, 65535), that would be 8k!  So don't just
     go arbitrarily large, or find another way to implement the
     dlm_node_iter functionality.  Keep in mind, the reason for the
     structure we're using is to avoid having to kmalloc in different
     types of ugly codepaths (under spinlock, -ENOMEM too difficult to
     deal with, etc.), so keep an eye out for that.  If you pick 300,
     like you were saying, the size will only go to 38 bytes, up 6 from 
     the current size.

To make a long story even longer, what you're asking for is definitely
do-able and probably even desirable but also painful.  In the dlm
source, *most* of the u8 values in the headers are node numbers, so look
for anything along the lines of "_to", "_from", "_node", "_master", 
"_idx", etc. 

Keep in mind, if you don't make these changes and just bump up the
constant, your node numbers above 255 will likely silently wrap and you
will hit corruptions somewhere down the line.

Thanks!
-kurt



On Wed, Jul 06, 2005 at 11:04:19AM -0700, Wim Coekaerts wrote:
> it would be entertaining to see how it even works... but uhm go ahead.
> would be a good test we sure don't have the hardware to do that
> 
> On Wed, Jul 06, 2005 at 10:38:42AM -0700, Bruce Schwartz wrote:
> > Thanks. From looking at the code it appears that the maximum number of nodes 
> > are controlled by some #defines (OCFS2_NODE_MAP_MAX_NODES, O2NM_MAX_NODES) 
> > and that bumping the number up to something like 300 should be a simple 
> > matter. There is a note in the code that reads: "if we need more, we can do 
> > a kmalloc for the map" which I would guess addresses the case where you'd 
> > want thousands of nodes.
> > 
> > Is my reading correct? And would it be a bad idea to try to set up a 300+ 
> > node OCFS2 system?
> > 
> > Thanks,
> > Bruce
> > 
> > On 7/5/05, Sunil Mushran <Sunil.Mushran@oracle.com> wrote:
> > > 
> > > It's actually 255. Yes, that doc needs to be updated.
> > > 
> > > No, the algorithms did not play much role in setting this limit.
> > > Guess, can say the limit is part arbitrary, part practical.
> > > (That extra byte adds up pretty quickly.)
> > > 
> > > Bruce Schwartz wrote:
> > > 
> > > > Hi all --
> > > >
> > > > In the "what's new in OCFS2" document at
> > > > 
> > > http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-whats-new.txt
> > > > it says that the 256 node limit is a software limit and could be
> > > > lifted. Why is that limit there? Are there some algorithms that
> > > > don't scale nicely with larger number of nodes? I'm guessing that
> > > > there is more to it than saving a byte of RAM in a few data structures.
> > > >
> > > > Thanks,
> > > > Bruce
> > > >
> > > >------------------------------------------------------------------------
> > > >
> > > >_______________________________________________
> > > >Ocfs2-devel mailing list
> > > >Ocfs2-devel@oss.oracle.com
> > > >http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> > > >
> > > >
> > >
> 
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel@oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-06 13:04           ` Wim Coekaerts
  2005-07-06 13:51             ` Kurt Hackel
@ 2005-07-06 13:57             ` Mark Fasheh
  2005-07-06 14:04               ` Mark Fasheh
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Fasheh @ 2005-07-06 13:57 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Jul 06, 2005 at 11:04:19AM -0700, Wim Coekaerts wrote:
> it would be entertaining to see how it even works... but uhm go ahead.
> would be a good test we sure don't have the hardware to do that
Well the first problem you're likely to hit is that nm, and the dlm
represent node number as a u8 (dlm more specifically in it's network
packets).

Heartbeat uses a u8 nodenum in it's heartbeat block, but has the next 3 u8's
reserved for when we move to a larger value for node_num so simply replacing
that whole bit with u32 at least wouldn't change the size of the structure.
The next smallest disk value would be the slot map items which are u16's
each, but I expect we'll have moved to an alternate method of picking a
nodes slot by the time that becomes an issue.

Anyway this is all off the top of my head. It'd be interesting to see what
you run into after having made the structure changes to support > 254 nodes.

Btw, I say > 254 because node number 255 is reserved for 'invalid node num'.
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-06 13:57             ` Mark Fasheh
@ 2005-07-06 14:04               ` Mark Fasheh
  2005-07-06 16:20                 ` Bruce Schwartz
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Fasheh @ 2005-07-06 14:04 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Jul 06, 2005 at 11:58:01AM -0700, Mark Fasheh wrote:
> Btw, I say > 254 because node number 255 is reserved for 'invalid node num'.
erf, off by one error... never mind I said that :)
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Ocfs2-devel] 256 node limit
  2005-07-06 14:04               ` Mark Fasheh
@ 2005-07-06 16:20                 ` Bruce Schwartz
  0 siblings, 0 replies; 10+ messages in thread
From: Bruce Schwartz @ 2005-07-06 16:20 UTC (permalink / raw)
  To: ocfs2-devel

Thanks for all the great responses. I had only skimmed the ocfs2 code and 
hadn't looked at heartbeat, dlm, etc. so I had missed the various u8's. 
Certainly the scope of the problem is much clearer now and I'm glad to hear 
that there isn't anything particularly difficult about it (at least up to 
the point where the number of nodes would require kmalloc instead of 
allocating off the stack). There's just lots of mechanical changes get 
right.

Thanks again and I'll report back if I actually try this.

-Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20050706/f3252bc6/attachment.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-07-06 16:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200507011649.j61GnMlY022352@oss.oracle.com>
2005-07-01 12:05 ` [Ocfs2-devel] Re: [Ocfs2-commits] zab commits r2451 - trunk/fs/ocfs2/cluster Lars Marowsky-Bree
2005-07-01 13:20   ` Zach Brown
2005-07-01 18:51     ` [Ocfs2-devel] 256 node limit Bruce Schwartz
2005-07-05 18:23       ` Sunil Mushran
2005-07-06 12:38         ` Bruce Schwartz
2005-07-06 13:04           ` Wim Coekaerts
2005-07-06 13:51             ` Kurt Hackel
2005-07-06 13:57             ` Mark Fasheh
2005-07-06 14:04               ` Mark Fasheh
2005-07-06 16:20                 ` Bruce Schwartz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.