Re: weighted distributed processing.

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: weighted distributed processing.
       [not found] ` <4FA1AFA3.9000300@artefactual.com>
@ 2012-05-02 22:42   ` Clint Byrum
  2012-05-02 23:26     ` Greg Farnum
  0 siblings, 1 reply; 3+ messages in thread
From: Clint Byrum @ 2012-05-02 22:42 UTC (permalink / raw)
  To: Joseph Perry; +Cc: archivematica, gearman, ceph-devel

Excerpts from Joseph Perry's message of Wed May 02 15:05:23 -0700 2012:
> Hello All,
> First off, I'm sending this email to three discussion groups:
> gearman@googlegroups.com - distributed processing library
> ceph-devel@vger.kernel.org - distributed file system
> archivematica@googlegroups.com - my project's discussion list, a 
> distributed processing system.
> 
> I'd like to start a discussion about something I'll refer to as weighted 
> distributed task based processing.
> Presently, we are using gearman's library's to meet our distributed 
> processing needs. The majority of our processing is file based, and our 
> processing stations are accessing the files over an nfs share. We are 
> looking at replacing the nfs server share with a distributed file 
> systems, like ceph.
> 
> It occurs to me that our processing times could theoretically be reduced 
> by by assigning tasks to processing clients where the file resides, over 
> places where it would need to be copied over the network. In order for 
> this to happen, the gearman server would need to get file location 
> information from the ceph system.
> 

If I understand the design of CEPH completely, it spreads I/O at the
block level, not the file level.

So there is little point in weighting since it seeks to spread the whole
file across all the machines/block devices in the cluster. Even if you
do ask ceph "which servers is file X on", which I'm sure it could tell
you, You will end up with high weights for most of the servers, and no
real benefit.

In this scenario, you're just better off having a really powerful network
and CEPH will balance the I/O enough that you can scale out the I/O
independently of the compute resources. This seems like a huge win, as
I don't believe most workloads scale at a 1:1 I/O:CPU ratio. 10Gigabit
switches are still not super cheap, but they are probably cheaper than
software engineer hours.

If your network is not up to the task of transferring all those blocks
around, you probably need to focus instead on something that keeps whole
files in a certain place. One such system would be MogileFS. This has a
database with a list of keys that say where the data lives, and in fact
the protocol the MogileFS tracker uses will tell you all the places a
key lives. You could then place a hint in the payload and have 2 levels
of workers. The pseudo becomes:

-workers register two queues. 'dispatch_foo', and 'do_foo_$hostname'
-client sends task w/ filename to 'dispatch_foo' 
-dispatcher looks at filename, asks mogile where the file is, looks at
recent queue lengths in gearman, and decides whether or not it is enough
of a win to direct the job to the host where the file is, or to farm it
out to somewhere that is less busy.

This will take a lot of poking at to get tuned right, but it should be
tunable to a single number, the ratio of localized queue length versus
non-localized queue length.

> pseudo:
> gearman client creates a task & includes a weight, of type ceph file
> gearman server identifies the file & polls the ceph system for clients 
> that have this file
> ceph system returns a list of clients that have the file locally
> gearman assigns the task
> .    if there is a client available for processing that has the file locally
> .        assign it there
> .        (that client has local access to the file, still on the ceph 
> system)
> .    else
> .        assign to other client
> .        (that processing client will pull the file from the ceph system 
> over the network)
> 
> 
> I call it a weighted distributed processing system, because it reminds 
> me of a weighted die: The outcome is influenced to a certain direction 
> (in the task assignment).
> 
> I wanted to start this as a discussion, rather than filing feature 
> requests, because of the complex nature of the requests, and the nicer 
> medium for feedback, clarification and refinement.
> 
> I'd be very interested to hear feedback on the idea,
> Joseph Perry

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: weighted distributed processing.
  2012-05-02 22:42   ` weighted distributed processing Clint Byrum
@ 2012-05-02 23:26     ` Greg Farnum
  2012-05-02 23:30       ` Greg Farnum
  0 siblings, 1 reply; 3+ messages in thread
From: Greg Farnum @ 2012-05-02 23:26 UTC (permalink / raw)
  To: Clint Byrum; +Cc: Joseph Perry, archivematica, gearman, ceph-devel



On Wednesday, May 2, 2012 at 3:42 PM, Clint Byrum wrote:

> Excerpts from Joseph Perry's message of Wed May 02 15:05:23 -0700 2012:
> > Hello All,
> > First off, I'm sending this email to three discussion groups:
> > gearman@googlegroups.com (mailto:gearman@googlegroups.com) - distributed processing library
> > ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org) - distributed file system
> > archivematica@googlegroups.com (mailto:archivematica@googlegroups.com) - my project's discussion list, a  
> > distributed processing system.
> >  
> > I'd like to start a discussion about something I'll refer to as weighted  
> > distributed task based processing.
> > Presently, we are using gearman's library's to meet our distributed  
> > processing needs. The majority of our processing is file based, and our  
> > processing stations are accessing the files over an nfs share. We are  
> > looking at replacing the nfs server share with a distributed file  
> > systems, like ceph.
> >  
> > It occurs to me that our processing times could theoretically be reduced  
> > by by assigning tasks to processing clients where the file resides, over  
> > places where it would need to be copied over the network. In order for  
> > this to happen, the gearman server would need to get file location  
> > information from the ceph system.
>  
>  
>  
> If I understand the design of CEPH completely, it spreads I/O at the
> block level, not the file level.
>  
> So there is little point in weighting since it seeks to spread the whole
> file across all the machines/block devices in the cluster. Even if you
> do ask ceph "which servers is file X on", which I'm sure it could tell
> you, You will end up with high weights for most of the servers, and no
> real benefit.
>  
> In this scenario, you're just better off having a really powerful network
> and CEPH will balance the I/O enough that you can scale out the I/O
> independently of the compute resources. This seems like a huge win, as
> I don't believe most workloads scale at a 1:1 I/O:CPU ratio. 10Gigabit
> switches are still not super cheap, but they are probably cheaper than
> software engineer hours.
>  
> If your network is not up to the task of transferring all those blocks
> around, you probably need to focus instead on something that keeps whole
> files in a certain place. One such system would be MogileFS. This has a
> database with a list of keys that say where the data lives, and in fact
> the protocol the MogileFS tracker uses will tell you all the places a
> key lives. You could then place a hint in the payload and have 2 levels
> of workers. The pseudo becomes:
>  
> -workers register two queues. 'dispatch_foo', and 'do_foo_$hostname'
> -client sends task w/ filename to 'dispatch_foo'  
> -dispatcher looks at filename, asks mogile where the file is, looks at
> recent queue lengths in gearman, and decides whether or not it is enough
> of a win to direct the job to the host where the file is, or to farm it
> out to somewhere that is less busy.
>  
> This will take a lot of poking at to get tuned right, but it should be
> tunable to a single number, the ratio of localized queue length versus
> non-localized queue length.
>  
> > pseudo:
> > gearman client creates a task & includes a weight, of type ceph file
> > gearman server identifies the file & polls the ceph system for clients  
> > that have this file
> > ceph system returns a list of clients that have the file locally
> > gearman assigns the task
> > . if there is a client available for processing that has the file locally
> > . assign it there
> > . (that client has local access to the file, still on the ceph  
> > system)
> > . else
> > . assign to other client
> > . (that processing client will pull the file from the ceph system  
> > over the network)
> >  
> >  
> > I call it a weighted distributed processing system, because it reminds  
> > me of a weighted die: The outcome is influenced to a certain direction  
> > (in the task assignment).
> >  
> > I wanted to start this as a discussion, rather than filing feature  
> > requests, because of the complex nature of the requests, and the nicer  
> > medium for feedback, clarification and refinement.
> >  
> > I'd be very interested to hear feedback on the idea,
> > Joseph Perry
>  

https://groups.google.com/group/gearman/browse_thread/thread/12a1b3aa64f103d1
^ is the Google Groups link for this (ceph-devel doesn't seem to have gotten the original email — at least I didn't!).

Clint is mostly correct: Ceph does not store files in a single location. It's not block-based in the sense of 4K disk blocks though — instead it breaks up files into (by default) 4MB chunks. It's possible to change this default to a larger number though; our Hadoop bindings break files into 64MB chunks. And it is possible to retrieve this location data using the cephfs tool:
./cephfs  
not enough parameters!
usage: cephfs path command [options]*
Commands:
show_layout -- view the layout information on a file or dir
set_layout -- set the layout on an empty file,
or the default layout on a directory
show_location -- view the location information on a file
Options:
Useful for setting layouts:
--stripe_unit, -u: set the size of each stripe
--stripe_count, -c: set the number of objects to stripe across
--object_size, -s: set the size of the objects to stripe across
--pool, -p: set the pool to use
Useful for getting location data:
--offset, -l: the offset to retrieve location data for



I suspect this provides the information you're looking for?

-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: weighted distributed processing.
  2012-05-02 23:26     ` Greg Farnum
@ 2012-05-02 23:30       ` Greg Farnum
  0 siblings, 0 replies; 3+ messages in thread
From: Greg Farnum @ 2012-05-02 23:30 UTC (permalink / raw)
  To: Joseph Perry; +Cc: ceph-devel

(Trimmed CC:) apparently neither Gearman nor Archivematica lists allow posting from non-members, which leads to some wonderful spam from Google and is going to make holding a cross-list conversation…difficult.  


On Wednesday, May 2, 2012 at 4:26 PM, Greg Farnum wrote:

>  
>  
> On Wednesday, May 2, 2012 at 3:42 PM, Clint Byrum wrote:
>  
> > Excerpts from Joseph Perry's message of Wed May 02 15:05:23 -0700 2012:
> > > Hello All,
> > > First off, I'm sending this email to three discussion groups:
> > > gearman@googlegroups.com (mailto:gearman@googlegroups.com) - distributed processing library
> > > ceph-devel@vger.kernel.org (mailto:ceph-devel@vger.kernel.org) - distributed file system
> > > archivematica@googlegroups.com (mailto:archivematica@googlegroups.com) - my project's discussion list, a  
> > > distributed processing system.
> > >  
> > > I'd like to start a discussion about something I'll refer to as weighted  
> > > distributed task based processing.
> > > Presently, we are using gearman's library's to meet our distributed  
> > > processing needs. The majority of our processing is file based, and our  
> > > processing stations are accessing the files over an nfs share. We are  
> > > looking at replacing the nfs server share with a distributed file  
> > > systems, like ceph.
> > >  
> > > It occurs to me that our processing times could theoretically be reduced  
> > > by by assigning tasks to processing clients where the file resides, over  
> > > places where it would need to be copied over the network. In order for  
> > > this to happen, the gearman server would need to get file location  
> > > information from the ceph system.
> >  
> >  
> >  
> >  
> >  
> > If I understand the design of CEPH completely, it spreads I/O at the
> > block level, not the file level.
> >  
> > So there is little point in weighting since it seeks to spread the whole
> > file across all the machines/block devices in the cluster. Even if you
> > do ask ceph "which servers is file X on", which I'm sure it could tell
> > you, You will end up with high weights for most of the servers, and no
> > real benefit.
> >  
> > In this scenario, you're just better off having a really powerful network
> > and CEPH will balance the I/O enough that you can scale out the I/O
> > independently of the compute resources. This seems like a huge win, as
> > I don't believe most workloads scale at a 1:1 I/O:CPU ratio. 10Gigabit
> > switches are still not super cheap, but they are probably cheaper than
> > software engineer hours.
> >  
> > If your network is not up to the task of transferring all those blocks
> > around, you probably need to focus instead on something that keeps whole
> > files in a certain place. One such system would be MogileFS. This has a
> > database with a list of keys that say where the data lives, and in fact
> > the protocol the MogileFS tracker uses will tell you all the places a
> > key lives. You could then place a hint in the payload and have 2 levels
> > of workers. The pseudo becomes:
> >  
> > -workers register two queues. 'dispatch_foo', and 'do_foo_$hostname'
> > -client sends task w/ filename to 'dispatch_foo'  
> > -dispatcher looks at filename, asks mogile where the file is, looks at
> > recent queue lengths in gearman, and decides whether or not it is enough
> > of a win to direct the job to the host where the file is, or to farm it
> > out to somewhere that is less busy.
> >  
> > This will take a lot of poking at to get tuned right, but it should be
> > tunable to a single number, the ratio of localized queue length versus
> > non-localized queue length.
> >  
> > > pseudo:
> > > gearman client creates a task & includes a weight, of type ceph file
> > > gearman server identifies the file & polls the ceph system for clients  
> > > that have this file
> > > ceph system returns a list of clients that have the file locally
> > > gearman assigns the task
> > > . if there is a client available for processing that has the file locally
> > > . assign it there
> > > . (that client has local access to the file, still on the ceph  
> > > system)
> > > . else
> > > . assign to other client
> > > . (that processing client will pull the file from the ceph system  
> > > over the network)
> > >  
> > >  
> > > I call it a weighted distributed processing system, because it reminds  
> > > me of a weighted die: The outcome is influenced to a certain direction  
> > > (in the task assignment).
> > >  
> > > I wanted to start this as a discussion, rather than filing feature  
> > > requests, because of the complex nature of the requests, and the nicer  
> > > medium for feedback, clarification and refinement.
> > >  
> > > I'd be very interested to hear feedback on the idea,
> > > Joseph Perry
> >  
>  
>  
>  
> https://groups.google.com/group/gearman/browse_thread/thread/12a1b3aa64f103d1
> ^ is the Google Groups link for this (ceph-devel doesn't seem to have gotten the original email — at least I didn't!).
>  
> Clint is mostly correct: Ceph does not store files in a single location. It's not block-based in the sense of 4K disk blocks though — instead it breaks up files into (by default) 4MB chunks. It's possible to change this default to a larger number though; our Hadoop bindings break files into 64MB chunks. And it is possible to retrieve this location data using the cephfs tool:
> ./cephfs  
> not enough parameters!
> usage: cephfs path command [options]*
> Commands:
> show_layout -- view the layout information on a file or dir
> set_layout -- set the layout on an empty file,
> or the default layout on a directory
> show_location -- view the location information on a file
> Options:
> Useful for setting layouts:
> --stripe_unit, -u: set the size of each stripe
> --stripe_count, -c: set the number of objects to stripe across
> --object_size, -s: set the size of the objects to stripe across
> --pool, -p: set the pool to use
> Useful for getting location data:
> --offset, -l: the offset to retrieve location data for
>  
>  
>  
> I suspect this provides the information you're looking for?
>  
> -Greg  


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-02 23:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4FA18F2D.3020007@usask.ca>
     [not found] ` <4FA1AFA3.9000300@artefactual.com>
2012-05-02 22:42   ` weighted distributed processing Clint Byrum
2012-05-02 23:26     ` Greg Farnum
2012-05-02 23:30       ` Greg Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.