public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Performance results with exofs
       [not found] <op.vdxrrgf1unckof@usensfaibisl2e.eng.emc.com>
@ 2010-06-07 16:07 ` Boaz Harrosh
  2010-06-07 16:13   ` Boaz Harrosh
  0 siblings, 1 reply; 13+ messages in thread
From: Boaz Harrosh @ 2010-06-07 16:07 UTC (permalink / raw)
  To: sfaibish; +Cc: J. Bruce Fields, NFS list

On 06/07/2010 06:24 PM, sfaibish wrote:
> Boaz,
> 
> You were mentioning some preliminary performance on NFS4.1 and pNFS during
> the pNFS call few weeks back. I thought you put them in an email but I  
> couldn't
> find that email. Could you re-send it to me or summarize the results in a  
> new
> email for comparison to the block layout performance. Bruce is also  
> interested
> so I CC him as well. Thanks
> 
> /Sorin
> 

I did not yet publish the Document. It's stuck behind my dis-talent for
writing and the pnfs bugs de jur.

Basically all machines:
- connected by a 1 GBit link.
- All clients doing a dd write of 8GB file from /dev/zero
- 3of8 is the special raid-groups arrangement of exofs && objlayout
  where out of 8 devices each file is striped over 3 devices in a
  round robin fashion. (*With a small dirty trick)

[single client]
1 - osds 40MB
2 - osds 80MB
4 - osds 114MB (saturation point of the 1 Gbit link)
8 - osds 114MB

[2 clients 8of8 osds]
226 MBs

[4 clients 8of8 osds]
263 MBs

[8 clients 8of8 osds]
252 MBs

[1 clients 3of8 osds]
114 MBs

[2 clients 3of8 osds *]
226 MBs

[4 clients 3of8 osds *]
417 MBs

[8 clients 3of8 osds]
405 MBs

Boaz


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 16:07 ` Performance results with exofs Boaz Harrosh
@ 2010-06-07 16:13   ` Boaz Harrosh
  2010-06-07 17:28     ` sfaibish
  0 siblings, 1 reply; 13+ messages in thread
From: Boaz Harrosh @ 2010-06-07 16:13 UTC (permalink / raw)
  To: sfaibish; +Cc: J. Bruce Fields, NFS list

On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
> On 06/07/2010 06:24 PM, sfaibish wrote:
>> Boaz,
>>
>> You were mentioning some preliminary performance on NFS4.1 and pNFS during
>> the pNFS call few weeks back. I thought you put them in an email but I  
>> couldn't
>> find that email. Could you re-send it to me or summarize the results in a  
>> new
>> email for comparison to the block layout performance. Bruce is also  
>> interested
>> so I CC him as well. Thanks
>>
>> /Sorin
>>
> 
> I did not yet publish the Document. It's stuck behind my dis-talent for
> writing and the pnfs bugs de jur.
> 
> Basically all machines:
> - connected by a 1 GBit link.
> - All clients doing a dd write of 8GB file from /dev/zero
> - 3of8 is the special raid-groups arrangement of exofs && objlayout
>   where out of 8 devices each file is striped over 3 devices in a
>   round robin fashion. (*With a small dirty trick)
> 

- All tests over an *empty* filesystem.

> [single client]
> 1 - osds 40MB
> 2 - osds 80MB
> 4 - osds 114MB (saturation point of the 1 Gbit link)
> 8 - osds 114MB
> 
> [2 clients 8of8 osds]
> 226 MBs
> 
> [4 clients 8of8 osds]
> 263 MBs
> 
> [8 clients 8of8 osds]
> 252 MBs
> 
> [1 clients 3of8 osds]
> 114 MBs
> 
> [2 clients 3of8 osds *]
> 226 MBs
> 
> [4 clients 3of8 osds *]
> 417 MBs
> 
> [8 clients 3of8 osds]
> 405 MBs
> 
> Boaz
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 16:13   ` Boaz Harrosh
@ 2010-06-07 17:28     ` sfaibish
       [not found]       ` <op.vdxxhfqsunckof-sXut7+96orlxdPWQvOaHCoI83tS8F2Zb0E9HWUfgJXw@public.gmane.org>
  2010-06-07 18:29       ` J. Bruce Fields
  0 siblings, 2 replies; 13+ messages in thread
From: sfaibish @ 2010-06-07 17:28 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: J. Bruce Fields, NFS list

Thanks.

/Sorin

On Mon, 07 Jun 2010 12:13:31 -0400, Boaz Harrosh <bharrosh@panasas.com>=
 =20
wrote:

> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
>> On 06/07/2010 06:24 PM, sfaibish wrote:
>>> Boaz,
>>>
>>> You were mentioning some preliminary performance on NFS4.1 and pNFS=
 =20
>>> during
>>> the pNFS call few weeks back. I thought you put them in an email bu=
t I
>>> couldn't
>>> find that email. Could you re-send it to me or summarize the result=
s =20
>>> in a
>>> new
>>> email for comparison to the block layout performance. Bruce is also
>>> interested
>>> so I CC him as well. Thanks
>>>
>>> /Sorin
>>>
>>
>> I did not yet publish the Document. It's stuck behind my dis-talent =
for
>> writing and the pnfs bugs de jur.
>>
>> Basically all machines:
>> - connected by a 1 GBit link.
>> - All clients doing a dd write of 8GB file from /dev/zero
>> - 3of8 is the special raid-groups arrangement of exofs && objlayout
>>   where out of 8 devices each file is striped over 3 devices in a
>>   round robin fashion. (*With a small dirty trick)
>>
>
> - All tests over an *empty* filesystem.
>
>> [single client]
>> 1 - osds 40MB
>> 2 - osds 80MB
>> 4 - osds 114MB (saturation point of the 1 Gbit link)
>> 8 - osds 114MB
>>
>> [2 clients 8of8 osds]
>> 226 MBs
>>
>> [4 clients 8of8 osds]
>> 263 MBs
>>
>> [8 clients 8of8 osds]
>> 252 MBs
>>
>> [1 clients 3of8 osds]
>> 114 MBs
>>
>> [2 clients 3of8 osds *]
>> 226 MBs
>>
>> [4 clients 3of8 osds *]
>> 417 MBs
>>
>> [8 clients 3of8 osds]
>> 405 MBs
>>
>> Boaz
>>
>
>
>



--=20
Best Regards

Sorin Faibish
Corporate Distinguished Engineer
Network Storage Group

         EMC=B2
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : sfaibish@emc.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
       [not found]       ` <op.vdxxhfqsunckof-sXut7+96orlxdPWQvOaHCoI83tS8F2Zb0E9HWUfgJXw@public.gmane.org>
@ 2010-06-07 17:29         ` Boaz Harrosh
  2010-06-07 17:34           ` sfaibish
  0 siblings, 1 reply; 13+ messages in thread
From: Boaz Harrosh @ 2010-06-07 17:29 UTC (permalink / raw)
  To: sfaibish; +Cc: J. Bruce Fields, NFS list

Show me yours and I'll show you mine, ...! ;-)

Boaz

On 06/07/2010 08:28 PM, sfaibish wrote:
> Thanks.
> 
> /Sorin
> 
> On Mon, 07 Jun 2010 12:13:31 -0400, Boaz Harrosh <bharrosh@panasas.com>  
> wrote:
> 
>> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
>>> On 06/07/2010 06:24 PM, sfaibish wrote:
>>>> Boaz,
>>>>
>>>> You were mentioning some preliminary performance on NFS4.1 and pNFS  
>>>> during
>>>> the pNFS call few weeks back. I thought you put them in an email but I
>>>> couldn't
>>>> find that email. Could you re-send it to me or summarize the results  
>>>> in a
>>>> new
>>>> email for comparison to the block layout performance. Bruce is also
>>>> interested
>>>> so I CC him as well. Thanks
>>>>
>>>> /Sorin
>>>>
>>>
>>> I did not yet publish the Document. It's stuck behind my dis-talent for
>>> writing and the pnfs bugs de jur.
>>>
>>> Basically all machines:
>>> - connected by a 1 GBit link.
>>> - All clients doing a dd write of 8GB file from /dev/zero
>>> - 3of8 is the special raid-groups arrangement of exofs && objlayout
>>>   where out of 8 devices each file is striped over 3 devices in a
>>>   round robin fashion. (*With a small dirty trick)
>>>
>>
>> - All tests over an *empty* filesystem.
>>
>>> [single client]
>>> 1 - osds 40MB
>>> 2 - osds 80MB
>>> 4 - osds 114MB (saturation point of the 1 Gbit link)
>>> 8 - osds 114MB
>>>
>>> [2 clients 8of8 osds]
>>> 226 MBs
>>>
>>> [4 clients 8of8 osds]
>>> 263 MBs
>>>
>>> [8 clients 8of8 osds]
>>> 252 MBs
>>>
>>> [1 clients 3of8 osds]
>>> 114 MBs
>>>
>>> [2 clients 3of8 osds *]
>>> 226 MBs
>>>
>>> [4 clients 3of8 osds *]
>>> 417 MBs
>>>
>>> [8 clients 3of8 osds]
>>> 405 MBs
>>>
>>> Boaz
>>>
>>
>>
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 17:29         ` Boaz Harrosh
@ 2010-06-07 17:34           ` sfaibish
  0 siblings, 0 replies; 13+ messages in thread
From: sfaibish @ 2010-06-07 17:34 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: J. Bruce Fields, NFS list

On Mon, 07 Jun 2010 13:29:41 -0400, Boaz Harrosh <bharrosh@panasas.com>=
 =20
wrote:

> Show me yours and I'll show you mine, ...! ;-)
Working on this. Hope that next week we will have something to share. :=
)

>
> Boaz
>
> On 06/07/2010 08:28 PM, sfaibish wrote:
>> Thanks.
>>
>> /Sorin
>>
>> On Mon, 07 Jun 2010 12:13:31 -0400, Boaz Harrosh <bharrosh@panasas.c=
om>
>> wrote:
>>
>>> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
>>>> On 06/07/2010 06:24 PM, sfaibish wrote:
>>>>> Boaz,
>>>>>
>>>>> You were mentioning some preliminary performance on NFS4.1 and pN=
=46S
>>>>> during
>>>>> the pNFS call few weeks back. I thought you put them in an email =
but =20
>>>>> I
>>>>> couldn't
>>>>> find that email. Could you re-send it to me or summarize the resu=
lts
>>>>> in a
>>>>> new
>>>>> email for comparison to the block layout performance. Bruce is al=
so
>>>>> interested
>>>>> so I CC him as well. Thanks
>>>>>
>>>>> /Sorin
>>>>>
>>>>
>>>> I did not yet publish the Document. It's stuck behind my dis-talen=
t =20
>>>> for
>>>> writing and the pnfs bugs de jur.
>>>>
>>>> Basically all machines:
>>>> - connected by a 1 GBit link.
>>>> - All clients doing a dd write of 8GB file from /dev/zero
>>>> - 3of8 is the special raid-groups arrangement of exofs && objlayou=
t
>>>>   where out of 8 devices each file is striped over 3 devices in a
>>>>   round robin fashion. (*With a small dirty trick)
>>>>
>>>
>>> - All tests over an *empty* filesystem.
>>>
>>>> [single client]
>>>> 1 - osds 40MB
>>>> 2 - osds 80MB
>>>> 4 - osds 114MB (saturation point of the 1 Gbit link)
>>>> 8 - osds 114MB
>>>>
>>>> [2 clients 8of8 osds]
>>>> 226 MBs
>>>>
>>>> [4 clients 8of8 osds]
>>>> 263 MBs
>>>>
>>>> [8 clients 8of8 osds]
>>>> 252 MBs
>>>>
>>>> [1 clients 3of8 osds]
>>>> 114 MBs
>>>>
>>>> [2 clients 3of8 osds *]
>>>> 226 MBs
>>>>
>>>> [4 clients 3of8 osds *]
>>>> 417 MBs
>>>>
>>>> [8 clients 3of8 osds]
>>>> 405 MBs
>>>>
>>>> Boaz
>>>>
>>>
>>>
>>>
>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



--=20
Best Regards

Sorin Faibish
Corporate Distinguished Engineer
Network Storage Group

         EMC=B2
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : sfaibish@emc.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 17:28     ` sfaibish
       [not found]       ` <op.vdxxhfqsunckof-sXut7+96orlxdPWQvOaHCoI83tS8F2Zb0E9HWUfgJXw@public.gmane.org>
@ 2010-06-07 18:29       ` J. Bruce Fields
  2010-06-07 18:41         ` Boaz Harrosh
  2010-06-07 18:43         ` Boaz Harrosh
  1 sibling, 2 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-06-07 18:29 UTC (permalink / raw)
  To: sfaibish; +Cc: Boaz Harrosh, NFS list

>> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
>>> I did not yet publish the Document. It's stuck behind my dis-talent for
>>> writing and the pnfs bugs de jur.

Untalented writing we can fix, as long as the details are there!

>>>
>>> Basically all machines:
>>> - connected by a 1 GBit link.
>>> - All clients doing a dd write of 8GB file from /dev/zero
>>> - 3of8 is the special raid-groups arrangement of exofs && objlayout
>>>   where out of 8 devices each file is striped over 3 devices in a
>>>   round robin fashion. (*With a small dirty trick)

Random stupid questions:

	- why do you think the 3of8 arrangement is scaling better than
	  the 8of8?
	- Have you tried any other workloads?  (Perfectly reasonable
	  that simple write throughput would be the first thing to
	  check--I'm just curious.)

>>>
>>
>> - All tests over an *empty* filesystem.
>>
>>> [single client]
>>> 1 - osds 40MB
>>> 2 - osds 80MB
>>> 4 - osds 114MB (saturation point of the 1 Gbit link)
>>> 8 - osds 114MB
>>>
>>> [2 clients 8of8 osds]
>>> 226 MBs
>>>
>>> [4 clients 8of8 osds]
>>> 263 MBs
>>>
>>> [8 clients 8of8 osds]
>>> 252 MBs
>>>
>>> [1 clients 3of8 osds]
>>> 114 MBs
>>>
>>> [2 clients 3of8 osds *]
>>> 226 MBs
>>>
>>> [4 clients 3of8 osds *]
>>> 417 MBs

If each osd has a single gigabit interface, and you're striping to 3, of
them, isn't that 417/3 == 139 MB/s each?

(Oh, I see: you must be writing to a different file from each client,
hence you are using all osd's even if each client is only using 3?)

--b.

>>>
>>> [8 clients 3of8 osds]
>>> 405 MBs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 18:29       ` J. Bruce Fields
@ 2010-06-07 18:41         ` Boaz Harrosh
  2010-06-07 18:49           ` J. Bruce Fields
  2010-06-07 18:43         ` Boaz Harrosh
  1 sibling, 1 reply; 13+ messages in thread
From: Boaz Harrosh @ 2010-06-07 18:41 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: sfaibish, NFS list

On 06/07/2010 09:29 PM, J. Bruce Fields wrote:
>>> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
>>>> I did not yet publish the Document. It's stuck behind my dis-talent for
>>>> writing and the pnfs bugs de jur.
> 
> Untalented writing we can fix, as long as the details are there!
> 
>>>>
>>>> Basically all machines:
>>>> - connected by a 1 GBit link.
>>>> - All clients doing a dd write of 8GB file from /dev/zero
>>>> - 3of8 is the special raid-groups arrangement of exofs && objlayout
>>>>   where out of 8 devices each file is striped over 3 devices in a
>>>>   round robin fashion. (*With a small dirty trick)
> 
> Random stupid questions:
> 
> 	- why do you think the 3of8 arrangement is scaling better than
> 	  the 8of8?

It's a know problem with a network storage cluster. What happens is
that with 8of8 all the clients exercise all of the nodes at the same
time so they are clashing on the network.

With 3of8 each node can still saturate it's link. (3 was chosen carefully from the
first test) and some nodes talk to some OSDs while other talk to other, so there is
more chance of pairs * 1GBit at the same time.

(The dirty trick I did was insert dummy files so the 4 client test will exercise all
 8 devices. Otherwise the stupid exofs round robin algorithm would only exercise 4+3
 devices.)

> 	- Have you tried any other workloads?  (Perfectly reasonable
> 	  that simple write throughput would be the first thing to
> 	  check--I'm just curious.)

Never got to it. Busy with Bakeathon preparations. Would like too very much

Thanks
Boaz

> 
>>>>
>>>
>>> - All tests over an *empty* filesystem.
>>>
>>>> [single client]
>>>> 1 - osds 40MB
>>>> 2 - osds 80MB
>>>> 4 - osds 114MB (saturation point of the 1 Gbit link)
>>>> 8 - osds 114MB
>>>>
>>>> [2 clients 8of8 osds]
>>>> 226 MBs
>>>>
>>>> [4 clients 8of8 osds]
>>>> 263 MBs
>>>>
>>>> [8 clients 8of8 osds]
>>>> 252 MBs
>>>>
>>>> [1 clients 3of8 osds]
>>>> 114 MBs
>>>>
>>>> [2 clients 3of8 osds *]
>>>> 226 MBs
>>>>
>>>> [4 clients 3of8 osds *]
>>>> 417 MBs
> 
> If each osd has a single gigabit interface, and you're striping to 3, of
> them, isn't that 417/3 == 139 MB/s each?
> 
> (Oh, I see: you must be writing to a different file from each client,
> hence you are using all osd's even if each client is only using 3?)
> 
> --b.
> 
>>>>
>>>> [8 clients 3of8 osds]
>>>> 405 MBs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 18:29       ` J. Bruce Fields
  2010-06-07 18:41         ` Boaz Harrosh
@ 2010-06-07 18:43         ` Boaz Harrosh
  1 sibling, 0 replies; 13+ messages in thread
From: Boaz Harrosh @ 2010-06-07 18:43 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: sfaibish, NFS list

On 06/07/2010 09:29 PM, J. Bruce Fields wrote:
>>>> [4 clients 3of8 osds *]
>>>> 417 MBs
> 
> If each osd has a single gigabit interface, and you're striping to 3, of
> them, isn't that 417/3 == 139 MB/s each?
> 
> (Oh, I see: you must be writing to a different file from each client,
> hence you are using all osd's even if each client is only using 3?)
> 

Right and that little trick from the previous email ;-)

Boaz
> --b.
> 
>>>>
>>>> [8 clients 3of8 osds]
>>>> 405 MBs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 18:41         ` Boaz Harrosh
@ 2010-06-07 18:49           ` J. Bruce Fields
  2010-06-08  5:26             ` Boaz Harrosh
  2010-06-08  6:54             ` Benny Halevy
  0 siblings, 2 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-06-07 18:49 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: sfaibish, NFS list

On Mon, Jun 07, 2010 at 09:41:29PM +0300, Boaz Harrosh wrote:
> On 06/07/2010 09:29 PM, J. Bruce Fields wrote:
> >>> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
> >>>> I did not yet publish the Document. It's stuck behind my dis-talent for
> >>>> writing and the pnfs bugs de jur.
> > 
> > Untalented writing we can fix, as long as the details are there!
> > 
> >>>>
> >>>> Basically all machines:
> >>>> - connected by a 1 GBit link.
> >>>> - All clients doing a dd write of 8GB file from /dev/zero
> >>>> - 3of8 is the special raid-groups arrangement of exofs && objlayout
> >>>>   where out of 8 devices each file is striped over 3 devices in a
> >>>>   round robin fashion. (*With a small dirty trick)
> > 
> > Random stupid questions:
> > 
> > 	- why do you think the 3of8 arrangement is scaling better than
> > 	  the 8of8?
> 
> It's a know problem with a network storage cluster. What happens is
> that with 8of8 all the clients exercise all of the nodes at the same
> time so they are clashing on the network.

OK, so if two clients are both trying to send a stripe of data to the
same OSD data at the same time, absent a switch that could somehow
afford to queue up a full stripe-unit's worth of data, packets get lost?

(Also, out of curiosity: do you know of any papers or documentation that
describe that problem in more detail?)

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 18:49           ` J. Bruce Fields
@ 2010-06-08  5:26             ` Boaz Harrosh
  2010-06-08  6:54             ` Benny Halevy
  1 sibling, 0 replies; 13+ messages in thread
From: Boaz Harrosh @ 2010-06-08  5:26 UTC (permalink / raw)
  To: J. Bruce Fields, Welch, Brent; +Cc: sfaibish, NFS list

On 06/07/2010 09:49 PM, J. Bruce Fields wrote:
>>
>> It's a know problem with a network storage cluster. What happens is
>> that with 8of8 all the clients exercise all of the nodes at the same
>> time so they are clashing on the network.
> 
> OK, so if two clients are both trying to send a stripe of data to the
> same OSD data at the same time, absent a switch that could somehow
> afford to queue up a full stripe-unit's worth of data, packets get lost?
> 

It's tcp they don't get lost, per-se they just get queued up. And that tcp
ramp up and all that, you know. 

We use a 64k stripe unit with say raid of 4-8 that's 256k-1M bytes in a stripe.
I don't think a network buffer that big will help at all. It'll just delay
everything more. The best is a sound statistical network strategy that'll let
the system even out overall. (Or not ...)

> (Also, out of curiosity: do you know of any papers or documentation that
> describe that problem in more detail?)
> 

Personally, I'm privileged to learn from the best here at Panasas. 

CC: Brent, Can you recommend to Bruce some good papers about raid
groups and network SAN strategies? 

> --b.

Boaz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-07 18:49           ` J. Bruce Fields
  2010-06-08  5:26             ` Boaz Harrosh
@ 2010-06-08  6:54             ` Benny Halevy
  2010-06-08 14:48               ` sfaibish
  1 sibling, 1 reply; 13+ messages in thread
From: Benny Halevy @ 2010-06-08  6:54 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Boaz Harrosh, sfaibish, NFS list

On 2010-06-07 21:49, J. Bruce Fields wrote:
> On Mon, Jun 07, 2010 at 09:41:29PM +0300, Boaz Harrosh wrote:
>> On 06/07/2010 09:29 PM, J. Bruce Fields wrote:
>>>>> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
>>>>>> I did not yet publish the Document. It's stuck behind my dis-talent for
>>>>>> writing and the pnfs bugs de jur.
>>>
>>> Untalented writing we can fix, as long as the details are there!
>>>
>>>>>>
>>>>>> Basically all machines:
>>>>>> - connected by a 1 GBit link.
>>>>>> - All clients doing a dd write of 8GB file from /dev/zero
>>>>>> - 3of8 is the special raid-groups arrangement of exofs && objlayout
>>>>>>   where out of 8 devices each file is striped over 3 devices in a
>>>>>>   round robin fashion. (*With a small dirty trick)
>>>
>>> Random stupid questions:
>>>
>>> 	- why do you think the 3of8 arrangement is scaling better than
>>> 	  the 8of8?
>>
>> It's a know problem with a network storage cluster. What happens is
>> that with 8of8 all the clients exercise all of the nodes at the same
>> time so they are clashing on the network.
> 
> OK, so if two clients are both trying to send a stripe of data to the
> same OSD data at the same time, absent a switch that could somehow
> afford to queue up a full stripe-unit's worth of data, packets get lost?
> 
> (Also, out of curiosity: do you know of any papers or documentation that
> describe that problem in more detail?)
> 

A good place to start would be
http://www.pdl.cmu.edu/Incast/

Benny

> --b.
> --

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
  2010-06-08  6:54             ` Benny Halevy
@ 2010-06-08 14:48               ` sfaibish
       [not found]                 ` <op.vdzkrtmqunckof-sXut7+96orlxdPWQvOaHCoI83tS8F2Zb0E9HWUfgJXw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: sfaibish @ 2010-06-08 14:48 UTC (permalink / raw)
  To: Benny Halevy, J. Bruce Fields; +Cc: Boaz Harrosh, NFS list

Problem solved; I sent Bruce 2 relevant papers from CMU and FAST 2009.

/Sorin


On Tue, 08 Jun 2010 02:54:53 -0400, Benny Halevy <bhalevy@panasas.com>  
wrote:

> On 2010-06-07 21:49, J. Bruce Fields wrote:
>> On Mon, Jun 07, 2010 at 09:41:29PM +0300, Boaz Harrosh wrote:
>>> On 06/07/2010 09:29 PM, J. Bruce Fields wrote:
>>>>>> On 06/07/2010 07:07 PM, Boaz Harrosh wrote:
>>>>>>> I did not yet publish the Document. It's stuck behind my  
>>>>>>> dis-talent for
>>>>>>> writing and the pnfs bugs de jur.
>>>>
>>>> Untalented writing we can fix, as long as the details are there!
>>>>
>>>>>>>
>>>>>>> Basically all machines:
>>>>>>> - connected by a 1 GBit link.
>>>>>>> - All clients doing a dd write of 8GB file from /dev/zero
>>>>>>> - 3of8 is the special raid-groups arrangement of exofs && objlayout
>>>>>>>   where out of 8 devices each file is striped over 3 devices in a
>>>>>>>   round robin fashion. (*With a small dirty trick)
>>>>
>>>> Random stupid questions:
>>>>
>>>> 	- why do you think the 3of8 arrangement is scaling better than
>>>> 	  the 8of8?
>>>
>>> It's a know problem with a network storage cluster. What happens is
>>> that with 8of8 all the clients exercise all of the nodes at the same
>>> time so they are clashing on the network.
>>
>> OK, so if two clients are both trying to send a stripe of data to the
>> same OSD data at the same time, absent a switch that could somehow
>> afford to queue up a full stripe-unit's worth of data, packets get lost?
>>
>> (Also, out of curiosity: do you know of any papers or documentation that
>> describe that problem in more detail?)
>>
>
> A good place to start would be
> http://www.pdl.cmu.edu/Incast/
>
> Benny
>
>> --b.
>> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



-- 
Best Regards

Sorin Faibish
Corporate Distinguished Engineer
Network Storage Group

         EMC²
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : sfaibish@emc.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Performance results with exofs
       [not found]                 ` <op.vdzkrtmqunckof-sXut7+96orlxdPWQvOaHCoI83tS8F2Zb0E9HWUfgJXw@public.gmane.org>
@ 2010-06-08 23:15                   ` J. Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2010-06-08 23:15 UTC (permalink / raw)
  To: sfaibish; +Cc: Benny Halevy, Boaz Harrosh, NFS list

On Tue, Jun 08, 2010 at 10:48:55AM -0400, sfaibish wrote:
> Problem solved; I sent Bruce 2 relevant papers from CMU and FAST 2009.

Thanks, all!

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-06-08 23:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <op.vdxrrgf1unckof@usensfaibisl2e.eng.emc.com>
2010-06-07 16:07 ` Performance results with exofs Boaz Harrosh
2010-06-07 16:13   ` Boaz Harrosh
2010-06-07 17:28     ` sfaibish
     [not found]       ` <op.vdxxhfqsunckof-sXut7+96orlxdPWQvOaHCoI83tS8F2Zb0E9HWUfgJXw@public.gmane.org>
2010-06-07 17:29         ` Boaz Harrosh
2010-06-07 17:34           ` sfaibish
2010-06-07 18:29       ` J. Bruce Fields
2010-06-07 18:41         ` Boaz Harrosh
2010-06-07 18:49           ` J. Bruce Fields
2010-06-08  5:26             ` Boaz Harrosh
2010-06-08  6:54             ` Benny Halevy
2010-06-08 14:48               ` sfaibish
     [not found]                 ` <op.vdzkrtmqunckof-sXut7+96orlxdPWQvOaHCoI83tS8F2Zb0E9HWUfgJXw@public.gmane.org>
2010-06-08 23:15                   ` J. Bruce Fields
2010-06-07 18:43         ` Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox