All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: RBD Snap removal priority
       [not found] ` <CACkq2mqCDvuRdCV-=gEwVNEke-UchOqSsNA=+z-Tux5xWhsa9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-09-27 18:05   ` Mike Dawson
       [not found]     ` <5245C8E6.9000305-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Mike Dawson @ 2013-09-27 18:05 UTC (permalink / raw)
  To: Travis Rhoden, ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[cc ceph-devel]

Travis,

RBD doesn't behave well when Ceph maintainance operations create spindle 
contention (i.e. 100% util from iostat). More about that below.

Do you run XFS under your OSDs? If so, can you check for extent 
fragmentation? Should be something like:

xfs_db -c frag -r /dev/sdb1

We recently saw a fragmentation factors of over 80%, with lots of ino's 
having hundreds of extents. After 24 hours+ of defrag'ing, we got it 
under control, but we're seeing the fragmentation factor grow by ~1.5% 
daily. We experienced spindle contention issues even after the defrag.



Sage, Sam, etc,

I think the real issue is Ceph has several states where it performs what 
I would call "maintanance operations" that saturate the underlying 
storage without properly yielding to client i/o (which should have a 
higher priority).

I have experienced or seen reports of Ceph maintainance affecting rbd 
client i/o in many ways:

- QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from 
Ceph Maintainance [1]
- Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2]
- rbd snap rm (Travis' report below)

[1] http://tracker.ceph.com/issues/6278
[2] http://tracker.ceph.com/issues/6333

I think this family of issues speak to the need for Ceph to have more 
visibility into the underlying storage's limitations (especially spindle 
contention) when performing known expensive maintainance operations.

Thanks,
Mike Dawson

On 9/27/2013 12:25 PM, Travis Rhoden wrote:
> Hello everyone,
>
> I'm running a Cuttlefish cluster that hosts a lot of RBDs.  I recently
> removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed
> that all of the clients had markedly decreased performance.  Looking
> at iostat on the OSD nodes had most disks pegged at 100% util.
>
> I know there are thread priorities that can be set for clients vs
> recovery, but I'm not sure what deleting a snapshot falls under.  I
> couldn't really find anything relevant.  Is there anything I can tweak
> to lower the priority of such an operation?  I didn't need it to
> complete fast, as "rbd snap rm" returns immediately and the actual
> deletion is done asynchronously.  I'd be fine with it taking longer at
> a lower priority, but as it stands now it brings my cluster to a crawl
> and is causing issues with several VMs.
>
> I see an "osd snap trim thread timeout" option in the docs -- Is the
> operation occuring here what you would call snap trimming?  If so, any
> chance of adding an option for "osd snap trim priority" just like
> there is for osd client op and osd recovery op?
>
> Hope what I am saying makes sense...
>
>   - Travis
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RBD Snap removal priority
       [not found]     ` <5245C8E6.9000305-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org>
@ 2013-09-27 19:37       ` Travis Rhoden
  0 siblings, 0 replies; 2+ messages in thread
From: Travis Rhoden @ 2013-09-27 19:37 UTC (permalink / raw)
  To: Mike Dawson
  Cc: ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Mike,

Thanks for the info.  I had seem some of the previous reports of
reduced performance during various recovery tasks (and certainly
experienced them) but you summarized them all quite nicely.

Yes, I'm running XFS on the OSDs.  I checked fragmentation on a few of
my OSDs -- all came back ~38% (better than I thought!).

 - Travis

On Fri, Sep 27, 2013 at 2:05 PM, Mike Dawson <mike.dawson-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org> wrote:
> [cc ceph-devel]
>
> Travis,
>
> RBD doesn't behave well when Ceph maintainance operations create spindle
> contention (i.e. 100% util from iostat). More about that below.
>
> Do you run XFS under your OSDs? If so, can you check for extent
> fragmentation? Should be something like:
>
> xfs_db -c frag -r /dev/sdb1
>
> We recently saw a fragmentation factors of over 80%, with lots of ino's
> having hundreds of extents. After 24 hours+ of defrag'ing, we got it under
> control, but we're seeing the fragmentation factor grow by ~1.5% daily. We
> experienced spindle contention issues even after the defrag.
>
>
>
> Sage, Sam, etc,
>
> I think the real issue is Ceph has several states where it performs what I
> would call "maintanance operations" that saturate the underlying storage
> without properly yielding to client i/o (which should have a higher
> priority).
>
> I have experienced or seen reports of Ceph maintainance affecting rbd client
> i/o in many ways:
>
> - QEMU/RBD Client I/O Stalls or Halts Due to Spindle Contention from Ceph
> Maintainance [1]
> - Recovery and/or Backfill Cause QEMU/RBD Reads to Hang [2]
> - rbd snap rm (Travis' report below)
>
> [1] http://tracker.ceph.com/issues/6278
> [2] http://tracker.ceph.com/issues/6333
>
> I think this family of issues speak to the need for Ceph to have more
> visibility into the underlying storage's limitations (especially spindle
> contention) when performing known expensive maintainance operations.
>
> Thanks,
> Mike Dawson
>
>
> On 9/27/2013 12:25 PM, Travis Rhoden wrote:
>>
>> Hello everyone,
>>
>> I'm running a Cuttlefish cluster that hosts a lot of RBDs.  I recently
>> removed a snapshot of a large one (rbd snap rm -- 12TB), and I noticed
>> that all of the clients had markedly decreased performance.  Looking
>> at iostat on the OSD nodes had most disks pegged at 100% util.
>>
>> I know there are thread priorities that can be set for clients vs
>> recovery, but I'm not sure what deleting a snapshot falls under.  I
>> couldn't really find anything relevant.  Is there anything I can tweak
>> to lower the priority of such an operation?  I didn't need it to
>> complete fast, as "rbd snap rm" returns immediately and the actual
>> deletion is done asynchronously.  I'd be fine with it taking longer at
>> a lower priority, but as it stands now it brings my cluster to a crawl
>> and is causing issues with several VMs.
>>
>> I see an "osd snap trim thread timeout" option in the docs -- Is the
>> operation occuring here what you would call snap trimming?  If so, any
>> chance of adding an option for "osd snap trim priority" just like
>> there is for osd client op and osd recovery op?
>>
>> Hope what I am saying makes sense...
>>
>>   - Travis
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-09-27 19:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CACkq2mqCDvuRdCV-=gEwVNEke-UchOqSsNA=+z-Tux5xWhsa9w@mail.gmail.com>
     [not found] ` <CACkq2mqCDvuRdCV-=gEwVNEke-UchOqSsNA=+z-Tux5xWhsa9w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-27 18:05   ` RBD Snap removal priority Mike Dawson
     [not found]     ` <5245C8E6.9000305-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org>
2013-09-27 19:37       ` Travis Rhoden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.