v0.75 released

All of lore.kernel.org
 help / color / mirror / Atom feed

* v0.75 released
@ 2014-01-15  3:42 Sage Weil
  2014-01-16 13:51 ` Ilya Dryomov
       [not found] ` <alpine.DEB.2.00.1401141941070.8325-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Sage Weil @ 2014-01-15  3:42 UTC (permalink / raw)
  To: ceph-devel, ceph-users

This is a big release, with lots of infrastructure going in for
firefly.  The big items include a prototype standalone frontend for
radosgw (which does not require apache or fastcgi), tracking for read
activity on the osds (to inform tiering decisions), preliminary cache
pool support (no snapshots yet), and lots of bug fixes and other work
across the tree to get ready for the next batch of erasure coding
patches.

For comparison, here are the diff stats for the last few versions::

 v0.75 291 files changed, 82713 insertions(+), 33495 deletions(-)
 v0.74 192 files changed, 17980 insertions(+), 1062 deletions(-)
 v0.73 148 files changed, 4464 insertions(+), 2129 deletions(-)

Upgrading
~~~~~~~~~

* The 'osd pool create ...' syntax has changed for erasure pools.

* The default CRUSH rules and layouts are now using the latest and
  greatest tunables and defaults.  Clusters using the old values will
  now present with a health WARN state.  This can be disabled by
  adding 'mon warn on legacy crush tunables = false' to ceph.conf.

Notable Changes
~~~~~~~~~~~~~~~

* common: bloom filter improvements (Sage Weil)
* common: fix config variable substitution (Loic Dachary)
* crush, osd: s/rep/replicated/ for less confusion (Loic Dachary)
* crush: refactor descend_once behavior; support set_choose*_tries for 
  replicated rules (Sage Weil)
* librados: fix throttle leak (and eventual deadlock) (Josh Durgin)
* librados: read directly into user buffer (Rutger ter Borg)
* librbd: fix use-after-free aio completion bug #5426 (Josh Durgin)
* librbd: localize/distribute parent reads (Sage Weil)
* mds: fix Resetter locking (Alexandre Oliva)
* mds: fix cap migration behavior (Yan, Zheng)
* mds: fix client session flushing (Yan, Zheng)
* mds: fix many many multi-mds bugs (Yan, Zheng)
* misc portability work (Noah Watkins)
* mon, osd: create erasure style crush rules (Loic Dachary, Sage Weil)
* mon: 'osd crush show-tunables' (Sage Weil)
* mon: clean up initial crush rule creation (Loic Dachary)
* mon: improve (replicate or erasure) pool creation UX (Loic Dachary)
* mon: infrastructure to handle mixed-version mon cluster and cli/rest API 
  (Greg Farnum)
* mon: mkfs now idempotent (Loic Dachary)
* mon: only seed new osdmaps to current OSDs (Sage Weil)
* mon: track osd features in OSDMap (Joao Luis, David Zafman)
* mon: warn if crush has non-optimal tunables (Sage Weil)
* mount.ceph: add -n for autofs support (Steve Stock)
* msgr: fix messenger restart race (Xihui He)
* osd, librados: fix full cluster handling (Josh Durgin)
* osd: add HitSet tracking for read ops (Sage Weil, Greg Farnum)
* osd: backfill to osds not in acting set (David Zafman)
* osd: enable new hashpspool layout by default (Sage Weil)
* osd: erasure plugin benchmarking tool (Loic Dachary)
* osd: fix XFS detection (Greg Farnum, Sushma Gurram)
* osd: fix copy-get omap bug (Sage Weil)
* osd: fix linux kernel version detection (Ilya Dryomov)
* osd: fix memstore segv (Haomai Wang)
* osd: fix several bugs with tier infrastructure
* osd: fix throttle thread (Haomai Wang)
* osd: preliminary cache pool support (no snaps) (Greg Farnum, Sage Weil)
* rados tool: fix listomapvals (Josh Durgin)
* rados: add 'crush location', smart replica selection/balancing (Sage 
  Weil)
* rados: some performance optimizations (Yehuda Sadeh)
* rbd: add rbdmap support for upstart (Laurent Barbe)
* rbd: expose kernel rbd client options via 'rbd map' (Ilya Dryomov)
* rbd: fix bench-write command (Hoamai Wang)
* rbd: support for 4096 mapped devices, up from ~250 (Ilya Dryomov)
* rgw: allow multiple frontends (Yehuda Sadeh)
* rgw: convert bucket info to new format on demand (Yehuda Sadeh)
* rgw: fix misc CORS bugs (Robin H. Johnson)
* rgw: prototype mongoose frontend (Yehuda Sadeh)

You can get v0.75 from the usual locations:

* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.75.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
  2014-01-15  3:42 v0.75 released Sage Weil
@ 2014-01-16 13:51 ` Ilya Dryomov
       [not found]   ` <CALFYKtAz5G4y1RUoqJmyr5psuh1NvD=E_7qSPm7WL+8Ra_bJ5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found] ` <alpine.DEB.2.00.1401141941070.8325-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Ilya Dryomov @ 2014-01-16 13:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: Ceph Development, ceph-users

On Wed, Jan 15, 2014 at 5:42 AM, Sage Weil <sage@inktank.com> wrote:
>
> [...]
>
> * rbd: support for 4096 mapped devices, up from ~250 (Ilya Dryomov)

Just a note, v0.75 simply adds some of the infrastructure, the actual
support for this will arrive with kernel 3.14.  The theoretical limit
is 65536 mapped devices, although I admit I haven't tried mapping more
than ~4000 at once.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
       [not found]   ` <CALFYKtAz5G4y1RUoqJmyr5psuh1NvD=E_7qSPm7WL+8Ra_bJ5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-17  0:05     ` Christian Balzer
       [not found]       ` <20140117090518.3f988b32-9yhXNL7Kh0lSCLKNlHTxZM8NsWr+9BEh@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Christian Balzer @ 2014-01-17  0:05 UTC (permalink / raw)
  To: Ceph Development, ceph-users-Qp0mS5GaXlQ

On Thu, 16 Jan 2014 15:51:17 +0200 Ilya Dryomov wrote:

> On Wed, Jan 15, 2014 at 5:42 AM, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:
> >
> > [...]
> >
> > * rbd: support for 4096 mapped devices, up from ~250 (Ilya Dryomov)
> 
> Just a note, v0.75 simply adds some of the infrastructure, the actual
> support for this will arrive with kernel 3.14.  The theoretical limit
> is 65536 mapped devices, although I admit I haven't tried mapping more
> than ~4000 at once.
> 
Just for clarification, this is for the client side when using the kernel
module, right?

Not looking at more than about 150 devices per compute node now, but that
might change and there is also the case of failovers...

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi-FW+hd8ioUD0@public.gmane.org   	Global OnLine Japan/Fusion Communications
http://www.gol.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
       [not found]       ` <20140117090518.3f988b32-9yhXNL7Kh0lSCLKNlHTxZM8NsWr+9BEh@public.gmane.org>
@ 2014-01-17  9:20         ` Ilya Dryomov
       [not found]           ` <CALFYKtBizOcjv9i5EkW7HLBdsNUzBCZ1oO1ZTaDWh8cwgK-tww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ilya Dryomov @ 2014-01-17  9:20 UTC (permalink / raw)
  To: Christian Balzer; +Cc: Ceph Development, ceph-users-Qp0mS5GaXlQ

On Fri, Jan 17, 2014 at 2:05 AM, Christian Balzer <chibi-FW+hd8ioUD0@public.gmane.org> wrote:
> On Thu, 16 Jan 2014 15:51:17 +0200 Ilya Dryomov wrote:
>
>> On Wed, Jan 15, 2014 at 5:42 AM, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:
>> >
>> > [...]
>> >
>> > * rbd: support for 4096 mapped devices, up from ~250 (Ilya Dryomov)
>>
>> Just a note, v0.75 simply adds some of the infrastructure, the actual
>> support for this will arrive with kernel 3.14.  The theoretical limit
>> is 65536 mapped devices, although I admit I haven't tried mapping more
>> than ~4000 at once.
>>
> Just for clarification, this is for the client side when using the kernel
> module, right?
>
> Not looking at more than about 150 devices per compute node now, but that
> might change and there is also the case of failovers...

Yes, this is how many 'rbd map ...'s a single rbd kernel module (and
therefore a single compute node) can handle.  Kernels 3.13 and below
can handle ~130-150, depending on the machine.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
       [not found]           ` <CALFYKtBizOcjv9i5EkW7HLBdsNUzBCZ1oO1ZTaDWh8cwgK-tww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-17 10:02             ` Ilya Dryomov
  0 siblings, 0 replies; 10+ messages in thread
From: Ilya Dryomov @ 2014-01-17 10:02 UTC (permalink / raw)
  To: Christian Balzer; +Cc: Ceph Development, ceph-users-Qp0mS5GaXlQ

On Fri, Jan 17, 2014 at 11:20 AM, Ilya Dryomov <ilya.dryomov-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:
> On Fri, Jan 17, 2014 at 2:05 AM, Christian Balzer <chibi-FW+hd8ioUD0@public.gmane.org> wrote:
>> On Thu, 16 Jan 2014 15:51:17 +0200 Ilya Dryomov wrote:
>>
>>> On Wed, Jan 15, 2014 at 5:42 AM, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:
>>> >
>>> > [...]
>>> >
>>> > * rbd: support for 4096 mapped devices, up from ~250 (Ilya Dryomov)
>>>
>>> Just a note, v0.75 simply adds some of the infrastructure, the actual
>>> support for this will arrive with kernel 3.14.  The theoretical limit
>>> is 65536 mapped devices, although I admit I haven't tried mapping more
>>> than ~4000 at once.
>>>
>> Just for clarification, this is for the client side when using the kernel
>> module, right?
>>
>> Not looking at more than about 150 devices per compute node now, but that
>> might change and there is also the case of failovers...
>
> Yes, this is how many 'rbd map ...'s a single rbd kernel module (and
> therefore a single compute node) can handle.  Kernels 3.13 and below
> can handle ~130-150, depending on the machine.

Sorry, typoed.  ~230-250.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
       [not found] ` <alpine.DEB.2.00.1401141941070.8325-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2014-01-18  6:50   ` Alexandre Oliva
       [not found]     ` <ord2jpvl4e.fsf-o1YuAO9g/txBDLzU/O5InQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alexandre Oliva @ 2014-01-18  6:50 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-Qp0mS5GaXlQ

On Jan 15, 2014, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:

>  v0.75 291 files changed, 82713 insertions(+), 33495 deletions(-)

> Upgrading
> ~~~~~~~~~

I suggest adding:

  * All (replicated?) pools will likely fail scrubbing because the
    per-pool dirty object counts, introduced in 0.75, won't match.  This
    inconsistency is cleared by a pg repair; unfortunately this is about
    as expensive as a a deep-scrub, and it's not automatically scheduled
    or retried, like scrubs and deep-scrubs.

I suppose after the dirty counts are brought to sync, the next scrub
won't find inconsistent counts again, but I haven't got to that point
yet.

What surprised me was the huge number of objects marked as dirty!  It
was at least 14k out of 70k objects in each data pool, and even more in
metadata pools, but it's not like I have messed with this many objects
recently.  Could something be amiss there?

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist     Red Hat Brazil Toolchain Engineer

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
       [not found]     ` <ord2jpvl4e.fsf-o1YuAO9g/txBDLzU/O5InQ@public.gmane.org>
@ 2014-01-18  6:58       ` Mark Kirkwood
  2014-01-19  5:00       ` Sage Weil
  1 sibling, 0 replies; 10+ messages in thread
From: Mark Kirkwood @ 2014-01-18  6:58 UTC (permalink / raw)
  To: Alexandre Oliva, Sage Weil
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-Qp0mS5GaXlQ

On 18/01/14 19:50, Alexandre Oliva wrote:
> On Jan 15, 2014, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:
>
>>   v0.75 291 files changed, 82713 insertions(+), 33495 deletions(-)
>
>> Upgrading
>> ~~~~~~~~~
>
> I suggest adding:
>
>    * All (replicated?) pools will likely fail scrubbing because the
>      per-pool dirty object counts, introduced in 0.75, won't match.  This
>      inconsistency is cleared by a pg repair; unfortunately this is about
>      as expensive as a a deep-scrub, and it's not automatically scheduled
>      or retried, like scrubs and deep-scrubs.
>
> I suppose after the dirty counts are brought to sync, the next scrub
> won't find inconsistent counts again, but I haven't got to that point
> yet.
>
> What surprised me was the huge number of objects marked as dirty!  It
> was at least 14k out of 70k objects in each data pool, and even more in
> metadata pools, but it's not like I have messed with this many objects
> recently.  Could something be amiss there?
>

And stat mis-matches too I think are going to require folk to run repairs.

Regards

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
       [not found]     ` <ord2jpvl4e.fsf-o1YuAO9g/txBDLzU/O5InQ@public.gmane.org>
  2014-01-18  6:58       ` Mark Kirkwood
@ 2014-01-19  5:00       ` Sage Weil
  2014-01-19  5:27         ` Sage Weil
  1 sibling, 1 reply; 10+ messages in thread
From: Sage Weil @ 2014-01-19  5:00 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-Qp0mS5GaXlQ

On Sat, 18 Jan 2014, Alexandre Oliva wrote:
> On Jan 15, 2014, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:
> 
> >  v0.75 291 files changed, 82713 insertions(+), 33495 deletions(-)
> 
> > Upgrading
> > ~~~~~~~~~
> 
> I suggest adding:
> 
>   * All (replicated?) pools will likely fail scrubbing because the
>     per-pool dirty object counts, introduced in 0.75, won't match.  This
>     inconsistency is cleared by a pg repair; unfortunately this is about
>     as expensive as a a deep-scrub, and it's not automatically scheduled
>     or retried, like scrubs and deep-scrubs.
> 
> I suppose after the dirty counts are brought to sync, the next scrub
> won't find inconsistent counts again, but I haven't got to that point
> yet.

Whoops!  Yeah...

> What surprised me was the huge number of objects marked as dirty!  It
> was at least 14k out of 70k objects in each data pool, and even more in
> metadata pools, but it's not like I have messed with this many objects
> recently.  Could something be amiss there?

The dirty state was introduced back in 0.71.  It's just the stats total 
that was added in this release... that probably explains your situation.  

Which also means this will bite anybody who ran emperor, too.  I think I 
need to introduce some pool flag or something indicating whether the dirty 
stats should be scrubbed or not, set only on new pools?

sage

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
  2014-01-19  5:00       ` Sage Weil
@ 2014-01-19  5:27         ` Sage Weil
       [not found]           ` <alpine.DEB.2.00.1401182125150.17005-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Sage Weil @ 2014-01-19  5:27 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: ceph-devel, ceph-users

On Sat, 18 Jan 2014, Sage Weil wrote:
> On Sat, 18 Jan 2014, Alexandre Oliva wrote:
> > On Jan 15, 2014, Sage Weil <sage@inktank.com> wrote:
> > 
> > >  v0.75 291 files changed, 82713 insertions(+), 33495 deletions(-)
> > 
> > > Upgrading
> > > ~~~~~~~~~
> > 
> > I suggest adding:
> > 
> >   * All (replicated?) pools will likely fail scrubbing because the
> >     per-pool dirty object counts, introduced in 0.75, won't match.  This
> >     inconsistency is cleared by a pg repair; unfortunately this is about
> >     as expensive as a a deep-scrub, and it's not automatically scheduled
> >     or retried, like scrubs and deep-scrubs.
> > 
> > I suppose after the dirty counts are brought to sync, the next scrub
> > won't find inconsistent counts again, but I haven't got to that point
> > yet.
> 
> Whoops!  Yeah...
> 
> > What surprised me was the huge number of objects marked as dirty!  It
> > was at least 14k out of 70k objects in each data pool, and even more in
> > metadata pools, but it's not like I have messed with this many objects
> > recently.  Could something be amiss there?
> 
> The dirty state was introduced back in 0.71.  It's just the stats total 
> that was added in this release... that probably explains your situation.  
> 
> Which also means this will bite anybody who ran emperor, too.  I think I 
> need to introduce some pool flag or something indicating whether the dirty 
> stats should be scrubbed or not, set only on new pools?

Pushed wip-7184.  This still needs to be tested, but I probably won't get 
to it until Monday.  If someone wants to give it a go (on a non-production 
cluster), that'd be great!

sage

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: v0.75 released
       [not found]           ` <alpine.DEB.2.00.1401182125150.17005-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2014-01-21 15:33             ` Alexandre Oliva
  0 siblings, 0 replies; 10+ messages in thread
From: Alexandre Oliva @ 2014-01-21 15:33 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-Qp0mS5GaXlQ

On Jan 19, 2014, Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org> wrote:

> On Sat, 18 Jan 2014, Sage Weil wrote:

>> Which also means this will bite anybody who ran emperor, too.  I think I 
>> need to introduce some pool flag or something indicating whether the dirty 
>> stats should be scrubbed or not, set only on new pools?

> Pushed wip-7184.

How about clearing the dirty_stats_invalid flag if, at the end of a
scrub, the dirty stats didn't mismatch?

Then, old PGs that, with 0.75, were marked inconsistent state and then
got repaired, still with v13 stats, won't have to go through *another*
round of repair before dirty mismatches are taken seriously again.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist     Red Hat Brazil Toolchain Engineer

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-01-21 15:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15  3:42 v0.75 released Sage Weil
2014-01-16 13:51 ` Ilya Dryomov
     [not found]   ` <CALFYKtAz5G4y1RUoqJmyr5psuh1NvD=E_7qSPm7WL+8Ra_bJ5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-17  0:05     ` Christian Balzer
     [not found]       ` <20140117090518.3f988b32-9yhXNL7Kh0lSCLKNlHTxZM8NsWr+9BEh@public.gmane.org>
2014-01-17  9:20         ` Ilya Dryomov
     [not found]           ` <CALFYKtBizOcjv9i5EkW7HLBdsNUzBCZ1oO1ZTaDWh8cwgK-tww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-17 10:02             ` Ilya Dryomov
     [not found] ` <alpine.DEB.2.00.1401141941070.8325-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2014-01-18  6:50   ` Alexandre Oliva
     [not found]     ` <ord2jpvl4e.fsf-o1YuAO9g/txBDLzU/O5InQ@public.gmane.org>
2014-01-18  6:58       ` Mark Kirkwood
2014-01-19  5:00       ` Sage Weil
2014-01-19  5:27         ` Sage Weil
     [not found]           ` <alpine.DEB.2.00.1401182125150.17005-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2014-01-21 15:33             ` Alexandre Oliva

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.