* long running workloads/rbd_fsx_cache_writeback.yaml on hammer
@ 2015-05-08 9:07 Loic Dachary
2015-05-08 12:36 ` Jason Dillaman
0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2015-05-08 9:07 UTC (permalink / raw)
To: Jason Dillaman; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 875 bytes --]
Hi Janson,
There is a long running (12h +) job at
http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/878799/
which is about
rbd/thrash/{base/install.yaml clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/default.yaml workloads/rbd_fsx_cache_writeback.yaml}
and runs on the current hammer-backports branch which contains a few librbd backports
http://tracker.ceph.com/issues/11492#Teuthology-run-commit-commita79146fc3cae28bf4c07478fb4566b06942da60d-hammer-backports-branch-May-2015
I don't see an obvious cause for error in the logs. Does it ring a bell ? Is it supposed to take that long ? Note that all other jobs completed successfully (see http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/ for details).
Cheers
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
2015-05-08 9:07 long running workloads/rbd_fsx_cache_writeback.yaml on hammer Loic Dachary
@ 2015-05-08 12:36 ` Jason Dillaman
2015-05-08 14:34 ` Loic Dachary
0 siblings, 1 reply; 7+ messages in thread
From: Jason Dillaman @ 2015-05-08 12:36 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development
Nice catch. It is deadlocked in a code path between handling a watch/notify error from librados and flushing the internal cache. The issue already appears to exist in the hammer branch.
--
Jason Dillaman
Red Hat
dillaman@redhat.com
http://www.redhat.com
----- Original Message -----
From: "Loic Dachary" <loic@dachary.org>
To: "Jason Dillaman" <dillaman@redhat.com>
Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
Sent: Friday, May 8, 2015 5:07:49 AM
Subject: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
Hi Janson,
There is a long running (12h +) job at
http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/878799/
which is about
rbd/thrash/{base/install.yaml clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/default.yaml workloads/rbd_fsx_cache_writeback.yaml}
and runs on the current hammer-backports branch which contains a few librbd backports
http://tracker.ceph.com/issues/11492#Teuthology-run-commit-commita79146fc3cae28bf4c07478fb4566b06942da60d-hammer-backports-branch-May-2015
I don't see an obvious cause for error in the logs. Does it ring a bell ? Is it supposed to take that long ? Note that all other jobs completed successfully (see http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/ for details).
Cheers
--
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
2015-05-08 12:36 ` Jason Dillaman
@ 2015-05-08 14:34 ` Loic Dachary
2015-05-08 15:03 ` Jason Dillaman
0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2015-05-08 14:34 UTC (permalink / raw)
To: Jason Dillaman; +Cc: Ceph Development
[-- Attachment #1: Type: text/plain, Size: 614 bytes --]
Hi,
On 08/05/2015 14:36, Jason Dillaman wrote:
> Nice catch. It is deadlocked in a code path between handling a watch/notify error from librados and flushing the internal cache. The issue already appears to exist in the hammer branch.
I'm glad it helped you figure out a bug :-) From the point of view of the upcoming v0.94.2, do you think we should wait until this is fixed ? Or is it rare enough and can wait until v0.94.3 ? I'm asking because this bug seems to be the only potential blocker for v0.94.2. It's ok either way, just let me know .
Cheers
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
2015-05-08 14:34 ` Loic Dachary
@ 2015-05-08 15:03 ` Jason Dillaman
2015-05-08 15:13 ` Loic Dachary
0 siblings, 1 reply; 7+ messages in thread
From: Jason Dillaman @ 2015-05-08 15:03 UTC (permalink / raw)
To: Loic Dachary; +Cc: Ceph Development, Josh Durgin
Since it sounds like the next hammer release is more or less ready, my vote would be that the fix can wait until the next release. Since it will only occur if the new exclusive lock feature is enabled (disabled on images by default) and the connection between librbd and the OSD is reset with writeback data waiting in the cache, it sounds like a rare enough issue.
Final decision rests w/ Josh.
--
Jason Dillaman
Red Hat
dillaman@redhat.com
http://www.redhat.com
----- Original Message -----
From: "Loic Dachary" <loic@dachary.org>
To: "Jason Dillaman" <dillaman@redhat.com>
Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
Sent: Friday, May 8, 2015 10:34:24 AM
Subject: Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
Hi,
On 08/05/2015 14:36, Jason Dillaman wrote:
> Nice catch. It is deadlocked in a code path between handling a watch/notify error from librados and flushing the internal cache. The issue already appears to exist in the hammer branch.
I'm glad it helped you figure out a bug :-) From the point of view of the upcoming v0.94.2, do you think we should wait until this is fixed ? Or is it rare enough and can wait until v0.94.3 ? I'm asking because this bug seems to be the only potential blocker for v0.94.2. It's ok either way, just let me know .
Cheers
--
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
2015-05-08 15:03 ` Jason Dillaman
@ 2015-05-08 15:13 ` Loic Dachary
2015-05-08 16:07 ` Josh Durgin
0 siblings, 1 reply; 7+ messages in thread
From: Loic Dachary @ 2015-05-08 15:13 UTC (permalink / raw)
To: Jason Dillaman; +Cc: Ceph Development, Josh Durgin
[-- Attachment #1: Type: text/plain, Size: 607 bytes --]
On 08/05/2015 17:03, Jason Dillaman wrote:
> Since it sounds like the next hammer release is more or less ready, my vote would be that the fix can wait until the next release. Since it will only occur if the new exclusive lock feature is enabled (disabled on images by default) and the connection between librbd and the OSD is reset with writeback data waiting in the cache, it sounds like a rare enough issue.
>
> Final decision rests w/ Josh.
Ok :-) Would you be so kind as to point me to the corresponding issue in the tracker for reference ?
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
2015-05-08 15:13 ` Loic Dachary
@ 2015-05-08 16:07 ` Josh Durgin
2015-05-08 16:57 ` Yuri Weinstein
0 siblings, 1 reply; 7+ messages in thread
From: Josh Durgin @ 2015-05-08 16:07 UTC (permalink / raw)
To: Loic Dachary, Jason Dillaman; +Cc: Ceph Development
On 05/08/2015 08:13 AM, Loic Dachary wrote:
>
>
> On 08/05/2015 17:03, Jason Dillaman wrote:
>> Since it sounds like the next hammer release is more or less ready, my vote would be that the fix can wait until the next release. Since it will only occur if the new exclusive lock feature is enabled (disabled on images by default) and the connection between librbd and the OSD is reset with writeback data waiting in the cache, it sounds like a rare enough issue.
>>
>> Final decision rests w/ Josh.
>
> Ok :-) Would you be so kind as to point me to the corresponding issue in the tracker for reference ?
I agree with Jason, this doesn't need to block the release. I think the
issue he refers to is the last sentence in
http://tracker.ceph.com/issues/11537
Josh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
2015-05-08 16:07 ` Josh Durgin
@ 2015-05-08 16:57 ` Yuri Weinstein
0 siblings, 0 replies; 7+ messages in thread
From: Yuri Weinstein @ 2015-05-08 16:57 UTC (permalink / raw)
To: Josh Durgin; +Cc: Loic Dachary, Jason Dillaman, Ceph Development
In what suite(s) do we anticipate failures for next week hammer QE validation?
Thx
YuriW
----- Original Message -----
From: "Josh Durgin" <jdurgin@redhat.com>
To: "Loic Dachary" <loic@dachary.org>, "Jason Dillaman" <dillaman@redhat.com>
Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
Sent: Friday, May 8, 2015 9:07:27 AM
Subject: Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer
On 05/08/2015 08:13 AM, Loic Dachary wrote:
>
>
> On 08/05/2015 17:03, Jason Dillaman wrote:
>> Since it sounds like the next hammer release is more or less ready, my vote would be that the fix can wait until the next release. Since it will only occur if the new exclusive lock feature is enabled (disabled on images by default) and the connection between librbd and the OSD is reset with writeback data waiting in the cache, it sounds like a rare enough issue.
>>
>> Final decision rests w/ Josh.
>
> Ok :-) Would you be so kind as to point me to the corresponding issue in the tracker for reference ?
I agree with Jason, this doesn't need to block the release. I think the
issue he refers to is the last sentence in
http://tracker.ceph.com/issues/11537
Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-05-08 16:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-08 9:07 long running workloads/rbd_fsx_cache_writeback.yaml on hammer Loic Dachary
2015-05-08 12:36 ` Jason Dillaman
2015-05-08 14:34 ` Loic Dachary
2015-05-08 15:03 ` Jason Dillaman
2015-05-08 15:13 ` Loic Dachary
2015-05-08 16:07 ` Josh Durgin
2015-05-08 16:57 ` Yuri Weinstein
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.