[PATCH 0/3] Ceph fscache: Fix kernel panic due to a race

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
@ 2013-12-26 14:29 Li Wang
  2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
  To: ceph-devel
  Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
	Yunchuan Wen

From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>

The following scripts could easily panic the kernel,

#!/bin/bash
mount -t ceph -o fsc MONADDR:/ cephfs
rm -rf cephfs/foo
dd if=/dev/zero of=cephfs/foo bs=8 count=512
echo 3 > /proc/sys/vm/drop_caches
dd if=cephfs/foo of=/dev/null bs=8 count=1024

This is due to when writing a page into fscache, the code will
assert that the write position does not exceed the 
object->store_limit_l, which is supposed to be equal to inode->i_size.
However, for current implementation, after file writing, the 
object->store_limit_l is not synchronized with new 
inode->i_size immediately, which introduces a race that if writing
a new page into fscache, will reach the ASSERT that write position
has exceeded the object->store_limit_l, and cause kernel panic. 
This patch fixes it.

Yunchuan Wen (3):
  Ceph fscache: Add an interface to synchronize object store limit
  Ceph fscache: Update object store limit after writing
  Ceph fscache: Wait for completion of object initialization

 fs/ceph/cache.c |    1 +
 fs/ceph/cache.h |   10 ++++++++++
 fs/ceph/file.c  |    3 +++
 3 files changed, 14 insertions(+)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit
  2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
@ 2013-12-26 14:29 ` Li Wang
  2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
  To: ceph-devel
  Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
	Yunchuan Wen, Min Chen, Li Wang

From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>

Add an interface to explicitly synchronize object->store_limit[_l]
with inode->i_size

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 fs/ceph/cache.h |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index ba94940..262106b 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -48,6 +48,12 @@ void ceph_readpage_to_fscache(struct inode *inode, struct page *page);
 void ceph_invalidate_fscache_page(struct inode* inode, struct page *page);
 void ceph_queue_revalidate(struct inode *inode);
 
+static inline void ceph_fscache_update_objectsize(struct inode *inode)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	fscache_attr_changed(ci->fscache);
+}
+
 static inline void ceph_fscache_invalidate(struct inode *inode)
 {
 	fscache_invalidate(ceph_inode(inode)->fscache);
@@ -127,6 +133,10 @@ static inline void ceph_readpage_to_fscache(struct inode *inode,
 {
 }
 
+static inline void ceph_fscache_update_objectsize(struct inode *inode)
+{
+}
+
 static inline void ceph_fscache_invalidate(struct inode *inode)
 {
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] Ceph fscache: Update object store limit after file writing
  2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
  2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
@ 2013-12-26 14:29 ` Li Wang
  2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang
  2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
  3 siblings, 0 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
  To: ceph-devel
  Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
	Yunchuan Wen, Min Chen, Li Wang

From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>

Synchronize object->store_limit[_l] with new inode->i_size after file writing.

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 fs/ceph/file.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3de8982..b6df7ab 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -786,6 +786,7 @@ retry_snap:
 			goto retry_snap;
 		}
 	} else {
+		loff_t old_size = inode->i_size;
 		/*
 		 * No need to acquire the i_truncate_mutex. Because
 		 * the MDS revokes Fwb caps before sending truncate
@@ -796,6 +797,8 @@ retry_snap:
 		written = generic_file_buffered_write(iocb, iov, nr_segs,
 						      pos, &iocb->ki_pos,
 						      count, 0);
+		if (inode->i_size > old_size)
+			ceph_fscache_update_objectsize(inode);
 		mutex_unlock(&inode->i_mutex);
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] Ceph fscache: Wait for completion of object initialization
  2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
  2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
  2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang
@ 2013-12-26 14:29 ` Li Wang
  2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
  3 siblings, 0 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
  To: ceph-devel
  Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
	Yunchuan Wen, Min Chen, Li Wang

From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>

The object store limit needs to be updated after writing,
and this can be done provided the corresponding object has already
been initialized. Current object initialization is done asynchrously,
which introduce a race if a file is opened, then immediately followed
by a writing, the initialization may have not completed, the code will
reach the ASSERT in fscache_submit_exclusive_op() to cause kernel
bug.

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
 fs/ceph/cache.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 8c44fdd..834f9f3 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -205,6 +205,7 @@ void ceph_fscache_register_inode_cookie(struct ceph_fs_client* fsc,
 	ci->fscache = fscache_acquire_cookie(fsc->fscache,
 					     &ceph_fscache_inode_object_def,
 					     ci, true);
+	fscache_check_consistency(ci->fscache);
 done:
 	mutex_unlock(&inode->i_mutex);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
  2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
                   ` (2 preceding siblings ...)
  2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang
@ 2013-12-26 22:51 ` Milosz Tanski
  2013-12-27  1:41   ` Yunchuan Wen
  2013-12-28  3:51   ` Li Wang
  3 siblings, 2 replies; 10+ messages in thread
From: Milosz Tanski @ 2013-12-26 22:51 UTC (permalink / raw)
  To: Li Wang
  Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
	linux-kernel, Yunchuan Wen

Li,

I looked at the patchset am I correct that this only happens when we
enable caching in the write path?

- Milosz

On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>
> The following scripts could easily panic the kernel,
>
> #!/bin/bash
> mount -t ceph -o fsc MONADDR:/ cephfs
> rm -rf cephfs/foo
> dd if=/dev/zero of=cephfs/foo bs=8 count=512
> echo 3 > /proc/sys/vm/drop_caches
> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>
> This is due to when writing a page into fscache, the code will
> assert that the write position does not exceed the
> object->store_limit_l, which is supposed to be equal to inode->i_size.
> However, for current implementation, after file writing, the
> object->store_limit_l is not synchronized with new
> inode->i_size immediately, which introduces a race that if writing
> a new page into fscache, will reach the ASSERT that write position
> has exceeded the object->store_limit_l, and cause kernel panic.
> This patch fixes it.
>
> Yunchuan Wen (3):
>   Ceph fscache: Add an interface to synchronize object store limit
>   Ceph fscache: Update object store limit after writing
>   Ceph fscache: Wait for completion of object initialization
>
>  fs/ceph/cache.c |    1 +
>  fs/ceph/cache.h |   10 ++++++++++
>  fs/ceph/file.c  |    3 +++
>  3 files changed, 14 insertions(+)
>
> --
> 1.7.9.5
>



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
  2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
@ 2013-12-27  1:41   ` Yunchuan Wen
  2013-12-28  3:51   ` Li Wang
  1 sibling, 0 replies; 10+ messages in thread
From: Yunchuan Wen @ 2013-12-27  1:41 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: Li Wang, ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
	linux-kernel

Hi, Milosz
I am not very sure about how to enable caching in write path.

I just compile the latest kernel with "Enable Ceph client caching
support", start cachefilesd, and mount cephfs with -o fsc.
Then, the kernel will panic easily when the script runs to "dd
if=cephfs/foo of=/dev/null bs=8 count=1024"


2013/12/27 Milosz Tanski <milosz@adfin.com>:
> Li,
>
> I looked at the patchset am I correct that this only happens when we
> enable caching in the write path?
>
> - Milosz
>
> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>
>> The following scripts could easily panic the kernel,
>>
>> #!/bin/bash
>> mount -t ceph -o fsc MONADDR:/ cephfs
>> rm -rf cephfs/foo
>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>> echo 3 > /proc/sys/vm/drop_caches
>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>
>> This is due to when writing a page into fscache, the code will
>> assert that the write position does not exceed the
>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>> However, for current implementation, after file writing, the
>> object->store_limit_l is not synchronized with new
>> inode->i_size immediately, which introduces a race that if writing
>> a new page into fscache, will reach the ASSERT that write position
>> has exceeded the object->store_limit_l, and cause kernel panic.
>> This patch fixes it.
>>
>> Yunchuan Wen (3):
>>   Ceph fscache: Add an interface to synchronize object store limit
>>   Ceph fscache: Update object store limit after writing
>>   Ceph fscache: Wait for completion of object initialization
>>
>>  fs/ceph/cache.c |    1 +
>>  fs/ceph/cache.h |   10 ++++++++++
>>  fs/ceph/file.c  |    3 +++
>>  3 files changed, 14 insertions(+)
>>
>> --
>> 1.7.9.5
>>
>
>
>
> --
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
>
> p: 646-253-9055
> e: milosz@adfin.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
  2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
  2013-12-27  1:41   ` Yunchuan Wen
@ 2013-12-28  3:51   ` Li Wang
  2014-01-03 14:43     ` Milosz Tanski
  1 sibling, 1 reply; 10+ messages in thread
From: Li Wang @ 2013-12-28  3:51 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
	linux-kernel, Yunchuan Wen

Hi Milosz,
   As far as I know, logically, currently fscache does not play
as write cache for Ceph, except that there is a
call to ceph_readpage_to_fscache() in ceph_writepage(), but that
is nothing related to our test case. According to our observation,
our test case never goes through ceph_writepage(), instead, it goes
through ceph_writepages(). So in other words, I donot think this
is related to caching in write path.
   May I try to explain the panic in more detail,

(1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
(2) echo 3 > /proc/sys/vm/drop_caches
(3) dd if=cephfs/foo of=/dev/null bs=8 count=1024

For statement (1), it is frequently appending a file, so
ceph_aio_write() frequently updates the inode->i_size,
however, these updates did not immediately reflected to
object->store_limit_l. For statement (3), when we
start reading the second page at [4096, 8192), ceph find that the page
does not be cached in fscache, then it decides to write this page into
fscache, during this process in cachefiles_write_page(), it found that 
object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
it make sense?

Cheers,
Li Wang

On 2013/12/27 6:51, Milosz Tanski wrote:
> Li,
>
> I looked at the patchset am I correct that this only happens when we
> enable caching in the write path?
>
> - Milosz
>
> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>
>> The following scripts could easily panic the kernel,
>>
>> #!/bin/bash
>> mount -t ceph -o fsc MONADDR:/ cephfs
>> rm -rf cephfs/foo
>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>> echo 3 > /proc/sys/vm/drop_caches
>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>
>> This is due to when writing a page into fscache, the code will
>> assert that the write position does not exceed the
>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>> However, for current implementation, after file writing, the
>> object->store_limit_l is not synchronized with new
>> inode->i_size immediately, which introduces a race that if writing
>> a new page into fscache, will reach the ASSERT that write position
>> has exceeded the object->store_limit_l, and cause kernel panic.
>> This patch fixes it.
>>
>> Yunchuan Wen (3):
>>    Ceph fscache: Add an interface to synchronize object store limit
>>    Ceph fscache: Update object store limit after writing
>>    Ceph fscache: Wait for completion of object initialization
>>
>>   fs/ceph/cache.c |    1 +
>>   fs/ceph/cache.h |   10 ++++++++++
>>   fs/ceph/file.c  |    3 +++
>>   3 files changed, 14 insertions(+)
>>
>> --
>> 1.7.9.5
>>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
  2013-12-28  3:51   ` Li Wang
@ 2014-01-03 14:43     ` Milosz Tanski
  2014-03-03 16:54       ` Milosz Tanski
  0 siblings, 1 reply; 10+ messages in thread
From: Milosz Tanski @ 2014-01-03 14:43 UTC (permalink / raw)
  To: Li Wang
  Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
	linux-kernel, Yunchuan Wen

I'm going to look the patches and the issue in full detail. In the
meantime do you guys have the oops back trace. I have some other
fscache patches that haven't made it upstream yet that might have been
masking this issue for me.

On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> Hi Milosz,
>   As far as I know, logically, currently fscache does not play
> as write cache for Ceph, except that there is a
> call to ceph_readpage_to_fscache() in ceph_writepage(), but that
> is nothing related to our test case. According to our observation,
> our test case never goes through ceph_writepage(), instead, it goes
> through ceph_writepages(). So in other words, I donot think this
> is related to caching in write path.
>   May I try to explain the panic in more detail,
>
> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
> (2) echo 3 > /proc/sys/vm/drop_caches
> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024
>
> For statement (1), it is frequently appending a file, so
> ceph_aio_write() frequently updates the inode->i_size,
> however, these updates did not immediately reflected to
> object->store_limit_l. For statement (3), when we
> start reading the second page at [4096, 8192), ceph find that the page
> does not be cached in fscache, then it decides to write this page into
> fscache, during this process in cachefiles_write_page(), it found that
> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
> it make sense?
>
> Cheers,
> Li Wang
>
>
> On 2013/12/27 6:51, Milosz Tanski wrote:
>>
>> Li,
>>
>> I looked at the patchset am I correct that this only happens when we
>> enable caching in the write path?
>>
>> - Milosz
>>
>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>>>
>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>>
>>> The following scripts could easily panic the kernel,
>>>
>>> #!/bin/bash
>>> mount -t ceph -o fsc MONADDR:/ cephfs
>>> rm -rf cephfs/foo
>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>>> echo 3 > /proc/sys/vm/drop_caches
>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>>
>>> This is due to when writing a page into fscache, the code will
>>> assert that the write position does not exceed the
>>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>>> However, for current implementation, after file writing, the
>>> object->store_limit_l is not synchronized with new
>>> inode->i_size immediately, which introduces a race that if writing
>>> a new page into fscache, will reach the ASSERT that write position
>>> has exceeded the object->store_limit_l, and cause kernel panic.
>>> This patch fixes it.
>>>
>>> Yunchuan Wen (3):
>>>    Ceph fscache: Add an interface to synchronize object store limit
>>>    Ceph fscache: Update object store limit after writing
>>>    Ceph fscache: Wait for completion of object initialization
>>>
>>>   fs/ceph/cache.c |    1 +
>>>   fs/ceph/cache.h |   10 ++++++++++
>>>   fs/ceph/file.c  |    3 +++
>>>   3 files changed, 14 insertions(+)
>>>
>>> --
>>> 1.7.9.5
>>>
>>
>>
>>
>



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
  2014-01-03 14:43     ` Milosz Tanski
@ 2014-03-03 16:54       ` Milosz Tanski
  2014-03-06 22:01         ` Sage Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Milosz Tanski @ 2014-03-03 16:54 UTC (permalink / raw)
  To: Li Wang
  Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
	linux-kernel, Yunchuan Wen

Hey guys, I'm terribly sorry but apparently this got stuck in my draft
mailbox for 3 months. Since we've been running this on both our test /
prod clusters I would say this is sufficiently tested.

I've looked at the code and tested this for a week now on test cluster
and it looks good. It does fix a real problem and I think we should
push it to mainline. Thanks for fixing this Li! The reason I was not
seeing this is that I had other fscache patches that were masking this
problem :/

Thanks,
- Milosz

P.S: Sorry for the double mail, the first one was not sent as text. I
apparently do not know how to use gmail.

On Fri, Jan 3, 2014 at 9:43 AM, Milosz Tanski <milosz@adfin.com> wrote:
> I'm going to look the patches and the issue in full detail. In the
> meantime do you guys have the oops back trace. I have some other
> fscache patches that haven't made it upstream yet that might have been
> masking this issue for me.
>
> On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote:
>> Hi Milosz,
>>   As far as I know, logically, currently fscache does not play
>> as write cache for Ceph, except that there is a
>> call to ceph_readpage_to_fscache() in ceph_writepage(), but that
>> is nothing related to our test case. According to our observation,
>> our test case never goes through ceph_writepage(), instead, it goes
>> through ceph_writepages(). So in other words, I donot think this
>> is related to caching in write path.
>>   May I try to explain the panic in more detail,
>>
>> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
>> (2) echo 3 > /proc/sys/vm/drop_caches
>> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>
>> For statement (1), it is frequently appending a file, so
>> ceph_aio_write() frequently updates the inode->i_size,
>> however, these updates did not immediately reflected to
>> object->store_limit_l. For statement (3), when we
>> start reading the second page at [4096, 8192), ceph find that the page
>> does not be cached in fscache, then it decides to write this page into
>> fscache, during this process in cachefiles_write_page(), it found that
>> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
>> it make sense?
>>
>> Cheers,
>> Li Wang
>>
>>
>> On 2013/12/27 6:51, Milosz Tanski wrote:
>>>
>>> Li,
>>>
>>> I looked at the patchset am I correct that this only happens when we
>>> enable caching in the write path?
>>>
>>> - Milosz
>>>
>>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>>>>
>>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>>>
>>>> The following scripts could easily panic the kernel,
>>>>
>>>> #!/bin/bash
>>>> mount -t ceph -o fsc MONADDR:/ cephfs
>>>> rm -rf cephfs/foo
>>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>>>> echo 3 > /proc/sys/vm/drop_caches
>>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>>>
>>>> This is due to when writing a page into fscache, the code will
>>>> assert that the write position does not exceed the
>>>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>>>> However, for current implementation, after file writing, the
>>>> object->store_limit_l is not synchronized with new
>>>> inode->i_size immediately, which introduces a race that if writing
>>>> a new page into fscache, will reach the ASSERT that write position
>>>> has exceeded the object->store_limit_l, and cause kernel panic.
>>>> This patch fixes it.
>>>>
>>>> Yunchuan Wen (3):
>>>>    Ceph fscache: Add an interface to synchronize object store limit
>>>>    Ceph fscache: Update object store limit after writing
>>>>    Ceph fscache: Wait for completion of object initialization
>>>>
>>>>   fs/ceph/cache.c |    1 +
>>>>   fs/ceph/cache.h |   10 ++++++++++
>>>>   fs/ceph/file.c  |    3 +++
>>>>   3 files changed, 14 insertions(+)
>>>>
>>>> --
>>>> 1.7.9.5
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
>
> p: 646-253-9055
> e: milosz@adfin.com



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
  2014-03-03 16:54       ` Milosz Tanski
@ 2014-03-06 22:01         ` Sage Weil
  0 siblings, 0 replies; 10+ messages in thread
From: Sage Weil @ 2014-03-06 22:01 UTC (permalink / raw)
  To: Milosz Tanski
  Cc: Li Wang, ceph-devel, linux-fsdevel@vger.kernel.org, linux-kernel,
	Yunchuan Wen

Hi Milosz,

Thanks, I've added this to the testing branch!

sage


On Mon, 3 Mar 2014, Milosz Tanski wrote:

> Hey guys, I'm terribly sorry but apparently this got stuck in my draft
> mailbox for 3 months. Since we've been running this on both our test /
> prod clusters I would say this is sufficiently tested.
> 
> I've looked at the code and tested this for a week now on test cluster
> and it looks good. It does fix a real problem and I think we should
> push it to mainline. Thanks for fixing this Li! The reason I was not
> seeing this is that I had other fscache patches that were masking this
> problem :/
> 
> Thanks,
> - Milosz
> 
> P.S: Sorry for the double mail, the first one was not sent as text. I
> apparently do not know how to use gmail.
> 
> On Fri, Jan 3, 2014 at 9:43 AM, Milosz Tanski <milosz@adfin.com> wrote:
> > I'm going to look the patches and the issue in full detail. In the
> > meantime do you guys have the oops back trace. I have some other
> > fscache patches that haven't made it upstream yet that might have been
> > masking this issue for me.
> >
> > On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> >> Hi Milosz,
> >>   As far as I know, logically, currently fscache does not play
> >> as write cache for Ceph, except that there is a
> >> call to ceph_readpage_to_fscache() in ceph_writepage(), but that
> >> is nothing related to our test case. According to our observation,
> >> our test case never goes through ceph_writepage(), instead, it goes
> >> through ceph_writepages(). So in other words, I donot think this
> >> is related to caching in write path.
> >>   May I try to explain the panic in more detail,
> >>
> >> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
> >> (2) echo 3 > /proc/sys/vm/drop_caches
> >> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024
> >>
> >> For statement (1), it is frequently appending a file, so
> >> ceph_aio_write() frequently updates the inode->i_size,
> >> however, these updates did not immediately reflected to
> >> object->store_limit_l. For statement (3), when we
> >> start reading the second page at [4096, 8192), ceph find that the page
> >> does not be cached in fscache, then it decides to write this page into
> >> fscache, during this process in cachefiles_write_page(), it found that
> >> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
> >> it make sense?
> >>
> >> Cheers,
> >> Li Wang
> >>
> >>
> >> On 2013/12/27 6:51, Milosz Tanski wrote:
> >>>
> >>> Li,
> >>>
> >>> I looked at the patchset am I correct that this only happens when we
> >>> enable caching in the write path?
> >>>
> >>> - Milosz
> >>>
> >>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
> >>>>
> >>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
> >>>>
> >>>> The following scripts could easily panic the kernel,
> >>>>
> >>>> #!/bin/bash
> >>>> mount -t ceph -o fsc MONADDR:/ cephfs
> >>>> rm -rf cephfs/foo
> >>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
> >>>> echo 3 > /proc/sys/vm/drop_caches
> >>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
> >>>>
> >>>> This is due to when writing a page into fscache, the code will
> >>>> assert that the write position does not exceed the
> >>>> object->store_limit_l, which is supposed to be equal to inode->i_size.
> >>>> However, for current implementation, after file writing, the
> >>>> object->store_limit_l is not synchronized with new
> >>>> inode->i_size immediately, which introduces a race that if writing
> >>>> a new page into fscache, will reach the ASSERT that write position
> >>>> has exceeded the object->store_limit_l, and cause kernel panic.
> >>>> This patch fixes it.
> >>>>
> >>>> Yunchuan Wen (3):
> >>>>    Ceph fscache: Add an interface to synchronize object store limit
> >>>>    Ceph fscache: Update object store limit after writing
> >>>>    Ceph fscache: Wait for completion of object initialization
> >>>>
> >>>>   fs/ceph/cache.c |    1 +
> >>>>   fs/ceph/cache.h |   10 ++++++++++
> >>>>   fs/ceph/file.c  |    3 +++
> >>>>   3 files changed, 14 insertions(+)
> >>>>
> >>>> --
> >>>> 1.7.9.5
> >>>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Milosz Tanski
> > CTO
> > 10 East 53rd Street, 37th floor
> > New York, NY 10022
> >
> > p: 646-253-9055
> > e: milosz@adfin.com
> 
> 
> 
> -- 
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
> 
> p: 646-253-9055
> e: milosz@adfin.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-03-06 22:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang
2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang
2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
2013-12-27  1:41   ` Yunchuan Wen
2013-12-28  3:51   ` Li Wang
2014-01-03 14:43     ` Milosz Tanski
2014-03-03 16:54       ` Milosz Tanski
2014-03-06 22:01         ` Sage Weil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).