* [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
@ 2013-12-26 14:29 Li Wang
2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
To: ceph-devel
Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
Yunchuan Wen
From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
The following scripts could easily panic the kernel,
#!/bin/bash
mount -t ceph -o fsc MONADDR:/ cephfs
rm -rf cephfs/foo
dd if=/dev/zero of=cephfs/foo bs=8 count=512
echo 3 > /proc/sys/vm/drop_caches
dd if=cephfs/foo of=/dev/null bs=8 count=1024
This is due to when writing a page into fscache, the code will
assert that the write position does not exceed the
object->store_limit_l, which is supposed to be equal to inode->i_size.
However, for current implementation, after file writing, the
object->store_limit_l is not synchronized with new
inode->i_size immediately, which introduces a race that if writing
a new page into fscache, will reach the ASSERT that write position
has exceeded the object->store_limit_l, and cause kernel panic.
This patch fixes it.
Yunchuan Wen (3):
Ceph fscache: Add an interface to synchronize object store limit
Ceph fscache: Update object store limit after writing
Ceph fscache: Wait for completion of object initialization
fs/ceph/cache.c | 1 +
fs/ceph/cache.h | 10 ++++++++++
fs/ceph/file.c | 3 +++
3 files changed, 14 insertions(+)
--
1.7.9.5
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit
2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
@ 2013-12-26 14:29 ` Li Wang
2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
To: ceph-devel
Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
Yunchuan Wen, Min Chen, Li Wang
From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Add an interface to explicitly synchronize object->store_limit[_l]
with inode->i_size
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
fs/ceph/cache.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index ba94940..262106b 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -48,6 +48,12 @@ void ceph_readpage_to_fscache(struct inode *inode, struct page *page);
void ceph_invalidate_fscache_page(struct inode* inode, struct page *page);
void ceph_queue_revalidate(struct inode *inode);
+static inline void ceph_fscache_update_objectsize(struct inode *inode)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ fscache_attr_changed(ci->fscache);
+}
+
static inline void ceph_fscache_invalidate(struct inode *inode)
{
fscache_invalidate(ceph_inode(inode)->fscache);
@@ -127,6 +133,10 @@ static inline void ceph_readpage_to_fscache(struct inode *inode,
{
}
+static inline void ceph_fscache_update_objectsize(struct inode *inode)
+{
+}
+
static inline void ceph_fscache_invalidate(struct inode *inode)
{
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] Ceph fscache: Update object store limit after file writing
2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
@ 2013-12-26 14:29 ` Li Wang
2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang
2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
3 siblings, 0 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
To: ceph-devel
Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
Yunchuan Wen, Min Chen, Li Wang
From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Synchronize object->store_limit[_l] with new inode->i_size after file writing.
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
fs/ceph/file.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3de8982..b6df7ab 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -786,6 +786,7 @@ retry_snap:
goto retry_snap;
}
} else {
+ loff_t old_size = inode->i_size;
/*
* No need to acquire the i_truncate_mutex. Because
* the MDS revokes Fwb caps before sending truncate
@@ -796,6 +797,8 @@ retry_snap:
written = generic_file_buffered_write(iocb, iov, nr_segs,
pos, &iocb->ki_pos,
count, 0);
+ if (inode->i_size > old_size)
+ ceph_fscache_update_objectsize(inode);
mutex_unlock(&inode->i_mutex);
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] Ceph fscache: Wait for completion of object initialization
2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang
@ 2013-12-26 14:29 ` Li Wang
2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
3 siblings, 0 replies; 10+ messages in thread
From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw)
To: ceph-devel
Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel,
Yunchuan Wen, Min Chen, Li Wang
From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
The object store limit needs to be updated after writing,
and this can be done provided the corresponding object has already
been initialized. Current object initialization is done asynchrously,
which introduce a race if a file is opened, then immediately followed
by a writing, the initialization may have not completed, the code will
reach the ASSERT in fscache_submit_exclusive_op() to cause kernel
bug.
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
---
fs/ceph/cache.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 8c44fdd..834f9f3 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -205,6 +205,7 @@ void ceph_fscache_register_inode_cookie(struct ceph_fs_client* fsc,
ci->fscache = fscache_acquire_cookie(fsc->fscache,
&ceph_fscache_inode_object_def,
ci, true);
+ fscache_check_consistency(ci->fscache);
done:
mutex_unlock(&inode->i_mutex);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
` (2 preceding siblings ...)
2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang
@ 2013-12-26 22:51 ` Milosz Tanski
2013-12-27 1:41 ` Yunchuan Wen
2013-12-28 3:51 ` Li Wang
3 siblings, 2 replies; 10+ messages in thread
From: Milosz Tanski @ 2013-12-26 22:51 UTC (permalink / raw)
To: Li Wang
Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
linux-kernel, Yunchuan Wen
Li,
I looked at the patchset am I correct that this only happens when we
enable caching in the write path?
- Milosz
On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>
> The following scripts could easily panic the kernel,
>
> #!/bin/bash
> mount -t ceph -o fsc MONADDR:/ cephfs
> rm -rf cephfs/foo
> dd if=/dev/zero of=cephfs/foo bs=8 count=512
> echo 3 > /proc/sys/vm/drop_caches
> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>
> This is due to when writing a page into fscache, the code will
> assert that the write position does not exceed the
> object->store_limit_l, which is supposed to be equal to inode->i_size.
> However, for current implementation, after file writing, the
> object->store_limit_l is not synchronized with new
> inode->i_size immediately, which introduces a race that if writing
> a new page into fscache, will reach the ASSERT that write position
> has exceeded the object->store_limit_l, and cause kernel panic.
> This patch fixes it.
>
> Yunchuan Wen (3):
> Ceph fscache: Add an interface to synchronize object store limit
> Ceph fscache: Update object store limit after writing
> Ceph fscache: Wait for completion of object initialization
>
> fs/ceph/cache.c | 1 +
> fs/ceph/cache.h | 10 ++++++++++
> fs/ceph/file.c | 3 +++
> 3 files changed, 14 insertions(+)
>
> --
> 1.7.9.5
>
--
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022
p: 646-253-9055
e: milosz@adfin.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
@ 2013-12-27 1:41 ` Yunchuan Wen
2013-12-28 3:51 ` Li Wang
1 sibling, 0 replies; 10+ messages in thread
From: Yunchuan Wen @ 2013-12-27 1:41 UTC (permalink / raw)
To: Milosz Tanski
Cc: Li Wang, ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
linux-kernel
Hi, Milosz
I am not very sure about how to enable caching in write path.
I just compile the latest kernel with "Enable Ceph client caching
support", start cachefilesd, and mount cephfs with -o fsc.
Then, the kernel will panic easily when the script runs to "dd
if=cephfs/foo of=/dev/null bs=8 count=1024"
2013/12/27 Milosz Tanski <milosz@adfin.com>:
> Li,
>
> I looked at the patchset am I correct that this only happens when we
> enable caching in the write path?
>
> - Milosz
>
> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>
>> The following scripts could easily panic the kernel,
>>
>> #!/bin/bash
>> mount -t ceph -o fsc MONADDR:/ cephfs
>> rm -rf cephfs/foo
>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>> echo 3 > /proc/sys/vm/drop_caches
>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>
>> This is due to when writing a page into fscache, the code will
>> assert that the write position does not exceed the
>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>> However, for current implementation, after file writing, the
>> object->store_limit_l is not synchronized with new
>> inode->i_size immediately, which introduces a race that if writing
>> a new page into fscache, will reach the ASSERT that write position
>> has exceeded the object->store_limit_l, and cause kernel panic.
>> This patch fixes it.
>>
>> Yunchuan Wen (3):
>> Ceph fscache: Add an interface to synchronize object store limit
>> Ceph fscache: Update object store limit after writing
>> Ceph fscache: Wait for completion of object initialization
>>
>> fs/ceph/cache.c | 1 +
>> fs/ceph/cache.h | 10 ++++++++++
>> fs/ceph/file.c | 3 +++
>> 3 files changed, 14 insertions(+)
>>
>> --
>> 1.7.9.5
>>
>
>
>
> --
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
>
> p: 646-253-9055
> e: milosz@adfin.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
2013-12-27 1:41 ` Yunchuan Wen
@ 2013-12-28 3:51 ` Li Wang
2014-01-03 14:43 ` Milosz Tanski
1 sibling, 1 reply; 10+ messages in thread
From: Li Wang @ 2013-12-28 3:51 UTC (permalink / raw)
To: Milosz Tanski
Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
linux-kernel, Yunchuan Wen
Hi Milosz,
As far as I know, logically, currently fscache does not play
as write cache for Ceph, except that there is a
call to ceph_readpage_to_fscache() in ceph_writepage(), but that
is nothing related to our test case. According to our observation,
our test case never goes through ceph_writepage(), instead, it goes
through ceph_writepages(). So in other words, I donot think this
is related to caching in write path.
May I try to explain the panic in more detail,
(1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
(2) echo 3 > /proc/sys/vm/drop_caches
(3) dd if=cephfs/foo of=/dev/null bs=8 count=1024
For statement (1), it is frequently appending a file, so
ceph_aio_write() frequently updates the inode->i_size,
however, these updates did not immediately reflected to
object->store_limit_l. For statement (3), when we
start reading the second page at [4096, 8192), ceph find that the page
does not be cached in fscache, then it decides to write this page into
fscache, during this process in cachefiles_write_page(), it found that
object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
it make sense?
Cheers,
Li Wang
On 2013/12/27 6:51, Milosz Tanski wrote:
> Li,
>
> I looked at the patchset am I correct that this only happens when we
> enable caching in the write path?
>
> - Milosz
>
> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>
>> The following scripts could easily panic the kernel,
>>
>> #!/bin/bash
>> mount -t ceph -o fsc MONADDR:/ cephfs
>> rm -rf cephfs/foo
>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>> echo 3 > /proc/sys/vm/drop_caches
>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>
>> This is due to when writing a page into fscache, the code will
>> assert that the write position does not exceed the
>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>> However, for current implementation, after file writing, the
>> object->store_limit_l is not synchronized with new
>> inode->i_size immediately, which introduces a race that if writing
>> a new page into fscache, will reach the ASSERT that write position
>> has exceeded the object->store_limit_l, and cause kernel panic.
>> This patch fixes it.
>>
>> Yunchuan Wen (3):
>> Ceph fscache: Add an interface to synchronize object store limit
>> Ceph fscache: Update object store limit after writing
>> Ceph fscache: Wait for completion of object initialization
>>
>> fs/ceph/cache.c | 1 +
>> fs/ceph/cache.h | 10 ++++++++++
>> fs/ceph/file.c | 3 +++
>> 3 files changed, 14 insertions(+)
>>
>> --
>> 1.7.9.5
>>
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
2013-12-28 3:51 ` Li Wang
@ 2014-01-03 14:43 ` Milosz Tanski
2014-03-03 16:54 ` Milosz Tanski
0 siblings, 1 reply; 10+ messages in thread
From: Milosz Tanski @ 2014-01-03 14:43 UTC (permalink / raw)
To: Li Wang
Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
linux-kernel, Yunchuan Wen
I'm going to look the patches and the issue in full detail. In the
meantime do you guys have the oops back trace. I have some other
fscache patches that haven't made it upstream yet that might have been
masking this issue for me.
On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> Hi Milosz,
> As far as I know, logically, currently fscache does not play
> as write cache for Ceph, except that there is a
> call to ceph_readpage_to_fscache() in ceph_writepage(), but that
> is nothing related to our test case. According to our observation,
> our test case never goes through ceph_writepage(), instead, it goes
> through ceph_writepages(). So in other words, I donot think this
> is related to caching in write path.
> May I try to explain the panic in more detail,
>
> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
> (2) echo 3 > /proc/sys/vm/drop_caches
> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024
>
> For statement (1), it is frequently appending a file, so
> ceph_aio_write() frequently updates the inode->i_size,
> however, these updates did not immediately reflected to
> object->store_limit_l. For statement (3), when we
> start reading the second page at [4096, 8192), ceph find that the page
> does not be cached in fscache, then it decides to write this page into
> fscache, during this process in cachefiles_write_page(), it found that
> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
> it make sense?
>
> Cheers,
> Li Wang
>
>
> On 2013/12/27 6:51, Milosz Tanski wrote:
>>
>> Li,
>>
>> I looked at the patchset am I correct that this only happens when we
>> enable caching in the write path?
>>
>> - Milosz
>>
>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>>>
>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>>
>>> The following scripts could easily panic the kernel,
>>>
>>> #!/bin/bash
>>> mount -t ceph -o fsc MONADDR:/ cephfs
>>> rm -rf cephfs/foo
>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>>> echo 3 > /proc/sys/vm/drop_caches
>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>>
>>> This is due to when writing a page into fscache, the code will
>>> assert that the write position does not exceed the
>>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>>> However, for current implementation, after file writing, the
>>> object->store_limit_l is not synchronized with new
>>> inode->i_size immediately, which introduces a race that if writing
>>> a new page into fscache, will reach the ASSERT that write position
>>> has exceeded the object->store_limit_l, and cause kernel panic.
>>> This patch fixes it.
>>>
>>> Yunchuan Wen (3):
>>> Ceph fscache: Add an interface to synchronize object store limit
>>> Ceph fscache: Update object store limit after writing
>>> Ceph fscache: Wait for completion of object initialization
>>>
>>> fs/ceph/cache.c | 1 +
>>> fs/ceph/cache.h | 10 ++++++++++
>>> fs/ceph/file.c | 3 +++
>>> 3 files changed, 14 insertions(+)
>>>
>>> --
>>> 1.7.9.5
>>>
>>
>>
>>
>
--
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022
p: 646-253-9055
e: milosz@adfin.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
2014-01-03 14:43 ` Milosz Tanski
@ 2014-03-03 16:54 ` Milosz Tanski
2014-03-06 22:01 ` Sage Weil
0 siblings, 1 reply; 10+ messages in thread
From: Milosz Tanski @ 2014-03-03 16:54 UTC (permalink / raw)
To: Li Wang
Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org,
linux-kernel, Yunchuan Wen
Hey guys, I'm terribly sorry but apparently this got stuck in my draft
mailbox for 3 months. Since we've been running this on both our test /
prod clusters I would say this is sufficiently tested.
I've looked at the code and tested this for a week now on test cluster
and it looks good. It does fix a real problem and I think we should
push it to mainline. Thanks for fixing this Li! The reason I was not
seeing this is that I had other fscache patches that were masking this
problem :/
Thanks,
- Milosz
P.S: Sorry for the double mail, the first one was not sent as text. I
apparently do not know how to use gmail.
On Fri, Jan 3, 2014 at 9:43 AM, Milosz Tanski <milosz@adfin.com> wrote:
> I'm going to look the patches and the issue in full detail. In the
> meantime do you guys have the oops back trace. I have some other
> fscache patches that haven't made it upstream yet that might have been
> masking this issue for me.
>
> On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote:
>> Hi Milosz,
>> As far as I know, logically, currently fscache does not play
>> as write cache for Ceph, except that there is a
>> call to ceph_readpage_to_fscache() in ceph_writepage(), but that
>> is nothing related to our test case. According to our observation,
>> our test case never goes through ceph_writepage(), instead, it goes
>> through ceph_writepages(). So in other words, I donot think this
>> is related to caching in write path.
>> May I try to explain the panic in more detail,
>>
>> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
>> (2) echo 3 > /proc/sys/vm/drop_caches
>> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>
>> For statement (1), it is frequently appending a file, so
>> ceph_aio_write() frequently updates the inode->i_size,
>> however, these updates did not immediately reflected to
>> object->store_limit_l. For statement (3), when we
>> start reading the second page at [4096, 8192), ceph find that the page
>> does not be cached in fscache, then it decides to write this page into
>> fscache, during this process in cachefiles_write_page(), it found that
>> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
>> it make sense?
>>
>> Cheers,
>> Li Wang
>>
>>
>> On 2013/12/27 6:51, Milosz Tanski wrote:
>>>
>>> Li,
>>>
>>> I looked at the patchset am I correct that this only happens when we
>>> enable caching in the write path?
>>>
>>> - Milosz
>>>
>>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
>>>>
>>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
>>>>
>>>> The following scripts could easily panic the kernel,
>>>>
>>>> #!/bin/bash
>>>> mount -t ceph -o fsc MONADDR:/ cephfs
>>>> rm -rf cephfs/foo
>>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
>>>> echo 3 > /proc/sys/vm/drop_caches
>>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
>>>>
>>>> This is due to when writing a page into fscache, the code will
>>>> assert that the write position does not exceed the
>>>> object->store_limit_l, which is supposed to be equal to inode->i_size.
>>>> However, for current implementation, after file writing, the
>>>> object->store_limit_l is not synchronized with new
>>>> inode->i_size immediately, which introduces a race that if writing
>>>> a new page into fscache, will reach the ASSERT that write position
>>>> has exceeded the object->store_limit_l, and cause kernel panic.
>>>> This patch fixes it.
>>>>
>>>> Yunchuan Wen (3):
>>>> Ceph fscache: Add an interface to synchronize object store limit
>>>> Ceph fscache: Update object store limit after writing
>>>> Ceph fscache: Wait for completion of object initialization
>>>>
>>>> fs/ceph/cache.c | 1 +
>>>> fs/ceph/cache.h | 10 ++++++++++
>>>> fs/ceph/file.c | 3 +++
>>>> 3 files changed, 14 insertions(+)
>>>>
>>>> --
>>>> 1.7.9.5
>>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
>
> p: 646-253-9055
> e: milosz@adfin.com
--
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022
p: 646-253-9055
e: milosz@adfin.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race
2014-03-03 16:54 ` Milosz Tanski
@ 2014-03-06 22:01 ` Sage Weil
0 siblings, 0 replies; 10+ messages in thread
From: Sage Weil @ 2014-03-06 22:01 UTC (permalink / raw)
To: Milosz Tanski
Cc: Li Wang, ceph-devel, linux-fsdevel@vger.kernel.org, linux-kernel,
Yunchuan Wen
Hi Milosz,
Thanks, I've added this to the testing branch!
sage
On Mon, 3 Mar 2014, Milosz Tanski wrote:
> Hey guys, I'm terribly sorry but apparently this got stuck in my draft
> mailbox for 3 months. Since we've been running this on both our test /
> prod clusters I would say this is sufficiently tested.
>
> I've looked at the code and tested this for a week now on test cluster
> and it looks good. It does fix a real problem and I think we should
> push it to mainline. Thanks for fixing this Li! The reason I was not
> seeing this is that I had other fscache patches that were masking this
> problem :/
>
> Thanks,
> - Milosz
>
> P.S: Sorry for the double mail, the first one was not sent as text. I
> apparently do not know how to use gmail.
>
> On Fri, Jan 3, 2014 at 9:43 AM, Milosz Tanski <milosz@adfin.com> wrote:
> > I'm going to look the patches and the issue in full detail. In the
> > meantime do you guys have the oops back trace. I have some other
> > fscache patches that haven't made it upstream yet that might have been
> > masking this issue for me.
> >
> > On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> >> Hi Milosz,
> >> As far as I know, logically, currently fscache does not play
> >> as write cache for Ceph, except that there is a
> >> call to ceph_readpage_to_fscache() in ceph_writepage(), but that
> >> is nothing related to our test case. According to our observation,
> >> our test case never goes through ceph_writepage(), instead, it goes
> >> through ceph_writepages(). So in other words, I donot think this
> >> is related to caching in write path.
> >> May I try to explain the panic in more detail,
> >>
> >> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512
> >> (2) echo 3 > /proc/sys/vm/drop_caches
> >> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024
> >>
> >> For statement (1), it is frequently appending a file, so
> >> ceph_aio_write() frequently updates the inode->i_size,
> >> however, these updates did not immediately reflected to
> >> object->store_limit_l. For statement (3), when we
> >> start reading the second page at [4096, 8192), ceph find that the page
> >> does not be cached in fscache, then it decides to write this page into
> >> fscache, during this process in cachefiles_write_page(), it found that
> >> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does
> >> it make sense?
> >>
> >> Cheers,
> >> Li Wang
> >>
> >>
> >> On 2013/12/27 6:51, Milosz Tanski wrote:
> >>>
> >>> Li,
> >>>
> >>> I looked at the patchset am I correct that this only happens when we
> >>> enable caching in the write path?
> >>>
> >>> - Milosz
> >>>
> >>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote:
> >>>>
> >>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
> >>>>
> >>>> The following scripts could easily panic the kernel,
> >>>>
> >>>> #!/bin/bash
> >>>> mount -t ceph -o fsc MONADDR:/ cephfs
> >>>> rm -rf cephfs/foo
> >>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512
> >>>> echo 3 > /proc/sys/vm/drop_caches
> >>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024
> >>>>
> >>>> This is due to when writing a page into fscache, the code will
> >>>> assert that the write position does not exceed the
> >>>> object->store_limit_l, which is supposed to be equal to inode->i_size.
> >>>> However, for current implementation, after file writing, the
> >>>> object->store_limit_l is not synchronized with new
> >>>> inode->i_size immediately, which introduces a race that if writing
> >>>> a new page into fscache, will reach the ASSERT that write position
> >>>> has exceeded the object->store_limit_l, and cause kernel panic.
> >>>> This patch fixes it.
> >>>>
> >>>> Yunchuan Wen (3):
> >>>> Ceph fscache: Add an interface to synchronize object store limit
> >>>> Ceph fscache: Update object store limit after writing
> >>>> Ceph fscache: Wait for completion of object initialization
> >>>>
> >>>> fs/ceph/cache.c | 1 +
> >>>> fs/ceph/cache.h | 10 ++++++++++
> >>>> fs/ceph/file.c | 3 +++
> >>>> 3 files changed, 14 insertions(+)
> >>>>
> >>>> --
> >>>> 1.7.9.5
> >>>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Milosz Tanski
> > CTO
> > 10 East 53rd Street, 37th floor
> > New York, NY 10022
> >
> > p: 646-253-9055
> > e: milosz@adfin.com
>
>
>
> --
> Milosz Tanski
> CTO
> 10 East 53rd Street, 37th floor
> New York, NY 10022
>
> p: 646-253-9055
> e: milosz@adfin.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-03-06 22:01 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang
2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang
2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang
2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang
2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski
2013-12-27 1:41 ` Yunchuan Wen
2013-12-28 3:51 ` Li Wang
2014-01-03 14:43 ` Milosz Tanski
2014-03-03 16:54 ` Milosz Tanski
2014-03-06 22:01 ` Sage Weil
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).