* [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race @ 2013-12-26 14:29 Li Wang 2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw) To: ceph-devel Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel, Yunchuan Wen From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> The following scripts could easily panic the kernel, #!/bin/bash mount -t ceph -o fsc MONADDR:/ cephfs rm -rf cephfs/foo dd if=/dev/zero of=cephfs/foo bs=8 count=512 echo 3 > /proc/sys/vm/drop_caches dd if=cephfs/foo of=/dev/null bs=8 count=1024 This is due to when writing a page into fscache, the code will assert that the write position does not exceed the object->store_limit_l, which is supposed to be equal to inode->i_size. However, for current implementation, after file writing, the object->store_limit_l is not synchronized with new inode->i_size immediately, which introduces a race that if writing a new page into fscache, will reach the ASSERT that write position has exceeded the object->store_limit_l, and cause kernel panic. This patch fixes it. Yunchuan Wen (3): Ceph fscache: Add an interface to synchronize object store limit Ceph fscache: Update object store limit after writing Ceph fscache: Wait for completion of object initialization fs/ceph/cache.c | 1 + fs/ceph/cache.h | 10 ++++++++++ fs/ceph/file.c | 3 +++ 3 files changed, 14 insertions(+) -- 1.7.9.5 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit 2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang @ 2013-12-26 14:29 ` Li Wang 2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw) To: ceph-devel Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel, Yunchuan Wen, Min Chen, Li Wang From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Add an interface to explicitly synchronize object->store_limit[_l] with inode->i_size Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Min Chen <minchen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com> --- fs/ceph/cache.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h index ba94940..262106b 100644 --- a/fs/ceph/cache.h +++ b/fs/ceph/cache.h @@ -48,6 +48,12 @@ void ceph_readpage_to_fscache(struct inode *inode, struct page *page); void ceph_invalidate_fscache_page(struct inode* inode, struct page *page); void ceph_queue_revalidate(struct inode *inode); +static inline void ceph_fscache_update_objectsize(struct inode *inode) +{ + struct ceph_inode_info *ci = ceph_inode(inode); + fscache_attr_changed(ci->fscache); +} + static inline void ceph_fscache_invalidate(struct inode *inode) { fscache_invalidate(ceph_inode(inode)->fscache); @@ -127,6 +133,10 @@ static inline void ceph_readpage_to_fscache(struct inode *inode, { } +static inline void ceph_fscache_update_objectsize(struct inode *inode) +{ +} + static inline void ceph_fscache_invalidate(struct inode *inode) { } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] Ceph fscache: Update object store limit after file writing 2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang 2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang @ 2013-12-26 14:29 ` Li Wang 2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang 2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski 3 siblings, 0 replies; 10+ messages in thread From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw) To: ceph-devel Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel, Yunchuan Wen, Min Chen, Li Wang From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Synchronize object->store_limit[_l] with new inode->i_size after file writing. Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Min Chen <minchen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com> --- fs/ceph/file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 3de8982..b6df7ab 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -786,6 +786,7 @@ retry_snap: goto retry_snap; } } else { + loff_t old_size = inode->i_size; /* * No need to acquire the i_truncate_mutex. Because * the MDS revokes Fwb caps before sending truncate @@ -796,6 +797,8 @@ retry_snap: written = generic_file_buffered_write(iocb, iov, nr_segs, pos, &iocb->ki_pos, count, 0); + if (inode->i_size > old_size) + ceph_fscache_update_objectsize(inode); mutex_unlock(&inode->i_mutex); } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] Ceph fscache: Wait for completion of object initialization 2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang 2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang 2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang @ 2013-12-26 14:29 ` Li Wang 2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski 3 siblings, 0 replies; 10+ messages in thread From: Li Wang @ 2013-12-26 14:29 UTC (permalink / raw) To: ceph-devel Cc: Sage Weil, Milosz Tanski, linux-fsdevel, linux-kernel, Yunchuan Wen, Min Chen, Li Wang From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> The object store limit needs to be updated after writing, and this can be done provided the corresponding object has already been initialized. Current object initialization is done asynchrously, which introduce a race if a file is opened, then immediately followed by a writing, the initialization may have not completed, the code will reach the ASSERT in fscache_submit_exclusive_op() to cause kernel bug. Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Min Chen <minchen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com> --- fs/ceph/cache.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c index 8c44fdd..834f9f3 100644 --- a/fs/ceph/cache.c +++ b/fs/ceph/cache.c @@ -205,6 +205,7 @@ void ceph_fscache_register_inode_cookie(struct ceph_fs_client* fsc, ci->fscache = fscache_acquire_cookie(fsc->fscache, &ceph_fscache_inode_object_def, ci, true); + fscache_check_consistency(ci->fscache); done: mutex_unlock(&inode->i_mutex); -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race 2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang ` (2 preceding siblings ...) 2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang @ 2013-12-26 22:51 ` Milosz Tanski 2013-12-27 1:41 ` Yunchuan Wen 2013-12-28 3:51 ` Li Wang 3 siblings, 2 replies; 10+ messages in thread From: Milosz Tanski @ 2013-12-26 22:51 UTC (permalink / raw) To: Li Wang Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org, linux-kernel, Yunchuan Wen Li, I looked at the patchset am I correct that this only happens when we enable caching in the write path? - Milosz On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote: > From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> > > The following scripts could easily panic the kernel, > > #!/bin/bash > mount -t ceph -o fsc MONADDR:/ cephfs > rm -rf cephfs/foo > dd if=/dev/zero of=cephfs/foo bs=8 count=512 > echo 3 > /proc/sys/vm/drop_caches > dd if=cephfs/foo of=/dev/null bs=8 count=1024 > > This is due to when writing a page into fscache, the code will > assert that the write position does not exceed the > object->store_limit_l, which is supposed to be equal to inode->i_size. > However, for current implementation, after file writing, the > object->store_limit_l is not synchronized with new > inode->i_size immediately, which introduces a race that if writing > a new page into fscache, will reach the ASSERT that write position > has exceeded the object->store_limit_l, and cause kernel panic. > This patch fixes it. > > Yunchuan Wen (3): > Ceph fscache: Add an interface to synchronize object store limit > Ceph fscache: Update object store limit after writing > Ceph fscache: Wait for completion of object initialization > > fs/ceph/cache.c | 1 + > fs/ceph/cache.h | 10 ++++++++++ > fs/ceph/file.c | 3 +++ > 3 files changed, 14 insertions(+) > > -- > 1.7.9.5 > -- Milosz Tanski CTO 10 East 53rd Street, 37th floor New York, NY 10022 p: 646-253-9055 e: milosz@adfin.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race 2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski @ 2013-12-27 1:41 ` Yunchuan Wen 2013-12-28 3:51 ` Li Wang 1 sibling, 0 replies; 10+ messages in thread From: Yunchuan Wen @ 2013-12-27 1:41 UTC (permalink / raw) To: Milosz Tanski Cc: Li Wang, ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org, linux-kernel Hi, Milosz I am not very sure about how to enable caching in write path. I just compile the latest kernel with "Enable Ceph client caching support", start cachefilesd, and mount cephfs with -o fsc. Then, the kernel will panic easily when the script runs to "dd if=cephfs/foo of=/dev/null bs=8 count=1024" 2013/12/27 Milosz Tanski <milosz@adfin.com>: > Li, > > I looked at the patchset am I correct that this only happens when we > enable caching in the write path? > > - Milosz > > On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote: >> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> >> >> The following scripts could easily panic the kernel, >> >> #!/bin/bash >> mount -t ceph -o fsc MONADDR:/ cephfs >> rm -rf cephfs/foo >> dd if=/dev/zero of=cephfs/foo bs=8 count=512 >> echo 3 > /proc/sys/vm/drop_caches >> dd if=cephfs/foo of=/dev/null bs=8 count=1024 >> >> This is due to when writing a page into fscache, the code will >> assert that the write position does not exceed the >> object->store_limit_l, which is supposed to be equal to inode->i_size. >> However, for current implementation, after file writing, the >> object->store_limit_l is not synchronized with new >> inode->i_size immediately, which introduces a race that if writing >> a new page into fscache, will reach the ASSERT that write position >> has exceeded the object->store_limit_l, and cause kernel panic. >> This patch fixes it. >> >> Yunchuan Wen (3): >> Ceph fscache: Add an interface to synchronize object store limit >> Ceph fscache: Update object store limit after writing >> Ceph fscache: Wait for completion of object initialization >> >> fs/ceph/cache.c | 1 + >> fs/ceph/cache.h | 10 ++++++++++ >> fs/ceph/file.c | 3 +++ >> 3 files changed, 14 insertions(+) >> >> -- >> 1.7.9.5 >> > > > > -- > Milosz Tanski > CTO > 10 East 53rd Street, 37th floor > New York, NY 10022 > > p: 646-253-9055 > e: milosz@adfin.com > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race 2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski 2013-12-27 1:41 ` Yunchuan Wen @ 2013-12-28 3:51 ` Li Wang 2014-01-03 14:43 ` Milosz Tanski 1 sibling, 1 reply; 10+ messages in thread From: Li Wang @ 2013-12-28 3:51 UTC (permalink / raw) To: Milosz Tanski Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org, linux-kernel, Yunchuan Wen Hi Milosz, As far as I know, logically, currently fscache does not play as write cache for Ceph, except that there is a call to ceph_readpage_to_fscache() in ceph_writepage(), but that is nothing related to our test case. According to our observation, our test case never goes through ceph_writepage(), instead, it goes through ceph_writepages(). So in other words, I donot think this is related to caching in write path. May I try to explain the panic in more detail, (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512 (2) echo 3 > /proc/sys/vm/drop_caches (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024 For statement (1), it is frequently appending a file, so ceph_aio_write() frequently updates the inode->i_size, however, these updates did not immediately reflected to object->store_limit_l. For statement (3), when we start reading the second page at [4096, 8192), ceph find that the page does not be cached in fscache, then it decides to write this page into fscache, during this process in cachefiles_write_page(), it found that object->store_limit_l < 4096 (page->index << 12), it causes panic. Does it make sense? Cheers, Li Wang On 2013/12/27 6:51, Milosz Tanski wrote: > Li, > > I looked at the patchset am I correct that this only happens when we > enable caching in the write path? > > - Milosz > > On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote: >> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> >> >> The following scripts could easily panic the kernel, >> >> #!/bin/bash >> mount -t ceph -o fsc MONADDR:/ cephfs >> rm -rf cephfs/foo >> dd if=/dev/zero of=cephfs/foo bs=8 count=512 >> echo 3 > /proc/sys/vm/drop_caches >> dd if=cephfs/foo of=/dev/null bs=8 count=1024 >> >> This is due to when writing a page into fscache, the code will >> assert that the write position does not exceed the >> object->store_limit_l, which is supposed to be equal to inode->i_size. >> However, for current implementation, after file writing, the >> object->store_limit_l is not synchronized with new >> inode->i_size immediately, which introduces a race that if writing >> a new page into fscache, will reach the ASSERT that write position >> has exceeded the object->store_limit_l, and cause kernel panic. >> This patch fixes it. >> >> Yunchuan Wen (3): >> Ceph fscache: Add an interface to synchronize object store limit >> Ceph fscache: Update object store limit after writing >> Ceph fscache: Wait for completion of object initialization >> >> fs/ceph/cache.c | 1 + >> fs/ceph/cache.h | 10 ++++++++++ >> fs/ceph/file.c | 3 +++ >> 3 files changed, 14 insertions(+) >> >> -- >> 1.7.9.5 >> > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race 2013-12-28 3:51 ` Li Wang @ 2014-01-03 14:43 ` Milosz Tanski 2014-03-03 16:54 ` Milosz Tanski 0 siblings, 1 reply; 10+ messages in thread From: Milosz Tanski @ 2014-01-03 14:43 UTC (permalink / raw) To: Li Wang Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org, linux-kernel, Yunchuan Wen I'm going to look the patches and the issue in full detail. In the meantime do you guys have the oops back trace. I have some other fscache patches that haven't made it upstream yet that might have been masking this issue for me. On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote: > Hi Milosz, > As far as I know, logically, currently fscache does not play > as write cache for Ceph, except that there is a > call to ceph_readpage_to_fscache() in ceph_writepage(), but that > is nothing related to our test case. According to our observation, > our test case never goes through ceph_writepage(), instead, it goes > through ceph_writepages(). So in other words, I donot think this > is related to caching in write path. > May I try to explain the panic in more detail, > > (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512 > (2) echo 3 > /proc/sys/vm/drop_caches > (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024 > > For statement (1), it is frequently appending a file, so > ceph_aio_write() frequently updates the inode->i_size, > however, these updates did not immediately reflected to > object->store_limit_l. For statement (3), when we > start reading the second page at [4096, 8192), ceph find that the page > does not be cached in fscache, then it decides to write this page into > fscache, during this process in cachefiles_write_page(), it found that > object->store_limit_l < 4096 (page->index << 12), it causes panic. Does > it make sense? > > Cheers, > Li Wang > > > On 2013/12/27 6:51, Milosz Tanski wrote: >> >> Li, >> >> I looked at the patchset am I correct that this only happens when we >> enable caching in the write path? >> >> - Milosz >> >> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote: >>> >>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> >>> >>> The following scripts could easily panic the kernel, >>> >>> #!/bin/bash >>> mount -t ceph -o fsc MONADDR:/ cephfs >>> rm -rf cephfs/foo >>> dd if=/dev/zero of=cephfs/foo bs=8 count=512 >>> echo 3 > /proc/sys/vm/drop_caches >>> dd if=cephfs/foo of=/dev/null bs=8 count=1024 >>> >>> This is due to when writing a page into fscache, the code will >>> assert that the write position does not exceed the >>> object->store_limit_l, which is supposed to be equal to inode->i_size. >>> However, for current implementation, after file writing, the >>> object->store_limit_l is not synchronized with new >>> inode->i_size immediately, which introduces a race that if writing >>> a new page into fscache, will reach the ASSERT that write position >>> has exceeded the object->store_limit_l, and cause kernel panic. >>> This patch fixes it. >>> >>> Yunchuan Wen (3): >>> Ceph fscache: Add an interface to synchronize object store limit >>> Ceph fscache: Update object store limit after writing >>> Ceph fscache: Wait for completion of object initialization >>> >>> fs/ceph/cache.c | 1 + >>> fs/ceph/cache.h | 10 ++++++++++ >>> fs/ceph/file.c | 3 +++ >>> 3 files changed, 14 insertions(+) >>> >>> -- >>> 1.7.9.5 >>> >> >> >> > -- Milosz Tanski CTO 10 East 53rd Street, 37th floor New York, NY 10022 p: 646-253-9055 e: milosz@adfin.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race 2014-01-03 14:43 ` Milosz Tanski @ 2014-03-03 16:54 ` Milosz Tanski 2014-03-06 22:01 ` Sage Weil 0 siblings, 1 reply; 10+ messages in thread From: Milosz Tanski @ 2014-03-03 16:54 UTC (permalink / raw) To: Li Wang Cc: ceph-devel, Sage Weil, linux-fsdevel@vger.kernel.org, linux-kernel, Yunchuan Wen Hey guys, I'm terribly sorry but apparently this got stuck in my draft mailbox for 3 months. Since we've been running this on both our test / prod clusters I would say this is sufficiently tested. I've looked at the code and tested this for a week now on test cluster and it looks good. It does fix a real problem and I think we should push it to mainline. Thanks for fixing this Li! The reason I was not seeing this is that I had other fscache patches that were masking this problem :/ Thanks, - Milosz P.S: Sorry for the double mail, the first one was not sent as text. I apparently do not know how to use gmail. On Fri, Jan 3, 2014 at 9:43 AM, Milosz Tanski <milosz@adfin.com> wrote: > I'm going to look the patches and the issue in full detail. In the > meantime do you guys have the oops back trace. I have some other > fscache patches that haven't made it upstream yet that might have been > masking this issue for me. > > On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote: >> Hi Milosz, >> As far as I know, logically, currently fscache does not play >> as write cache for Ceph, except that there is a >> call to ceph_readpage_to_fscache() in ceph_writepage(), but that >> is nothing related to our test case. According to our observation, >> our test case never goes through ceph_writepage(), instead, it goes >> through ceph_writepages(). So in other words, I donot think this >> is related to caching in write path. >> May I try to explain the panic in more detail, >> >> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512 >> (2) echo 3 > /proc/sys/vm/drop_caches >> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024 >> >> For statement (1), it is frequently appending a file, so >> ceph_aio_write() frequently updates the inode->i_size, >> however, these updates did not immediately reflected to >> object->store_limit_l. For statement (3), when we >> start reading the second page at [4096, 8192), ceph find that the page >> does not be cached in fscache, then it decides to write this page into >> fscache, during this process in cachefiles_write_page(), it found that >> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does >> it make sense? >> >> Cheers, >> Li Wang >> >> >> On 2013/12/27 6:51, Milosz Tanski wrote: >>> >>> Li, >>> >>> I looked at the patchset am I correct that this only happens when we >>> enable caching in the write path? >>> >>> - Milosz >>> >>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote: >>>> >>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> >>>> >>>> The following scripts could easily panic the kernel, >>>> >>>> #!/bin/bash >>>> mount -t ceph -o fsc MONADDR:/ cephfs >>>> rm -rf cephfs/foo >>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512 >>>> echo 3 > /proc/sys/vm/drop_caches >>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024 >>>> >>>> This is due to when writing a page into fscache, the code will >>>> assert that the write position does not exceed the >>>> object->store_limit_l, which is supposed to be equal to inode->i_size. >>>> However, for current implementation, after file writing, the >>>> object->store_limit_l is not synchronized with new >>>> inode->i_size immediately, which introduces a race that if writing >>>> a new page into fscache, will reach the ASSERT that write position >>>> has exceeded the object->store_limit_l, and cause kernel panic. >>>> This patch fixes it. >>>> >>>> Yunchuan Wen (3): >>>> Ceph fscache: Add an interface to synchronize object store limit >>>> Ceph fscache: Update object store limit after writing >>>> Ceph fscache: Wait for completion of object initialization >>>> >>>> fs/ceph/cache.c | 1 + >>>> fs/ceph/cache.h | 10 ++++++++++ >>>> fs/ceph/file.c | 3 +++ >>>> 3 files changed, 14 insertions(+) >>>> >>>> -- >>>> 1.7.9.5 >>>> >>> >>> >>> >> > > > > -- > Milosz Tanski > CTO > 10 East 53rd Street, 37th floor > New York, NY 10022 > > p: 646-253-9055 > e: milosz@adfin.com -- Milosz Tanski CTO 10 East 53rd Street, 37th floor New York, NY 10022 p: 646-253-9055 e: milosz@adfin.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race 2014-03-03 16:54 ` Milosz Tanski @ 2014-03-06 22:01 ` Sage Weil 0 siblings, 0 replies; 10+ messages in thread From: Sage Weil @ 2014-03-06 22:01 UTC (permalink / raw) To: Milosz Tanski Cc: Li Wang, ceph-devel, linux-fsdevel@vger.kernel.org, linux-kernel, Yunchuan Wen Hi Milosz, Thanks, I've added this to the testing branch! sage On Mon, 3 Mar 2014, Milosz Tanski wrote: > Hey guys, I'm terribly sorry but apparently this got stuck in my draft > mailbox for 3 months. Since we've been running this on both our test / > prod clusters I would say this is sufficiently tested. > > I've looked at the code and tested this for a week now on test cluster > and it looks good. It does fix a real problem and I think we should > push it to mainline. Thanks for fixing this Li! The reason I was not > seeing this is that I had other fscache patches that were masking this > problem :/ > > Thanks, > - Milosz > > P.S: Sorry for the double mail, the first one was not sent as text. I > apparently do not know how to use gmail. > > On Fri, Jan 3, 2014 at 9:43 AM, Milosz Tanski <milosz@adfin.com> wrote: > > I'm going to look the patches and the issue in full detail. In the > > meantime do you guys have the oops back trace. I have some other > > fscache patches that haven't made it upstream yet that might have been > > masking this issue for me. > > > > On Fri, Dec 27, 2013 at 10:51 PM, Li Wang <liwang@ubuntukylin.com> wrote: > >> Hi Milosz, > >> As far as I know, logically, currently fscache does not play > >> as write cache for Ceph, except that there is a > >> call to ceph_readpage_to_fscache() in ceph_writepage(), but that > >> is nothing related to our test case. According to our observation, > >> our test case never goes through ceph_writepage(), instead, it goes > >> through ceph_writepages(). So in other words, I donot think this > >> is related to caching in write path. > >> May I try to explain the panic in more detail, > >> > >> (1) dd if=/dev/zero of=cephfs/foo bs=8 count=512 > >> (2) echo 3 > /proc/sys/vm/drop_caches > >> (3) dd if=cephfs/foo of=/dev/null bs=8 count=1024 > >> > >> For statement (1), it is frequently appending a file, so > >> ceph_aio_write() frequently updates the inode->i_size, > >> however, these updates did not immediately reflected to > >> object->store_limit_l. For statement (3), when we > >> start reading the second page at [4096, 8192), ceph find that the page > >> does not be cached in fscache, then it decides to write this page into > >> fscache, during this process in cachefiles_write_page(), it found that > >> object->store_limit_l < 4096 (page->index << 12), it causes panic. Does > >> it make sense? > >> > >> Cheers, > >> Li Wang > >> > >> > >> On 2013/12/27 6:51, Milosz Tanski wrote: > >>> > >>> Li, > >>> > >>> I looked at the patchset am I correct that this only happens when we > >>> enable caching in the write path? > >>> > >>> - Milosz > >>> > >>> On Thu, Dec 26, 2013 at 9:29 AM, Li Wang <liwang@ubuntukylin.com> wrote: > >>>> > >>>> From: Yunchuan Wen <yunchuanwen@ubuntukylin.com> > >>>> > >>>> The following scripts could easily panic the kernel, > >>>> > >>>> #!/bin/bash > >>>> mount -t ceph -o fsc MONADDR:/ cephfs > >>>> rm -rf cephfs/foo > >>>> dd if=/dev/zero of=cephfs/foo bs=8 count=512 > >>>> echo 3 > /proc/sys/vm/drop_caches > >>>> dd if=cephfs/foo of=/dev/null bs=8 count=1024 > >>>> > >>>> This is due to when writing a page into fscache, the code will > >>>> assert that the write position does not exceed the > >>>> object->store_limit_l, which is supposed to be equal to inode->i_size. > >>>> However, for current implementation, after file writing, the > >>>> object->store_limit_l is not synchronized with new > >>>> inode->i_size immediately, which introduces a race that if writing > >>>> a new page into fscache, will reach the ASSERT that write position > >>>> has exceeded the object->store_limit_l, and cause kernel panic. > >>>> This patch fixes it. > >>>> > >>>> Yunchuan Wen (3): > >>>> Ceph fscache: Add an interface to synchronize object store limit > >>>> Ceph fscache: Update object store limit after writing > >>>> Ceph fscache: Wait for completion of object initialization > >>>> > >>>> fs/ceph/cache.c | 1 + > >>>> fs/ceph/cache.h | 10 ++++++++++ > >>>> fs/ceph/file.c | 3 +++ > >>>> 3 files changed, 14 insertions(+) > >>>> > >>>> -- > >>>> 1.7.9.5 > >>>> > >>> > >>> > >>> > >> > > > > > > > > -- > > Milosz Tanski > > CTO > > 10 East 53rd Street, 37th floor > > New York, NY 10022 > > > > p: 646-253-9055 > > e: milosz@adfin.com > > > > -- > Milosz Tanski > CTO > 10 East 53rd Street, 37th floor > New York, NY 10022 > > p: 646-253-9055 > e: milosz@adfin.com > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-03-06 22:01 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-26 14:29 [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Li Wang 2013-12-26 14:29 ` [PATCH 1/3] Ceph fscache: Add an interface to synchronize object store limit Li Wang 2013-12-26 14:29 ` [PATCH 2/3] Ceph fscache: Update object store limit after file writing Li Wang 2013-12-26 14:29 ` [PATCH 3/3] Ceph fscache: Wait for completion of object initialization Li Wang 2013-12-26 22:51 ` [PATCH 0/3] Ceph fscache: Fix kernel panic due to a race Milosz Tanski 2013-12-27 1:41 ` Yunchuan Wen 2013-12-28 3:51 ` Li Wang 2014-01-03 14:43 ` Milosz Tanski 2014-03-03 16:54 ` Milosz Tanski 2014-03-06 22:01 ` Sage Weil
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).