linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: David Howells <dhowells@redhat.com>
Cc: "idryomov@gmail.com" <idryomov@gmail.com>,
	Alex Markuze <amarkuze@redhat.com>,
	"slava@dubeyko.com" <slava@dubeyko.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: RE: [PATCH v2] ceph: Fix kernel crash in generic/397 test
Date: Wed, 29 Jan 2025 19:16:59 +0000	[thread overview]
Message-ID: <67fa0e9f45d0ca52a2f6a21c1fea1fd14e589847.camel@ibm.com> (raw)
In-Reply-To: <3669136.1738158062@warthog.procyon.org.uk>

On Wed, 2025-01-29 at 13:41 +0000, David Howells wrote:
> Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> 
> > > Do you want me to push a branch with my tracepoints that I'm using somewhere
> > > that you can grab it?
> > 
> > Sounds good! Maybe it can help me. :)
> 
> Take a look at:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/  
> 
> The "ceph-folio" branch has Willy's folio conversion patches plus a tracing
> patch plus a patch that's an unsuccessful attempt by me to fix the hang I was
> seeing.
> 
> The tracepoint I'm using (netfs_folio) takes a folio pointer, so it was easier
> to do it on top of Willy's patches.
> 
> The "netfs-crypto" branch are my patches to implement content crypto in
> netfslib.  I've tested them to some extent with AFS, but the test code I have
> in AFS only supports crypto of files where the file is an exact multiple of
> page size as AFS doesn't support any sort of xattr and so I can't store the
> real EOF pointer so simply.
> 
> The "ceph-iter" branch are my patches on top of a merge of those two
> (excluding the debugging patches) to try and convert ceph to fully using
> netfslib and to pass an iterator all the way down to the socket, aiming to
> reduce the number of data types to basically two.
> 

Great! Thanks a lot.

I believe I have been found all current issues in ceph_writepages_start().
So, I need to clean up the current messy state of the fix and the method itself.
Let me make this clean up, test the fix (probably, I could have some issues with
the fix yet), and share the patch finally.

As far as I can see, there are several issues in ceph_writepages_start():
(1) We have double lock issue (reason of the hang);
(2) We have issue with not correct place for folio_wait_writeback();
(3) The ceph_inc_osd_stopping_blocker() could not provide guarantee of waiting
finishing all dirty memory pages flush. It's racy now, as far as I can see. But
I need to check it more accurately by testing.
(4) The folio_batch with found dirty pages by filemap_get_folios_tag() is not
processed properly. And this is why some number of dirty pages simply never
processed and we still have dirty pages after unmount.
(5) The whole method of ceph_writepages_start() is huge and messy for my taste
and this is the reason of all of these issues (it's hard to follow the logic of
the method in this unreasonable complexity).

Thanks,
Slava.


  reply	other threads:[~2025-01-29 19:17 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17  3:50 [PATCH v2] ceph: Fix kernel crash in generic/397 test Viacheslav Dubeyko
2025-01-17 12:06 ` Ilya Dryomov
2025-01-17 19:18   ` Viacheslav Dubeyko
2025-01-17 21:32 ` David Howells
2025-01-17 22:08   ` Viacheslav Dubeyko
2025-01-20  9:33 ` David Howells
2025-01-20  9:47   ` Alex Markuze
2025-01-27 18:40     ` Viacheslav Dubeyko
2025-01-28 16:13     ` David Howells
2025-01-28 16:57     ` David Howells
2025-01-28 20:01     ` David Howells
2025-01-28 20:16       ` Viacheslav Dubeyko
2025-01-28 22:34       ` David Howells
2025-01-28 22:37         ` Viacheslav Dubeyko
2025-01-29 10:39           ` Alex Markuze
2025-01-29 13:42           ` David Howells
2025-01-29 13:54             ` Alex Markuze
2025-01-29 13:41         ` David Howells
2025-01-29 19:16           ` Viacheslav Dubeyko [this message]
2025-02-14 20:29 ` David Howells
2025-02-14 20:54   ` Viacheslav Dubeyko
2025-04-15 17:59 ` Viacheslav Dubeyko
2025-06-01 16:23   ` Ilya Dryomov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67fa0e9f45d0ca52a2f6a21c1fea1fd14e589847.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=amarkuze@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dhowells@redhat.com \
    --cc=idryomov@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=slava@dubeyko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).