From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: David Howells <dhowells@redhat.com>
Cc: Xiubo Li <xiubli@redhat.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
Alex Markuze <amarkuze@redhat.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"jlayton@kernel.org" <jlayton@kernel.org>,
"idryomov@gmail.com" <idryomov@gmail.com>,
"netfs@lists.linux.dev" <netfs@lists.linux.dev>
Subject: RE: Ceph and Netfslib
Date: Mon, 23 Dec 2024 23:13:47 +0000 [thread overview]
Message-ID: <690826facef0310d7f44cf522deeed979b6ff287.camel@ibm.com> (raw)
In-Reply-To: <3992139.1734551286@warthog.procyon.org.uk>
On Wed, 2024-12-18 at 19:48 +0000, David Howells wrote:
> Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> > >
<skipped>
> > >
> >
> > >
> > > Thirdly, I was under the impression that, for any given
> > > page/folio,
> > > only the
> > > head snapshot could be altered - and that any older snapshot must
> > > be
> > > flushed
> > > before we could allow that.
> > >
> > >
As far as I can see, ceph_dirty_folio() attaches [1] to folio->private
a pointer on struct ceph_snap_context. So, it sounds that folio could
not have any associated snapshot context until it will be marked as
dirty.
Oppositely, ceph_invalidate_folio() detaches [2] the ceph_snap_context
from folio->private, or writepage_nounlock() detaches [3] the
ceph_snap_context from page, or writepages_finish() detaches [4] the
ceph_snap_context from page. So, technically speaking, folio/page
should have the associated snapshot context only in dirty state.
The struct ceph_snap_context represents a set of existing snapshots:
struct ceph_snap_context {
refcount_t nref;
u64 seq;
u32 num_snaps;
u64 snaps[];
};
The snapshot context is prepared by build_snap_context() and the set of
existing snapshots include: (1) parent inode's snapshots [5], (2)
inode's snapshots [6], (3) prior parent snapshots [7].
* When a snapshot is taken (that is, when the client receives
* notification that a snapshot was taken), each inode with caps and
* with dirty pages (dirty pages implies there is a cap) gets a new
* ceph_cap_snap in the i_cap_snaps list (which is sorted in ascending
* order, new snaps go to the tail).
So, ceph_dirty_folio() takes the latest ceph_cap_snap:
if (__ceph_have_pending_cap_snap(ci)) {
struct ceph_cap_snap *capsnap =
list_last_entry(&ci->i_cap_snaps,
struct ceph_cap_snap,
ci_item);
snapc = ceph_get_snap_context(capsnap->context);
capsnap->dirty_pages++;
} else {
BUG_ON(!ci->i_head_snapc);
snapc = ceph_get_snap_context(ci->i_head_snapc);
++ci->i_wrbuffer_ref_head;
}
* On writeback, we must submit writes to the osd IN SNAP ORDER. So,
* we look for the first capsnap in i_cap_snaps and write out pages in
* that snap context _only_. Then we move on to the next capsnap,
* eventually reaching the "live" or "head" context (i.e., pages that
* are not yet snapped) and are writing the most recently dirtied
* pages
For example, writepage_nounlock() executes such logic [8]:
oldest = get_oldest_context(inode, &ceph_wbc, snapc);
if (snapc->seq > oldest->seq) {
doutc(cl, "%llx.%llx page %p snapc %p not writeable -
noop\n",
ceph_vinop(inode), page, snapc);
/* we should only noop if called by kswapd */
WARN_ON(!(current->flags & PF_MEMALLOC));
ceph_put_snap_context(oldest);
redirty_page_for_writepage(wbc, page);
return 0;
}
ceph_put_snap_context(oldest);
So, we should flush all dirty pages/folios in the snapshots order. But
I am not sure that we modify a snapshot by making pages/folios dirty. I
think we simply adding capsnap in the list and making a new snapshot
context in the case of new snapshot creation.
> > > Fourthly, the ceph_snap_context struct holds a list of snaps.
> > > Does
> > > it really
> > > need to, or is just the most recent snap for which the folio
> > > holds
> > > changes
> > > sufficient?
> > >
> >
> >
As far as I can see, the main goal of ceph_snap_context is the
accounting of all snapshots that has particular inode and all its
parents. And all these guys could have dirty pages. So, the
responsibility of of ceph_snap_context is to flush dirty folios/pages
with the goal to flush it in snapshots order for all inodes in the
hierarchy.
I could miss some details. :) But I hope the answer could help.
Thanks,
Slava.
[1]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L127
[2]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L157
[3]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L800
[4]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L911
[5]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L391
[6]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L399
[7]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/snap.c#L402
[8]
https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/addr.c#L695
next prev parent reply other threads:[~2024-12-23 23:14 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-18 18:33 Ceph and Netfslib David Howells
2024-12-18 18:47 ` Patrick Donnelly
2024-12-18 19:36 ` David Howells
2024-12-18 19:06 ` Viacheslav Dubeyko
2024-12-18 19:48 ` David Howells
2024-12-23 23:13 ` Viacheslav Dubeyko [this message]
2024-12-24 12:56 ` Matthew Wilcox
2024-12-24 21:52 ` Viacheslav Dubeyko
2025-01-09 0:53 ` Viacheslav Dubeyko
2024-12-18 19:43 ` David Howells
2025-03-05 16:34 ` Is EOLDSNAPC actually generated? -- " David Howells
2025-03-05 19:23 ` Alex Markuze
2025-03-05 20:22 ` David Howells
2025-03-06 13:19 ` Alex Markuze
2025-03-06 13:48 ` David Howells
2025-03-06 13:55 ` Alex Markuze
2025-03-06 13:58 ` Venky Shankar
2025-03-06 14:13 ` David Howells
2025-03-06 14:23 ` Alex Markuze
2025-03-06 16:21 ` Gregory Farnum
2025-03-06 17:18 ` Alex Markuze
2025-03-06 15:55 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=690826facef0310d7f44cf522deeed979b6ff287.camel@ibm.com \
--to=slava.dubeyko@ibm.com \
--cc=amarkuze@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=dhowells@redhat.com \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=netfs@lists.linux.dev \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox