From: Greg Kurz <groug@kaod.org>
To: Eric Blake <eblake@redhat.com>
Cc: antonios.motakis@huawei.com, qemu-devel@nongnu.org,
veaceslav.falico@huawei.com, Eduard.Shishkin@huawei.com,
andy.wangguoli@huawei.com, Jani.Kokkonen@huawei.com,
cota@braap.org, berrange@redhat.com
Subject: Re: [Qemu-devel] [PATCH 3/4] 9pfs: stat_to_qid: use device as input to qid.path
Date: Fri, 9 Feb 2018 18:57:49 +0100 [thread overview]
Message-ID: <20180209185749.7ecbd272@bahia.lan> (raw)
In-Reply-To: <3c73d37f-36f6-e487-7add-34c18310bda7@redhat.com>
On Fri, 9 Feb 2018 10:06:05 -0600
Eric Blake <eblake@redhat.com> wrote:
> On 02/09/2018 09:13 AM, Greg Kurz wrote:
> > On Thu, 8 Feb 2018 19:00:18 +0100
> > <antonios.motakis@huawei.com> wrote:
> >
> >> From: Antonios Motakis <antonios.motakis@huawei.com>
> >>
> >> To support multiple devices on the 9p share, and avoid
> >> qid path collisions we take the device id as input
> >> to generate a unique QID path. The lowest 48 bits of
> >> the path will be set equal to the file inode, and the
> >> top bits will be uniquely assigned based on the top
> >> 16 bits of the inode and the device id.
>
> >> +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
> >> + * to a unique QID path (64 bits). To avoid having to map and keep track
> >> + * of up to 2^64 objects, we map only the 16 highest bits of the inode plus
> >> + * the device id to the 16 highest bits of the QID path. The 48 lowest bits
> >> + * of the QID path equal to the lowest bits of the inode number.
> >> + *
> >> + * This takes advantage of the fact that inode number are usually not
> >> + * random but allocated sequentially, so we have fewer items to keep
> >> + * track of.
> >> + */
> >> +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf,
> >> + uint64_t *path)
> >> +{
> >> + QppEntry lookup = {
> >> + .dev = stbuf->st_dev,
> >> + .ino_prefix = (uint16_t) (stbuf->st_ino >> 48)
> >> + }, *val;
> >> + uint32_t hash = qpp_hash(lookup);
> >> +
> >> + val = qht_lookup(&pdu->s->qpp_table, qpp_lookup_func, &lookup, hash);
> >> +
> >> + if (!val) {
> >> + if (pdu->s->qp_prefix_next == 0) {
> >> + /* we ran out of prefixes */
> >> + return -ENFILE;
> >
> > Not sure this errno would make sense for guest syscalls that don't open
> > file descriptors... Maybe ENOENT ?
> >
> > Cc'ing Eric for insights.
>
> Actually, it makes sense to me:
>
> ENFILE 23 Too many open files in system
>
> You only get here if the hash table filled up, which means there are
> indeed too many open files (even if this syscall wasn't opening a file).
> In fact, ENFILE is usually associated with running into ulimit
> restrictions, and come to think of it, you are more likely to hit your
> ulimit than you are to run into a hash collision (so this code may be
> very hard to reach in practice, but if you do reach it, it behaves
> similarly to what you were more likely to hit in the first place).
>
> ENOENT implies the file doesn't exist - but here, the file exists but we
> can't open it because we're out of resources for tracking it.
>
Only the host knows that the file exists actually. If stat_to_qid() for
this path returns ENOENT, then the file shouldn't be visible in the
guest.
I say "shouldn't" instead of "isn't" because I now realize that
v9fs_do_readdir(), which is used to implement getdents() in the
guest, sends a degraded QID, hand-crafted without calling
stat_to_qid() at all... :-\
/*
* Fill up just the path field of qid because the client uses
* only that. To fill the entire qid structure we will have
* to stat each dirent found, which is expensive
*/
size = MIN(sizeof(dent->d_ino), sizeof(qid.path));
memcpy(&qid.path, &dent->d_ino, size);
/* Fill the other fields with dummy values */
qid.type = 0;
qid.version = 0;
Antonios, your patchset should probably also address this.
The problem is that the dirent we get from v9fs_co_readdir()
only has the inode number, so I guess we must build a dummy
struct stat with:
stbuf.st_ino = dent->d_ino
stbuf.st_dev = st_dev of the parent directory
The st_dev of the parent directory could be obtained in
v9fs_readdir() and passed to v9fs_do_readdir(). Another
solution could be to cache the QID in the V9fsFidState
of the parent directory when it is open.
Also, if we hit a collision while reading the directory, I'm
afraid the remaining entries won't be read at all. I'm not
sure this is really what we want.
> Also, POSIX permits returning specific errno codes that aren't otherwise
> listed for a syscall if the usual semantics of that errno code are
> indeed the reason for the failure (you should prefer to fail with errno
> codes documented by POSIX where possible, but POSIX does not limit you
> to just that set).
>
Ok, then ENFILE wouldn't be that bad in the end.
Thanks for your POSIX expertise :)
Cheers,
--
Greg
next prev parent reply other threads:[~2018-02-09 17:58 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-08 18:00 [Qemu-devel] [PATCH 0/4] QID path collision fix antonios.motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 1/4] 9pfs: V9fsQID: set type of version and path to unsigned antonios.motakis
2018-02-09 12:37 ` Greg Kurz
2018-02-16 10:19 ` Antonios Motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 2/4] 9pfs: check for file device to avoid QID path collisions antonios.motakis
2018-02-09 13:03 ` Greg Kurz
2018-02-16 10:19 ` Antonios Motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 3/4] 9pfs: stat_to_qid: use device as input to qid.path antonios.motakis
2018-02-09 15:13 ` Greg Kurz
2018-02-09 16:06 ` Eric Blake
2018-02-09 17:57 ` Greg Kurz [this message]
2018-02-16 10:20 ` Antonios Motakis
2018-02-16 11:21 ` Greg Kurz
2018-02-09 21:58 ` Emilio G. Cota
2018-02-16 10:19 ` Antonios Motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 4/4] 9pfs: stat_to_qid: implement slow path antonios.motakis
2018-02-09 15:22 ` Greg Kurz
2018-02-09 21:47 ` Emilio G. Cota
2018-02-16 10:28 ` Antonios Motakis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180209185749.7ecbd272@bahia.lan \
--to=groug@kaod.org \
--cc=Eduard.Shishkin@huawei.com \
--cc=Jani.Kokkonen@huawei.com \
--cc=andy.wangguoli@huawei.com \
--cc=antonios.motakis@huawei.com \
--cc=berrange@redhat.com \
--cc=cota@braap.org \
--cc=eblake@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=veaceslav.falico@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).