qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: Eric Blake <eblake@redhat.com>
Cc: antonios.motakis@huawei.com, qemu-devel@nongnu.org,
	veaceslav.falico@huawei.com, Eduard.Shishkin@huawei.com,
	andy.wangguoli@huawei.com, Jani.Kokkonen@huawei.com,
	cota@braap.org, berrange@redhat.com
Subject: Re: [Qemu-devel] [PATCH 3/4] 9pfs: stat_to_qid: use device as input to qid.path
Date: Fri, 9 Feb 2018 18:57:49 +0100	[thread overview]
Message-ID: <20180209185749.7ecbd272@bahia.lan> (raw)
In-Reply-To: <3c73d37f-36f6-e487-7add-34c18310bda7@redhat.com>

On Fri, 9 Feb 2018 10:06:05 -0600
Eric Blake <eblake@redhat.com> wrote:

> On 02/09/2018 09:13 AM, Greg Kurz wrote:
> > On Thu, 8 Feb 2018 19:00:18 +0100
> > <antonios.motakis@huawei.com> wrote:
> >   
> >> From: Antonios Motakis <antonios.motakis@huawei.com>
> >>
> >> To support multiple devices on the 9p share, and avoid
> >> qid path collisions we take the device id as input
> >> to generate a unique QID path. The lowest 48 bits of
> >> the path will be set equal to the file inode, and the
> >> top bits will be uniquely assigned based on the top
> >> 16 bits of the inode and the device id.  
> 
> >> +/* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
> >> + * to a unique QID path (64 bits). To avoid having to map and keep track
> >> + * of up to 2^64 objects, we map only the 16 highest bits of the inode plus
> >> + * the device id to the 16 highest bits of the QID path. The 48 lowest bits
> >> + * of the QID path equal to the lowest bits of the inode number.
> >> + *
> >> + * This takes advantage of the fact that inode number are usually not
> >> + * random but allocated sequentially, so we have fewer items to keep
> >> + * track of.
> >> + */
> >> +static int qid_path_prefixmap(V9fsPDU *pdu, const struct stat *stbuf,
> >> +                                uint64_t *path)
> >> +{
> >> +    QppEntry lookup = {
> >> +        .dev = stbuf->st_dev,
> >> +        .ino_prefix = (uint16_t) (stbuf->st_ino >> 48)
> >> +    }, *val;
> >> +    uint32_t hash = qpp_hash(lookup);
> >> +
> >> +    val = qht_lookup(&pdu->s->qpp_table, qpp_lookup_func, &lookup, hash);
> >> +
> >> +    if (!val) {
> >> +        if (pdu->s->qp_prefix_next == 0) {
> >> +            /* we ran out of prefixes */
> >> +            return -ENFILE;  
> > 
> > Not sure this errno would make sense for guest syscalls that don't open
> > file descriptors... Maybe ENOENT ?
> > 
> > Cc'ing Eric for insights.  
> 
> Actually, it makes sense to me:
> 
> ENFILE 23 Too many open files in system
> 
> You only get here if the hash table filled up, which means there are 
> indeed too many open files (even if this syscall wasn't opening a file). 
>   In fact, ENFILE is usually associated with running into ulimit 
> restrictions, and come to think of it, you are more likely to hit your 
> ulimit than you are to run into a hash collision (so this code may be 
> very hard to reach in practice, but if you do reach it, it behaves 
> similarly to what you were more likely to hit in the first place).
> 
> ENOENT implies the file doesn't exist - but here, the file exists but we 
> can't open it because we're out of resources for tracking it.
> 

Only the host knows that the file exists actually. If stat_to_qid() for
this path returns ENOENT, then the file shouldn't be visible in the
guest.

I say "shouldn't" instead of "isn't" because I now realize that
v9fs_do_readdir(), which is used to implement getdents() in the
guest, sends a degraded QID, hand-crafted without calling
stat_to_qid() at all... :-\

        /*
         * Fill up just the path field of qid because the client uses
         * only that. To fill the entire qid structure we will have
         * to stat each dirent found, which is expensive
         */
        size = MIN(sizeof(dent->d_ino), sizeof(qid.path));
        memcpy(&qid.path, &dent->d_ino, size);
        /* Fill the other fields with dummy values */
        qid.type = 0;
        qid.version = 0;


Antonios, your patchset should probably also address this.

The problem is that the dirent we get from v9fs_co_readdir()
only has the inode number, so I guess we must build a dummy
struct stat with:

    stbuf.st_ino = dent->d_ino
    stbuf.st_dev = st_dev of the parent directory

The st_dev of the parent directory could be obtained in
v9fs_readdir() and passed to v9fs_do_readdir(). Another
solution could be to cache the QID in the V9fsFidState
of the parent directory when it is open.

Also, if we hit a collision while reading the directory, I'm
afraid the remaining entries won't be read at all. I'm not
sure this is really what we want.

> Also, POSIX permits returning specific errno codes that aren't otherwise 
> listed for a syscall if the usual semantics of that errno code are 
> indeed the reason for the failure (you should prefer to fail with errno 
> codes documented by POSIX where possible, but POSIX does not limit you 
> to just that set).
> 

Ok, then ENFILE wouldn't be that bad in the end. 

Thanks for your POSIX expertise :)

Cheers,

--
Greg

  reply	other threads:[~2018-02-09 17:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-08 18:00 [Qemu-devel] [PATCH 0/4] QID path collision fix antonios.motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 1/4] 9pfs: V9fsQID: set type of version and path to unsigned antonios.motakis
2018-02-09 12:37   ` Greg Kurz
2018-02-16 10:19     ` Antonios Motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 2/4] 9pfs: check for file device to avoid QID path collisions antonios.motakis
2018-02-09 13:03   ` Greg Kurz
2018-02-16 10:19     ` Antonios Motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 3/4] 9pfs: stat_to_qid: use device as input to qid.path antonios.motakis
2018-02-09 15:13   ` Greg Kurz
2018-02-09 16:06     ` Eric Blake
2018-02-09 17:57       ` Greg Kurz [this message]
2018-02-16 10:20         ` Antonios Motakis
2018-02-16 11:21           ` Greg Kurz
2018-02-09 21:58     ` Emilio G. Cota
2018-02-16 10:19       ` Antonios Motakis
2018-02-08 18:00 ` [Qemu-devel] [PATCH 4/4] 9pfs: stat_to_qid: implement slow path antonios.motakis
2018-02-09 15:22   ` Greg Kurz
2018-02-09 21:47     ` Emilio G. Cota
2018-02-16 10:28       ` Antonios Motakis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180209185749.7ecbd272@bahia.lan \
    --to=groug@kaod.org \
    --cc=Eduard.Shishkin@huawei.com \
    --cc=Jani.Kokkonen@huawei.com \
    --cc=andy.wangguoli@huawei.com \
    --cc=antonios.motakis@huawei.com \
    --cc=berrange@redhat.com \
    --cc=cota@braap.org \
    --cc=eblake@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=veaceslav.falico@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).