Re: [Qemu-devel] [RFC] qid path collision issues in 9pfs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Greg Kurz <groug@kaod.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: Antonios Motakis <antonios.motakis@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Veaceslav Falico <veaceslav.falico@huawei.com>,
	Jani Kokkonen <Jani.Kokkonen@huawei.com>,
	Eduard Shishkin <Eduard.Shishkin@huawei.com>,
	"vfalico@gmail.com" <vfalico@gmail.com>,
	"Wangguoli (Andy)" <andy.wangguoli@huawei.com>,
	Jiangyiwen <jiangyiwen@huawei.com>,
	"zhangwei (CR)" <zhangwei555@huawei.com>
Subject: Re: [Qemu-devel] [RFC] qid path collision issues in 9pfs
Date: Wed, 24 Jan 2018 14:30:31 +0100	[thread overview]
Message-ID: <20180124143031.7fc9c90f@bahia.lan> (raw)
In-Reply-To: <20180120220349.GA20376@flamenco>

Thanks Emilio for providing these valuable suggestions ! :)

On Sat, 20 Jan 2018 17:03:49 -0500
"Emilio G. Cota" <cota@braap.org> wrote:

> On Fri, Jan 19, 2018 at 19:05:06 -0500, Emilio G. Cota wrote:
> > > > > On Fri, 12 Jan 2018 19:32:10 +0800
> > > > > Antonios Motakis <antonios.motakis@huawei.com> wrote:  
> > > > Since inodes are not completely random, and we usually have a handful of device IDs,
> > > > we get a much smaller number of entries to track in the hash table.
> > > > 
> > > > So what this would give:
> > > > (1)	Would be faster and take less memory than mapping the full inode_nr,devi_id
> > > > tuple to unique QID paths
> > > > (2)	Guaranteed not to run out of bits when inode numbers stay below the lowest
> > > > 54 bits and we have less than 1024 devices.
> > > > (3)	When we get beyond this this limit, there is a chance we run out of bits to
> > > > allocate new QID paths, but we can detect this and refuse to serve the offending
> > > > files instead of allowing a collision.
> > > > 
> > > > We could tweak the prefix size to match the scenarios that we consider more likely,
> > > > but I think close to 10-16 bits sounds reasonable enough. What do you think?  
> > 
> > Assuming assumption (2) is very likely to be true, I'd suggest
> > dropping the intermediate hash table altogether, and simply refuse
> > to work with any files that do not meet (2).
> > 
> > That said, the naive solution of having a large hash table with all entries
> > in it might be worth a shot.  
> 
> hmm but that would still take a lot of memory.
> 
> Given assumption (2), a good compromise would be the following,
> taking into account that the number of total gids is unlikely to
> reach even close to 2**64:
> - bit 63: 0/1 determines "fast" or "slow" encoding
> - 62-0:
>   - fast (trivial) encoding: when assumption (2) is met
>     - 62-53: device id (it fits because of (2))
>     - 52-0: inode (it fits because of (2))

And as pointed by Eduard, we may have to take the mount id into account
as well if we want to support the case where we have bind mounts in the
exported directory... My understanding is that mount ids are incremental
and reused when the associated fs gets unmounted: if we assume that the
host doesn't have more than 1024 mounts, we would need 10 bits to encode
it.

The fast encoding could be something like:

62-53: mount id
52-43: device id
42-0: inode

>   - slow path: assumption (2) isn't met. Then, assign incremental
>     IDs in the [0,2**63-1] range and track them in a hash table.
> 
> Choosing 10 or whatever else bits for the device id is of course TBD,
> as Antonios you pointed out.
> 

This is a best effort to have a fallback in QEMU. The right way to
address the issue would really be to extend the protocol to have
bigger qids (eg, 64 for inode, 32 for device and 32 for mount).

Cheers,

--
Greg

> Something like this will give you great performance and 0 memory
> overhead for the majority of cases if (2) indeed holds.
> 
> 		Emilio

next prev parent reply	other threads:[~2018-01-24 13:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12 11:32 [Qemu-devel] [RFC] qid path collision issues in 9pfs Antonios Motakis
2018-01-12 14:27 ` Daniel P. Berrange
2018-01-12 15:05   ` Veaceslav Falico
2018-01-12 17:00     ` Greg Kurz
2018-01-12 16:25   ` Greg Kurz
2018-01-12 16:14 ` Greg Kurz
2018-01-15  3:49   ` Antonios Motakis
2018-01-19 10:27     ` Greg Kurz
2018-01-19 15:52       ` Eduard Shishkin
2018-01-19 16:36         ` Greg Kurz
2018-01-19 16:37         ` Veaceslav Falico
2018-01-19 18:05           ` Greg Kurz
2018-01-19 18:51             ` Eduard Shishkin
2018-01-25 14:46             ` Veaceslav Falico
2018-01-25 16:08               ` Veaceslav Falico
2018-01-29 17:05                 ` Greg Kurz
2018-01-22 12:40           ` Eduard Shishkin
2018-01-24 15:09             ` Greg Kurz
2018-01-20  0:05       ` Emilio G. Cota
2018-01-20 22:03         ` Emilio G. Cota
2018-01-24 13:30           ` Greg Kurz [this message]
2018-01-24 16:40             ` Antonios Motakis
2018-01-24 18:05               ` Eduard Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180124143031.7fc9c90f@bahia.lan \
    --to=groug@kaod.org \
    --cc=Eduard.Shishkin@huawei.com \
    --cc=Jani.Kokkonen@huawei.com \
    --cc=andy.wangguoli@huawei.com \
    --cc=antonios.motakis@huawei.com \
    --cc=cota@braap.org \
    --cc=jiangyiwen@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=veaceslav.falico@huawei.com \
    --cc=vfalico@gmail.com \
    --cc=zhangwei555@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).