All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tommi Virtanen <tommi.virtanen@dreamhost.com>
To: Gregory Farnum <gregory.farnum@dreamhost.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: long object names
Date: Thu, 21 Apr 2011 15:00:44 -0700	[thread overview]
Message-ID: <20110421220044.GB26545@dreamer> (raw)
In-Reply-To: <99672323C62444DFA181A0659E4D37FC@gmail.com>

On Thu, Apr 21, 2011 at 01:03:57PM -0700, Gregory Farnum wrote:
> I like what Yehuda has here for its relative simplicity

It's far from simple.

Let's look at the unlink path:


static int lfn_unlink(const char *pathname)
{
  const char *filename;
  char short_fn[PATH_MAX];
  char short_fn2[PATH_MAX];
  int r, i, exist, err;
  int path_len;
  int is_lfn;

** helper function to split the path to dir and file, figure out a
** short name for this longname, count the lenght of the directory
** part of the path and other things; loops through the candidates,
** comparing against the xattr
  r = lfn_get(pathname, short_fn, sizeof(short_fn), &filename, &exist, &is_lfn);
  if (r < 0)
    return r;
** if the filename  wasn't actually too long, take the easy way out
  if (!is_lfn)
    return unlink(pathname);
  if (!exist) {
    errno = ENOENT;
    return -1;
  }

** actual file unlink here
  err = unlink(short_fn);
  if (err < 0)
    return err;

** and then, rename all the collisions, one by one, because they have
** a sequential number in them!
  path_len = filename - pathname;
  memcpy(short_fn2, pathname, path_len);

** this loop finds the highest sequential number in this hash
** collision bucket, saves it in i
  for (i = r + 1; ; i++) {
    struct stat buf;
    int ret;

    build_filename(&short_fn2[path_len], sizeof(short_fn2) - path_len, filename, i);
    ret = stat(short_fn2, &buf);
    if (ret < 0) {
      if (i == r + 1)
        return 0;

      break;
    }
  }

** and then the highest seq number munged filename gets renamed to
** fill the gap we left behind
  build_filename(&short_fn2[path_len], sizeof(short_fn2) - path_len, filename, i - 1);
  generic_dout(0) << "renaming " << short_fn2 << " -> " << short_fn << dendl;

  if (rename(short_fn2, short_fn) < 0) {
    generic_derr << "ERROR: could not rename " << short_fn2 << " -> " << short_fn << dendl;
    assert(0);
  }

  return 0;
}


Now, imagine a colliding file create between the stat and the rename
-> boom. This is not the only race in there.

The underlying problem is that you're constructing an atomic operation
out of multiple underlying operations, and you're not obsessively
careful about ordering them. Once you get obsessive about ordering
them, the extra directory my scheme creates will seem very cheap.

If you say that's not relevant because of some locking that the OSD
does, then 1) you're building a lot of assumptions on the locking
never changing 2) I can construct similar bugs with a single actor,
with a crash at the wrong moment.

Simple code makes Tv happy. You don't want an unhappy Tv all up in
your codebase.

-- 
:(){ :|:&};:

  parent reply	other threads:[~2011-04-21 22:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-21  4:42 long object names Sage Weil
2011-04-21 18:56 ` Tommi Virtanen
2011-04-21 19:27   ` Colin McCabe
2011-04-21 19:32     ` Tommi Virtanen
2011-04-21 20:03       ` Gregory Farnum
2011-04-21 21:09         ` Colin McCabe
2011-04-21 21:23           ` Yehuda Sadeh Weinraub
2011-04-21 21:44             ` Colin McCabe
2011-04-21 21:54               ` Yehuda Sadeh Weinraub
2011-04-21 22:01                 ` Colin McCabe
2011-04-21 22:58                   ` Zenon Panoussis
2011-04-21 23:04                     ` Yehuda Sadeh Weinraub
2011-04-21 22:00         ` Tommi Virtanen [this message]
2011-04-21 22:23           ` Gregory Farnum
2011-04-21 22:25           ` Yehuda Sadeh Weinraub
2011-04-21 23:07             ` Tommi Virtanen
2011-04-22 15:44               ` Sage Weil
2011-04-22 16:34                 ` Tommi Virtanen
2011-04-22 17:36                 ` Colin McCabe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110421220044.GB26545@dreamer \
    --to=tommi.virtanen@dreamhost.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gregory.farnum@dreamhost.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.