All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
	Gabriel Krisman Bertazi <krisman@suse.de>,
	viro@zeniv.linux.org.uk, linux-f2fs-devel@lists.sourceforge.net,
	ebiggers@kernel.org, linux-fsdevel@vger.kernel.org,
	jaegeuk@kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs
Date: Tue, 21 Nov 2023 00:12:15 -0500	[thread overview]
Message-ID: <20231121051215.GA335601@mit.edu> (raw)
In-Reply-To: <CAHk-=wh+o0Zkzn=mtF6nB1b-EEcod-y4+ZWtAe7=Mi1v7RjUpg@mail.gmail.com>

On Mon, Nov 20, 2023 at 07:03:13PM -0800, Linus Torvalds wrote:
> On Mon, 20 Nov 2023 at 18:29, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > It's a bit complicated, yes. But no, doing things one unicode
> > character at a time is just bad bad bad.
> 
> Put another way: the _point_ of UTF-8 is that ASCII is still ASCII.
> It's literally why UTF-8 doesn't suck.
> 
> So you can still compare ASCII strings as-is.
> 
> No, that doesn't help people who are really using other locales, and
> are actively using complicated characters.
> 
> But it very much does mean that you can compare "Bad" and "bad" and
> never ever look at any unicode translation ever.

Yeah, agreed, that would be a nice optimization.  However, in the
unfortunate case where (a) it's non-ASCII, and (b) the input string is
non-normalized and/or differs in case, we end up scanning some portion
of the two strings twice; once doing the strcmp, and once doing the
Unicode slow path.

That being said, given that even in the case where we're dealing with
non-ASCII strings, in the fairly common case where the program is
doing a readdir() followed by a open() or stat(), the filename will be
byte-identical and so a strcmp() will suffice.

So I agree that it's a nice optimization.  It'd be interesting how
much such an optimization would actually show up in various
benchmarks.  It'd have to be something that was really metadata-heavy,
or else the filenamea lookups would get drowned out.

   	    	      	      	    	- Ted

WARNING: multiple messages have this Message-ID (diff)
From: "Theodore Ts'o" <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>,
	Christian Brauner <brauner@kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net, ebiggers@kernel.org,
	viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
	jaegeuk@kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs
Date: Tue, 21 Nov 2023 00:12:15 -0500	[thread overview]
Message-ID: <20231121051215.GA335601@mit.edu> (raw)
In-Reply-To: <CAHk-=wh+o0Zkzn=mtF6nB1b-EEcod-y4+ZWtAe7=Mi1v7RjUpg@mail.gmail.com>

On Mon, Nov 20, 2023 at 07:03:13PM -0800, Linus Torvalds wrote:
> On Mon, 20 Nov 2023 at 18:29, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > It's a bit complicated, yes. But no, doing things one unicode
> > character at a time is just bad bad bad.
> 
> Put another way: the _point_ of UTF-8 is that ASCII is still ASCII.
> It's literally why UTF-8 doesn't suck.
> 
> So you can still compare ASCII strings as-is.
> 
> No, that doesn't help people who are really using other locales, and
> are actively using complicated characters.
> 
> But it very much does mean that you can compare "Bad" and "bad" and
> never ever look at any unicode translation ever.

Yeah, agreed, that would be a nice optimization.  However, in the
unfortunate case where (a) it's non-ASCII, and (b) the input string is
non-normalized and/or differs in case, we end up scanning some portion
of the two strings twice; once doing the strcmp, and once doing the
Unicode slow path.

That being said, given that even in the case where we're dealing with
non-ASCII strings, in the fairly common case where the program is
doing a readdir() followed by a open() or stat(), the filename will be
byte-identical and so a strcmp() will suffice.

So I agree that it's a nice optimization.  It'd be interesting how
much such an optimization would actually show up in various
benchmarks.  It'd have to be something that was really metadata-heavy,
or else the filenamea lookups would get drowned out.

   	    	      	      	    	- Ted


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2023-11-21  5:12 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16  5:07 [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs Gabriel Krisman Bertazi
2023-08-16  5:07 ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-16  5:07 ` [PATCH v6 1/9] ecryptfs: Reject casefold directory inodes Gabriel Krisman Bertazi
2023-08-16  5:07   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-16  5:07 ` [PATCH v6 2/9] 9p: Split ->weak_revalidate from ->revalidate Gabriel Krisman Bertazi
2023-08-16  5:07   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-16  5:07 ` [PATCH v6 3/9] fs: Expose name under lookup to d_revalidate hooks Gabriel Krisman Bertazi
2023-08-16  5:07   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-11-22 20:59   ` Al Viro
2023-11-22 20:59     ` [f2fs-dev] " Al Viro
2023-08-16  5:07 ` [PATCH v6 4/9] fs: Add DCACHE_CASEFOLDED_NAME flag Gabriel Krisman Bertazi
2023-08-16  5:07   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-11-22 20:32   ` Al Viro
2023-11-22 20:32     ` [f2fs-dev] " Al Viro
2023-08-16  5:07 ` [PATCH v6 5/9] libfs: Validate negative dentries in case-insensitive directories Gabriel Krisman Bertazi
2023-08-16  5:07   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-11-22 20:20   ` Al Viro
2023-11-22 20:20     ` [f2fs-dev] " Al Viro
2023-08-16  5:08 ` [PATCH v6 6/9] libfs: Chain encryption checks after case-insensitive revalidation Gabriel Krisman Bertazi
2023-08-16  5:08   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-16  5:08 ` [PATCH v6 7/9] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops Gabriel Krisman Bertazi
2023-08-16  5:08   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-16  5:08 ` [PATCH v6 8/9] ext4: Enable negative dentries on case-insensitive lookup Gabriel Krisman Bertazi
2023-08-16  5:08   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-16  5:08 ` [PATCH v6 9/9] f2fs: " Gabriel Krisman Bertazi
2023-08-16  5:08   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-17 17:06 ` [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs Eric Biggers
2023-08-17 17:06   ` [f2fs-dev] " Eric Biggers
2023-08-21 15:52   ` Christian Brauner
2023-08-21 15:52     ` [f2fs-dev] " Christian Brauner
2023-08-21 18:53     ` Gabriel Krisman Bertazi
2023-08-21 18:53       ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-08-22  9:03       ` Christian Brauner
2023-08-22  9:03         ` [f2fs-dev] " Christian Brauner
2023-10-24 22:20         ` Gabriel Krisman Bertazi
2023-10-24 22:20           ` Gabriel Krisman Bertazi
2023-10-25 13:32 ` Christian Brauner
2023-10-25 13:32   ` [f2fs-dev] " Christian Brauner
2023-10-25 15:19   ` Gabriel Krisman Bertazi
2023-10-25 15:19     ` Gabriel Krisman Bertazi
2023-11-19 23:11   ` [f2fs-dev] " Gabriel Krisman Bertazi
2023-11-19 23:11     ` Gabriel Krisman Bertazi
     [not found]   ` <655a9634.630a0220.d50d7.5063SMTPIN_ADDED_BROKEN@mx.google.com>
2023-11-20 15:06     ` Christian Brauner
2023-11-20 15:06       ` Christian Brauner
2023-11-20 16:59       ` Gabriel Krisman Bertazi
2023-11-20 16:59         ` Gabriel Krisman Bertazi
2023-11-20 18:07       ` Linus Torvalds
2023-11-20 18:07         ` Linus Torvalds
2023-11-21  2:02         ` Theodore Ts'o
2023-11-21  2:02           ` Theodore Ts'o
2023-11-21  2:29           ` Linus Torvalds
2023-11-21  2:29             ` Linus Torvalds
2023-11-21  3:03             ` Linus Torvalds
2023-11-21  3:03               ` Linus Torvalds
2023-11-21  5:12               ` Theodore Ts'o [this message]
2023-11-21  5:12                 ` Theodore Ts'o
2023-11-22 21:04                 ` Al Viro
2023-11-22 21:04                   ` Al Viro
2023-11-21  2:27         ` Al Viro
2023-11-21  2:27           ` Al Viro
2023-11-22 21:19           ` Al Viro
2023-11-22 21:19             ` Al Viro
2023-11-23  0:18             ` Linus Torvalds
2023-11-23  0:18               ` Linus Torvalds
2023-11-23  5:09               ` Al Viro
2023-11-23  5:09                 ` Al Viro
2023-11-23 15:57               ` Gabriel Krisman Bertazi
2023-11-23 15:57                 ` Gabriel Krisman Bertazi
2023-11-23 17:12                 ` Al Viro
2023-11-23 17:12                   ` Al Viro
2023-11-23 17:37                   ` Gabriel Krisman Bertazi
2023-11-23 17:37                     ` Gabriel Krisman Bertazi
2023-11-23 18:24                     ` Al Viro
2023-11-23 18:24                       ` Al Viro
2023-11-23 19:06                       ` Gabriel Krisman Bertazi
2023-11-23 19:06                         ` Gabriel Krisman Bertazi
2023-11-23 19:53                         ` Al Viro
2023-11-23 19:53                           ` Al Viro
2023-11-23 20:15                           ` Al Viro
2023-11-23 20:15                             ` Al Viro
2023-11-24 15:20                           ` Gabriel Krisman Bertazi
2023-11-24 15:20                             ` Gabriel Krisman Bertazi
2023-11-28  0:02                             ` Al Viro
2023-11-28  0:02                               ` Al Viro
2023-11-23 21:52                         ` Al Viro
2023-11-23 21:52                           ` Al Viro
2023-11-24 15:22                           ` Gabriel Krisman Bertazi
2023-11-24 15:22                             ` Gabriel Krisman Bertazi
2023-11-25 22:01                             ` Al Viro
2023-11-25 22:01                               ` Al Viro
2023-11-26  4:52                               ` Al Viro
2023-11-26  4:52                                 ` Al Viro
2023-11-26 18:41                                 ` fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-26 18:41                                   ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27  6:38                                   ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Al Viro
2023-11-27  6:38                                     ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27 15:47                                     ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Eric W. Biederman
2023-11-27 15:47                                       ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Eric W. Biederman
2023-11-27 16:01                                       ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Eric W. Biederman
2023-11-27 16:01                                         ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Eric W. Biederman
2023-11-27 17:25                                         ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Al Viro
2023-11-27 17:25                                           ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27 18:26                                           ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Al Viro
2023-11-27 18:26                                             ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27 16:03                                       ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Al Viro
2023-11-27 16:03                                         ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27 16:14                                         ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Al Viro
2023-11-27 16:14                                           ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27 18:19                                           ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Eric W. Biederman
2023-11-27 18:19                                             ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Eric W. Biederman
2023-11-27 18:43                                             ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Al Viro
2023-11-27 18:43                                               ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27 16:33                                     ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Christian Brauner
2023-11-27 16:33                                       ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Christian Brauner
2023-11-29  4:53                                     ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Al Viro
2023-11-29  4:53                                       ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-29 10:21                                       ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Christian Brauner
2023-11-29 10:21                                         ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Christian Brauner
2023-11-29 15:19                                       ` fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] " Eric W. Biederman
2023-11-29 15:19                                         ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Eric W. Biederman
     [not found]               ` <655f7665.df0a0220.58a21.e84fSMTPIN_ADDED_BROKEN@mx.google.com>
2023-11-23 16:41                 ` [f2fs-dev] " Linus Torvalds
2023-11-23 16:41                   ` Linus Torvalds
2023-11-23  1:12             ` Al Viro
2023-11-23  1:12               ` Al Viro
2023-11-23  1:22               ` Al Viro
2023-11-23  1:22                 ` Al Viro
2023-11-22  3:30         ` Gabriel Krisman Bertazi
2023-11-22  3:30           ` Gabriel Krisman Bertazi
2024-01-16 19:02 ` patchwork-bot+f2fs
2024-01-16 19:02   ` patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231121051215.GA335601@mit.edu \
    --to=tytso@mit.edu \
    --cc=brauner@kernel.org \
    --cc=ebiggers@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=krisman@suse.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.