From: Luben Tuikov <ltuikov@yahoo.com>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] gitweb.cgi: Use File::MMagic; "a=blob" action knows the blob/file type
Date: Sat, 8 Jul 2006 18:17:56 -0700 (PDT) [thread overview]
Message-ID: <20060709011756.92415.qmail@web31815.mail.mud.yahoo.com> (raw)
In-Reply-To: <7vzmfksjpe.fsf@assigned-by-dhcp.cox.net>
--- Junio C Hamano <junkio@cox.net> wrote:
> Luben Tuikov <ltuikov@yahoo.com> writes:
>
> > Use File::MMagic to determine the MIME type of a blob/file.
> > The variable magic_mime_file holds the location of the
> > "magic.mime" file, usually "/usr/share/file/magic.mime".
> > If not defined, the magic numbers internally stored in the
> > File::MMagic module are used.
>
> I am sorry to ask you this, but would you mind redoing this
> patch without File::MMagic bits?
Yeah, no problem, will do.
> I think giving "a=blob" an
> ability to automatically switch to git_blob_plain is a good
> addition (as is your earlier patch to give a direct link to
> reach blob_plain from the list), so let's have that part in
> first.
Ok.
> I haven't applied your earlier one but it will appear in
> "next" shortly.
Ok.
> Existing filename based mimetypes_guess should be a lot cheaper
> than exploding a blob and feeding it to File::MMagic. I was
> hoping File::MMagic to be used when we cannot guess the content
> type that way (i.e. when mimetypes_guess returns undef or
> application/octet-stream).
The MIME guessing is used in git_blob_plain(), where the blob
is already exploded, since a file descriptor is passed to the MIME
guessing routines.
Initially I tried using the already opened file descriptor to
the blob, but File::MMagic v1.27 always returns "text/plain"
on that. I was astounded by this and a 5 line perl script
using that same module confirmed my suspicion that the module
or "seek" is broken in perl or in FileHandle. I.e.
checktype_filename() succeeds, but checktype_filehandle()
always returns "text/plain".
Had simply reading the file descriptor succeeded, then we'd not
need to explode blobs in $git_temp.
The other reason I decided to completely go with File::MMagic is
that if I'm going to use it to decide the mime type after
the default method didn't succeed, why not forget the default
method and just use File::MMagic in general -- seemed like a logical
simplification.
> Since the repository owner can correct misidentification by the
> standard /etc/mime.types by supplying a custom per-repository
> $mimetypes_file (modulo that the current implementation of
> mimetype_guess_file does not allow it if the file does not have
> an extension that is specific enough), File::MMagic might be an
> overkill, especially if used in the way this patch does. To
> allow finer grained differentiation that cannot be done with
> file extensions alone (e.g. some files may have .dat extension
> but one can be VCD mpeg wrapped in RIFF, and another can be a
> Z-machine story file), it might be simpler to allow the
> repository owner to specify full $file_name for such an ambiguous
> file in their custom $mimetypes_file, and try to match it in
> mimetype_guess_file sub. That way we may not even need to use
> File::MMagic.
This is true, but I wonder how many people are going to go and
create their own mime.type files. Most of the time people would
do minimal gitweb.cgi changes not even specifying $mimetypes_file
nor even $magic_mime_file.
> Are there cases where only $hash is given without $file_name?
> If so we may need to fall back on File::MMagic in such a case
> after all, but get_blob_mimetype sub copies the whole blob to a
> temporary file to work around a problem with version 1.27 you
> state in the comment -- this is way too much (and nobody seems
> to clean up the tempfile). Looking at magic.mime, I suspect we
> might be able to get away with the first 4k bytes or so at most
> (the largest offset except iso9660 image is "Biff5" appearing at
> 2114 to signal an Excel spreadsheet).
I was hoping that tmpwatch(8) would clean up the blobs in Linux/UNIX.
But we can also delete them after the lookup. Anothing thing is that
getting a blob isn't that often and when it happens, after the MIME
lookup it is already in the pagecache.
Luben
prev parent reply other threads:[~2006-07-09 1:18 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-08 4:10 [PATCH] gitweb.cgi: Use File::MMagic; "a=blob" action knows the blob/file type Luben Tuikov
2006-07-08 6:18 ` Junio C Hamano
2006-07-09 1:17 ` Luben Tuikov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060709011756.92415.qmail@web31815.mail.mud.yahoo.com \
--to=ltuikov@yahoo.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox