Git development
 help / color / mirror / Atom feed
From: Luben Tuikov <ltuikov@yahoo.com>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] gitweb.cgi: Use File::MMagic; "a=blob" action knows the blob/file type
Date: Sat, 8 Jul 2006 18:17:56 -0700 (PDT)	[thread overview]
Message-ID: <20060709011756.92415.qmail@web31815.mail.mud.yahoo.com> (raw)
In-Reply-To: <7vzmfksjpe.fsf@assigned-by-dhcp.cox.net>

--- Junio C Hamano <junkio@cox.net> wrote:
> Luben Tuikov <ltuikov@yahoo.com> writes:
> 
> > Use File::MMagic to determine the MIME type of a blob/file.
> > The variable magic_mime_file holds the location of the
> > "magic.mime" file, usually "/usr/share/file/magic.mime".
> > If not defined, the magic numbers internally stored in the
> > File::MMagic module are used.
> 
> I am sorry to ask you this, but would you mind redoing this
> patch without File::MMagic bits?

Yeah, no problem, will do.

> I think giving "a=blob" an
> ability to automatically switch to git_blob_plain is a good
> addition (as is your earlier patch to give a direct link to
> reach blob_plain from the list), so let's have that part in
> first.

Ok.

> I haven't applied your earlier one but it will appear in
> "next" shortly.

Ok.

> Existing filename based mimetypes_guess should be a lot cheaper
> than exploding a blob and feeding it to File::MMagic.  I was
> hoping File::MMagic to be used when we cannot guess the content
> type that way (i.e. when mimetypes_guess returns undef or
> application/octet-stream).

The MIME guessing is used in git_blob_plain(), where the blob
is already exploded, since a file descriptor is passed to the MIME
guessing routines.

Initially I tried using the already opened file descriptor to
the blob, but File::MMagic v1.27 always returns "text/plain"
on that.  I was astounded by this and a 5 line perl script
using that same module confirmed my suspicion that the module
or "seek" is broken in perl or in FileHandle.  I.e.
checktype_filename() succeeds, but checktype_filehandle()
always returns "text/plain".

Had simply reading the file descriptor succeeded, then we'd not
need to explode blobs in $git_temp.

The other reason I decided to completely go with File::MMagic is
that if I'm going to use it to decide the mime type after
the default method didn't succeed, why not forget the default
method and just use File::MMagic in general -- seemed like a logical
simplification.

> Since the repository owner can correct misidentification by the
> standard /etc/mime.types by supplying a custom per-repository
> $mimetypes_file (modulo that the current implementation of
> mimetype_guess_file does not allow it if the file does not have
> an extension that is specific enough), File::MMagic might be an
> overkill, especially if used in the way this patch does.  To
> allow finer grained differentiation that cannot be done with
> file extensions alone (e.g. some files may have .dat extension
> but one can be VCD mpeg wrapped in RIFF, and another can be a
> Z-machine story file), it might be simpler to allow the
> repository owner to specify full $file_name for such an ambiguous
> file in their custom $mimetypes_file, and try to match it in
> mimetype_guess_file sub.  That way we may not even need to use
> File::MMagic.

This is true, but I wonder how many people are going to go and
create their own mime.type files.  Most of the time people would
do minimal gitweb.cgi changes not even specifying $mimetypes_file
nor even $magic_mime_file.

> Are there cases where only $hash is given without $file_name?
> If so we may need to fall back on File::MMagic in such a case
> after all, but get_blob_mimetype sub copies the whole blob to a
> temporary file to work around a problem with version 1.27 you
> state in the comment -- this is way too much (and nobody seems
> to clean up the tempfile).  Looking at magic.mime, I suspect we
> might be able to get away with the first 4k bytes or so at most
> (the largest offset except iso9660 image is "Biff5" appearing at
> 2114 to signal an Excel spreadsheet).

I was hoping that tmpwatch(8) would clean up the blobs in Linux/UNIX.
But we can also delete them after the lookup.  Anothing thing is that
getting a blob isn't that often and when it happens, after the MIME
lookup it is already in the pagecache.

    Luben

      reply	other threads:[~2006-07-09  1:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-08  4:10 [PATCH] gitweb.cgi: Use File::MMagic; "a=blob" action knows the blob/file type Luben Tuikov
2006-07-08  6:18 ` Junio C Hamano
2006-07-09  1:17   ` Luben Tuikov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060709011756.92415.qmail@web31815.mail.mud.yahoo.com \
    --to=ltuikov@yahoo.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox