git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* FW: Windows. Git, and Dedupe
@ 2013-03-18 21:20 Josh Rowe
  2013-03-19 21:08 ` René Scharfe
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Rowe @ 2013-03-18 21:20 UTC (permalink / raw)
  To: git@vger.kernel.org

Windows probably isn’t the most popular platform for Git developers ☺, but here goes…

On Windows with an NTFS volume with Deduplication enabled, Git believes that deduplicated files are symlinks.  It then fails to be able to do anything with the file.  This can be repro-ed by creating an NTFS volume with dedup, creating some duplicate files, verifying that a few files are deduped, and trying to add and commit the files via git.

Jmr


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FW: Windows. Git, and Dedupe
  2013-03-18 21:20 FW: Windows. Git, and Dedupe Josh Rowe
@ 2013-03-19 21:08 ` René Scharfe
  2013-03-19 21:36   ` Josh Rowe
  0 siblings, 1 reply; 6+ messages in thread
From: René Scharfe @ 2013-03-19 21:08 UTC (permalink / raw)
  To: Josh Rowe; +Cc: git@vger.kernel.org, msysgit

Am 18.03.2013 22:20, schrieb Josh Rowe:
> On Windows with an NTFS volume with Deduplication enabled, Git
> believes that deduplicated files are symlinks.  It then fails to be
> able to do anything with the file.  This can be repro-ed by creating
> an NTFS volume with dedup, creating some duplicate files, verifying
> that a few files are deduped, and trying to add and commit the files
> via git.

Both Single Instance Storage[1] and Data Deduplication[2] (introduced
with Windows Server 2012) seem to be server-only features.  How about
keeping regular git repositories with checked-out files on client
disks and use the server only for bare repositories (without working
tree)?

When I tried to add a symbolic link created with mklink on Windows 8,
the mingw version of git refused because readlink(2) is not
supported.  This seems to be sufficient to reproduce the issue.

I couldn't test the Cygwin version, though, because http://cygwin.com
doesn't respond at the moment.

But a working readlink(2) wouldn't help anyway, I guess.  I imagine
that the reparse points used for deduplication point into a magic
block store which performs garbage collection of content that is no
longer referenced -- which probably means that a recreated "symlink"
may point to blocks that have been deleted in the meantime.

Perhaps you need a way to ask git to always follow symlinks instead
of trying to store their target specification.

René


[1] http://technet.microsoft.com/en-us/library/dd573308%28v=ws.10%29.aspx
[2] http://msdn.microsoft.com/en-us/library/windows/desktop/hh769303%28v=vs.85%29.aspx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: FW: Windows. Git, and Dedupe
  2013-03-19 21:08 ` René Scharfe
@ 2013-03-19 21:36   ` Josh Rowe
  2013-03-20 19:54     ` René Scharfe
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Rowe @ 2013-03-19 21:36 UTC (permalink / raw)
  To: René Scharfe; +Cc: git@vger.kernel.org, msysgit@googlegroups.com

Yes, Dedup is in fact a Server-only feature.  However, there are lots of people using the Server SKU as development workstations (especially here at Microsoft <g>).  There are also some sysadmins that I know of who use git and download sysadmin scripts via git to Servers.  Finally, I would hazard a guess that it's possible to mount an NTFS filesystem containing deduped files from a Server machine onto a client SKU and access those files.  (I'm not on the NTFS team, and haven't tried it.)  So I think there are good reasons to support reparse points on Windows.  

The reparse point could be decoded as being a non-symlink reparse item using; in those cases, treating the file as an "ordinary" file would be appropriate.

For example, see the following.  The reparse tag value for symlinks is IO_REPARSE_TAG_SYMLINK (0xa000000c) and for deduped files is (IO_REPARSE_TAG_DEDUP) 0x80000013.  The value can be discovered from the information at [1].  

I admin to not having looked at the git code nor being familiar with mingw.  Are native Win32 calls supported in the git codebase?

Jmr


[1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365740(v=vs.85).aspx


PS I:\temp> cmd /c mklink x y
symbolic link created for x <<===>> y
PS I:\temp> fsutil reparsepoint query x
Reparse Tag Value : 0xa000000c
Tag value: Microsoft
Tag value: Name Surrogate
Tag value: Symbolic Link

Reparse Data Length: 0x00000010
Reparse Data:
0000:  02 00 02 00 00 00 02 00  01 00 00 00 79 00 79 00  ............y.y.
PS I:\temp> fsutil reparsepoint query x.txt
Reparse Tag Value : 0x80000013
Tag value: Microsoft

Reparse Data Length: 0x0000007c
Reparse Data:
0000:  01 02 7c 00 00 00 00 00  66 9c 1a 01 00 00 00 00  ..|.....f.......
0010:  00 00 01 00 00 00 00 00  cb eb c5 00 6a 97 63 4d  ............j.cM
0020:  97 9c 13 0c 41 8e ed 8b  40 00 40 00 40 00 00 00  ....A...@.@.@...
0030:  d3 b9 a8 d4 e4 c6 cd 01  55 ca 02 00 00 00 05 00  ........U.......
0040:  70 ac 21 04 00 00 05 00  01 00 00 00 88 8d 00 00  p.!.............
0050:  c8 30 00 00 00 00 00 00  c8 44 db 94 6c 88 9a d4  .0.......D..l...
0060:  0a a9 01 3a 1f 80 80 8d  ea 0d 53 d7 36 49 b9 a4  ...:......S.6I..
0070:  82 a2 b9 4e 2a 16 4b a1  2e d9 f3 dd              ...N*.K.....

-----Original Message-----
From: René Scharfe [mailto:rene.scharfe@lsrfire.ath.cx] 
Sent: Tuesday, March 19, 2013 2:08 PM
To: Josh Rowe
Cc: git@vger.kernel.org; msysgit@googlegroups.com
Subject: Re: FW: Windows. Git, and Dedupe

Am 18.03.2013 22:20, schrieb Josh Rowe:
> On Windows with an NTFS volume with Deduplication enabled, Git 
> believes that deduplicated files are symlinks.  It then fails to be 
> able to do anything with the file.  This can be repro-ed by creating 
> an NTFS volume with dedup, creating some duplicate files, verifying 
> that a few files are deduped, and trying to add and commit the files 
> via git.

Both Single Instance Storage[1] and Data Deduplication[2] (introduced with Windows Server 2012) seem to be server-only features.  How about keeping regular git repositories with checked-out files on client disks and use the server only for bare repositories (without working tree)?

When I tried to add a symbolic link created with mklink on Windows 8, the mingw version of git refused because readlink(2) is not supported.  This seems to be sufficient to reproduce the issue.

I couldn't test the Cygwin version, though, because http://cygwin.com doesn't respond at the moment.

But a working readlink(2) wouldn't help anyway, I guess.  I imagine that the reparse points used for deduplication point into a magic block store which performs garbage collection of content that is no longer referenced -- which probably means that a recreated "symlink"
may point to blocks that have been deleted in the meantime.

Perhaps you need a way to ask git to always follow symlinks instead of trying to store their target specification.

René


[1] http://technet.microsoft.com/en-us/library/dd573308%28v=ws.10%29.aspx
[2] http://msdn.microsoft.com/en-us/library/windows/desktop/hh769303%28v=vs.85%29.aspx




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FW: Windows. Git, and Dedupe
  2013-03-19 21:36   ` Josh Rowe
@ 2013-03-20 19:54     ` René Scharfe
  2013-03-20 20:43       ` Josh Rowe
  0 siblings, 1 reply; 6+ messages in thread
From: René Scharfe @ 2013-03-20 19:54 UTC (permalink / raw)
  To: Josh Rowe; +Cc: git@vger.kernel.org, msysgit@googlegroups.com

Am 19.03.2013 22:36, schrieb Josh Rowe:
> Yes, Dedup is in fact a Server-only feature.

Is there an easier way to reproduce the issue than registering and 
downloading the Windows Server 2012 evaluation version?  It's not that 
hard, admittedly, but still.

> The reparse point could be decoded as being a non-symlink reparse
> itemusing; in those cases, treating the file as an "ordinary"
> file would be appropriate.
>
> For example, see the following. The reparse tag value for symlinks
> isIO_REPARSE_TAG_SYMLINK (0xa000000c) and for deduped files is
 > (IO_REPARSE_TAG_DEDUP) 0x80000013.

That's interesting and invalidates my initial checks with mklink, 
because if I read compat/mingw.c [1] correctly then git handles symlinks 
on Windows in a special way, but should treat dedup reparse points as 
normal files already.

Hrm, but probably st_size is set to zero for them.  Do the deduped files 
appear as empty?  "git ls-tree -r HEAD" would show them with a hash of 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391.  If true then how do we get 
their real content sizes using Win32 API calls?

By the way, what does the command "git version" return for you?

Thanks,
René


[1] https://git.kernel.org/cgit/git/git.git/tree/compat/mingw.c#n427

-- 
-- 
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.

You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en

--- 
You received this message because you are subscribed to the Google Groups "msysGit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: FW: Windows. Git, and Dedupe
  2013-03-20 19:54     ` René Scharfe
@ 2013-03-20 20:43       ` Josh Rowe
  2013-03-20 21:45         ` René Scharfe
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Rowe @ 2013-03-20 20:43 UTC (permalink / raw)
  To: René Scharfe; +Cc: git@vger.kernel.org, msysgit@googlegroups.com

If you have Win8 or HyperV 2012, I can ship you a small NTFS .vhd with some deduped files.  I'm not sure if that will be readable, but I would hazard a guess that it would be.  It definitely will not be readable on Win7.  

I'm using:

PS C:\> git version
git version 1.8.0.msysgit.0

I don't see any changes related to this in the file log since the original code was added in 2010.  I do notice that mingw_fstat doesn't do anything special with symlinks; I don't know where that is used.  

The file sizes show up as their original size with Windows tools (powershell, Win32, cmd, .Net, etc).  git ls-tree -r HEAD does not show that hash code for files that are not intentionally empty.  

Jmr


-----Original Message-----
From: René Scharfe [mailto:rene.scharfe@lsrfire.ath.cx] 
Sent: Wednesday, March 20, 2013 12:55 PM
To: Josh Rowe
Cc: git@vger.kernel.org; msysgit@googlegroups.com
Subject: Re: FW: Windows. Git, and Dedupe

Am 19.03.2013 22:36, schrieb Josh Rowe:
> Yes, Dedup is in fact a Server-only feature.

Is there an easier way to reproduce the issue than registering and downloading the Windows Server 2012 evaluation version?  It's not that hard, admittedly, but still.

> The reparse point could be decoded as being a non-symlink reparse 
> itemusing; in those cases, treating the file as an "ordinary"
> file would be appropriate.
>
> For example, see the following. The reparse tag value for symlinks 
> isIO_REPARSE_TAG_SYMLINK (0xa000000c) and for deduped files is
 > (IO_REPARSE_TAG_DEDUP) 0x80000013.

That's interesting and invalidates my initial checks with mklink, because if I read compat/mingw.c [1] correctly then git handles symlinks on Windows in a special way, but should treat dedup reparse points as normal files already.

Hrm, but probably st_size is set to zero for them.  Do the deduped files appear as empty?  "git ls-tree -r HEAD" would show them with a hash of e69de29bb2d1d6434b8b29ae775ad8c2e48c5391.  If true then how do we get their real content sizes using Win32 API calls?

By the way, what does the command "git version" return for you?

Thanks,
René


[1] https://git.kernel.org/cgit/git/git.git/tree/compat/mingw.c#n427




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: FW: Windows. Git, and Dedupe
  2013-03-20 20:43       ` Josh Rowe
@ 2013-03-20 21:45         ` René Scharfe
  0 siblings, 0 replies; 6+ messages in thread
From: René Scharfe @ 2013-03-20 21:45 UTC (permalink / raw)
  To: Josh Rowe; +Cc: git@vger.kernel.org, msysgit@googlegroups.com

Am 20.03.2013 21:43, schrieb Josh Rowe:
> If you have Win8 or HyperV 2012, I can ship you a small NTFS .vhd
> with some deduped files.  I'm not sure if that will be readable, but
> I would hazard a guess that it would be.  It definitely will not be
> readable on Win7.

It would be nice if you could upload it to an FTP server or website and 
post a public link so that the real git-on-Windows developers can get it 
as well.  You can also send it to me personally and I'll see if I can 
mount it using Windows 8 and where I get from there.  In any case, 
please make sure there's no sensitive or private data in the VHD file.

How big is it after compression using (preferably) 7-Zip or ZIP?

> I'm using:
>
> PS C:\> git version git version 1.8.0.msysgit.0

That means compat/mingw.c is directly relevant to you; more about MinGW, 
MSys and git at http://msysgit.github.com/ and http://mingw.org/.

> The file sizes show up as their original size with Windows tools
> (powershell, Win32, cmd, .Net, etc).  git ls-tree -r HEAD does not
> show that hash code for files that are not intentionally empty.

So we can likely (hopefully) get the sizes of deduped files with the 
same API calls as for regular ones.  Which makes me even more puzzled 
over the question of why git makes a difference between the two kinds.

René

-- 
-- 
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.

You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en

--- 
You received this message because you are subscribed to the Google Groups "msysGit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-03-20 21:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-18 21:20 FW: Windows. Git, and Dedupe Josh Rowe
2013-03-19 21:08 ` René Scharfe
2013-03-19 21:36   ` Josh Rowe
2013-03-20 19:54     ` René Scharfe
2013-03-20 20:43       ` Josh Rowe
2013-03-20 21:45         ` René Scharfe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).