* [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists
@ 2016-12-11 23:34 Beat Bolli
  2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Beat Bolli @ 2016-12-11 23:34 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli
We need to track the new commits in uniset, otherwise their and our code
get out of sync.
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
Junio, these go on top of my bb/unicode-9.0 branch, please.
Thanks!
 update_unicode.sh | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/update_unicode.sh b/update_unicode.sh
index 4c1ec8d..9ca7d8b 100755
--- a/update_unicode.sh
+++ b/update_unicode.sh
@@ -14,6 +14,11 @@ fi &&
 		http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt &&
 	if ! test -d uniset; then
 		git clone https://github.com/depp/uniset.git
+	else
+	(
+		cd uniset &&
+		git pull
+	)
 	fi &&
 	(
 		cd uniset &&
-- 
2.7.2
^ permalink raw reply related	[flat|nested] 11+ messages in thread* [PATCH 2/3] update_unicode.sh: remove the plane filters 2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli @ 2016-12-11 23:34 ` Beat Bolli 2016-12-11 23:34 ` [PATCH 3/3] update_unicode.sh: restore hexadecimal output Beat Bolli 2016-12-12 5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen 2 siblings, 0 replies; 11+ messages in thread From: Beat Bolli @ 2016-12-11 23:34 UTC (permalink / raw) To: git; +Cc: Beat Bolli The uniset upstream has accepted my patches that eliminate the Unicode plane offsets from the output in '--32' mode. Remove the corresponding filter in update_unicode.sh. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- update_unicode.sh | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/update_unicode.sh b/update_unicode.sh index 9ca7d8b..e595bf8 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -31,11 +31,10 @@ fi && UNICODE_DIR=. && export UNICODE_DIR && cat >$UNICODEWIDTH_H <<-EOF static const struct interval zero_width[] = { - $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD | - grep -v plane) + $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD) }; static const struct interval double_width[] = { - $(uniset/uniset --32 eaw:F,W | grep -v plane) + $(uniset/uniset --32 eaw:F,W) }; EOF ) -- 2.7.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] update_unicode.sh: restore hexadecimal output 2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli 2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli @ 2016-12-11 23:34 ` Beat Bolli 2016-12-12 5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen 2 siblings, 0 replies; 11+ messages in thread From: Beat Bolli @ 2016-12-11 23:34 UTC (permalink / raw) To: git; +Cc: Beat Bolli The uniset upstream has decided that decimal numbers are The True Way, so let's convert them back to the usual format that's closer to the U+nnnn standard. The generated unicode_widths.h file again looks exactly the same as two commits ago. Signed-off-by: Beat Bolli <dev+git@drbeat.li> --- update_unicode.sh | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/update_unicode.sh b/update_unicode.sh index e595bf8..d7720d5 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -5,6 +5,12 @@ #Mn Nonspacing_Mark a nonspacing combining mark (zero advance width) #Cf Format a format control character # + +dec_to_hex() { + # convert any decimal numbers to 4-digit hex + perl -pe 's/(\d+)/sprintf("0x%04X", $1)/ge' +} + UNICODEWIDTH_H=../unicode_width.h if ! test -d unicode; then mkdir unicode @@ -29,7 +35,7 @@ fi && make ) && UNICODE_DIR=. && export UNICODE_DIR && - cat >$UNICODEWIDTH_H <<-EOF + dec_to_hex >$UNICODEWIDTH_H <<-EOF static const struct interval zero_width[] = { $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD) }; -- 2.7.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli 2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli 2016-12-11 23:34 ` [PATCH 3/3] update_unicode.sh: restore hexadecimal output Beat Bolli @ 2016-12-12 5:53 ` Torsten Bögershausen 2016-12-12 8:54 ` Beat Bolli 2 siblings, 1 reply; 11+ messages in thread From: Torsten Bögershausen @ 2016-12-12 5:53 UTC (permalink / raw) To: Beat Bolli, git On 2016-12-12 00:34, Beat Bolli wrote: > We need to track the new commits in uniset, otherwise their and our code > get out of sync. > > Signed-off-by: Beat Bolli <dev+git@drbeat.li> > --- > > Junio, these go on top of my bb/unicode-9.0 branch, please. > > Thanks! > > update_unicode.sh | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/update_unicode.sh b/update_unicode.sh > index 4c1ec8d..9ca7d8b 100755 > --- a/update_unicode.sh > +++ b/update_unicode.sh > @@ -14,6 +14,11 @@ fi && > http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt && > if ! test -d uniset; then > git clone https://github.com/depp/uniset.git > + else > + ( > + cd uniset && > + git pull If upstream has accepted your patches, that's nice. Minor question, especially to the next commit: Should we make sure to checkout the exact version, which has been tested? In this case cb97792880625e24a9f581412d03659091a0e54f And this is for both a fresh clone and the git pull needs to be replaced by git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f (Which of course is a shell variable ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-12 5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen @ 2016-12-12 8:54 ` Beat Bolli 2016-12-12 18:12 ` Torsten Bögershausen 0 siblings, 1 reply; 11+ messages in thread From: Beat Bolli @ 2016-12-12 8:54 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: git On 2016-12-12 06:53, Torsten Bögershausen wrote: > On 2016-12-12 00:34, Beat Bolli wrote: >> We need to track the new commits in uniset, otherwise their and our >> code >> get out of sync. >> >> Signed-off-by: Beat Bolli <dev+git@drbeat.li> >> --- >> >> Junio, these go on top of my bb/unicode-9.0 branch, please. >> >> Thanks! >> >> update_unicode.sh | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/update_unicode.sh b/update_unicode.sh >> index 4c1ec8d..9ca7d8b 100755 >> --- a/update_unicode.sh >> +++ b/update_unicode.sh >> @@ -14,6 +14,11 @@ fi && >> http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt && >> if ! test -d uniset; then >> git clone https://github.com/depp/uniset.git >> + else >> + ( >> + cd uniset && >> + git pull > If upstream has accepted your patches, that's nice. > > Minor question, especially to the next commit: > Should we make sure to checkout the exact version, which has been > tested? > In this case cb97792880625e24a9f581412d03659091a0e54f > > And this is for both a fresh clone and the git pull > needs to be replaced by > git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f > > > (Which of course is a shell variable) I was actually wondering what the policy was for adding submodules to the Git repo, but then decided against it. Another option would be to fork uniset on GitHub and just let it stay on a working commit. Junio, what's your stance on this? Beat ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-12 8:54 ` Beat Bolli @ 2016-12-12 18:12 ` Torsten Bögershausen 2016-12-12 18:33 ` Junio C Hamano 2016-12-12 19:24 ` Beat Bolli 0 siblings, 2 replies; 11+ messages in thread From: Torsten Bögershausen @ 2016-12-12 18:12 UTC (permalink / raw) To: Beat Bolli, Torsten Bögershausen; +Cc: git >> Minor question, especially to the next commit: >> Should we make sure to checkout the exact version, which has been tested? >> In this case cb97792880625e24a9f581412d03659091a0e54f >> >> And this is for both a fresh clone and the git pull >> needs to be replaced by >> git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f >> >> >> (Which of course is a shell variable) > > I was actually wondering what the policy was for adding submodules to the Git repo, > but then decided against it. Another option would be to fork uniset on GitHub and > just let it stay on a working commit. > > Junio, what's your stance on this? > > Beat If I run ./update_unicode.sh on the latest master of https://github.com/depp/uniset.git , commit a5fac4a091857dd5429cc2d, I get a diff in unicode_width.h like this: -{ 0x0300, 0x036F }, +{ 768, 879 }, IOW, all hex values are printed as decimal values. Not a problem for the compiler, but for the human to check the unicode tables. So I think we should "pin" the version of uniset. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-12 18:12 ` Torsten Bögershausen @ 2016-12-12 18:33 ` Junio C Hamano 2016-12-12 23:50 ` Beat Bolli 2016-12-12 19:24 ` Beat Bolli 1 sibling, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2016-12-12 18:33 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: Beat Bolli, git Torsten Bögershausen <tboegi@web.de> writes: > If I run ./update_unicode.sh on the latest master of > https://github.com/depp/uniset.git , commit > a5fac4a091857dd5429cc2d, I get a diff in unicode_width.h like > this: > > -{ 0x0300, 0x036F }, > > +{ 768, 879 }, > > IOW, all hex values are printed as decimal values. > Not a problem for the compiler, but for the human > to check the unicode tables. > > So I think we should "pin" the version of uniset. Sure, and I'd rather see the update-unicode.sh script moved somewhere in contrib/ while at it. Those who are interested in keeping up with the unicode standard are tiny minority of the developer population, and most of us would treat the built width table as the source (after all, that is what we ship). To be bluntly honest, I'd rather not to see "update-unicode.sh" download and build uniset at all. It's as if po/ hierarchy shipping with its own script to download and build msgmerge--that's madness. Needless to say, shipping the sources for uniset embedded in our project tree (either as a snapshot-fork or as a submodule) is even worse. Those who want to muck with po/ are expected to have msgmerge and friends. Why not expect the same for those who want to update the unicode width table? I'd rather see a written instruction telling which snapshot to get and from where to build and place on their $PATH in the README file, sitting next to the update-unicode.sh script in contrib/uniwidth/ directory, for those who are interested in building the width table "from the source", and the update-unicode.sh script to assume that uniset is available. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-12 18:33 ` Junio C Hamano @ 2016-12-12 23:50 ` Beat Bolli 2016-12-13 6:16 ` Torsten Bögershausen 0 siblings, 1 reply; 11+ messages in thread From: Beat Bolli @ 2016-12-12 23:50 UTC (permalink / raw) To: git; +Cc: Torsten Bögershausen On 12.12.16 19:33, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@web.de> writes: > >> If I run ./update_unicode.sh on the latest master of >> https://github.com/depp/uniset.git , commit >> a5fac4a091857dd5429cc2d, I get a diff in unicode_width.h like >> this: >> >> -{ 0x0300, 0x036F }, >> >> +{ 768, 879 }, >> >> IOW, all hex values are printed as decimal values. >> Not a problem for the compiler, but for the human >> to check the unicode tables. >> >> So I think we should "pin" the version of uniset. > > Sure, and I'd rather see the update-unicode.sh script moved > somewhere in contrib/ while at it. Those who are interested in > keeping up with the unicode standard are tiny minority of the > developer population, and most of us would treat the built width > table as the source (after all, that is what we ship). > > To be bluntly honest, I'd rather not to see "update-unicode.sh" > download and build uniset at all. It's as if po/ hierarchy shipping > with its own script to download and build msgmerge--that's madness. > Needless to say, shipping the sources for uniset embedded in our > project tree (either as a snapshot-fork or as a submodule) is even > worse. Those who want to muck with po/ are expected to have > msgmerge and friends. Why not expect the same for those who want to > update the unicode width table? > > I'd rather see a written instruction telling which snapshot to get > and from where to build and place on their $PATH in the README file, > sitting next to the update-unicode.sh script in contrib/uniwidth/ > directory, for those who are interested in building the width table > "from the source", and the update-unicode.sh script to assume that > uniset is available. > OK. So please don't merge bb/unicode-9.0 to next yet; I'll prepare a reroll following your description. Torsten, is this alright with you? Cheers, Beat ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-12 23:50 ` Beat Bolli @ 2016-12-13 6:16 ` Torsten Bögershausen 2016-12-13 6:42 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Torsten Bögershausen @ 2016-12-13 6:16 UTC (permalink / raw) To: Beat Bolli, git >> Sure, and I'd rather see the update-unicode.sh script moved >> somewhere in contrib/ while at it. Those who are interested in >> keeping up with the unicode standard are tiny minority of the >> developer population, and most of us would treat the built width >> table as the source (after all, that is what we ship). >> >> To be bluntly honest, I'd rather not to see "update-unicode.sh" >> download and build uniset at all. It's as if po/ hierarchy shipping >> with its own script to download and build msgmerge--that's madness. >> Needless to say, shipping the sources for uniset embedded in our >> project tree (either as a snapshot-fork or as a submodule) is even >> worse. Those who want to muck with po/ are expected to have >> msgmerge and friends. Why not expect the same for those who want to >> update the unicode width table? >> >> I'd rather see a written instruction telling which snapshot to get >> and from where to build and place on their $PATH in the README file, >> sitting next to the update-unicode.sh script in contrib/uniwidth/ >> directory, for those who are interested in building the width table >> "from the source", and the update-unicode.sh script to assume that >> uniset is available. OK with the contrib - that's an improvement. About the instructions how to download and compile: (we don't need to change the $PATH, do we ?) I don't know. The typical instructions I have seen are a sequence of shell commands to be executed, which hopefully simply work by doing "copy-and-paste". I find this error-prone, as you you may loose the last character while moving the mouse, or don't check the error message or return codes. Having a pre-baked shell script, which does use "&&" is in that way more attractive, and the README can be as simple as run "update-unicode.sh" and that's it. uniset is a small project and where should we put it ? a) inside the Git tree? b) /tmp ? c) into the $HOME directory ? d) /usr/local a) is quick and dirty b) probably OK c) Not sure about tha d) Needs super user rights Can we try to find a good place ? "contrib/uniwidth/" may be different to find, how about contrib/update-unicode ? > OK. So please don't merge bb/unicode-9.0 to next yet; I'll prepare a > reroll following your description. > > Torsten, is this alright with you? sure > Cheers, Beat ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-13 6:16 ` Torsten Bögershausen @ 2016-12-13 6:42 ` Junio C Hamano 0 siblings, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2016-12-13 6:42 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: Beat Bolli, git Torsten Bögershausen <tboegi@web.de> writes: > The typical instructions I have seen are a sequence of shell commands > to be executed, which hopefully simply work by doing "copy-and-paste". > I find this error-prone, as you you may loose the last character while > moving the mouse, or don't check the error message or return codes. > Having a pre-baked shell script, which does use "&&" is in that way > more attractive, > and the README can be as simple as run "update-unicode.sh" and that's it. That's OK as well. > "contrib/uniwidth/" may be different to find, how about contrib/update-unicode ? This, too. And as long as .gitignore pattern is set up correctly there, I do not think we terribly mind "git clone ..from..there.." into it, either. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists 2016-12-12 18:12 ` Torsten Bögershausen 2016-12-12 18:33 ` Junio C Hamano @ 2016-12-12 19:24 ` Beat Bolli 1 sibling, 0 replies; 11+ messages in thread From: Beat Bolli @ 2016-12-12 19:24 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: git On 12.12.16 19:12, Torsten Bögershausen wrote: > >>> Minor question, especially to the next commit: >>> Should we make sure to checkout the exact version, which has been tested? >>> In this case cb97792880625e24a9f581412d03659091a0e54f >>> >>> And this is for both a fresh clone and the git pull >>> needs to be replaced by >>> git fetch && git checkout cb97792880625e24a9f581412d03659091a0e54f >>> >>> >>> (Which of course is a shell variable) >> >> I was actually wondering what the policy was for adding submodules to the Git repo, >> but then decided against it. Another option would be to fork uniset on GitHub and >> just let it stay on a working commit. >> >> Junio, what's your stance on this? >> >> Beat > > If I run ./update_unicode.sh on the latest master of https://github.com/depp/uniset.git , > commit a5fac4a091857dd5429cc2d, I get a diff in unicode_width.h like this: > > -{ 0x0300, 0x036F }, > > +{ 768, 879 }, > > IOW, all hex values are printed as decimal values. > Not a problem for the compiler, but for the human > to check the unicode tables. > > So I think we should "pin" the version of uniset. That's what patch 3/3 fixes. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-12-13 6:42 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-11 23:34 [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Beat Bolli 2016-12-11 23:34 ` [PATCH 2/3] update_unicode.sh: remove the plane filters Beat Bolli 2016-12-11 23:34 ` [PATCH 3/3] update_unicode.sh: restore hexadecimal output Beat Bolli 2016-12-12 5:53 ` [PATCH 1/3] update_unicode.sh: update the uniset repo if it exists Torsten Bögershausen 2016-12-12 8:54 ` Beat Bolli 2016-12-12 18:12 ` Torsten Bögershausen 2016-12-12 18:33 ` Junio C Hamano 2016-12-12 23:50 ` Beat Bolli 2016-12-13 6:16 ` Torsten Bögershausen 2016-12-13 6:42 ` Junio C Hamano 2016-12-12 19:24 ` Beat Bolli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).