2020-10-05 05:19:33
*https://github.com/snowblossomcoin/channels/compare/fb55dea8ef6d...58624358f2b8*
https://github.com/snowblossomcoin/channels/commit/58624358f2b8fb23a0355724ff2cb107c8cc824b - Add unicode file test

GitHub
2020-10-05 07:34:10
now that starts seeming sane again, test and debug
i guess you could actually sorta do debug driven testing and keep adding well isolated unit tests to different layers of the onion to see where it goes wrong? not much else one can do in a transform pipeline to ensure one keeps getting it right

Rotonen
2020-10-05 15:12:29
when it breaks, it breaks at the new File(parent, string), or the File.getName()

Fireduck
2020-10-05 15:13:05
I think I only saw it because I was running an older java 8 on my laptop at the time

Fireduck
2020-10-05 15:18:42
does `File()` have a baked in assumption about the encoding?

Rotonen
2020-10-05 15:19:09
or some platform specificity

Rotonen
2020-10-05 15:19:24
and unicode (even UTF-8) filenames are funky anyway cross platform - some things are NFC, some things are NFKD

Rotonen
2020-10-05 16:41:54
I was thinking there might some filesystem weird interaction, not sure yet

Fireduck
2020-10-05 17:43:10
many low level apis are horribly linux / ext3 specific in most programming languages

Rotonen
2020-10-05 17:43:22
baseline guess: try to find a higher level abstraction?

Rotonen
2020-10-05 17:46:28
http://java.io.File is really old. It might be that java.nio.files stuff will work better. I should try that.

Fireduck
2020-10-05 17:46:30
http://fireduck.com/java/java-se-8/docs/api/java/nio/file/package-summary.html

Fireduck
2020-10-05 17:47:37
unit test the whole onion on each layer, will give you massive confidence going forwards

Rotonen
2020-10-05 17:47:53
and you should assert byte for byte on each layer

Rotonen
2020-10-05 17:48:01
I have a unit test that reproduces the problem pretty simply

Fireduck
2020-10-05 17:48:13
otherwise it's super easy to create subtle corruptions on such transform pipelines

Rotonen
2020-10-05 17:48:27
that's a good start, glad to hear of that

Rotonen
2020-10-05 17:48:52
also just asserting byte for byte on an end to end level helps notice you broke it, sure

Rotonen
2020-10-05 17:49:00
still a huge time saver to know what you broke

Rotonen
2020-10-05 17:49:34
*I* didn't break anything :wink:

Fireduck
2020-10-05 17:51:10
i'm hoping the craziest of things will eventually be built on that

Rotonen
2020-10-05 17:52:05
yeah, me too

Fireduck
2020-10-05 17:54:15
The basic failure is, take a unicode string. Make a directory with that string as the name. Read the directory name, see that it is different

Fireduck
2020-10-05 17:54:29
If I run the test outside of bazel, it passes

Fireduck
2020-10-05 17:54:33
inside bazel, it fails

Fireduck
2020-10-05 17:55:09
that "name is different" can actually just be how the filesystem in question works

Rotonen
2020-10-05 17:55:33
if it were a different encoding of the same string, I'd agree

Fireduck
2020-10-05 17:55:51
but it is replace all the Japanese with question marks

Fireduck
2020-10-05 17:56:01
'same string' is also more complicated than one woud like to know of

Rotonen
2020-10-05 17:56:26
right. I would be fine with that, but it is doing something really stupid

Fireduck
2020-10-05 17:56:39
i suppose it's just taken the UTF-8 input byte for byte and used those bytes for the filename, and then those, when read in the appropriate encoding for the filesystem, don't represent anything you can render

Rotonen
2020-10-05 17:57:20
that's the trouble with lower level APIs, they assume you know what you're doing and they do exactly what you tell them to do :smile:

Rotonen
2020-10-05 17:57:50
You think if I picked a specific normalization first it might pass?

Fireduck
2020-10-05 17:58:00
not sure

Rotonen
2020-10-05 17:58:00
(depending on the filesystem)

Fireduck
2020-10-05 17:58:13
try to do the byte for byte comparison on the filesystem itself on the name

Rotonen
2020-10-05 17:58:30
as in trap the test in a debugger after the write and dig in

Rotonen
2020-10-05 17:58:36
from outside of the test, on the system side

Rotonen
2020-10-05 17:59:02
yeah, I have a print code point method to help me see what is actually going on

Fireduck
2020-10-05 17:59:33
and having done multi platform file stuff, the only thing i know for sure is that i'll never want to try to do that stuff manually myself, the jungle is too thick - find community wisdom you can rely on (most likely java.nio or some other upstream api would provide that)

Rotonen
2020-10-05 17:59:59
yeah, I have high hopes for nio

Fireduck
2020-10-05 18:00:03
no, i meant more along the lines of break a hex editor out and *actually* see what's stored on disk, as bytes

Rotonen
2020-10-05 18:02:51
if those match what your input is, then you're in the mess i'm thinking of

Rotonen
2020-10-05 18:03:12
but, if it just goes away with a more modern API, probably not worth bothering about, unless curious

Rotonen