2020-10-05 05:19:33
*https://github.com/snowblossomcoin/channels/compare/fb55dea8ef6d...58624358f2b8*
https://github.com/snowblossomcoin/channels/commit/58624358f2b8fb23a0355724ff2cb107c8cc824b - Add unicode file test
GitHub
2020-10-05 07:34:10
now that starts seeming sane again, test and debug
i guess you could actually sorta do debug driven testing and keep adding well isolated unit tests to different layers of the onion to see where it goes wrong? not much else one can do in a transform pipeline to ensure one keeps getting it right
Rotonen
2020-10-05 15:12:29
when it breaks, it breaks at the new File(parent, string), or the File.getName()
Fireduck
2020-10-05 15:13:05
I think I only saw it because I was running an older java 8 on my laptop at the time
Fireduck
2020-10-05 15:18:42
does `File()` have a baked in assumption about the encoding?
Rotonen
2020-10-05 15:19:09
or some platform specificity
Rotonen
2020-10-05 15:19:24
and unicode (even UTF-8) filenames are funky anyway cross platform - some things are NFC, some things are NFKD
Rotonen
2020-10-05 16:41:54
I was thinking there might some filesystem weird interaction, not sure yet
Fireduck
2020-10-05 17:43:10
many low level apis are horribly linux / ext3 specific in most programming languages
Rotonen
2020-10-05 17:43:22
baseline guess: try to find a higher level abstraction?
Rotonen
2020-10-05 17:46:28
http://java.io.File is really old. It might be that java.nio.files stuff will work better. I should try that.
Fireduck
2020-10-05 17:46:30
http://fireduck.com/java/java-se-8/docs/api/java/nio/file/package-summary.html
Fireduck
2020-10-05 17:47:37
unit test the whole onion on each layer, will give you massive confidence going forwards
Rotonen
2020-10-05 17:47:53
and you should assert byte for byte on each layer
Rotonen
2020-10-05 17:48:01
I have a unit test that reproduces the problem pretty simply
Fireduck
2020-10-05 17:48:13
otherwise it's super easy to create subtle corruptions on such transform pipelines
Rotonen
2020-10-05 17:48:27
that's a good start, glad to hear of that
Rotonen
2020-10-05 17:48:52
also just asserting byte for byte on an end to end level helps notice you broke it, sure
Rotonen
2020-10-05 17:49:00
still a huge time saver to know what you broke
Rotonen
2020-10-05 17:49:34
*I* didn't break anything :wink:
Fireduck
2020-10-05 17:51:10
i'm hoping the craziest of things will eventually be built on that
Rotonen
2020-10-05 17:52:05
yeah, me too
Fireduck
2020-10-05 17:54:15
The basic failure is, take a unicode string. Make a directory with that string as the name. Read the directory name, see that it is different
Fireduck
2020-10-05 17:54:29
If I run the test outside of bazel, it passes
Fireduck
2020-10-05 17:54:33
inside bazel, it fails
Fireduck
2020-10-05 17:55:09
that "name is different" can actually just be how the filesystem in question works
Rotonen
2020-10-05 17:55:33
if it were a different encoding of the same string, I'd agree
Fireduck
2020-10-05 17:55:51
but it is replace all the Japanese with question marks
Fireduck
2020-10-05 17:56:01
'same string' is also more complicated than one woud like to know of
Rotonen
2020-10-05 17:56:26
right. I would be fine with that, but it is doing something really stupid
Fireduck
2020-10-05 17:56:39
i suppose it's just taken the UTF-8 input byte for byte and used those bytes for the filename, and then those, when read in the appropriate encoding for the filesystem, don't represent anything you can render
Rotonen
2020-10-05 17:57:20
that's the trouble with lower level APIs, they assume you know what you're doing and they do exactly what you tell them to do :smile:
Rotonen
2020-10-05 17:57:50
You think if I picked a specific normalization first it might pass?
Fireduck
2020-10-05 17:58:00
not sure
Rotonen
2020-10-05 17:58:00
(depending on the filesystem)
Fireduck
2020-10-05 17:58:13
try to do the byte for byte comparison on the filesystem itself on the name
Rotonen
2020-10-05 17:58:30
as in trap the test in a debugger after the write and dig in
Rotonen
2020-10-05 17:58:36
from outside of the test, on the system side
Rotonen
2020-10-05 17:59:02
yeah, I have a print code point method to help me see what is actually going on
Fireduck
2020-10-05 17:59:33
and having done multi platform file stuff, the only thing i know for sure is that i'll never want to try to do that stuff manually myself, the jungle is too thick - find community wisdom you can rely on (most likely java.nio or some other upstream api would provide that)
Rotonen
2020-10-05 17:59:59
yeah, I have high hopes for nio
Fireduck
2020-10-05 18:00:03
no, i meant more along the lines of break a hex editor out and *actually* see what's stored on disk, as bytes
Rotonen
2020-10-05 18:02:51
if those match what your input is, then you're in the mess i'm thinking of
Rotonen
2020-10-05 18:03:12
but, if it just goes away with a more modern API, probably not worth bothering about, unless curious
Rotonen