archive - snowblossom - slack - dev

2020-10-05 05:19:33

GitHub

*https://github.com/snowblossomcoin/channels/compare/fb55dea8ef6d...58624358f2b8* https://github.com/snowblossomcoin/channels/commit/58624358f2b8fb23a0355724ff2cb107c8cc824b - Add unicode file test

2020-10-05 07:34:10

Rotonen

now that starts seeming sane again, test and debug i guess you could actually sorta do debug driven testing and keep adding well isolated unit tests to different layers of the onion to see where it goes wrong? not much else one can do in a transform pipeline to ensure one keeps getting it right

2020-10-05 15:12:29

Fireduck

when it breaks, it breaks at the new File(parent, string), or the File.getName()

2020-10-05 15:13:05

Fireduck

I think I only saw it because I was running an older java 8 on my laptop at the time

2020-10-05 15:18:42

Rotonen

does `File()` have a baked in assumption about the encoding?

2020-10-05 15:19:09

Rotonen

or some platform specificity

2020-10-05 15:19:24

Rotonen

and unicode (even UTF-8) filenames are funky anyway cross platform - some things are NFC, some things are NFKD

2020-10-05 16:41:54

Fireduck

I was thinking there might some filesystem weird interaction, not sure yet

2020-10-05 17:43:10

Rotonen

many low level apis are horribly linux / ext3 specific in most programming languages

2020-10-05 17:43:22

Rotonen

baseline guess: try to find a higher level abstraction?

2020-10-05 17:46:28

Fireduck

http://java.io.File is really old. It might be that java.nio.files stuff will work better. I should try that.

2020-10-05 17:46:30

Fireduck

http://fireduck.com/java/java-se-8/docs/api/java/nio/file/package-summary.html

2020-10-05 17:47:37

Rotonen

unit test the whole onion on each layer, will give you massive confidence going forwards

2020-10-05 17:47:53

Rotonen

and you should assert byte for byte on each layer

2020-10-05 17:48:01

Fireduck

I have a unit test that reproduces the problem pretty simply

2020-10-05 17:48:13

Rotonen

otherwise it's super easy to create subtle corruptions on such transform pipelines

2020-10-05 17:48:27

Rotonen

that's a good start, glad to hear of that

2020-10-05 17:48:52

Rotonen

also just asserting byte for byte on an end to end level helps notice you broke it, sure

2020-10-05 17:49:00

Rotonen

still a huge time saver to know what you broke

2020-10-05 17:49:34

Fireduck

*I* didn't break anything :wink:

2020-10-05 17:51:10

Rotonen

i'm hoping the craziest of things will eventually be built on that

2020-10-05 17:52:05

Fireduck

yeah, me too

2020-10-05 17:54:15

Fireduck

The basic failure is, take a unicode string. Make a directory with that string as the name. Read the directory name, see that it is different

2020-10-05 17:54:29

Fireduck

If I run the test outside of bazel, it passes

2020-10-05 17:54:33

Fireduck

inside bazel, it fails

2020-10-05 17:55:09

Rotonen

that "name is different" can actually just be how the filesystem in question works

2020-10-05 17:55:33

Fireduck

if it were a different encoding of the same string, I'd agree

2020-10-05 17:55:51

Fireduck

but it is replace all the Japanese with question marks

2020-10-05 17:56:01

Rotonen

'same string' is also more complicated than one woud like to know of

2020-10-05 17:56:26

Fireduck

right. I would be fine with that, but it is doing something really stupid

2020-10-05 17:56:39

Rotonen

i suppose it's just taken the UTF-8 input byte for byte and used those bytes for the filename, and then those, when read in the appropriate encoding for the filesystem, don't represent anything you can render

2020-10-05 17:57:20

Rotonen

that's the trouble with lower level APIs, they assume you know what you're doing and they do exactly what you tell them to do :smile:

2020-10-05 17:57:50

Fireduck

You think if I picked a specific normalization first it might pass?

2020-10-05 17:58:00

Rotonen

not sure

2020-10-05 17:58:00

Fireduck

(depending on the filesystem)

2020-10-05 17:58:13

Rotonen

try to do the byte for byte comparison on the filesystem itself on the name

2020-10-05 17:58:30

Rotonen

as in trap the test in a debugger after the write and dig in

2020-10-05 17:58:36

Rotonen

from outside of the test, on the system side

2020-10-05 17:59:02

Fireduck

yeah, I have a print code point method to help me see what is actually going on

2020-10-05 17:59:33

Rotonen

and having done multi platform file stuff, the only thing i know for sure is that i'll never want to try to do that stuff manually myself, the jungle is too thick - find community wisdom you can rely on (most likely java.nio or some other upstream api would provide that)

2020-10-05 17:59:59

Fireduck

yeah, I have high hopes for nio

2020-10-05 18:00:03

Rotonen

no, i meant more along the lines of break a hex editor out and *actually* see what's stored on disk, as bytes

2020-10-05 18:02:51

Rotonen

if those match what your input is, then you're in the mess i'm thinking of

2020-10-05 18:03:12

Rotonen

but, if it just goes away with a more modern API, probably not worth bothering about, unless curious