now that starts seeming sane again, test and debug i guess you could actually sorta do debug driven testing and keep adding well isolated unit tests to different layers of the onion to see where it goes wrong? not much else one can do in a transform pipeline to ensure one keeps getting it right
when it breaks, it breaks at the new File(parent, string), or the File.getName()
I think I only saw it because I was running an older java 8 on my laptop at the time
does `File()` have a baked in assumption about the encoding?
or some platform specificity
and unicode (even UTF-8) filenames are funky anyway cross platform - some things are NFC, some things are NFKD
I was thinking there might some filesystem weird interaction, not sure yet
many low level apis are horribly linux / ext3 specific in most programming languages
baseline guess: try to find a higher level abstraction?
http://java.io.File is really old. It might be that java.nio.files stuff will work better. I should try that.
unit test the whole onion on each layer, will give you massive confidence going forwards
and you should assert byte for byte on each layer
I have a unit test that reproduces the problem pretty simply
otherwise it's super easy to create subtle corruptions on such transform pipelines
that's a good start, glad to hear of that
also just asserting byte for byte on an end to end level helps notice you broke it, sure
still a huge time saver to know what you broke
*I* didn't break anything :wink:
i'm hoping the craziest of things will eventually be built on that
yeah, me too
The basic failure is, take a unicode string. Make a directory with that string as the name. Read the directory name, see that it is different
If I run the test outside of bazel, it passes
inside bazel, it fails
that "name is different" can actually just be how the filesystem in question works
if it were a different encoding of the same string, I'd agree
but it is replace all the Japanese with question marks
'same string' is also more complicated than one woud like to know of
right. I would be fine with that, but it is doing something really stupid
i suppose it's just taken the UTF-8 input byte for byte and used those bytes for the filename, and then those, when read in the appropriate encoding for the filesystem, don't represent anything you can render
that's the trouble with lower level APIs, they assume you know what you're doing and they do exactly what you tell them to do :smile:
You think if I picked a specific normalization first it might pass?
not sure
(depending on the filesystem)
try to do the byte for byte comparison on the filesystem itself on the name
as in trap the test in a debugger after the write and dig in
from outside of the test, on the system side
yeah, I have a print code point method to help me see what is actually going on
and having done multi platform file stuff, the only thing i know for sure is that i'll never want to try to do that stuff manually myself, the jungle is too thick - find community wisdom you can rely on (most likely java.nio or some other upstream api would provide that)
yeah, I have high hopes for nio
no, i meant more along the lines of break a hex editor out and *actually* see what's stored on disk, as bytes
if those match what your input is, then you're in the mess i'm thinking of
but, if it just goes away with a more modern API, probably not worth bothering about, unless curious