2019-10-02 04:54:04
*https://github.com/snowblossomcoin/snowblossom/compare/3b8219d51ee9...2925f91d7972*
https://github.com/snowblossomcoin/snowblossom/commit/2925f91d7972f9137f7717f66f85ec900ea3a971 - unit tests for ids and fbo index
GitHub
2019-10-02 05:15:32
*https://github.com/snowblossomcoin/snowblossom/compare/2925f91d7972...a0eb84697a62*
https://github.com/snowblossomcoin/snowblossom/commit/a0eb84697a62af82fba142f2a7bdab9b58083159 - Adding GRPC calls
GitHub
2019-10-02 14:14:27
and for the record, i was not kidding about the orangutang
Rotonen
2019-10-02 16:20:36
Is it in Unicode?
Fireduck
2019-10-02 17:14:27
yes
Rotonen
2019-10-02 17:18:03
all emoji are per definition unicode
Rotonen
2019-10-02 18:06:50
I was thinking that emoji was small pictures used to convey feelings, but I think you are technically correct
Fireduck
2019-10-02 18:06:52
the best kind of correct
Fireduck
2019-10-02 18:23:24
Ok, mysql does something crazy with strings by default
Fireduck
2019-10-02 18:23:43
which ignores caps, and collapses weird accent marks
Fireduck
2019-10-02 18:23:54
and that is what I want to use for normalizing name strings
Fireduck
2019-10-02 18:24:01
but I have no idea what the hell it is
Fireduck
2019-10-02 18:31:52
that’s called collation
Rotonen
2019-10-02 18:32:12
do not confuse that with unidode normalization
Rotonen
2019-10-02 18:32:33
cool
Fireduck
2019-10-02 18:32:33
collation is just for searching and sorting
Rotonen
2019-10-02 18:32:51
see also: LC_COLLATE
Rotonen
2019-10-02 18:33:39
it’s for locale specific stuff like ’should a and ä sort as the same character’
Rotonen
2019-10-02 18:34:23
yeah
Fireduck
2019-10-02 18:34:24
it encodes culture based expectations of software behaviour, not actual byte level representations of characters
Rotonen
2019-10-02 18:34:58
If someone registers Fräd, I want that to be the same as if someone registered frad
Fireduck
2019-10-02 18:35:15
not very globally inclusive
Rotonen
2019-10-02 18:35:34
To be clear, people can use the exact string they register
Fireduck
2019-10-02 18:35:43
i’d not like to see that be restricted to latin scripts either
Rotonen
2019-10-02 18:35:44
it is only for the uniqueness bucketing that it would be collated down
Fireduck
2019-10-02 18:36:49
you’ll disappoint a lot of nordic and german people that way - some genuinely different names will collapse to one that way
Rotonen
2019-10-02 18:37:03
understood
Fireduck
2019-10-02 18:37:20
plus everyone who is not on a flat latin script, so the majority of the global population
Rotonen
2019-10-02 18:38:01
I don't think I would be preventing anyone from using any unicode string they like
Fireduck
2019-10-02 18:38:36
so what do you collapse the orangutang to?
Rotonen
2019-10-02 18:38:49
probably just orangutang
Fireduck
2019-10-02 18:39:05
I'll absolutely put that in the unit tests for it
Fireduck
2019-10-02 18:39:31
for hard mode, try family emojis with mixed skin tone modifiers
Rotonen
2019-10-02 18:39:43
oh god
Fireduck
2019-10-02 18:40:07
yeah, good idea
Fireduck
2019-10-02 18:40:47
i like unicode
Rotonen
2019-10-02 18:40:53
me too
Fireduck
2019-10-02 18:41:14
yet you want to collapse to ASCII
Rotonen
2019-10-02 18:41:31
i’ll be ASCII 0x0B then
Rotonen
2019-10-02 18:41:42
only the things that collapse to ASCII
Fireduck
2019-10-02 18:41:56
I suspect I am communicating poorly because I don't know the terminology in this space
Fireduck
2019-10-02 18:42:47
my argument: treat user input just as a pile of bytes, do less, allow for more
Rotonen
2019-10-02 18:43:19
I mostly agree with that. I am just trying to avoid people impersonating others easily by picking strange letters
Fireduck
2019-10-02 18:43:26
with one caveat of normalize to NFC so people cannot have identical renderings from multiple inputs
Rotonen
2019-10-02 18:43:26
I know I won't be able to prevent that completely
Fireduck
2019-10-02 18:43:34
you can
Rotonen
2019-10-02 18:43:42
NFC or NFKC?
Fireduck
2019-10-02 18:44:02
K is always crap and allows for nonsense
Rotonen
2019-10-02 18:44:13
pick NFC or NFD
Rotonen
2019-10-02 18:44:49
composed is ’minimal set of bytes which can represent’ and decomposed is the opposite
Rotonen
2019-10-02 18:45:57
or just hash the input and that’s the read id?
Rotonen
2019-10-02 18:46:27
The hashing is being done, mostly to get me a fixed length item in the table
Fireduck
2019-10-02 18:47:10
gitstyle short hash as discordstyle tail and done?
Rotonen
2019-10-02 18:47:54
tail too short, easy to brute force a collision
Fireduck
2019-10-02 18:48:20
poc attack demo required to continue discussion
Rotonen
2019-10-02 18:48:39
ha, I might misunderstand how the tails are made
Fireduck
2019-10-02 18:48:51
if you are counting entries and everyone has the same count, no problem
Fireduck
2019-10-02 18:49:01
iirc git just takes beginning and end of the sha
Rotonen
2019-10-02 18:49:04
if it is the hash of something, then someone can generate keys until it hashes to the same short tail
Fireduck
2019-10-02 18:49:34
can play silly games like tail hash = hmac(tx_id, block_id, name)
Fireduck
2019-10-02 18:49:46
so it is hard to know what the tail will be unless you are also the miner
Fireduck
2019-10-02 18:49:51
and willing to throw away good blocks
Fireduck
2019-10-02 18:49:56
yes, but an attack requires identical rendering input AND a hash collision
Rotonen
2019-10-02 18:50:48
*prefix suffix hash collision
Rotonen
2019-10-02 18:51:47
Think any of those collator modes will help?
Fireduck
2019-10-02 18:53:29
that’s ultimately more to do with how to build search engines and human facing sortable tables of any sort
Rotonen
2019-10-02 18:54:01
seems java would like to decompose and sort over more bytes - can make sense
Rotonen
2019-10-02 18:55:28
and the strength values are for very complex tiered rulesets of sorting, such as some multi lingual academic library (books on shelves kind) would need
Rotonen
2019-10-02 18:56:28
and i’ve been against using collation as a tool, just normalize to NFC and tail tag with a hash
Rotonen
2019-10-02 18:56:55
gitstyle, hard auth with long, most use cases good enough with short
Rotonen
2019-10-02 18:58:29
and an input collision is fine as the human facing ones are still unique?
Rotonen
2019-10-02 18:59:15
and an attack poc welcome on a prefix-suffix collision on something which’d render the same
Rotonen
2019-10-02 19:39:37
the important thing to remember is that if it doesn't have a tail, it's not a monkey
Fireduck
2019-10-02 19:41:35
f̬r̸̹ạ̴̳̱̻͔̗̞͠d͇̝̮̠̮̟̪̹́
Fireduck
2019-10-02 20:19:40
you also have to remember that the naive user you are trying to protect would not pick anything with staggering diacritics
Rotonen
2019-10-02 20:20:38
good, cause my collation currently can't handle them. :wink:
Fireduck
2019-10-02 20:20:49
I'm thinking about what you've said, it makes a lot of sense
Fireduck
2019-10-02 20:21:03
but I do want to collapse case at least, I think
Fireduck
2019-10-02 20:34:25
Assert.assertEquals("fireduck", ForBenefitOfUtil.normalize("fireduck"), ForBenefitOfUtil.normalize("𝓕ire𝐃uc𝐤"));
Fireduck
2019-10-02 20:37:10
That one is pretty obviously different, but if the font rendering is a little wonky, it could look quick similar.
Fireduck
2019-10-02 20:37:23
Especially things like bold small letters vs regular
Fireduck