archive - snowblossom - slack - dev

2019-10-02 04:54:04

GitHub

*https://github.com/snowblossomcoin/snowblossom/compare/3b8219d51ee9...2925f91d7972* https://github.com/snowblossomcoin/snowblossom/commit/2925f91d7972f9137f7717f66f85ec900ea3a971 - unit tests for ids and fbo index

2019-10-02 05:15:32

GitHub

*https://github.com/snowblossomcoin/snowblossom/compare/2925f91d7972...a0eb84697a62* https://github.com/snowblossomcoin/snowblossom/commit/a0eb84697a62af82fba142f2a7bdab9b58083159 - Adding GRPC calls

2019-10-02 14:14:27

Rotonen

and for the record, i was not kidding about the orangutang

2019-10-02 16:20:36

Fireduck

Is it in Unicode?

2019-10-02 17:14:27

Rotonen

yes

2019-10-02 17:18:03

Rotonen

all emoji are per definition unicode

2019-10-02 18:06:50

Fireduck

I was thinking that emoji was small pictures used to convey feelings, but I think you are technically correct

2019-10-02 18:06:52

Fireduck

the best kind of correct

2019-10-02 18:23:24

Fireduck

Ok, mysql does something crazy with strings by default

2019-10-02 18:23:43

Fireduck

which ignores caps, and collapses weird accent marks

2019-10-02 18:23:54

Fireduck

and that is what I want to use for normalizing name strings

2019-10-02 18:24:01

Fireduck

but I have no idea what the hell it is

2019-10-02 18:31:52

Rotonen

that’s called collation

2019-10-02 18:32:12

Rotonen

do not confuse that with unidode normalization

2019-10-02 18:32:33

Fireduck

cool

2019-10-02 18:32:33

Rotonen

collation is just for searching and sorting

2019-10-02 18:32:51

Rotonen

see also: LC_COLLATE

2019-10-02 18:33:39

Rotonen

it’s for locale specific stuff like ’should a and ä sort as the same character’

2019-10-02 18:34:23

Fireduck

yeah

2019-10-02 18:34:24

Rotonen

it encodes culture based expectations of software behaviour, not actual byte level representations of characters

2019-10-02 18:34:58

Fireduck

If someone registers Fräd, I want that to be the same as if someone registered frad

2019-10-02 18:35:15

Rotonen

not very globally inclusive

2019-10-02 18:35:34

Fireduck

To be clear, people can use the exact string they register

2019-10-02 18:35:43

Rotonen

i’d not like to see that be restricted to latin scripts either

2019-10-02 18:35:44

Fireduck

it is only for the uniqueness bucketing that it would be collated down

2019-10-02 18:36:49

Rotonen

you’ll disappoint a lot of nordic and german people that way - some genuinely different names will collapse to one that way

2019-10-02 18:37:03

Fireduck

understood

2019-10-02 18:37:20

Rotonen

plus everyone who is not on a flat latin script, so the majority of the global population

2019-10-02 18:38:01

Fireduck

I don't think I would be preventing anyone from using any unicode string they like

2019-10-02 18:38:36

Rotonen

so what do you collapse the orangutang to?

2019-10-02 18:38:49

Fireduck

probably just orangutang

2019-10-02 18:39:05

Fireduck

I'll absolutely put that in the unit tests for it

2019-10-02 18:39:31

Rotonen

for hard mode, try family emojis with mixed skin tone modifiers

2019-10-02 18:39:43

Fireduck

oh god

2019-10-02 18:40:07

Fireduck

yeah, good idea

2019-10-02 18:40:47

Rotonen

i like unicode

2019-10-02 18:40:53

Fireduck

me too

2019-10-02 18:41:14

Rotonen

yet you want to collapse to ASCII

2019-10-02 18:41:31

Rotonen

i’ll be ASCII 0x0B then

2019-10-02 18:41:42

Fireduck

only the things that collapse to ASCII

2019-10-02 18:41:56

Fireduck

I suspect I am communicating poorly because I don't know the terminology in this space

2019-10-02 18:42:47

Rotonen

my argument: treat user input just as a pile of bytes, do less, allow for more

2019-10-02 18:43:19

Fireduck

I mostly agree with that. I am just trying to avoid people impersonating others easily by picking strange letters

2019-10-02 18:43:26

Rotonen

with one caveat of normalize to NFC so people cannot have identical renderings from multiple inputs

2019-10-02 18:43:26

Fireduck

I know I won't be able to prevent that completely

2019-10-02 18:43:34

Rotonen

you can

2019-10-02 18:43:42

Fireduck

NFC or NFKC?

2019-10-02 18:44:02

Rotonen

K is always crap and allows for nonsense

2019-10-02 18:44:13

Rotonen

pick NFC or NFD

2019-10-02 18:44:49

Rotonen

composed is ’minimal set of bytes which can represent’ and decomposed is the opposite

2019-10-02 18:45:57

Rotonen

or just hash the input and that’s the read id?

2019-10-02 18:46:27

Fireduck

The hashing is being done, mostly to get me a fixed length item in the table

2019-10-02 18:47:10

Rotonen

gitstyle short hash as discordstyle tail and done?

2019-10-02 18:47:54

Fireduck

tail too short, easy to brute force a collision

2019-10-02 18:48:20

Rotonen

poc attack demo required to continue discussion

2019-10-02 18:48:39

Fireduck

ha, I might misunderstand how the tails are made

2019-10-02 18:48:51

Fireduck

if you are counting entries and everyone has the same count, no problem

2019-10-02 18:49:01

Rotonen

iirc git just takes beginning and end of the sha

2019-10-02 18:49:04

Fireduck

if it is the hash of something, then someone can generate keys until it hashes to the same short tail

2019-10-02 18:49:34

Fireduck

can play silly games like tail hash = hmac(tx_id, block_id, name)

2019-10-02 18:49:46

Fireduck

so it is hard to know what the tail will be unless you are also the miner

2019-10-02 18:49:51

Fireduck

and willing to throw away good blocks

2019-10-02 18:49:56

Rotonen

yes, but an attack requires identical rendering input AND a hash collision

2019-10-02 18:50:48

Rotonen

*prefix suffix hash collision

2019-10-02 18:51:40

Fireduck

http://fireduck.com/java/java-se-8/docs/api/java/text/Collator.html

2019-10-02 18:51:47

Fireduck

Think any of those collator modes will help?

2019-10-02 18:53:29

Rotonen

that’s ultimately more to do with how to build search engines and human facing sortable tables of any sort

2019-10-02 18:54:01

Rotonen

seems java would like to decompose and sort over more bytes - can make sense

2019-10-02 18:55:28

Rotonen

and the strength values are for very complex tiered rulesets of sorting, such as some multi lingual academic library (books on shelves kind) would need

2019-10-02 18:56:28

Rotonen

and i’ve been against using collation as a tool, just normalize to NFC and tail tag with a hash

2019-10-02 18:56:55

Rotonen

gitstyle, hard auth with long, most use cases good enough with short

2019-10-02 18:58:29

Rotonen

and an input collision is fine as the human facing ones are still unique?

2019-10-02 18:59:15

Rotonen

and an attack poc welcome on a prefix-suffix collision on something which’d render the same

2019-10-02 19:39:37

Fireduck

the important thing to remember is that if it doesn't have a tail, it's not a monkey

2019-10-02 19:41:35

Fireduck

f̬r̸̹ạ̴̳̱̻͔̗̞͠d͇̝̮̠̮̟̪̹́

2019-10-02 20:19:40

Rotonen

you also have to remember that the naive user you are trying to protect would not pick anything with staggering diacritics

2019-10-02 20:20:38

Fireduck

good, cause my collation currently can't handle them. :wink:

2019-10-02 20:20:49

Fireduck

I'm thinking about what you've said, it makes a lot of sense

2019-10-02 20:21:03

Fireduck

but I do want to collapse case at least, I think

2019-10-02 20:34:25

Fireduck

Assert.assertEquals("fireduck", ForBenefitOfUtil.normalize("fireduck"), ForBenefitOfUtil.normalize("𝓕ire𝐃uc𝐤"));

2019-10-02 20:37:10

Fireduck

That one is pretty obviously different, but if the font rendering is a little wonky, it could look quick similar.

2019-10-02 20:37:23

Fireduck

Especially things like bold small letters vs regular