archive - snowblossom - slack - general

2019-01-21 05:09:45

Fireduck

@asdf Is that you?

2019-01-21 05:10:39

asdf

Ack

2019-01-21 05:12:00

Fireduck

I am working on a new miner, the threading is getting complicated

2019-01-21 05:12:21

Fireduck

it should close the gap between NVME and ram mining. Not entirely, but closer anyways.

2019-01-21 05:25:10

Humphrey

how much difference betweet NVME and RAM mining so far?

2019-01-21 05:25:57

Fireduck

Currently, pretty big. A good RAM miner would be about 4 MH/s. A solid NVME miner would be about 100kh/s.

2019-01-21 05:26:13

Fireduck

I should be noted the RAM miner will probably cost at least $7k or so and the NVME could be probably $400

2019-01-21 05:26:48

Fireduck

But with my new setup, in theory that same NVME miner rig could do maybe 700kh/s

2019-01-21 05:27:11

Fireduck

Assuming it has an ok chunk of ram to work with (like 32gb or so)

2019-01-21 05:27:58

Fireduck

but that is numbers in a spreadsheet right now. I need to write a good bit of code with a fairly crafty thread model without overwhelming the garbage collection or creating a NUMA nightmare

2019-01-21 05:28:00

Fireduck

so we shall see

2019-01-21 05:29:48

Humphrey

really a big gap

2019-01-21 05:31:18

Humphrey

I am new here, and figuring out how to get a start since I don't have RAM OR NVME in hand at the moment.

2019-01-21 05:31:42

Fireduck

You can run a node and buy a little SNOW on an exchange

2019-01-21 05:31:51

Fireduck

you don't need to mine

2019-01-21 05:32:13

Fireduck

but if you want to mine, if you have a computer with a free M.2 slot, you can get an NVME for not much money

2019-01-21 05:32:26

Fireduck

The Intel 760P line seems like the best bang for buck

2019-01-21 05:32:34

Fireduck

the the samsung 970 PRO are nice as well

2019-01-21 05:45:42

Humphrey

OK, thanks

2019-01-21 05:46:43

Humphrey

How much SNOW would I get by estimation if I buy an Interl 760P to mine.

2019-01-21 05:47:48

Fireduck

Assuming you get 56kh/s, use the calculator on http://snowblossom-explorer.org/

2019-01-21 05:54:12

Humphrey

really not much, only less than 0.5 SNOW/day even I have 100kh/s according to the Cal.

2019-01-21 05:54:23

Fireduck

yeah

2019-01-21 05:55:34

Humphrey

can I connect to a testnet to test my hardware performance?

2019-01-21 05:56:03

Fireduck

For mining? Not really. The testnet uses very small fields so you'll end up with it all in cache

2019-01-21 05:56:11

Fireduck

No reason to not test mining performance on mainnet

2019-01-21 05:56:35

Humphrey

OK, got it

2019-01-21 15:02:07

Humphrey

Anyone try NVME RAID 0 before？

2019-01-21 15:02:45

Rotonen

yes, do raid 1 until you cannot fit the field anymore

2019-01-21 15:03:14

Rotonen

Intel VROC would be interesting, AFAIK no one tried that yet

2019-01-21 15:06:16

Humphrey

how much improvement？

2019-01-21 17:24:48

Rotonen

nothing unexpectable either way

2019-01-21 21:55:23

offmenu

Hey everyone, been a while since I've bee around here. just saying hi! I see were on a new field

2019-01-21 22:01:58

offmenu

have you guys settled on the logo yet?

2019-01-21 22:02:10

offmenu

i might throw something together if youre still open to submissions

2019-01-21 22:09:18

Fireduck

Not really settled

2019-01-21 22:38:15

Fireduck

@Rotonen I need some sort of guide on NUMA programming

2019-01-21 22:38:35

Fireduck

I think you have an idea of where I am at, but I am just guessing on how to write code to work well in a NUMA setup

2019-01-21 22:39:12

Fireduck

Basically, trying to reduce the number of cross thread locks, since they could be cross socket as well.

2019-01-21 22:39:37

Fireduck

Also trying to have data used by a single thread until needed

2019-01-21 22:39:52

Fireduck

but I might be doing everything wrong

2019-01-21 22:40:50

Fireduck

If you take a look at https://github.com/snowblossomcoin/snowblossom/blob/master/miner/src/surf/MagicQueue.java that might help ``` package snowblossom.miner.surf; import java.util.concurrent.LinkedBlockingQueue; import java.nio.ByteBuffer; import java.util.Map; import java.util.HashMap; import java.util.LinkedList; /** * Data optimization based on guesses about how NUMA works * and trying to keep things simple for the GC. So probably all wrong. */ public class MagicQueue { /** * Collection of ByteBuffers for each bucket, ready to be read */ private final LinkedList<ByteBuffer>[] global_buckets; private final int max_chunk_size; /** * Each thread accumulatedd data in this map before they are saved * to the global buckets. */ private final ThreadLocal<Map<Integer, ByteBuffer> > local_buff; public MagicQueue(int max_chunk_size, int bucket_count) { this.max_chunk_size = max_chunk_size; global_buckets = new LinkedList[bucket_count]; for(int i=0; i<bucket_count; i++) { global_buckets[i] = new LinkedList<>(); } local_buff = new ThreadLocal<Map<Integer, ByteBuffer>>() { @Override protected Map<Integer,ByteBuffer> initialValue() { return new HashMap<Integer, ByteBuffer>(bucket_count*2+1, 0.5f); } }; } /** * returns a ByteBuffer that is ready to accepts writes up to data_sizee * as needed. Might already have data in it. Can only be used in this thread. * Might not get saved to the global bucket until flush is called. */ public ByteBuffer openWrite(int bucket, int data_size) { Map<Integer, ByteBuffer> local = local_buff.get(); if (local.containsKey(bucket)) { if (local.get(bucket).remaining() >= data_size) return local.get(bucket); writeToBucket(bucket, local.get(bucket)); global_buckets[bucket].add(local.get(bucket)); } local.put(bucket, ByteBuffer.allocate(max_chunk_size)); return local.get(bucket); } /** * @param data A byte buffer open for writes */ private void writeToBucket(int bucket, ByteBuffer data) { LinkedList<ByteBuffer> lst = global_buckets[bucket]; synchronized(lst) { ByteBuffer last = lst.peekLast(); if ((last != null) && (last.remaining() >= data.position())) { data.flip(); last.put(data); } else { lst.add(data); } } } /** * Returns null of a ByteBuffer with position 0 and limit set to how much data is there. * ready for reading. */ public ByteBuffer readBucket(int bucket) { LinkedList<ByteBuffer> lst = global_buckets[bucket]; synchronized(lst) { ByteBuffer bb = lst.poll(); if (bb == null) return null; bb.flip(); return bb; } } public void flushFromLocal() { for(Map.Entry<Integer,ByteBuffer> me : local_buff.get().entrySet()) { int b = me.getKey(); ByteBuffer bb = me.getValue(); writeToBucket(b, bb); } local_buff.get().clear(); } } ```

2019-01-21 22:50:41

Rotonen

@Fireduck step 1, C

2019-01-21 22:52:06

Rotonen

@Fireduck or rethink as IPC across thread pool processes and use numactl

2019-01-21 22:53:00

Rotonen

@Fireduck otherwise you'll slam into the wall of the GC not yet being numa aware

2019-01-21 22:53:35

Rotonen

see JEP 345 and associated discussions

2019-01-21 22:55:12

Rotonen

on a brief literary overview over the past 5 years, seems someone in glasgow has been doing something close enough to the byte queue you're trying to go for http://www.dcs.gla.ac.uk/~jsinger/pdfs/lcpc15.pdf

2019-01-21 22:56:03

Fireduck

ha, that does look like the same problem

2019-01-21 22:56:25

Fireduck

I am probably just prematurely optimizing anyways

2019-01-21 22:56:59

Rotonen

is there anything which *needs* to be reaped mining-time?

2019-01-21 22:57:13

Rotonen

just turn off the GC?

2019-01-21 22:58:09

Rotonen

or try go - nigh native gRPC, goroutines suit the problem well, windows is a first class citizen in the packaging ecosystem

2019-01-21 22:58:52

Rotonen

heh, there is a numa aware java library for byte buffer manipulation too :stuck_out_tongue: https://oss.sonatype.org/service/local/repositories/releases/archive/org/xerial/jnuma/0.1.3/jnuma-0.1.3-javadoc.jar/!/xerial/jnuma/Numa.html

2019-01-21 22:59:23

Rotonen

and that turns the GC off for the objects, yeah :smile:

2019-01-21 22:59:55

Fireduck

I can avoid the GC pretty well by just reusing a lot of bytebuffers

2019-01-21 22:59:57

Rotonen

on a boxes and arrows level that library should do everything you need, actually

2019-01-21 23:00:06

Rotonen

yes, but you cannot set per object policies

2019-01-21 23:00:31

Fireduck

At this point, I am not up to turning off GC

2019-01-21 23:00:48

Fireduck

I'd be more likely to implement all the mining in C++ rather than try to do that

2019-01-21 23:01:08

Rotonen

go or rust would be sexy

2019-01-21 23:01:19

Rotonen

grpc on rust looks painful, though

2019-01-21 23:01:35

Fireduck

go is terrible. rust is probably terrible.

2019-01-21 23:01:43

Fireduck

I have a strong dislike of "sexy" technologies

2019-01-21 23:01:56

Rotonen

i like go, 'if i can think it in bash pipes xargs parallel, i can think it in go'

2019-01-21 23:02:14

Rotonen

i guess that's about the aim of the whole thing too

2019-01-21 23:02:23

Rotonen

a generalized unixpipe emulator

2019-01-21 23:02:33

Fireduck

yeah, for me there are exploratory learn new tech projects

2019-01-21 23:02:37

Fireduck

and get shit done projects

2019-01-21 23:02:45

Fireduck

and they make very different choices

2019-01-21 23:03:09

Rotonen

i suppose you get paid for the first kind? :stuck_out_tongue:

2019-01-21 23:04:09

Fireduck

well, at work we use scala, so yeah

2019-01-21 23:04:26

Fireduck

I like the concept of functional programming, but in practice it makes me a little crazy

2019-01-21 23:04:37

Rotonen

but seriously, browse through this for the how and see about taking a stab at it yourself https://github.com/xerial/jnuma A Java library for accessing NUMA (Non Uniform Memory Access) API

2019-01-21 23:04:57

Rotonen

yeah, functional is fine, until you *need* a side effect

2019-01-21 23:05:11

Rotonen

which is about every program ever, which actually *did* something

2019-01-21 23:05:14

Fireduck

why write any code at all if you didn't want a side effect?

2019-01-21 23:05:19

Fireduck

right

2019-01-21 23:07:10

Rotonen

going through the tests seems a good starting point for diving into 'how is this thing done' https://github.com/xerial/jnuma/blob/develop/src/test/scala/xerial/jnuma/NumaTest.scala ``` /* * Copyright 2012 Taro L. Saito * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ //-------------------------------------- // // NumaTest.scala // Since: 2012/11/22 2:24 PM // //-------------------------------------- package xerial.jnuma import util.Random import java.nio.{ByteOrder, ByteBuffer} import java.util import java.io.{OutputStream, FileOutputStream} /** * @author leo */ class NumaTest extends MySpec { "Numa" should { "report NUMA info" taggedAs ("report") in { val available = Numa.isAvailable val numNodes = Numa.numNodes() debug("numa is available: " + available) debug("num nodes: " + numNodes) for (i <- 0 until numNodes) { val n = Numa.nodeSize(i) val f = Numa.freeSize(i) debug("node %d - size:%,d free:%,d", i, n, f) } val nodes = (0 until numNodes) for (n1 <- nodes; n2 <- n1 until numNodes) { val d = Numa.distance(n1, n2) debug("distance %s - %s: %d", n1, n2, d) } def toBitString(b: Array[Long]) = { val s = for (i <- 0 until Numa.numCPUs()) yield { if ((b(i / 64) & (1L << (i % 64))) == 0) "0" else "1" } s.mkString } for (node <- nodes) { val cpuVector = Numa.nodeToCpus(node) debug("node %d -> cpus %s", node, toBitString(cpuVector)) } val numCPUs = Runtime.getRuntime.availableProcessors(); val affinity = (0 until numCPUs).par.map { cpu => Numa.getAffinity() } debug("affinity: %s", affinity.map(toBitString(_)).mkString(", ")) val preferred = (0 until numCPUs).par.map { cpu => Numa.runOnNode(cpu % numNodes) Numa.setPreferred(cpu % numNodes) val n = Numa.getPreferredNode Numa.runOnAllNodes() n } debug("setting prefererd NUMA nodes: %s", preferred.mkString(", ")) val s = (0 until numCPUs).par.map { cpu => Numa.setAffinity((cpu + 1) % numCPUs) if (cpu % 2 == 0) (0 until Int.MaxValue / 10).foreach { i => } Numa.getAffinity() } debug("affinity after setting: %s", s.map(toBitString(_)).mkString(", ")) val r = (0 until numCPUs).par.map { cpu => Numa.resetAffinity() Numa.getAffinity() } debug("affinity after resetting: %s", r.map(toBitString(_)).mkString(", ")) } "allocate local buffer" in { for (i <- 0 until 3) { val local = Numa.allocLocal(1024) Numa.free(local) } } "allocate buffer on nodes" in { val N = 100000 def access(b: ByteBuffer) { val r = new Random(0) var i = 0 val p = 1024 val buf = new Array[Byte](p) while (i < N) { b.position(r.nextInt(b.capacity() / p) * p) b.get(buf) i += 1 } } val bl = ByteBuffer.allocateDirect(8 * 1024 * 1024) val bj = ByteBuffer.allocate(8 * 1024 * 1024) val b0 = Numa.allocOnNode(8 * 1024 * 1024, 0) val b1 = Numa.allocOnNode(8 * 1024 * 1024, 1) val bi = Numa.allocInterleaved(8 * 1024 * 1024) time("numa random access", repeat = 10) { block("direct") { access(bl) } block("heap") { access(bj) } block("numa0") { access(b0) } block("numa1") { access(b1) } block("interleaved") { access(bi) } } Numa.free(b0) Numa.free(b1) Numa.free(bi) } def radixSort8(buf: ByteBuffer) = { val K = 256 val N = buf.capacity() val pile = Array.ofDim[Int](K) // count frequencies buf.position(0) for (i <- 0 until N) pile(buf.get(i) + 128) += 1 // count cumulates for (i <- 1 until K) { pile(i) += pile(i - 1) } def split { for (i <- 0 until N) { var e = buf.get(i) var toContinue = true while (toContinue) { val p = e + 128 val pileIndex = pile(p) - 1 pile(p) -= 1 if (pileIndex < i) toContinue = false else { val tmp = buf.get(pileIndex) buf.put(pileIndex, e) e = tmp } } buf.put(i, e) } } split } def radixSort8_local(buf: ByteBuffer) = { val K = 256 val N = buf.capacity() - (2 * 4 * K) val countOffset = buf.capacity() / 4 val pileOffset = countOffset + K // count frequencies buf.position(0) for (i <- 0 until K) { buf.putInt(countOffset + i * 4, 0) } for (i <- 0 until buf.capacity()) { val ch = buf.get(i) + 128 val prevCount = buf.getInt(countOffset + ch * 4) buf.putInt(countOffset + ch * 4, prevCount + 1) } // count cumulates for (i <- 0 until K) { val prev = if (i == 0) 0 else buf.getInt(countOffset + (i - 1) * 4) val current = buf.getInt(countOffset + i * 4) buf.putInt(pileOffset + i * 4, prev + current) } def split { for (i <- 0 until N) { var e = buf.get(i) var toContinue = true while (toContinue) { val p = e + 128 val pileIndex = buf.getInt(pileOffset + p * 4) - 1 buf.putInt(pileOffset + p * 4, pileIndex) if (pileIndex < i) toContinue = false else { val tmp = buf.get(pileIndex) buf.put(pileIndex, e) e = tmp } } buf.put(i, e) } } split } def radixSort8_array(b: Array[Byte]) = { val K = 256 val N = b.length val count = new Array[Int](K + 1) // count frequencies for (i <- 0 until N) { count(b(i) + 128) += 1 } // count cumulates for (i <- 1 to K) count(i) += count(i - 1) def split { val pile = new Array[Int](K + 1) Array.copy(count, 1, pile, 0, K) for (i <- 0 until N) { var e = b(i) var toContinue = true while (toContinue) { val p = e + 128 val pileIndex = pile(p) - 1 pile(p) -= 1 if (pileIndex < i) toContinue = false else { val tmp = b(pileIndex) b(pileIndex) = e e = tmp } } b(i) = e } } split } "perform microbenchmark" taggedAs ("bench") in { val bufferSize = 4 * 1024 * 1024 when("buffer size is %,d".format(bufferSize)) val numaBufs = (for (i <- 0 until Numa.numNodes()) yield "numa%d".format(i) -> Numa.allocOnNode(bufferSize, i)) :+ "numa-i" -> Numa.allocInterleaved(bufferSize) //val bdirect = ByteBuffer.allocateDirect(bufferSize) val bheap = ByteBuffer.allocate(bufferSize).order(ByteOrder.nativeOrder()); val bufs = numaBufs ++ Map("heap" -> bheap) // fill bytes def fillBytes(b: ByteBuffer) = { var i = 0 b.clear() while (b.remaining() > 0) { b.put(i.toByte) i += 1 } } def fillI…

2019-01-21 23:07:45

Fireduck

that is interesting

2019-01-21 23:09:08

Rotonen

api-wise that thing has everything i've seen you publicly cuss about not being in there or about how to tackle

2019-01-21 23:09:33

Rotonen

whether or not that's well done, that's something you need to test, benchmark, potentially modify

2019-01-21 23:09:41

Fireduck

right

2019-01-21 23:09:56

Rotonen

and it's apache 2.0 licensed