2019-01-21 05:09:45
@asdf Is that you?
Fireduck
2019-01-21 05:10:39
Ack
asdf
2019-01-21 05:12:00
I am working on a new miner, the threading is getting complicated
Fireduck
2019-01-21 05:12:21
it should close the gap between NVME and ram mining. Not entirely, but closer anyways.
Fireduck
2019-01-21 05:25:10
how much difference betweet NVME and RAM mining so far?
Humphrey
2019-01-21 05:25:57
Currently, pretty big. A good RAM miner would be about 4 MH/s. A solid NVME miner would be about 100kh/s.
Fireduck
2019-01-21 05:26:13
I should be noted the RAM miner will probably cost at least $7k or so and the NVME could be probably $400
Fireduck
2019-01-21 05:26:48
But with my new setup, in theory that same NVME miner rig could do maybe 700kh/s
Fireduck
2019-01-21 05:27:11
Assuming it has an ok chunk of ram to work with (like 32gb or so)
Fireduck
2019-01-21 05:27:58
but that is numbers in a spreadsheet right now. I need to write a good bit of code with a fairly crafty thread model without overwhelming the garbage collection or creating a NUMA nightmare
Fireduck
2019-01-21 05:28:00
so we shall see
Fireduck
2019-01-21 05:29:48
really a big gap
Humphrey
2019-01-21 05:31:18
I am new here, and figuring out how to get a start since I don't have RAM OR NVME in hand at the moment.
Humphrey
2019-01-21 05:31:42
You can run a node and buy a little SNOW on an exchange
Fireduck
2019-01-21 05:31:51
you don't need to mine
Fireduck
2019-01-21 05:32:13
but if you want to mine, if you have a computer with a free M.2 slot, you can get an NVME for not much money
Fireduck
2019-01-21 05:32:26
The Intel 760P line seems like the best bang for buck
Fireduck
2019-01-21 05:32:34
the the samsung 970 PRO are nice as well
Fireduck
2019-01-21 05:45:42
OK, thanks
Humphrey
2019-01-21 05:46:43
How much SNOW would I get by estimation if I buy an Interl 760P to mine.
Humphrey
2019-01-21 05:47:48
Assuming you get 56kh/s, use the calculator on http://snowblossom-explorer.org/
Fireduck
2019-01-21 05:54:12
really not much, only less than 0.5 SNOW/day even I have 100kh/s according to the Cal.
Humphrey
2019-01-21 05:54:23
yeah
Fireduck
2019-01-21 05:55:34
can I connect to a testnet to test my hardware performance?
Humphrey
2019-01-21 05:56:03
For mining? Not really. The testnet uses very small fields so you'll end up with it all in cache
Fireduck
2019-01-21 05:56:11
No reason to not test mining performance on mainnet
Fireduck
2019-01-21 05:56:35
OK, got it
Humphrey
2019-01-21 15:02:07
Anyone try NVME RAID 0 before?
Humphrey
2019-01-21 15:02:45
yes, do raid 1 until you cannot fit the field anymore
Rotonen
2019-01-21 15:03:14
Intel VROC would be interesting, AFAIK no one tried that yet
Rotonen
2019-01-21 15:06:16
how much improvement?
Humphrey
2019-01-21 17:24:48
nothing unexpectable either way
Rotonen
2019-01-21 21:55:23
Hey everyone, been a while since I've bee around here. just saying hi! I see were on a new field
offmenu
2019-01-21 22:01:58
have you guys settled on the logo yet?
offmenu
2019-01-21 22:02:10
i might throw something together if youre still open to submissions
offmenu
2019-01-21 22:09:18
Not really settled
Fireduck
2019-01-21 22:38:15
@Rotonen I need some sort of guide on NUMA programming
Fireduck
2019-01-21 22:38:35
I think you have an idea of where I am at, but I am just guessing on how to write code to work well in a NUMA setup
Fireduck
2019-01-21 22:39:12
Basically, trying to reduce the number of cross thread locks, since they could be cross socket as well.
Fireduck
2019-01-21 22:39:37
Also trying to have data used by a single thread until needed
Fireduck
2019-01-21 22:39:52
but I might be doing everything wrong
Fireduck
2019-01-21 22:40:50
If you take a look at https://github.com/snowblossomcoin/snowblossom/blob/master/miner/src/surf/MagicQueue.java that might help ```
package snowblossom.miner.surf;
import java.util.concurrent.LinkedBlockingQueue;
import java.nio.ByteBuffer;
import java.util.Map;
import java.util.HashMap;
import java.util.LinkedList;
/**
* Data optimization based on guesses about how NUMA works
* and trying to keep things simple for the GC. So probably all wrong.
*/
public class MagicQueue
{
/**
* Collection of ByteBuffers for each bucket, ready to be read
*/
private final LinkedList<ByteBuffer>[] global_buckets;
private final int max_chunk_size;
/**
* Each thread accumulatedd data in this map before they are saved
* to the global buckets.
*/
private final ThreadLocal<Map<Integer, ByteBuffer> > local_buff;
public MagicQueue(int max_chunk_size, int bucket_count)
{
this.max_chunk_size = max_chunk_size;
global_buckets = new LinkedList[bucket_count];
for(int i=0; i<bucket_count; i++)
{
global_buckets[i] = new LinkedList<>();
}
local_buff = new ThreadLocal<Map<Integer, ByteBuffer>>() {
@Override protected Map<Integer,ByteBuffer> initialValue() {
return new HashMap<Integer, ByteBuffer>(bucket_count*2+1, 0.5f);
}
};
}
/**
* returns a ByteBuffer that is ready to accepts writes up to data_sizee
* as needed. Might already have data in it. Can only be used in this thread.
* Might not get saved to the global bucket until flush is called.
*/
public ByteBuffer openWrite(int bucket, int data_size)
{
Map<Integer, ByteBuffer> local = local_buff.get();
if (local.containsKey(bucket))
{
if (local.get(bucket).remaining() >= data_size) return local.get(bucket);
writeToBucket(bucket, local.get(bucket));
global_buckets[bucket].add(local.get(bucket));
}
local.put(bucket, ByteBuffer.allocate(max_chunk_size));
return local.get(bucket);
}
/**
* @param data A byte buffer open for writes
*/
private void writeToBucket(int bucket, ByteBuffer data)
{
LinkedList<ByteBuffer> lst = global_buckets[bucket];
synchronized(lst)
{
ByteBuffer last = lst.peekLast();
if ((last != null) && (last.remaining() >= data.position()))
{
data.flip();
last.put(data);
}
else
{
lst.add(data);
}
}
}
/**
* Returns null of a ByteBuffer with position 0 and limit set to how much data is there.
* ready for reading.
*/
public ByteBuffer readBucket(int bucket)
{
LinkedList<ByteBuffer> lst = global_buckets[bucket];
synchronized(lst)
{
ByteBuffer bb = lst.poll();
if (bb == null) return null;
bb.flip();
return bb;
}
}
public void flushFromLocal()
{
for(Map.Entry<Integer,ByteBuffer> me : local_buff.get().entrySet())
{
int b = me.getKey();
ByteBuffer bb = me.getValue();
writeToBucket(b, bb);
}
local_buff.get().clear();
}
}
```
Fireduck
2019-01-21 22:50:41
@Fireduck step 1, C
Rotonen
2019-01-21 22:52:06
@Fireduck or rethink as IPC across thread pool processes and use numactl
Rotonen
2019-01-21 22:53:00
@Fireduck otherwise you'll slam into the wall of the GC not yet being numa aware
Rotonen
2019-01-21 22:53:35
see JEP 345 and associated discussions
Rotonen
2019-01-21 22:55:12
on a brief literary overview over the past 5 years, seems someone in glasgow has been doing something close enough to the byte queue you're trying to go for
http://www.dcs.gla.ac.uk/~jsinger/pdfs/lcpc15.pdf
Rotonen
2019-01-21 22:56:03
ha, that does look like the same problem
Fireduck
2019-01-21 22:56:25
I am probably just prematurely optimizing anyways
Fireduck
2019-01-21 22:56:59
is there anything which *needs* to be reaped mining-time?
Rotonen
2019-01-21 22:57:13
just turn off the GC?
Rotonen
2019-01-21 22:58:09
or try go - nigh native gRPC, goroutines suit the problem well, windows is a first class citizen in the packaging ecosystem
Rotonen
2019-01-21 22:58:52
heh, there is a numa aware java library for byte buffer manipulation too :stuck_out_tongue:
https://oss.sonatype.org/service/local/repositories/releases/archive/org/xerial/jnuma/0.1.3/jnuma-0.1.3-javadoc.jar/!/xerial/jnuma/Numa.html
Rotonen
2019-01-21 22:59:23
and that turns the GC off for the objects, yeah :smile:
Rotonen
2019-01-21 22:59:55
I can avoid the GC pretty well by just reusing a lot of bytebuffers
Fireduck
2019-01-21 22:59:57
on a boxes and arrows level that library should do everything you need, actually
Rotonen
2019-01-21 23:00:06
yes, but you cannot set per object policies
Rotonen
2019-01-21 23:00:31
At this point, I am not up to turning off GC
Fireduck
2019-01-21 23:00:48
I'd be more likely to implement all the mining in C++ rather than try to do that
Fireduck
2019-01-21 23:01:08
go or rust would be sexy
Rotonen
2019-01-21 23:01:19
grpc on rust looks painful, though
Rotonen
2019-01-21 23:01:35
go is terrible. rust is probably terrible.
Fireduck
2019-01-21 23:01:43
I have a strong dislike of "sexy" technologies
Fireduck
2019-01-21 23:01:56
i like go, 'if i can think it in bash pipes xargs parallel, i can think it in go'
Rotonen
2019-01-21 23:02:14
i guess that's about the aim of the whole thing too
Rotonen
2019-01-21 23:02:23
a generalized unixpipe emulator
Rotonen
2019-01-21 23:02:33
yeah, for me there are exploratory learn new tech projects
Fireduck
2019-01-21 23:02:37
and get shit done projects
Fireduck
2019-01-21 23:02:45
and they make very different choices
Fireduck
2019-01-21 23:03:09
i suppose you get paid for the first kind? :stuck_out_tongue:
Rotonen
2019-01-21 23:04:09
well, at work we use scala, so yeah
Fireduck
2019-01-21 23:04:26
I like the concept of functional programming, but in practice it makes me a little crazy
Fireduck
2019-01-21 23:04:37
but seriously, browse through this for the how and see about taking a stab at it yourself
https://github.com/xerial/jnuma A Java library for accessing NUMA (Non Uniform Memory Access) API
Rotonen
2019-01-21 23:04:57
yeah, functional is fine, until you *need* a side effect
Rotonen
2019-01-21 23:05:11
which is about every program ever, which actually *did* something
Rotonen
2019-01-21 23:05:14
why write any code at all if you didn't want a side effect?
Fireduck
2019-01-21 23:05:19
right
Fireduck
2019-01-21 23:07:10
going through the tests seems a good starting point for diving into 'how is this thing done' https://github.com/xerial/jnuma/blob/develop/src/test/scala/xerial/jnuma/NumaTest.scala ```
/*
* Copyright 2012 Taro L. Saito
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
//--------------------------------------
//
// NumaTest.scala
// Since: 2012/11/22 2:24 PM
//
//--------------------------------------
package xerial.jnuma
import util.Random
import java.nio.{ByteOrder, ByteBuffer}
import java.util
import java.io.{OutputStream, FileOutputStream}
/**
* @author leo
*/
class NumaTest extends MySpec {
"Numa" should {
"report NUMA info" taggedAs ("report") in {
val available = Numa.isAvailable
val numNodes = Numa.numNodes()
debug("numa is available: " + available)
debug("num nodes: " + numNodes)
for (i <- 0 until numNodes) {
val n = Numa.nodeSize(i)
val f = Numa.freeSize(i)
debug("node %d - size:%,d free:%,d", i, n, f)
}
val nodes = (0 until numNodes)
for (n1 <- nodes; n2 <- n1 until numNodes) {
val d = Numa.distance(n1, n2)
debug("distance %s - %s: %d", n1, n2, d)
}
def toBitString(b: Array[Long]) = {
val s = for (i <- 0 until Numa.numCPUs()) yield {
if ((b(i / 64) & (1L << (i % 64))) == 0) "0" else "1"
}
s.mkString
}
for (node <- nodes) {
val cpuVector = Numa.nodeToCpus(node)
debug("node %d -> cpus %s", node, toBitString(cpuVector))
}
val numCPUs = Runtime.getRuntime.availableProcessors();
val affinity = (0 until numCPUs).par.map {
cpu =>
Numa.getAffinity()
}
debug("affinity: %s", affinity.map(toBitString(_)).mkString(", "))
val preferred = (0 until numCPUs).par.map {
cpu =>
Numa.runOnNode(cpu % numNodes)
Numa.setPreferred(cpu % numNodes)
val n = Numa.getPreferredNode
Numa.runOnAllNodes()
n
}
debug("setting prefererd NUMA nodes: %s", preferred.mkString(", "))
val s = (0 until numCPUs).par.map {
cpu =>
Numa.setAffinity((cpu + 1) % numCPUs)
if (cpu % 2 == 0)
(0 until Int.MaxValue / 10).foreach {
i =>
}
Numa.getAffinity()
}
debug("affinity after setting: %s", s.map(toBitString(_)).mkString(", "))
val r = (0 until numCPUs).par.map {
cpu =>
Numa.resetAffinity()
Numa.getAffinity()
}
debug("affinity after resetting: %s", r.map(toBitString(_)).mkString(", "))
}
"allocate local buffer" in {
for (i <- 0 until 3) {
val local = Numa.allocLocal(1024)
Numa.free(local)
}
}
"allocate buffer on nodes" in {
val N = 100000
def access(b: ByteBuffer) {
val r = new Random(0)
var i = 0
val p = 1024
val buf = new Array[Byte](p)
while (i < N) {
b.position(r.nextInt(b.capacity() / p) * p)
b.get(buf)
i += 1
}
}
val bl = ByteBuffer.allocateDirect(8 * 1024 * 1024)
val bj = ByteBuffer.allocate(8 * 1024 * 1024)
val b0 = Numa.allocOnNode(8 * 1024 * 1024, 0)
val b1 = Numa.allocOnNode(8 * 1024 * 1024, 1)
val bi = Numa.allocInterleaved(8 * 1024 * 1024)
time("numa random access", repeat = 10) {
block("direct") {
access(bl)
}
block("heap") {
access(bj)
}
block("numa0") {
access(b0)
}
block("numa1") {
access(b1)
}
block("interleaved") {
access(bi)
}
}
Numa.free(b0)
Numa.free(b1)
Numa.free(bi)
}
def radixSort8(buf: ByteBuffer) = {
val K = 256
val N = buf.capacity()
val pile = Array.ofDim[Int](K)
// count frequencies
buf.position(0)
for (i <- 0 until N)
pile(buf.get(i) + 128) += 1
// count cumulates
for (i <- 1 until K) {
pile(i) += pile(i - 1)
}
def split {
for (i <- 0 until N) {
var e = buf.get(i)
var toContinue = true
while (toContinue) {
val p = e + 128
val pileIndex = pile(p) - 1
pile(p) -= 1
if (pileIndex < i)
toContinue = false
else {
val tmp = buf.get(pileIndex)
buf.put(pileIndex, e)
e = tmp
}
}
buf.put(i, e)
}
}
split
}
def radixSort8_local(buf: ByteBuffer) = {
val K = 256
val N = buf.capacity() - (2 * 4 * K)
val countOffset = buf.capacity() / 4
val pileOffset = countOffset + K
// count frequencies
buf.position(0)
for (i <- 0 until K) {
buf.putInt(countOffset + i * 4, 0)
}
for (i <- 0 until buf.capacity()) {
val ch = buf.get(i) + 128
val prevCount = buf.getInt(countOffset + ch * 4)
buf.putInt(countOffset + ch * 4, prevCount + 1)
}
// count cumulates
for (i <- 0 until K) {
val prev = if (i == 0) 0 else buf.getInt(countOffset + (i - 1) * 4)
val current = buf.getInt(countOffset + i * 4)
buf.putInt(pileOffset + i * 4, prev + current)
}
def split {
for (i <- 0 until N) {
var e = buf.get(i)
var toContinue = true
while (toContinue) {
val p = e + 128
val pileIndex = buf.getInt(pileOffset + p * 4) - 1
buf.putInt(pileOffset + p * 4, pileIndex)
if (pileIndex < i)
toContinue = false
else {
val tmp = buf.get(pileIndex)
buf.put(pileIndex, e)
e = tmp
}
}
buf.put(i, e)
}
}
split
}
def radixSort8_array(b: Array[Byte]) = {
val K = 256
val N = b.length
val count = new Array[Int](K + 1)
// count frequencies
for (i <- 0 until N) {
count(b(i) + 128) += 1
}
// count cumulates
for (i <- 1 to K)
count(i) += count(i - 1)
def split {
val pile = new Array[Int](K + 1)
Array.copy(count, 1, pile, 0, K)
for (i <- 0 until N) {
var e = b(i)
var toContinue = true
while (toContinue) {
val p = e + 128
val pileIndex = pile(p) - 1
pile(p) -= 1
if (pileIndex < i)
toContinue = false
else {
val tmp = b(pileIndex)
b(pileIndex) = e
e = tmp
}
}
b(i) = e
}
}
split
}
"perform microbenchmark" taggedAs ("bench") in {
val bufferSize = 4 * 1024 * 1024
when("buffer size is %,d".format(bufferSize))
val numaBufs = (for (i <- 0 until Numa.numNodes()) yield "numa%d".format(i) -> Numa.allocOnNode(bufferSize, i)) :+
"numa-i" -> Numa.allocInterleaved(bufferSize)
//val bdirect = ByteBuffer.allocateDirect(bufferSize)
val bheap = ByteBuffer.allocate(bufferSize).order(ByteOrder.nativeOrder());
val bufs = numaBufs ++ Map("heap" -> bheap)
// fill bytes
def fillBytes(b: ByteBuffer) = {
var i = 0
b.clear()
while (b.remaining() > 0) {
b.put(i.toByte)
i += 1
}
}
def fillI…
Rotonen
2019-01-21 23:07:45
that is interesting
Fireduck
2019-01-21 23:09:08
api-wise that thing has everything i've seen you publicly cuss about not being in there or about how to tackle
Rotonen
2019-01-21 23:09:33
whether or not that's well done, that's something you need to test, benchmark, potentially modify
Rotonen
2019-01-21 23:09:41
right
Fireduck
2019-01-21 23:09:56
and it's apache 2.0 licensed
Rotonen