2018-11-21 00:00:10
and `Binary search is a simple example that could benefit from explicit prefetching. The access pattern in a binary search looks pretty much random to the hardware prefetcher, so there is little chance that it will accurately predict what to fetch.` sounds relevant
Rotonen
2018-11-21 00:00:23
oh well, enough fun to have discovered a whole field of rabbit holes for now
Rotonen
2018-11-21 00:02:33
and this seems like almost snowblossom mining
https://stackoverflow.com/questions/7327994/prefetching-examples/50280085#50280085 Can anyone give an example or a link to an example which uses __builtin_prefetch in GCC (or just the asm instruction prefetcht0 in general) to gain a substantial performance advantage? In particula...
Rotonen
2018-11-21 00:02:49
`maybe surprisingly, the less CPU-bound task the bigger the speed-up: we are able to hide the latency almost completely, thus the speed-up is` sounds so very familiar
Rotonen
2018-11-21 00:03:57
now this is just funny in regards to how much of the advancements from the past 10 years are in the way of mining snowblossom :smile:
https://stackoverflow.com/a/45201673/1214697 Some CPU and compilers supply prefetch instructions. Eg: __builtin_prefetch in GCC Document. Although there is a comment in GCC's document, but it's too short to me. I want to know, in prantice, w...
Rotonen
2018-11-21 00:04:29
was not expecting power savings to be in the mess as well
`On recent Intel chips one reason you apparently might want to use prefetching is to avoid CPU power-saving features artificially limiting your achieved memory bandwidth.`
Rotonen
2018-11-21 00:04:48
yeah, java does all this stuff actually well, good look to anyone implementing a miner in anything else :smile:
Rotonen
2018-11-21 00:17:24
but yeah, i guess these would be fun for arktika 8 channels per socket, 4 sockets, 100GE
https://www.anandtech.com/print/13620/huawei-server-efforts-hi1620-and-arms-big-server-core-ares Huawei Server Efforts: Hi1620 and Arm’s Big Server Core, Ares
Rotonen
2018-11-21 00:27:43
INFO: RPC Server: read_ops/s: 4610388.4 rpc_ops/s: 4502.3 network_bw: 70.3 MB/s
Fireduck
2018-11-21 00:27:46
that is more like it
Fireduck
2018-11-21 00:29:02
i'll believe a poolside 1h average :stuck_out_tongue:
Rotonen
2018-11-21 00:30:56
heh
Fireduck
2018-11-21 00:31:02
you should see about 1mh in an hour
Fireduck
2018-11-21 00:31:49
i'm seeing that as a spot rate currently, but that's fluctuating between 200k and 1M
Rotonen
2018-11-21 00:31:58
i think the pool tries to tell me things in the log way too often
Rotonen
2018-11-21 00:33:10
yeah, just had a `INFO: Mining rate: 0.000/sec - at this rate ∞ hours per block`
Rotonen
2018-11-21 00:33:23
maybe it could tell me the averages every 5min or so
Rotonen
2018-11-21 00:35:02
Heh yeah
Fireduck
2018-11-21 00:35:43
what's the relation of that 1M poolside figure and the 4M from the output you posted?
Rotonen
2018-11-21 00:36:48
Those are individual value reads
Fireduck
2018-11-21 00:37:02
So 6 of those is one hash attempt
Fireduck
2018-11-21 00:37:26
that'd land you at 800kH/s, though?
Rotonen
2018-11-21 00:37:42
INFO: 1-min: 105.271K/s, 5-min: 105.735K/s, hour: 51.313K/s
Nov 20, 2018 4:37:28 PM snowblossom.miner.Arktika printStats
INFO: Layer 0: read_ops/s: 6995959.1 read_bw: 27328.0 MB/s
Nov 20, 2018 4:37:28 PM snowblossom.miner.Arktika printStats
INFO: Layer 1: read_ops/s: 0.0 read_bw: 0.0 MB/s
Nov 20, 2018 4:37:28 PM snowblossom.miner.Arktika printStats
INFO: RPC Server: read_ops/s: 6351366.5 rpc_ops/s: 6202.5 network_bw: 96.9 MB/s
Fireduck
2018-11-21 00:37:48
I was just getting warmed up
Fireduck
2018-11-21 00:38:08
that makes sense
Rotonen
2018-11-21 00:38:10
that is the r900 reading from ram and dishing out on network
Fireduck
2018-11-21 00:38:37
that's 8 channels it pulls off of on that side?
Rotonen
2018-11-21 00:38:38
using most the CPU to do it, not sure that adding 10g will help
Fireduck
2018-11-21 00:38:57
I have no idea how many channels it has
Fireduck
2018-11-21 00:39:18
xeon is 4 channels per socket and that's a dual socket system?
Rotonen
2018-11-21 00:39:54
and as ryzen is 2 channels per socket and limited to one socket systems, i'm a bit mystified as per its allure
Rotonen
2018-11-21 00:41:02
from my point of view you're taking only a 10% to 15% performance hit vs. local ram there with arktika
Rotonen
2018-11-21 00:41:22
4 socket, E7450
Fireduck
2018-11-21 00:41:42
that should land you around 3MH/s
Rotonen
2018-11-21 00:41:47
these CPUs seem to suck, I can only get them to mine at about 200kh/s directly
Fireduck
2018-11-21 00:41:56
if all ram is local and evenly distributed (and the reads too)
Rotonen
2018-11-21 00:42:25
that sounds more like hardware locality issues there
Rotonen
2018-11-21 00:43:01
like putting most of the field by dumb luck onto one numa node (and there asymmetrically across channels within that node too)
Rotonen
2018-11-21 00:43:47
the page cache has numa aware magic about it so i'm curious, if that R900 does have enough ram and the channels are evenly populated, what'd it pull 'off the disk' after a cat to /dev/null
Rotonen
2018-11-21 03:06:39
i have a server with 256gb DDR4 ram, Intel Xeon E5-1650 V3 getting 870kH/s by precacheing in RAM. However, with 64 GB ram with precache its only getting me 37kH/s... am wrong to expect it to get more?
cryptovape
2018-11-21 03:07:40
If you have other machines on a lan with the 256gb one you can get more out of that setup
Fireduck
2018-11-21 03:09:09
oooo... would I install Arktika on the other computers and use a remote layer to the server's IP?
cryptovape
2018-11-21 03:09:23
yes
Rotonen
2018-11-21 03:10:20
Awesome, yeah the CPU is maxed out. What is the minimum RAM i would need for field 7 RAM mining?
cryptovape
2018-11-21 03:22:44
Ideally you would be able to get entire field in ram
Fireduck
2018-11-21 03:23:02
But doesn't need to all be on one machine
Fireduck
2018-11-21 03:25:27
Any reason to not use rj45 10gbe network?
Fireduck
2018-11-21 03:28:00
Cables will be a few meters at most
Fireduck
2018-11-21 03:38:32
the DDR4 machines are rentals remotely... but I did order 144gb DDR3 ram for my R610, so I may try if CPUs are maxed.
cryptovape
2018-11-21 03:42:39
Also currently have a CPU mining setup for BOINC (for Biblepay and ByteBall) with ~25 i7-3770s 4/8gb RAM, so may want to try to incorporate that somehow.
cryptovape
2018-11-21 03:45:08
How much do those rentals cost?
Fireduck
2018-11-21 03:47:24
got the 256gb for $129...
cryptovape
2018-11-21 03:48:15
$129 per month,day,hour?
Fireduck
2018-11-21 03:48:19
month
cryptovape
2018-11-21 03:48:28
Impressive
Fireduck
2018-11-21 03:52:29
i think the 64gb was $75, but not worth it at 37kH/s
cryptovape
2018-11-21 06:51:22
you can get a little more from that using arktika
Fireduck
2018-11-21 06:51:35
so that you have a separate queue for your memory misses
Fireduck
2018-11-21 07:01:00
wouldn't expect anyhthing amazing
Fireduck
2018-11-21 09:36:16
renting 256GB one socket quad channel hardware for around 100 per month is indeed a thing, also available in europe from various vendors
Rotonen
2018-11-21 09:37:36
@Fireduck the windows penalties are not as bad as thought of, someone hit 1.1MH/s on an asymmetrically populated dual socket xeon on windows
Rotonen
2018-11-21 11:18:59
@cryptovape where are you renting the 256GB machine?
fydel
2018-11-21 16:02:23
@Rotonen shit speed:
Fireduck
2018-11-21 16:02:24
nerd@jet:/var/shm/snow/snowblossom.7$ cat snowblossom.7.snow.* | dd if=/dev/stdin of=/dev/null bs=32k
566180+0 records in
566180+0 records out
18552586240 bytes (19 GB, 17 GiB) copied, 23.8856 s, 777 MB/s
Fireduck
2018-11-21 16:02:58
nevermind, rsync was still running
Fireduck
2018-11-21 16:03:18
still terrible:
Fireduck
2018-11-21 16:03:19
nerd@jet:/var/shm/snow/snowblossom.7$ cat snowblossom.7.snow.* | dd if=/dev/stdin of=/dev/null bs=32k
814943+0 records in
814943+0 records out
26704052224 bytes (27 GB, 25 GiB) copied, 32.8304 s, 813 MB/s
Fireduck
2018-11-21 16:04:42
anyways, since I learned that doing -Xms along with -Xmx makes it easier to fit things in ram I am less concerned about my inability to quickly read from /var/shm
Fireduck
2018-11-21 16:20:29
I've determined my old Dell has enough CPU capacity to spit out some more bit so I'm upgrading it to 10 gigabit
Fireduck
2018-11-21 16:43:19
@Fireduck the dd was not what i was trying to query over, and use a 1G blocksize for speedups
Rotonen
2018-11-21 16:43:50
I just use dd since it gives me a nice report
Fireduck
2018-11-21 16:44:24
how’s the mining ’off the disk’ now that the file is in the page cache?
Rotonen
2018-11-21 16:45:50
and cat is probably the quickest way to cache the file, dd just has the sequential read benchmark aspect to it
Rotonen
2018-11-21 16:46:54
and i guess you sigint your way out at that point or sigusr1 for the interim reports as that’s only read a few tens of GB so far?
Rotonen
2018-11-21 16:47:13
This is shared memory filesystem so not sure if the page cache is even a thing
Fireduck
2018-11-21 16:47:30
it is not
Rotonen
2018-11-21 16:47:38
My benchmark mode shows it at 18gb/s
Fireduck
2018-11-21 16:48:02
Vs well over 100gb/s for jvm heap
Fireduck
2018-11-21 16:48:14
i’m curious as to what you get when you have not filled the ram with shm and have catted the field once
Rotonen
2018-11-21 16:48:46
So mine from SSD but use cat to load cache?
Fireduck
2018-11-21 16:48:51
yes
Rotonen
2018-11-21 16:48:53
I'll give it a shot
Fireduck
2018-11-21 16:49:41
should work from a spinny disk just as well too :P
Rotonen
2018-11-21 16:49:54
just slower to cache
Rotonen
2018-11-21 16:51:24
i like doing that as then you never hit the heap size nonsense
Rotonen
2018-11-21 17:06:30
16.7GB/s
Fireduck
2018-11-21 17:06:43
and I know that is cache because that SSD is terrible
Fireduck
2018-11-21 17:17:19
so about 1/6 vs. memfield?
Rotonen
2018-11-21 18:13:52
check your DM!
cryptovape
2018-11-21 18:20:41
On that system. Might have something to do with have 4 sockets.
Fireduck
2018-11-21 18:20:56
Maybe Java is doing some numa magic? Who knows.
Fireduck
2018-11-21 18:24:46
it is, but so shoukd the page cache
Rotonen