and `Binary search is a simple example that could benefit from explicit prefetching. The access pattern in a binary search looks pretty much random to the hardware prefetcher, so there is little chance that it will accurately predict what to fetch.` sounds relevant
oh well, enough fun to have discovered a whole field of rabbit holes for now
and this seems like almost snowblossom mining https://stackoverflow.com/questions/7327994/prefetching-examples/50280085#50280085 Can anyone give an example or a link to an example which uses __builtin_prefetch in GCC (or just the asm instruction prefetcht0 in general) to gain a substantial performance advantage? In particula...
`maybe surprisingly, the less CPU-bound task the bigger the speed-up: we are able to hide the latency almost completely, thus the speed-up is` sounds so very familiar
now this is just funny in regards to how much of the advancements from the past 10 years are in the way of mining snowblossom :smile: https://stackoverflow.com/a/45201673/1214697 Some CPU and compilers supply prefetch instructions. Eg: __builtin_prefetch in GCC Document. Although there is a comment in GCC's document, but it's too short to me. I want to know, in prantice, w...
was not expecting power savings to be in the mess as well `On recent Intel chips one reason you apparently might want to use prefetching is to avoid CPU power-saving features artificially limiting your achieved memory bandwidth.`
yeah, java does all this stuff actually well, good look to anyone implementing a miner in anything else :smile:
but yeah, i guess these would be fun for arktika 8 channels per socket, 4 sockets, 100GE https://www.anandtech.com/print/13620/huawei-server-efforts-hi1620-and-arms-big-server-core-ares Huawei Server Efforts: Hi1620 and Arm’s Big Server Core, Ares
INFO: RPC Server: read_ops/s: 4610388.4 rpc_ops/s: 4502.3 network_bw: 70.3 MB/s
that is more like it
i'll believe a poolside 1h average :stuck_out_tongue:
heh
you should see about 1mh in an hour
i'm seeing that as a spot rate currently, but that's fluctuating between 200k and 1M
i think the pool tries to tell me things in the log way too often
yeah, just had a `INFO: Mining rate: 0.000/sec - at this rate ∞ hours per block`
maybe it could tell me the averages every 5min or so
Heh yeah
what's the relation of that 1M poolside figure and the 4M from the output you posted?
Those are individual value reads
So 6 of those is one hash attempt
that'd land you at 800kH/s, though?
INFO: 1-min: 105.271K/s, 5-min: 105.735K/s, hour: 51.313K/s Nov 20, 2018 4:37:28 PM snowblossom.miner.Arktika printStats INFO: Layer 0: read_ops/s: 6995959.1 read_bw: 27328.0 MB/s Nov 20, 2018 4:37:28 PM snowblossom.miner.Arktika printStats INFO: Layer 1: read_ops/s: 0.0 read_bw: 0.0 MB/s Nov 20, 2018 4:37:28 PM snowblossom.miner.Arktika printStats INFO: RPC Server: read_ops/s: 6351366.5 rpc_ops/s: 6202.5 network_bw: 96.9 MB/s
I was just getting warmed up
that makes sense
that is the r900 reading from ram and dishing out on network
that's 8 channels it pulls off of on that side?
using most the CPU to do it, not sure that adding 10g will help
I have no idea how many channels it has
xeon is 4 channels per socket and that's a dual socket system?
and as ryzen is 2 channels per socket and limited to one socket systems, i'm a bit mystified as per its allure
from my point of view you're taking only a 10% to 15% performance hit vs. local ram there with arktika
4 socket, E7450
that should land you around 3MH/s
these CPUs seem to suck, I can only get them to mine at about 200kh/s directly
if all ram is local and evenly distributed (and the reads too)
that sounds more like hardware locality issues there
like putting most of the field by dumb luck onto one numa node (and there asymmetrically across channels within that node too)
the page cache has numa aware magic about it so i'm curious, if that R900 does have enough ram and the channels are evenly populated, what'd it pull 'off the disk' after a cat to /dev/null
i have a server with 256gb DDR4 ram, Intel Xeon E5-1650 V3 getting 870kH/s by precacheing in RAM. However, with 64 GB ram with precache its only getting me 37kH/s... am wrong to expect it to get more?
If you have other machines on a lan with the 256gb one you can get more out of that setup
oooo... would I install Arktika on the other computers and use a remote layer to the server's IP?
yes
Awesome, yeah the CPU is maxed out. What is the minimum RAM i would need for field 7 RAM mining?
Ideally you would be able to get entire field in ram
But doesn't need to all be on one machine
Any reason to not use rj45 10gbe network?
Cables will be a few meters at most
the DDR4 machines are rentals remotely... but I did order 144gb DDR3 ram for my R610, so I may try if CPUs are maxed.
Also currently have a CPU mining setup for BOINC (for Biblepay and ByteBall) with ~25 i7-3770s 4/8gb RAM, so may want to try to incorporate that somehow.
How much do those rentals cost?
got the 256gb for $129...
$129 per month,day,hour?
month
Impressive
i think the 64gb was $75, but not worth it at 37kH/s
you can get a little more from that using arktika
so that you have a separate queue for your memory misses
wouldn't expect anyhthing amazing
renting 256GB one socket quad channel hardware for around 100 per month is indeed a thing, also available in europe from various vendors
@Fireduck the windows penalties are not as bad as thought of, someone hit 1.1MH/s on an asymmetrically populated dual socket xeon on windows
@cryptovape where are you renting the 256GB machine?
@Rotonen shit speed:
nerd@jet:/var/shm/snow/snowblossom.7$ cat snowblossom.7.snow.* | dd if=/dev/stdin of=/dev/null bs=32k 566180+0 records in 566180+0 records out 18552586240 bytes (19 GB, 17 GiB) copied, 23.8856 s, 777 MB/s
nevermind, rsync was still running
still terrible:
nerd@jet:/var/shm/snow/snowblossom.7$ cat snowblossom.7.snow.* | dd if=/dev/stdin of=/dev/null bs=32k 814943+0 records in 814943+0 records out 26704052224 bytes (27 GB, 25 GiB) copied, 32.8304 s, 813 MB/s
anyways, since I learned that doing -Xms along with -Xmx makes it easier to fit things in ram I am less concerned about my inability to quickly read from /var/shm
I've determined my old Dell has enough CPU capacity to spit out some more bit so I'm upgrading it to 10 gigabit
@Fireduck the dd was not what i was trying to query over, and use a 1G blocksize for speedups
I just use dd since it gives me a nice report
how’s the mining ’off the disk’ now that the file is in the page cache?
and cat is probably the quickest way to cache the file, dd just has the sequential read benchmark aspect to it
and i guess you sigint your way out at that point or sigusr1 for the interim reports as that’s only read a few tens of GB so far?
This is shared memory filesystem so not sure if the page cache is even a thing
it is not
My benchmark mode shows it at 18gb/s
Vs well over 100gb/s for jvm heap
i’m curious as to what you get when you have not filled the ram with shm and have catted the field once
So mine from SSD but use cat to load cache?
yes
I'll give it a shot
should work from a spinny disk just as well too :P
just slower to cache
i like doing that as then you never hit the heap size nonsense
16.7GB/s
and I know that is cache because that SSD is terrible
so about 1/6 vs. memfield?
check your DM!
On that system. Might have something to do with have 4 sockets.
Maybe Java is doing some numa magic? Who knows.
it is, but so shoukd the page cache