20110226, 00:02  #1 
P90 years forever!
Aug 2002
Yeehaw, FL
7677_{10} Posts 
Random bit in a word
I need some code that efficiently (on Intel this means relatively short chunk of code with no branches that are hard to predict) returns the index of a randomly selected set bit in a 64bit value.
Psuedo code: int64 t; cnt = POPCNT (t) i = random value between 0 and cnt while (i) use BSF to find a set bit, then clear that set bit return BSF(t) POPCNT is an Intel assembly code instruction to count the number of set bits in a register BSF is bitscanforward, it returns the index of the first set bit in a register Clever bittwiddling suggestions are welcome! Last fiddled with by Prime95 on 20110226 at 00:07 
20110226, 01:48  #2 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
2·11·19·23 Posts 

20110226, 04:48  #3 
Jun 2003
5,179 Posts 

20110226, 05:12  #4 
P90 years forever!
Aug 2002
Yeehaw, FL
3^{2}×853 Posts 
I was thinking a binary search for the random nth set bit as in:
int result = 0; int64 t; cnt = POPCNT (t) i = random value between 0 and cnt cnt32 = POPCNT (t & 0xFFFFFFFF) t = t >> ((i > cnt32) ? 32 : 0) result = result + ((i > cnt32) ? 32 : 0) i = i  ((i > cnt32) ? cnt32 : 0) repeat for 16, 8, 4, 2 return result the compiler ought to be smart enough to generate conditional move instructions and thus the above is jumpless 
20110226, 07:21  #5 
"Gang aft agley"
Sep 2002
2·1,877 Posts 
Select the bit position (from the mostsignificant bit) with the given count (rank) might be close to what you want. It does its' own bit counting using a common fast method but retains the intermediate calculations and then uses those values to quickly locate the index of the r^{th} set bit. In the example it finds the position of the r^{th} set bit from the most significant end but it says that it can be adapted to find the bit from the least significant end.
Code:
The following 64bit code selects the position of the rth 1 bit when counting from the left. In other words if we start at the most significant bit and proceed to the right, counting the number of bits set to 1 until we reach the desired rank, r, then the position where we stop is returned. If the rank requested exceeds the count of bits set, then 64 is returned. The code may be modified for 32bit or counting from the right. uint64_t v; // Input value to find position with rank r. unsigned int r; // Input: bit's desired rank [164]. unsigned int s; // Output: Resulting position of bit with rank r [164] uint64_t a, b, c, d; // Intermediate temporaries for bit count. unsigned int t; // Bit count temporary. // Do a normal parallel bit count for a 64bit integer, // but store all intermediate steps. // a = (v & 0x5555...) + ((v >> 1) & 0x5555...); a = v  ((v >> 1) & ~0UL/3); // b = (a & 0x3333...) + ((a >> 2) & 0x3333...); b = (a & ~0UL/5) + ((a >> 2) & ~0UL/5); // c = (b & 0x0f0f...) + ((b >> 4) & 0x0f0f...); c = (b + (b >> 4)) & ~0UL/0x11; // d = (c & 0x00ff...) + ((c >> 8) & 0x00ff...); d = (c + (c >> 8)) & ~0UL/0x101; t = (d >> 32) + (d >> 48); // Now do branchless select! s = 64; // if (r > t) {s = 32; r = t;} s = ((t  r) & 256) >> 3; r = (t & ((t  r) >> 8)); t = (d >> (s  16)) & 0xff; // if (r > t) {s = 16; r = t;} s = ((t  r) & 256) >> 4; r = (t & ((t  r) >> 8)); t = (c >> (s  8)) & 0xf; // if (r > t) {s = 8; r = t;} s = ((t  r) & 256) >> 5; r = (t & ((t  r) >> 8)); t = (b >> (s  4)) & 0x7; // if (r > t) {s = 4; r = t;} s = ((t  r) & 256) >> 6; r = (t & ((t  r) >> 8)); t = (a >> (s  2)) & 0x3; // if (r > t) {s = 2; r = t;} s = ((t  r) & 256) >> 7; r = (t & ((t  r) >> 8)); t = (v >> (s  1)) & 0x1; // if (r > t) s; s = ((t  r) & 256) >> 8; s = 65  s; If branching is fast on your target CPU, consider uncommenting the ifstatements and commenting the lines that follow them. Juha Järvi sent this to me on November 21, 2009. 
20110226, 16:57  #6  
P90 years forever!
Aug 2002
Yeehaw, FL
1110111111101_{2} Posts 
Quote:


20110228, 19:14  #7 
∂^{2}ω=0
Sep 2002
República de California
2·13·449 Posts 
This is more in genericC terms, but perhaps worth investigating:
1. Create a pair of 256entry lookup tables: Table 1 is simply a bytewise lookuptable version of POPCNT: Take a byte and return #set bits in that byte; Table 2 is a bit more complex  it is a lookup table which encodes the following: For any bytevalued integer B in [0,255], given a target setbit index I in [0,POPCNT(B)], the tablelookup returns the bitindex of the set bit with respect to B. This is a bit trickier to set up, since different values of B have numbers of set bits ranging from 0 to 8, so one needs to consider whether the requested bitindex is legal for the bytevalued int in question. But assuming the caller will only ever request a bitindex value which is appropriate for the specific bytevalue in question, you could use a set of 256 24bit ints to encode this. Each 24bit table entry is divided into eight 3bit subfields containing up to eight bit indices for the bytevalue corresponding to the table entry. The low 3 bits are the bit index of the first set bit in B, etc. For greater efficiency it would probably be better to use 32bit ints 4bit indexoffset subfields, with one bit in each subfield going unused. Then, declaring this table like so: uint32 table2[256]; and after properly initializing it, the bitindex of a given set bit I (zerooffset, i.e.I=0 means the first set bit) in bytevalue B is simply equal to (table2[B] >> (4*I)) & 0xf 2. Loop over the 8 bytes in your 64bit input int X, and do 8 Table1 lookups to generate a sequence of partial sums based on the total #bits set in the accumulated bytes. This could be the basis for a fast binary search, but since are just 8 such partial the overhead for managing such a data structure might exceed any gains due to the better asymptotic opcount; 3. Given an input bitindex in [0, POPCNT(X)], use the partial sums you generated in step [2] to find which byte contains the bit in question, and then use Table 2 to give you the bit index within the target byte. 
20110228, 21:58  #8 
"Gang aft agley"
Sep 2002
7252_{8} Posts 
I'm a bit rusty in programming Ernst, so I apologize if I miss the points or end up blathering a bit...
It is easy to get population counts for small fields of the entire 64 bit value For single bit fields, POPCNT is the bit itself. For two bit POPCNTs, you add even bit positions to odd bit positions. Looking above, at the code snippet in my earlier message, it is just a case of shift, mask and add. The comments for the POPCNT variables are more explicit than the slightly optimized actual code. v = 1 bit POPCNTs (64 of them) a = 2 bit POPCNTs (32 of them) b = 4 bit POPCNTs (16 of them) c = 8 bit POPCNTs (8 of them) d = 16 bit POPCNTs (4 of them) Perhaps 'c' would be more efficient than 8 table lookups for step two that you describe. The rest of the code snippet is a bit roughish; to avoid conditional tests and jumps, for comparisons it does subtracts and depends on certain bits of the result. To adjust 's' it depends on bit 8 of an integer being 0 or 1 after a subtraction . To adjust 'r' it depends on several bits of the second byte of a subtraction being all 0 or all 1. In any case the fact that it works makes it so ugly that it is nice. The first 't' value looks wrong to me but I accept that I must be misunderstanding things a bit. After each test and adjustment, 't' is changed to depend on a different one of those POPCNT variables I mentioned above shifted by the partial result 's' and masked to the relevant bits for the next comparison. My tools didn't compile the masking constants correctly and I was forced to replace ~0UL with ~0ui64 Ross Last fiddled with by only_human on 20110228 at 22:43 Reason: Expanded thoughts. Added 'v' to POPCNT variable list 
20110301, 01:05  #9  
P90 years forever!
Aug 2002
Yeehaw, FL
1110111111101_{2} Posts 
Quote:
The code from Juha takes 11 operations per binary search level. We can replace 33 operations with a few operations and a lookup into a 1KB table. BTW, I think I can reduce the 11 operations per binary search level down to a more reasonable 6 or 7. 

20110301, 01:28  #10  
"Gang aft agley"
Sep 2002
7252_{8} Posts 
Quote:
I thought that more than one comparison could be done at a time by setting guard bits that would keep borrows from propagating too far but that is likely primitive thinking because this is probably perfect fodder for modern SSE type instructions. I would be curious to see your ideas on simplifying Juha's binary search if you do develop them. An ugly part of the code in my opinion is that, as written, in C is that the masks used after the subtractions wouldn't work correctly on machines that have a signed binary default integer representation. 

20110301, 01:57  #11 
P90 years forever!
Aug 2002
Yeehaw, FL
3^{2}·853 Posts 
Here is my modified version without Ernst's suggestion to eliminate the last 3 binary search levels. This is quite a bit faster. Also, it shouldn't be hard to remove the multiplication operations.
The original code was 25.5% of program execution time. The new code is 20.5% of total execution time. Code:
// Adapted from code by Juha J‰rvi found on Stanford's bit hacks web page uint64_t v = needs_work_mask; // Input value to find position with rank r. unsigned int r; // Input: bit's desired rank [164]. unsigned int s; // Output: Resulting position of bit with rank r [164] uint64_t a, b, c, d; // Intermediate temporaries for bit count. unsigned int t; // Bit count temporary. // Do a normal parallel bit count for a 64bit integer, // but store all intermediate steps. // a = (v & 0x5555...) + ((v >> 1) & 0x5555...); a = v  ((v >> 1) & ~0UL/3); // b = (a & 0x3333...) + ((a >> 2) & 0x3333...); b = (a & ~0UL/5) + ((a >> 2) & ~0UL/5); // c = (b & 0x0f0f...) + ((b >> 4) & 0x0f0f...); c = (b + (b >> 4)) & ~0UL/0x11; // d = (c & 0x00ff...) + ((c >> 8) & 0x00ff...); d = (c + (c >> 8)) & ~0UL/0x101; #ifdef OLD t = (d >> 32) + (d >> 48); // Now do branchless select! s = 64; // if (r > t) {s = 32; r = t;} s = ((t  r) & 256) >> 3; r = (t & ((t  r) >> 8)); t = (d >> (s  16)) & 0xff; // if (r > t) {s = 16; r = t;} s = ((t  r) & 256) >> 4; r = (t & ((t  r) >> 8)); t = (c >> (s  8)) & 0xf; // if (r > t) {s = 8; r = t;} s = ((t  r) & 256) >> 5; r = (t & ((t  r) >> 8)); t = (b >> (s  4)) & 0x7; // if (r > t) {s = 4; r = t;} s = ((t  r) & 256) >> 6; r = (t & ((t  r) >> 8)); t = (a >> (s  2)) & 0x3; // if (r > t) {s = 2; r = t;} s = ((t  r) & 256) >> 7; r = (t & ((t  r) >> 8)); t = (v >> (s  1)) & 0x1; // if (r > t) s; s = ((t  r) & 256) >> 8; //s = 65  s; //convert s to LSB 063 notation s; #else int pick; // s = 0; // if (r > popcnt32[s]) {s += 32; r = popcnt32[s];} t = (d + (d >> 16)) & 0xFF; pick = (r > t); s = pick << 5; r = pick * t; // if (r > popcnt16[s]) {s += 16; r = popcnt16[s];} t = (d >> s) & 0xff; pick = (r > t); s += pick << 4; r = pick * t; // if (r > popcnt8[s]) {s += 8; r = popcnt8[s];} t = (c >> s) & 0xf; pick = (r > t); s += pick << 3; r = pick * t; // if (r > popcnt4[s]) {s += 4; r = popcnt4[s];} t = (b >> s) & 0x7; pick = (r > t); s += pick << 2; r = pick * t; // if (r > popcnt2[s]) {s += 2; r = popcnt2[s];} t = (a >> s) & 0x3; pick = (r > t); s += pick << 1; r = pick * t; // if (r > popcnt1[s]) s++; t = (v >> s) & 0x1; pick = (r > t); s += pick; #endif 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
random comments, random questions and thread titles made for Google  jasong  Lounge  46  20170509 12:32 
About random number (random seed) in Msieve  Greenk12  Factoring  1  20081115 13:56 
My last name is a word. I had no idea.  jasong  jasong  3  20071005 18:02 
Word Problem  mfgoode  Lounge  11  20060401 16:16 
The only word  mfgoode  Puzzles  11  20040914 19:08 