After the initial setup of the U2F Zero (hint: there was none), I immediately wondered how I could enable it as a required second factor when logging into my Mac.

After a bit of browsing on Yubico's site and a couple of blogs, I didn't find a solid step-by-step tutorial on how to set this up that fully described the process and the different configurations. So here we are.

Note:These instructions apply for any U2F token, not just the U2F Zero or YubiKey devices.

If you're feeling dangerous (and I really do mean that), you can skip to the very end and just see which commands to run.

The U2F Zero is an open source DIY (or purchase pre-built from Amazon) hardware token that supports U2F (Univeral 2nd Factor) authentication. At $8.00, it's hard to beat the price, and similar products like the various YubiKeys aren't open source, which is a minor inconvenience for some and a dealbreaker for others.

You have the option of either purchasing it pre-assembled from Amazon or building it yourself. While I always enjoy soldering, the programmer alone is around $35, so unless you're going to build quite a few, you might be best off just buying them pre-built.

Note:I'm doing this on macOS Sierra (10.12), but this should work on any recent version of OS X.

Basically I am trying to achieve the following: **when logging in either after a restart or upon waking from sleep, I would like to require a U2F token as a second authentication factor, in addition to my password.**

The process is actually extremely simple, and boils down to:

- Install the
`pam-u2f`

module - Generate the Authorization Mapping File for your key(s)
- Edit the system configuration files to enable U2F authentication

`pam-u2f`

ModuleInstalling the `pam-u2f`

module is easiest with Homebrew, in which case a simple

```
brew install pam-u2f
```

will suffice.

If you're not using Homebrew on OS X (seriously?), you can build it from source following the instructions here. You will also need `autoconf`

and `automake`

and friends, and also `libu2f-host`

and `libu2f-server`

.

In either case, you will end up with a shared library called `pam_u2f.so`

. If you installed via Homebrew (be sure to check the version since the path will change) it will be located at `/usr/local/Cellar/pam-u2f/1.0.4/lib/pam/pam_u2f.so`

.

To make the configuration step easier, you can symlink the Homebrew library or copy the library you built into `/usr/lib/pam/`

.

```
cd /usr/lib/pam
sudo ln -s /usr/local/Cellar/pam-u2f/1.0.4/lib/pam/pam_u2f.so
# Or copy if you built it yourself
```

Note:You will get an`Operation not permitted`

message on OS X El Capitan or macOS Sierra, since you must disable System Integrity Protection to have write access to anything in`/usr`

other than`/usr/local`

. You can disable SIP by booting into recovery mode (⌘ + R) and then opening a Terminal and executing`csrutil disable`

.

If you decide not to disable SIP or just not to link or copy into `/usr/lib/pam`

, you can also just provide the full path to the `pam_u2f.so`

file in the configuration files later.

In order for the module to know which U2F token to accept the challenge response from and authenticate with, you must store some information about your token. Specifically, you must create what is called an Authorization Mapping File, which contains a list of token pairs consisting of the key handles and the public keys.

More information on these files can be found here.

To generate this file, you can use the `pamu2fcfg`

utility which is installed with the `pam-u2f`

recipe on Homebrew or can also be built manually.

Insert your U2F token and then open a Terminal and run

```
pamu2fcfg
```

Note:If you want to be verbose about the username for the token you can add the`-u <username>`

option.

If the device is inserted correctly, the colour of the light on the U2F Zero will change, prompting you to press the button to complete the challenge response. If everything worked you will see some output on the screen consisting of

```
<username>:<key handle>,<public key>\n
```

I'm going to place this in a file called `~/.config/u2f/keys`

, but you can place it anywhere. To directly output to the file:

```
# Create the directory if it doesn't exist
mkdir -p ~/.config/u2f
# Use >> if you want to append, or a single > if you want to rewrite the file
pamu2fcfg > ~/.config/u2f/keys
```

You'll need to specify this location in the PAM configuration files, unless you place it in the file `~/.config/Yubico/u2f_keys`

, which is the default search location.

To use multiple tokens under the same user on the same machine, you just need to concatenate the second token's key information to the mapping file.

With one user and two tokens, the mapping file should look like:

```
<username>:<key handle 1>,<public key 1>:<key handle 2>,<public key 2>
```

Thankfully, the PAM U2F config tool includes an option (`-n`

) to leave off the username, so in the simple case of one user and two tokens you can just do

```
pamu2fcfg -n >> ~/.config/u2f/keys
```

with the second token.

If, however, you wanted to add a second token but for a **different user**, you would just add a separate line with the output right from `pamu2fcfg`

, no modification necessary.

For multiple users and/or tokens the mapping file could look like:

```
<username 1>:<key handle 1>,<public key 1>:<key handle 2>,<public key 2>
<username 2>:<key handle 3>,<public key 3>:<key handle 2>,<public key 2>
```

In that example, both users have a unique token (keys `1`

and `3`

) and also share a token (key `2`

).

The PAM configuration files are located at `/etc/pam.d`

.

You will see a list of configuration files for various authentication modules here. Of these files, the `screensaver`

and `authorization`

files are of particular interest to us.

Note:In these steps it isextremely importantto use the`sufficient`

keyword andnotthe`required`

keyword when first configuring these. This gives you the opportunity to test it out and verify that you can correctly use your U2F token to log in.

Open up the `screensaver`

file (with `sudo`

, of course) and add a line consisting of:

```
auth sufficient <path to pam_u2f.so> authfile=<path to mapping file>
```

Using the full Homebrew path and placing my authorization mapping file under `~/.config/u2f/keys`

, my line looks like:

```
auth sufficient /usr/local/Cellar/pam-u2f/1.0.4/lib/pam/pam_u2f.so authfile=/Users/joe/.config/u2f/keys
```

Note:You cannot perform shell expansions (e.g.`~`

) here so give the full path.

If you symlinked or installed directly into `/usr/lib/pam`

, and if you created your mapping file under `~/.config/Yubico/u2f_keys`

, you can simply use:

```
auth sufficient pam_u2f.so
```

Then save and exit. Make sure you're configured to require a password immediately after going to sleep, and then tell your Mac to go to sleep.

Upon waking, insert your U2F token and then type your password. After entering the password, it should pause and the light on your U2F Zero should change colours. Et voilà! Press the button to complete the challenge response and you should be successfully logged in with your U2F as a second factor!

You can now perform the next step by adding the same line to the `authorization`

file, and then log out and back in to test.

Since we chose `sufficient`

for the test, if you don't have your token inserted when you type your password, it will log in without asking for it. To *require* your token in addition to your password for true 2FA, simply change the `sufficient`

keyword to `required`

in the `authorization`

and `screensaver`

files.

Note:It is ato first try thevery good idea`required`

configuration with the`screensaver`

file only, since if something goes wrong, you can still authenticate via the normal login screen without your token.

As mentioned, be careful with requiring the token. If you lose your token, you may be totally locked out of your Mac. One solution to this is to buy two tokens and add both of them to your authorization mapping file. Then store that second token away for safe keeping.

If you *really* truly don't think you need to read about what it is that you're doing, you can achieve all of this by just running the following commands:

```
# Install PAM U2F Module
brew install pam-u2f
# Generate a new authorization mapping file
mkdir -p ~/.config/u2f
# >>> Insert U2F Token
pamu2fcfg > ~/.config/u2f/keys
# >>> Press button on U2F token
# >>> Remove token
# >>> Insert second U2F token (if you have one)
pamu2fcfg -n >> ~/.config/u2f/keys
# >>> Press button on second U2F token
# >>> Remove token
# ... Repeat for each additional token
# Edit PAM configuration files
sudo $EDITOR /etc/pam.d/screensaver
### Add a line with:
# auth required <path to pam_u2f.so> authfile=<path to mapping file>
sudo $EDITOR /etc/pam.d/authorization
### Add a line with:
# auth required <path to pam_u2f.so> authfile=<path to mapping file>
```

]]>Converted to Markdown by Joe Gibson (@gibsjose) joseph.gibson@nasa.gov

Edits and Table of Contents by Jeroen de Bruijn (@vidavidorra) vidavidorra@gmail.com

Check out the Markdown source or fork the repository on GitHub at https://github.com/gibsjose/BitHacks

]]>Converted to Markdown by Joe Gibson (@gibsjose) joseph.gibson@nasa.gov

Edits and Table of Contents by Jeroen de Bruijn (@vidavidorra) vidavidorra@gmail.com

Check out the Markdown source or fork the repository on GitHub at https://github.com/gibsjose/BitHacks.

Sean Anderson's wonderful compilation of Bit Twiddling Hacks, converted to a more readable Markdown version and accompanying PDF.

See the original document for license information.

- About the operation counting methodology
- Compute the sign of an integer
- Detect if two integers have opposite signs
- Compute the integer absolute value (abs) without branching
- Compute the minimum (min) or maximum (max) of two integers without branching
- Determining if an integer is a power of 2
- Sign extending
- Conditionally set or clear bits without branching
- Conditionally negate a value without branching
- Merge bits from two values according to a mask
- Counting bits set
- Counting bits set, naive way
- Counting bits set by lookup table
- Counting bits set, Brian Kernighan's way
- Counting bits set in 14, 24, or 32-bit words using 64-bit instructions
- Counting bits set, in parallel
- Count bits set (rank) from the most-significant bit upto a given position
- Select the bit position (from the most-significant bit) with the given count (rank)

- Computing parity (1 if an odd number of bits set, 0 otherwise)
- Swapping Values
- Reversing bit sequences
- Reverse bits the obvious way
- Reverse bits in word by lookup table
- Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division)
- Reverse the bits in a byte with 4 operations (64-bit multiply, no division)
- Reverse the bits in a byte with 7 operations (no 64-bit, only 32)
- Reverse an N-bit quantity in parallel with 5 * lg(N) operations

- Modulus division (aka computing remainders)
- Finding integer log base 2 of an integer (aka the position of the highest bit set)
- Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)
- Find the integer log base 2 of an integer with an 64-bit IEEE float
- Find the log base 2 of an integer with a lookup table
- Find the log base 2 of an N-bit integer in O(lg(N)) operations
- Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup

- Find integer log base 10 of an integer
- Find integer log base 10 of an integer the obvious way
- Find integer log base 2 of a 32-bit IEEE float
- Find integer log base 2 of the pow(2, r)-root of a 32-bit IEEE float (for unsigned integer r)
- Counting consecutive trailing zero bits (or finding bit indices)
- Count the consecutive zero bits (trailing) on the right linearly
- Count the consecutive zero bits (trailing) on the right in parallel
- Count the consecutive zero bits (trailing) on the right by binary search
- Count the consecutive zero bits (trailing) on the right by casting to a float
- Count the consecutive zero bits (trailing) on the right with modulus division and lookup
- Count the consecutive zero bits (trailing) on the right with multiply and lookup

- Round up to the next highest power of 2 by float casting
- Round up to the next highest power of 2
- Interleaving bits (aka computing Morton Numbers)
- Testing for ranges of bytes in a word (and counting occurances found)
- Compute the lexicographically next bit permutation

When totaling the number of operations for algorithms here, any C operator is counted as one operation. Intermediate assignments, which need not be written to RAM, are not counted. Of course, this operation counting approach only serves as an approximation of the actual number of machine instructions and CPU time. All operations are assumed to take the same amount of time, which is not true in reality, but CPUs have been heading increasingly in this direction over time. There are many nuances that determine how fast a system will run a given sample of code, such as cache sizes, memory bandwidths, instruction sets, etc. In the end, benchmarking is the best way to determine whether one method is really faster than another, so consider the techniques below as possibilities to test on your target architecture.

```
int v; // we want to find the sign of v
int sign; // the result goes here
// CHAR_BIT is the number of bits per byte (normally 8).
sign = -(v < 0); // if v < 0 then -1, else 0.
// or, to avoid branching on CPUs with flag registers (IA32):
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
// or, for one less instruction (but not portable):
sign = v >> (sizeof(int) * CHAR_BIT - 1);
```

The last expression above evaluates to `sign = v >> 31`

for 32-bit integers. This is one operation faster than the obvious way, `sign = -(v < 0)`

. This trick works because when signed integers are shifted right, the value of the far left bit is copied to the other bits. The far left bit is `1`

when the value is negative and `0`

otherwise; all `1`

bits gives `-1`

. Unfortunately, this behavior is architecture-specific.

Alternatively, if you prefer the result be either `-1`

or `+1`

, then use:

```
sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then -1, else +1
```

On the other hand, if you prefer the result be either `-1`

, `0`

, or `+1`

, then use:

```
sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
// Or, for more speed but less portability:
sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); // -1, 0, or +1
// Or, for portability, brevity, and (perhaps) speed:
sign = (v > 0) - (v < 0); // -1, 0, or +1
```

If instead you want to know if something is non-negative, resulting in `+1`

or else `0`

, then use:

```
sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); // if v < 0 then 0, else 1
```

**Notes:**

*On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work.*

*For greater portability, Toby Speight suggested on September 28, 2005 that CHAR_BIT be used here and throughout rather than assuming bytes were 8 bits long.*

*Angus recommended the more portable versions above, involving casting on March 4, 2006. *

*Rohit Garg suggested the version for non-negative integers on September 12, 2009.*

```
int x, y; // input values to compare signs
bool f = ((x ^ y) < 0); // true iff x and y have opposite signs
```

**Notes:**

*Manfred Weis suggested I add this entry on November 26, 2009.*

```
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
Patented variation:
r = (v ^ mask) - mask;
```

Some CPUs don't have an integer absolute value instruction (or the compiler fails to use them). On machines where branching is expensive, the above expression can be faster than the obvious approach, `r = (v < 0) ? -(unsigned)v : v`

, even though the number of operations is the same.

**Notes:**

*On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. I've read that ANSI C does not require values to be represented as two's complement, so it may not work for that reason as well (on a diminishingly small number of old machines that still use one's complement).*

*On March 14, 2004, Keith H. Duggar sent me the patented variation above; it is superior to the one I initially came up with, r=(+1|(v>>(sizeof(int)*CHAR_BIT-1)))*v, because a multiply is not used. Unfortunately, this method has been patented in the USA on June 6, 2000 by Vladimir Yu Volkonsky and assigned to Sun Microsystems.*

*On August 13, 2006, Yuriy Kaminskiy told me that the patent is likely invalid because the method was published well before the patent was even filed, such as in How to Optimize for the Pentium Processor by Agner Fog, dated November, 9, 1996. Yuriy also mentioned that this document was translated to Russian in 1997, which Vladimir could have read. Moreover, the Internet Archive also has an old link to it.*

*On January 30, 2007, Peter Kankowski shared with me an abs version he discovered that was inspired by Microsoft's Visual C++ compiler output. It is featured here as the primary solution.*

*On December 6, 2007, Hai Jin complained that the result was signed, so when computing the abs of the most negative value, it was still negative.*

*On April 15, 2008 Andrew Shapira pointed out that the obvious approach could overflow, as it lacked an (unsigned) cast then; for maximum portability he suggested (v < 0) ? (1 + ((unsigned)(-1-v))) : (unsigned)v. But citing the ISO C99 spec on July 9, 2008, Vincent Lefèvre convinced me to remove it because even on non-2s-complement machines -(unsigned)v will do the right thing. The evaluation of -(unsigned)v first converts the negative value of v to an unsigned by adding 2**N, yielding a 2s complement representation of v's value that I'll call U. Then, U is negated, giving the desired result, -U = 0 - U = 2**N - U = 2**N - (v+2**N) = -v = abs(v).*

```
int x; // we want to find the minimum of x and y
int y;
int r; // the result goes here
r = y ^ ((x ^ y) & -(x < y)); // min(x, y)
```

**Notes:***
On some rare machines where branching is very expensive and no condition move instructions exist, the above expression might be faster than the obvious approach, `r = (x < y) ? x : y`

, even though it involves two more instructions. (Typically, the obvious approach is best, though.) It works because if `x < y`

, then `-(x < y)`

will be all ones, so `r = y ^ (x ^ y) & ~0 = y ^ x ^ y = x`

. Otherwise, if `x >= y`

, then `-(x < y)`

will be all zeros, so `r = y ^ ((x ^ y) & 0) = y`

. On some machines, evaluating `(x < y)`

as `0`

or `1`

requires a branch instruction, so there may be no advantage.

```
r = x ^ ((x ^ y) & -(x < y)); // max(x, y)
```

If you know that `INT_MIN <= x - y <= INT_MAX`

, then you can use the following, which are faster because `(x - y)`

only needs to be evaluated once.

```
r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)
```

**Notes:**

*Note that the 1989 ANSI C specification doesn't specify the result of signed right-shift, so these aren't portable. If exceptions are thrown on overflows, then the values of x and y should be unsigned or cast to unsigned for the subtractions to avoid unnecessarily throwing an exception, however the right-shift needs a signed operand to produce all one bits when negative, so cast to signed there.*

*On March 7, 2003, Angus Duggan pointed out the right-shift portability issue.*

*On May 3, 2005, Randal E. Bryant alerted me to the need for the precondition, INT_MIN <= x - y <= INT_MAX, and suggested the non-quick and dirty version as a fix. Both of these issues concern only the quick and dirty version.*

*Nigel Horspoon observed on July 6, 2005 that gcc produced the same code on a Pentium as the obvious solution because of how it evaluates (x < y).*

*On July 9, 2008 Vincent Lefèvre pointed out the potential for overflow exceptions with subtractions in r = y + ((x - y) & -(x < y)), which was the previous version.*

*Timothy B. Terriberry suggested using xor rather than add and subract to avoid casting and the risk of overflows on June 2, 2009.*

```
unsigned int v; // we want to see if v is a power of 2
bool f; // the result goes here
f = (v & (v - 1)) == 0;
```

Note that `0`

is incorrectly considered a power of 2 here. To remedy this, use:

```
f = v && !(v & (v - 1));
```

Sign extension is automatic for built-in types, such as chars and ints. But suppose you have a signed two's complement number, `x`

, that is stored using only `b`

bits. Moreover, suppose you want to convert `x`

to an int, which has more than `b`

bits. A simple copy will work if `x`

is positive, but if negative, the sign must be extended. For example, if we have only 4 bits to store a number, then -3 is represented as `1101`

in binary. If we have 8 bits, then -3 is `11111101`

. The most-significant bit of the 4-bit representation is replicated sinistrally to fill in the destination when we convert to a representation with more bits; this is sign extending. In C, sign extension from a constant bit-width is trivial, since bit fields may be specified in structs or unions. For example, to convert from 5 bits to an full integer:

```
int x; // convert this from using 5 bits to a full int
int r; // resulting sign extended number goes here
struct {signed int x:5;} s;
r = s.x = x;
```

The following is a C++ template function that uses the same language feature to convert from B bits in one operation (though the compiler is generating more, of course).

```
template <typename T, unsigned B>
inline T signextend(const T x)
{
struct {T x:B;} s;
return s.x = x;
}
int r = signextend<signed int,5>(x); // sign extend 5 bit number x to r
```

**Notes:**

*John Byrd caught a typo in the code (attributed to html formatting) on May 2, 2005.*

*On March 4, 2006, Pat Wood pointed out that the ANSI C standard requires that the bitfield have the keyword "signed" to be signed; otherwise, the sign is undefined.*

Sometimes we need to extend the sign of a number but we don't know a priori the number of bits, `b`

, in which it is represented. (Or we could be programming in a language like Java, which lacks bitfields.)

```
unsigned b; // number of bits representing the number in x
int x; // sign extend this b-bit number to r
int r; // resulting sign-extended number
int const m = 1U << (b - 1); // mask can be pre-computed if b is fixed
x = x & ((1U << b) - 1); // (Skip this if bits in x above position b are already zero.)
r = (x ^ m) - m;
```

The code above requires four operations, but when the bitwidth is a constant rather than variable, it requires only two fast operations, assuming the upper bits are already zeroes.

A slightly faster but less portable method that doesn't depend on the bits in x above position b being zero is:

```
int const m = CHAR_BIT * sizeof(x) - b;
r = (x << m) >> m;
```

**Notes:**

*Sean A. Irvine suggested that I add sign extension methods to this page on June 13, 2004, and he provided m = (1 << (b - 1)) - 1; r = -(x & ~m) | x; as a starting point from which I optimized to get m = 1U << (b - 1); r = -(x & m) | x.*

*But then on May 11, 2007, Shay Green suggested the version above, which requires one less operation than mine.*

*Vipin Sharma suggested I add a step to deal with situations where x had possible ones in bits other than the b bits we wanted to sign-extend on Oct. 15, 2008.*

*On December 31, 2009 Chris Pirazzi suggested I add the faster version, which requires two operations for constant bit-widths and three for variable widths.*

The following may be slow on some machines, due to the effort required for multiplication and division. This version is 4 operations. If you know that your initial bit-width, `b`

, is greater than 1, you might do this type of sign extension in 3 operations by using `r = (x * multipliers[b]) / multipliers[b]`

, which requires only one array lookup.

```
unsigned b; // number of bits representing the number in x
int x; // sign extend this b-bit number to r
int r; // resulting sign-extended number
#define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byte
static int const multipliers[] =
{
0, M(1), M(2), M(3), M(4), M(5), M(6), M(7),
M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15),
M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
M(32)
}; // (add more if using more than 64 bits)
static int const divisors[] =
{
1, ~M(1), M(2), M(3), M(4), M(5), M(6), M(7),
M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15),
M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23),
M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31),
M(32)
}; // (add more for 64 bits)
#undef M
r = (x * multipliers[b]) / divisors[b];
```

The following variation is not portable, but on architectures that employ an arithmetic right-shift, maintaining the sign, it should be fast.

```
const int s = -b; // OR: sizeof(x) * CHAR_BIT - b;
r = (x << s) >> s;
```

**Notes:**

*Randal E. Bryant pointed out a bug on May 3, 2005 in an earlier version (that used `multipliers[]`

for `divisors[]`

), where it failed on the case of `x=1`

and `b=1`

.

```
bool f; // conditional flag
unsigned int m; // the bit mask
unsigned int w; // the word to modify: if (f) w |= m; else w &= ~m;
w ^= (-f ^ w) & m;
// OR, for superscalar CPUs:
w = (w & ~m) | (-f & m);
```

**Notes:**

*On some architectures, the lack of branching can more than make up for what appears to be twice as many operations. For instance, informal speed tests on an AMD Athlon™ XP 2100+ indicated it was 5-10% faster. An Intel Core 2 Duo ran the superscalar version about 16% faster than the first.*

*Glenn Slayden informed me of the first expression on December 11, 2003.*

*Marco Yu shared the superscalar version with me on April 3, 2007 and alerted me to a typo 2 days later.*

If you need to negate only when a flag is false, then use the following to avoid branching:

```
bool fDontNegate; // Flag indicating we should not negate v.
int v; // Input value to negate if fDontNegate is false.
int r; // result = fDontNegate ? v : -v;
r = (fDontNegate ^ (fDontNegate - 1)) * v;
If you need to negate only when a flag is true, then use this:
bool fNegate; // Flag indicating if we should negate v.
int v; // Input value to negate if fNegate is true.
int r; // result = fNegate ? -v : v;
r = (v ^ -fNegate) + fNegate;
```

**Notes:***

*Avraham Plotnitzky suggested I add the first version on June 2, 2009. Motivated to avoid the multiply, I came up with the second version on June 8, 2009.*

*Alfonso De Gregorio pointed out that some parens were missing on November 26, 2009, and received a bug bounty.*

```
unsigned int a; // value to merge in non-masked bits
unsigned int b; // value to merge in masked bits
unsigned int mask; // 1 where bits from b should be selected; 0 where from a.
unsigned int r; // result of (a & ~mask) | (b & mask) goes here
r = a ^ ((a ^ b) & mask);
```

**Notes:**

*This shaves one operation from the obvious way of combining two sets of bits according to a bit mask. If the mask is a constant, then there may be no advantage. Ron Jeffery sent this to me on February 9, 2006.*

```
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; v >>= 1)
{
c += v & 1;
}
```

The naive approach requires one iteration per bit, until no more bits are set. So on a 32-bit word with only the high set, it will go through 32 iterations.

```
static const unsigned char BitsSetTable256[256] =
{
# define B2(n) n, n+1, n+1, n+2
# define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2)
# define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
B6(0), B6(1), B6(1), B6(2)
};
unsigned int v; // count the number of bits set in 32-bit value v
unsigned int c; // c is the total bits set in v
// Option 1:
c = BitsSetTable256[v & 0xff] +
BitsSetTable256[(v >> 8) & 0xff] +
BitsSetTable256[(v >> 16) & 0xff] +
BitsSetTable256[v >> 24];
// Option 2:
unsigned char * p = (unsigned char *) &v;
c = BitsSetTable256[p[0]] +
BitsSetTable256[p[1]] +
BitsSetTable256[p[2]] +
BitsSetTable256[p[3]];
// To initially generate the table algorithmically:
BitsSetTable256[0] = 0;
for (int i = 0; i < 256; i++)
{
BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
}
```

**Notes:**

*On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.*

```
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
```

**Notes:**

*Brian Kernighan's method goes through as many iterations as there are set bits. So if we have a 32-bit word with only the high bit set, then it will only go once through the loop.*

*Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9. On April 19, 2006 Don Knuth pointed out to me that this method "was first published by Peter Wegner in CACM 3 (1960), 322. (Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.)"*

```
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
// option 1, for at most 14-bit values in v:
c = (v * 0x200040008001ULL & 0x111111111111111ULL) % 0xf;
// option 2, for at most 24-bit values in v:
c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
// option 3, for at most 32-bit values in v:
c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
```

This method requires a 64-bit CPU with fast modulus division to be efficient. The first option takes only 3 operations; the second option takes 10; and the third option takes 15.

**Notes:**

*Rich Schroeppel originally created a 9-bit version, similiar to option 1; see the Programming Hacks section of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972. His method was the inspiration for the variants above, devised by Sean Anderson. Randal E.*

*Bryant offered a couple bug fixes on May 3, 2005.*

*Bruce Dawson tweaked what had been a 12-bit version and made it suitable for 14 bits using the same number of operations on Feburary 1, 2007.*

```
unsigned int v; // count bits set in this (32-bit value)
unsigned int c; // store the total here
static const int S[] = {1, 2, 4, 8, 16}; // Magic Binary Numbers
static const int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, 0x0000FFFF};
c = v - ((v >> 1) & B[0]);
c = ((c >> S[1]) & B[1]) + (c & B[1]);
c = ((c >> S[2]) + c) & B[2];
c = ((c >> S[3]) + c) & B[3];
c = ((c >> S[4]) + c) & B[4];
```

The B array, expressed as binary, is:

```
B[0] = 0x55555555 = 01010101 01010101 01010101 01010101
B[1] = 0x33333333 = 00110011 00110011 00110011 00110011
B[2] = 0x0F0F0F0F = 00001111 00001111 00001111 00001111
B[3] = 0x00FF00FF = 00000000 11111111 00000000 11111111
B[4] = 0x0000FFFF = 00000000 00000000 11111111 11111111
```

We can adjust the method for larger integer sizes by continuing with the patterns for the Binary Magic Numbers, `B`

and `S`

. If there are `k`

bits, then we need the arrays `S`

and `B`

to be `ceil(lg(k))`

elements long, and we must compute the same number of expressions for `c`

as `S`

or `B`

are long. For a 32-bit `v`

, 16 operations are used.

The best method for counting bits in a 32-bit integer `v`

is the following:

```
v = v - ((v >> 1) & 0x55555555); // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
```

The best bit counting method takes only 12 operations, which is the same as the lookup-table method, but avoids the memory and potential cache misses of a table. It is a hybrid between the purely parallel method above and the earlier methods using multiplies (in the section on counting bits with 64-bit instructions), though it doesn't use 64-bit instructions. The counts of bits set in the bytes is done in parallel, and the sum total of the bits set in the bytes is computed by multiplying by `0x1010101`

and shifting right 24 bits.

A generalization of the best bit counting method to integers of bit-widths upto 128 (parameterized by type `T`

) is this:

```
v = v - ((v >> 1) & (T)~(T)0/3); // temp
v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3); // temp
v = (v + (v >> 4)) & (T)~(T)0/255*15; // temp
c = (T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * CHAR_BIT; // count
```

**Notes:**

*See Ian Ashdown's nice newsgroup post for more information on counting the number of bits set (also known as sideways addition).*

*The best bit counting method was brought to my attention on October 5, 2005 by Andrew Shapira; he found it in pages 187-188 of Software Optimization Guide for AMD Athlon™ 64 and Opteron™ Processors.*

*Charlie Gordon suggested a way to shave off one operation from the purely parallel version on December 14, 2005, and Don Clugston trimmed three more from it on December 30, 2005.*

*I made a typo with Don's suggestion that Eric Cole spotted on January 8, 2006.*

*Eric later suggested the arbitrary bit-width generalization to the best method on November 17, 2006.*

*On April 5, 2007, Al Williams observed that I had a line of dead code at the top of the first method.*

The following finds the the rank of a bit, meaning it returns the sum of bits that are set to `1`

from the most-signficant bit downto the bit at the given position.

```
uint64_t v; // Compute the rank (bits set) in v from the MSB to pos.
unsigned int pos; // Bit position to count bits upto.
uint64_t r; // Resulting rank of bit at pos goes here.
// Shift out bits after given position.
r = v >> (sizeof(v) * CHAR_BIT - pos);
// Count set bits in parallel.
// r = (r & 0x5555...) + ((r >> 1) & 0x5555...);
r = r - ((r >> 1) & ~0UL/3);
// r = (r & 0x3333...) + ((r >> 2) & 0x3333...);
r = (r & ~0UL/5) + ((r >> 2) & ~0UL/5);
// r = (r & 0x0f0f...) + ((r >> 4) & 0x0f0f...);
r = (r + (r >> 4)) & ~0UL/17;
// r = r % 255;
r = (r * (~0UL/255)) >> ((sizeof(v) - 1) * CHAR_BIT);
```

**Notes:**

*Juha Järvi sent this to me on November 21, 2009 as an inverse operation to the computing the bit position with the given rank, which follows.*

The following 64-bit code selects the position of the `r`

th `1`

bit when counting from the left. In other words if we start at the most significant bit and proceed to the right, counting the number of bits set to `1`

until we reach the desired rank, `r`

, then the position where we stop is returned. If the rank requested exceeds the count of bits set, then 64 is returned. The code may be modified for 32-bit or counting from the right.

```
uint64_t v; // Input value to find position with rank r.
unsigned int r; // Input: bit's desired rank [1-64].
unsigned int s; // Output: Resulting position of bit with rank r [1-64]
uint64_t a, b, c, d; // Intermediate temporaries for bit count.
unsigned int t; // Bit count temporary.
// Do a normal parallel bit count for a 64-bit integer,
// but store all intermediate steps.
// a = (v & 0x5555...) + ((v >> 1) & 0x5555...);
a = v - ((v >> 1) & ~0UL/3);
// b = (a & 0x3333...) + ((a >> 2) & 0x3333...);
b = (a & ~0UL/5) + ((a >> 2) & ~0UL/5);
// c = (b & 0x0f0f...) + ((b >> 4) & 0x0f0f...);
c = (b + (b >> 4)) & ~0UL/0x11;
// d = (c & 0x00ff...) + ((c >> 8) & 0x00ff...);
d = (c + (c >> 8)) & ~0UL/0x101;
t = (d >> 32) + (d >> 48);
// Now do branchless select!
s = 64;
// if (r > t) {s -= 32; r -= t;}
s -= ((t - r) & 256) >> 3; r -= (t & ((t - r) >> 8));
t = (d >> (s - 16)) & 0xff;
// if (r > t) {s -= 16; r -= t;}
s -= ((t - r) & 256) >> 4; r -= (t & ((t - r) >> 8));
t = (c >> (s - 8)) & 0xf;
// if (r > t) {s -= 8; r -= t;}
s -= ((t - r) & 256) >> 5; r -= (t & ((t - r) >> 8));
t = (b >> (s - 4)) & 0x7;
// if (r > t) {s -= 4; r -= t;}
s -= ((t - r) & 256) >> 6; r -= (t & ((t - r) >> 8));
t = (a >> (s - 2)) & 0x3;
// if (r > t) {s -= 2; r -= t;}
s -= ((t - r) & 256) >> 7; r -= (t & ((t - r) >> 8));
t = (v >> (s - 1)) & 0x1;
// if (r > t) s--;
s -= ((t - r) & 256) >> 8;
s = 65 - s;
```

**If branching is fast on your target CPU, consider uncommenting the if-statements and commenting the lines that follow them.**

**Notes:***

*Juha Järvi sent this to me on November 21, 2009.*

```
unsigned int v; // word value to compute the parity of
bool parity = false; // parity will be the parity of v
while (v)
{
parity = !parity;
v = v & (v - 1);
}
```

The above code uses an approach like Brian Kernigan's bit counting, above. The time it takes is proportional to the number of bits set.

```
static const bool ParityTable256[256] =
{
# define P2(n) n, n^1, n^1, n
# define P4(n) P2(n), P2(n^1), P2(n^1), P2(n)
# define P6(n) P4(n), P4(n^1), P4(n^1), P4(n)
P6(0), P6(1), P6(1), P6(0)
};
unsigned char b; // byte value to compute the parity of
bool parity = ParityTable256[b];
// OR, for 32-bit words:
unsigned int v;
v ^= v >> 16;
v ^= v >> 8;
bool parity = ParityTable256[v & 0xff];
// Variation:
unsigned char * p = (unsigned char *) &v;
parity = ParityTable256[p[0] ^ p[1] ^ p[2] ^ p[3]];
```

**Notes:**

*Randal E. Bryant encouraged the addition of the (admittedly) obvious last variation with variable p on May 3, 2005.*

*Bruce Rawles found a typo in an instance of the table variable's name on September 27, 2005, and he received a $10 bug bounty.*

*On October 9, 2006, Fabrice Bellard suggested the 32-bit variations above, which require only one table lookup; the previous version had four lookups (one per byte) and were slower.*

*On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.*

```
unsigned char b; // byte value to compute the parity of
bool parity = (((b * 0x0101010101010101ULL) & 0x8040201008040201ULL) % 0x1FF) & 1;
```

The method above takes around 4 operations, but only works on bytes.

The following method computes the parity of the 32-bit value in only 8 operations using a multiply.

```
unsigned int v; // 32-bit word
v ^= v >> 1;
v ^= v >> 2;
v = (v & 0x11111111U) * 0x11111111U;
return (v >> 28) & 1;
```

Also for 64-bits, 8 operations are still enough.

```
unsigned long long v; // 64-bit word
v ^= v >> 1;
v ^= v >> 2;
v = (v & 0x1111111111111111UL) * 0x1111111111111111UL;
return (v >> 60) & 1;
```

**Notes:**

*Andrew Shapira came up with this and sent it to me on Sept. 2, 2007.*

```
unsigned int v; // word value to compute the parity of
v ^= v >> 16;
v ^= v >> 8;
v ^= v >> 4;
v &= 0xf;
return (0x6996 >> v) & 1;
```

The method above takes around 9 operations, and works for 32-bit words. It may be optimized to work just on bytes in 5 operations by removing the two lines immediately following `unsigned int v;`

. The method first shifts and XORs the eight nibbles of the 32-bit value together, leaving the result in the lowest nibble of `v`

. Next, the binary number `0110 1001 1001 0110`

(`0x6996`

in hex) is shifted to the right by the value represented in the lowest nibble of `v`

. This number is like a miniature 16-bit parity-table indexed by the low four bits in `v`

. The result has the parity of `v`

in bit `1`

, which is masked and returned.

**Notes:**

*Thanks to Mathew Hendry for pointing out the shift-lookup idea at the end on Dec. 15, 2002. That optimization shaves two operations off using only shifting and XORing to find the parity.*

```
#define SWAP(a, b) ((&(a) == &(b)) || (((a) -= (b)), ((b) += (a)), ((a) = (b) - (a))))
```

This swaps the values of `a`

and `b`

without using a temporary variable. The initial check for `a`

and `b`

being the same location in memory may be omitted when you know this can't happen. (The compiler may omit it anyway as an optimization.) If you enable overflows exceptions, then pass unsigned values so an exception isn't thrown. The XOR method that follows may be slightly faster on some machines. Don't use this with floating-point numbers (unless you operate on their raw integer representations).

**Notes:**

*Sanjeev Sivasankaran suggested I add this on June 12, 2007.*

*Vincent Lefèvre pointed out the potential for overflow exceptions on July 9, 2008*

```
#define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))
```

This is an old trick to exchange the values of the variables `a`

and `b`

without using extra space for a temporary variable.

**Notes:**

*On January 20, 2005, Iain A. Fleming pointed out that the macro above doesn't work when you swap with the same memory location, such as SWAP(a[i], a[j]) with i == j. So if that may occur, consider defining the macro as (((a) == (b)) || (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b)))).*

*On July 14, 2009, Hallvard Furuseth suggested that on some machines, (((a) ^ (b)) && ((b) ^= (a) ^= (b), (a) ^= (b))) might be faster, since the (a) ^ (b) expression is reused.*

```
unsigned int i, j; // positions of bit sequences to swap
unsigned int n; // number of consecutive bits in each sequence
unsigned int b; // bits to swap reside in b
unsigned int r; // bit-swapped result goes here
unsigned int x = ((b >> i) ^ (b >> j)) & ((1U << n) - 1); // XOR temporary
r = b ^ ((x << i) | (x << j));
```

As an example of swapping ranges of bits suppose we have have `b = 00101111`

(expressed in binary) and we want to swap the `n = 3`

consecutive bits starting at `i = 1`

(the second bit from the right) with the 3 consecutive bits starting at `j = 5;`

the result would be `r = 11100011`

(binary).

This method of swapping is similar to the general purpose XOR swap trick, but intended for operating on individual bits. The variable `x`

stores the result of XORing the pairs of bit values we want to swap, and then the bits are set to the result of themselves XORed with `x`

. Of course, the result is undefined if the sequences overlap.

**Notes:**

*On July 14, 2009 Hallvard Furuseth suggested that I change the 1 << n to 1U << n because the value was being assigned to an unsigned and to avoid shifting into a sign bit.*

```
unsigned int v; // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end
for (v >>= 1; v; v >>= 1)
{
r <<= 1;
r |= v & 1;
s--;
}
r <<= s; // shift when v's highest bits are zero
```

**Notes:**

*On October 15, 2004, Michael Hoisie pointed out a bug in the original version.*

*Randal E. Bryant suggested removing an extra operation on May 3, 2005.*

*Behdad Esfabod suggested a slight change that eliminated one iteration of the loop on May 18, 2005.*

*Then, on February 6, 2007, Liyong Zhou suggested a better version that loops while v is not 0, so rather than iterating over all bits it stops early.*

```
static const unsigned char BitReverseTable256[256] =
{
# define R2(n) n, n + 2*64, n + 1*64, n + 3*64
# define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
# define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
R6(0), R6(2), R6(1), R6(3)
};
unsigned int v; // reverse 32-bit value, 8 bits at time
unsigned int c; // c will get v reversed
// Option 1:
c = (BitReverseTable256[v & 0xff] << 24) |
(BitReverseTable256[(v >> 8) & 0xff] << 16) |
(BitReverseTable256[(v >> 16) & 0xff] << 8) |
(BitReverseTable256[(v >> 24) & 0xff]);
// Option 2:
unsigned char * p = (unsigned char *) &v;
unsigned char * q = (unsigned char *) &c;
q[3] = BitReverseTable256[p[0]];
q[2] = BitReverseTable256[p[1]];
q[1] = BitReverseTable256[p[2]];
q[0] = BitReverseTable256[p[3]];
```

The first method takes about 17 operations, and the second takes about 12, assuming your CPU can load and store bytes easily.

**Notes:**

*On July 14, 2009 Hallvard Furuseth suggested the macro compacted table.*

```
unsigned char b; // reverse this (8-bit) byte
b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
```

The multiply operation creates five separate copies of the 8-bit byte pattern to fan-out into a 64-bit value. The AND operation selects the bits that are in the correct (reversed) positions, relative to each 10-bit groups of bits. The multiply and the AND operations copy the bits from the original byte so they each appear in only one of the 10-bit sets. The reversed positions of the bits from the original byte coincide with their relative positions within any 10-bit set. The last step, which involves modulus division by `2^10 - 1`

, has the effect of merging together each set of 10 bits (from positions 0-9, 10-19, 20-29, ...) in the 64-bit value. They do not overlap, so the addition steps underlying the modulus division behave like or operations.

**Notes:**

*This method was attributed to Rich Schroeppel in the Programming Hacks section of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972.*

```
unsigned char b; // reverse this byte
b = ((b * 0x80200802ULL) & 0x0884422110ULL) * 0x0101010101ULL >> 32;
```

The following shows the flow of the bit values with the boolean variables `a, b, c, d, e, f, g,`

and `h`

, which comprise an 8-bit byte. Notice how the first multiply fans out the bit pattern to multiple copies, while the last multiply combines them in the fifth byte from the right.

```
abcd efgh (-> hgfe dcba)
* 1000 0000 0010 0000 0000 1000 0000 0010 (0x80200802)
-------------------------------------------------------------------------------------------------
0abc defg h00a bcde fgh0 0abc defg h00a bcde fgh0
& 0000 1000 1000 0100 0100 0010 0010 0001 0001 0000 (0x0884422110)
-------------------------------------------------------------------------------------------------
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
* 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 (0x0101010101)
-------------------------------------------------------------------------------------------------
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
0000 d000 h000 0c00 0g00 00b0 00f0 000a 000e 0000
-------------------------------------------------------------------------------------------------
0000 d000 h000 dc00 hg00 dcb0 hgf0 dcba hgfe dcba hgfe 0cba 0gfe 00ba 00fe 000a 000e 0000
>> 32
-------------------------------------------------------------------------------------------------
0000 d000 h000 dc00 hg00 dcb0 hgf0 dcba hgfe dcba
& 1111 1111
-------------------------------------------------------------------------------------------------
hgfe dcba
```

Note that the last two steps can be combined on some processors because the registers can be accessed as bytes; just multiply so that a register stores the upper 32 bits of the result and the take the low byte. Thus, it may take only 6 operations.

**Notes:**

*Devised by Sean Anderson, July 13, 2001.*

```
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
```

Make sure you assign or cast the result to an unsigned char to remove garbage in the higher bits.

**Notes:**

*Devised by Sean Anderson, July 13, 2001.*

*Typo spotted and correction supplied by Mike Keith, January 3, 2002.*

```
unsigned int v; // 32-bit word to reverse bit order
// swap odd and even bits
v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
// swap consecutive pairs
v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
// swap nibbles ...
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
// swap bytes
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
// swap 2-byte long pairs
v = ( v >> 16 ) | ( v << 16);
```

The following variation is also `O(lg(N))`

, however it requires more operations to reverse `v`

. Its virtue is in taking less slightly memory by computing the constants on the fly.

```
unsigned int s = sizeof(v) * CHAR_BIT; // bit size; must be power of 2
unsigned int mask = ~0;
while ((s >>= 1) > 0)
{
mask ^= (mask << s);
v = ((v >> s) & mask) | ((v << s) & ~mask);
}
```

These methods above are best suited to situations where `N`

is large. If you use the above with 64-bit ints (or larger), then you need to add more lines (following the pattern); otherwise only the lower 32 bits will be reversed and the result will be in the lower 32 bits.

**Notes:**

*See Dr. Dobb's Journal 1983, Edwin Freed's article on Binary Magic Numbers for more information.*

*The second variation was suggested by Ken Raeburn on September 13, 2005.*

*Veldmeijer mentioned that the first version could do without ANDS in the last line on March 19, 2006.*

```
const unsigned int n; // numerator
const unsigned int s;
const unsigned int d = 1U << s; // So d will be one of: 1, 2, 4, 8, 16, 32, ...
unsigned int m; // m will be n % d
m = n & (d - 1);
```

Most programmers learn this trick early, but it was included for the sake of completeness.

```
unsigned int n; // numerator
const unsigned int s; // s > 0
const unsigned int d = (1 << s) - 1; // so d is either 1, 3, 7, 15, 31, ...).
unsigned int m; // n % d goes here.
for (m = n; n > d; n = m)
{
for (m = 0; n; n >>= s)
{
m += n & d;
}
}
// Now m is a value from 0 to d, but since with modulus division
// we want m to be 0 when it is d.
m = m == d ? 0 : m;
```

This method of modulus division by an integer that is one less than a power of 2 takes at most `5 + (4 + 5 * ceil(N / s)) * ceil(lg(N / s))`

operations, where `N`

is the number of bits in the numerator. In other words, it takes at most `O(N * lg(N))`

time.

**Notes:**

*Devised by Sean Anderson, August 15, 2001.*

*Before Sean A. Irvine corrected me on June 17, 2004, I mistakenly commented that we could alternatively assign m = ((m + 1) & d) - 1; at the end.*

*Michael Miller spotted a typo in the code April 25, 2005.*

```
// The following is for a word size of 32 bits!
static const unsigned int M[] =
{
0x00000000, 0x55555555, 0x33333333, 0xc71c71c7,
0x0f0f0f0f, 0xc1f07c1f, 0x3f03f03f, 0xf01fc07f,
0x00ff00ff, 0x07fc01ff, 0x3ff003ff, 0xffc007ff,
0xff000fff, 0xfc001fff, 0xf0003fff, 0xc0007fff,
0x0000ffff, 0x0001ffff, 0x0003ffff, 0x0007ffff,
0x000fffff, 0x001fffff, 0x003fffff, 0x007fffff,
0x00ffffff, 0x01ffffff, 0x03ffffff, 0x07ffffff,
0x0fffffff, 0x1fffffff, 0x3fffffff, 0x7fffffff
};
static const unsigned int Q[][6] =
{
{ 0, 0, 0, 0, 0, 0}, {16, 8, 4, 2, 1, 1}, {16, 8, 4, 2, 2, 2},
{15, 6, 3, 3, 3, 3}, {16, 8, 4, 4, 4, 4}, {15, 5, 5, 5, 5, 5},
{12, 6, 6, 6 , 6, 6}, {14, 7, 7, 7, 7, 7}, {16, 8, 8, 8, 8, 8},
{ 9, 9, 9, 9, 9, 9}, {10, 10, 10, 10, 10, 10}, {11, 11, 11, 11, 11, 11},
{12, 12, 12, 12, 12, 12}, {13, 13, 13, 13, 13, 13}, {14, 14, 14, 14, 14, 14},
{15, 15, 15, 15, 15, 15}, {16, 16, 16, 16, 16, 16}, {17, 17, 17, 17, 17, 17},
{18, 18, 18, 18, 18, 18}, {19, 19, 19, 19, 19, 19}, {20, 20, 20, 20, 20, 20},
{21, 21, 21, 21, 21, 21}, {22, 22, 22, 22, 22, 22}, {23, 23, 23, 23, 23, 23},
{24, 24, 24, 24, 24, 24}, {25, 25, 25, 25, 25, 25}, {26, 26, 26, 26, 26, 26},
{27, 27, 27, 27, 27, 27}, {28, 28, 28, 28, 28, 28}, {29, 29, 29, 29, 29, 29},
{30, 30, 30, 30, 30, 30}, {31, 31, 31, 31, 31, 31}
};
static const unsigned int R[][6] =
{
{0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x00000003, 0x00000001, 0x00000001},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x00000003, 0x00000003, 0x00000003},
{0x00007fff, 0x0000003f, 0x00000007, 0x00000007, 0x00000007, 0x00000007},
{0x0000ffff, 0x000000ff, 0x0000000f, 0x0000000f, 0x0000000f, 0x0000000f},
{0x00007fff, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f, 0x0000001f},
{0x00000fff, 0x0000003f, 0x0000003f, 0x0000003f, 0x0000003f, 0x0000003f},
{0x00003fff, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f, 0x0000007f},
{0x0000ffff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff, 0x000000ff},
{0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff, 0x000001ff},
{0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff, 0x000003ff},
{0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff, 0x000007ff},
{0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff, 0x00000fff},
{0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff, 0x00001fff},
{0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff, 0x00003fff},
{0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff, 0x00007fff},
{0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff},
{0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff, 0x0001ffff},
{0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff, 0x0003ffff},
{0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff, 0x0007ffff},
{0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff, 0x000fffff},
{0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff, 0x001fffff},
{0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff, 0x003fffff},
{0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff, 0x007fffff},
{0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff, 0x00ffffff},
{0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff, 0x01ffffff},
{0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff, 0x03ffffff},
{0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff, 0x07ffffff},
{0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff, 0x0fffffff},
{0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff, 0x1fffffff},
{0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff, 0x3fffffff},
{0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff}
};
unsigned int n; // numerator
const unsigned int s; // s > 0
const unsigned int d = (1 << s) - 1; // so d is either 1, 3, 7, 15, 31, ...).
unsigned int m; // n % d goes here.
m = (n & M[s]) + ((n >> s) & M[s]);
for (const unsigned int * q = &Q[s][0], * r = &R[s][0]; m > d; q++, r++)
{
m = (m >> *q) + (m & *r);
}
m = m == d ? 0 : m; // OR, less portably: m = m & -((signed)(m - d) >> s);
```

This method of finding modulus division by an integer that is one less than a power of 2 takes at most `O(lg(N))`

time, where `N`

is the number of bits in the numerator (32 bits, for the code above). The number of operations is at most `12 + 9 * ceil(lg(N))`

. The tables may be removed if you know the denominator at compile time; just extract the few relevent entries and unroll the loop. It may be easily extended to more bits.

It finds the result by summing the values in base `(1 << s)`

in parallel. First every other base `(1 << s)`

value is added to the previous one. Imagine that the result is written on a piece of paper. Cut the paper in half, so that half the values are on each cut piece. Align the values and sum them onto a new piece of paper. Repeat by cutting this paper in half (which will be a quarter of the size of the previous one) and summing, until you cannot cut further. After performing `lg(N/s/2)`

cuts, we cut no more; just continue to add the values and put the result onto a new piece of paper as before, while there are at least two s-bit values.

**Notes:**

*Devised by Sean Anderson, August 20, 2001.*

*A typo was spotted by Randy E. Bryant on May 3, 2005 (after pasting the code, I had later added "unsinged" to a variable declaration).*

*As in the previous hack, I mistakenly commented that we could alternatively assign m = ((m + 1) & d) - 1; at the end, and Don Knuth corrected me on April 19, 2006 and suggested m = m & -((signed)(m - d) >> s).*

*On June 18, 2009 Sean Irvine proposed a change that used ((n >> s) & M[s]) instead of ((n & ~M[s]) >> s), which typically requires fewer operations because the M[s] constant is already loaded.*

```
unsigned int v; // 32-bit word to find the log base 2 of
unsigned int r = 0; // r will be lg(v)
while (v >>= 1) // unroll for more speed...
{
r++;
}
```

The log base 2 of an integer is the same as the position of the highest bit set (or most significant bit set, MSB). The following log base 2 methods are faster than this one.

```
int v; // 32-bit integer to find the log base 2 of
int r; // result of log_2(v) goes here
union { unsigned int u[2]; double d; } t; // temp
t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = v;
t.d -= 4503599627370496.0;
r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF;
```

The code above loads a 64-bit (IEEE-754 floating-point) double with a 32-bit integer (with no paddding bits) by storing the integer in the mantissa while the exponent is set to `252`

. From this newly minted double, `252`

(expressed as a double) is subtracted, which sets the resulting exponent to the log base 2 of the input value, `v`

. All that is left is shifting the exponent bits into position (20 bits right) and subtracting the bias, `0x3FF`

(which is `1023`

decimal). This technique only takes 5 operations, but many CPUs are slow at manipulating doubles, and the endianess of the architecture must be accommodated.

**Notes:**

*Eric Cole sent me this on January 15, 2006.*

*Evan Felix pointed out a typo on April 4, 2006.*

*Vincent Lefèvre told me on July 9, 2008 to change the endian check to use the float's endian, which could differ from the integer's endian.*

```
static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
unsigned int v; // 32-bit word to find the log of
unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
```

The lookup table method takes only about 7 operations to find the log of a 32-bit value. If extended for 64-bit quantities, it would take roughly 9 operations. Another operation can be trimmed off by using four tables, with the possible additions incorporated into each. Using int table elements may be faster, depending on your architecture.

The code above is tuned to uniformly distributed output values. If your inputs are evenly distributed across all 32-bit values, then consider using the following:

```
if (tt = v >> 24)
{
r = 24 + LogTable256[tt];
}
else if (tt = v >> 16)
{
r = 16 + LogTable256[tt];
}
else if (tt = v >> 8)
{
r = 8 + LogTable256[tt];
}
else
{
r = LogTable256[v];
}
```

To initially generate the log table algorithmically:

```
LogTable256[0] = LogTable256[1] = 0;
for (int i = 2; i < 256; i++)
{
LogTable256[i] = 1 + LogTable256[i / 2];
}
LogTable256[0] = -1; // if you want log(0) to return -1
```

**Notes:***

*Behdad Esfahbod and I shaved off a fraction of an operation (on average) on May 18, 2005.*

*Yet another fraction of an operation was removed on November 14, 2006 by Emanuel Hoogeveen.*

*The variation that is tuned to evenly distributed input values was suggested by David A. Butterfield on September 19, 2008.*

*Venkat Reddy told me on January 5, 2009 that log(0) should return -1 to indicate an error, so I changed the first entry in the table to that.**

```
unsigned int v; // 32-bit value to find the log2 of
const unsigned int b[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000};
const unsigned int S[] = {1, 2, 4, 8, 16};
int i;
register unsigned int r = 0; // result of log2(v) will go here
for (i = 4; i >= 0; i--) // unroll for speed...
{
if (v & b[i])
{
v >>= S[i];
r |= S[i];
}
}
// OR (IF YOUR CPU BRANCHES SLOWLY):
unsigned int v; // 32-bit value to find the log2 of
register unsigned int r; // result of log2(v) will go here
register unsigned int shift;
r = (v > 0xFFFF) << 4; v >>= r;
shift = (v > 0xFF ) << 3; v >>= shift; r |= shift;
shift = (v > 0xF ) << 2; v >>= shift; r |= shift;
shift = (v > 0x3 ) << 1; v >>= shift; r |= shift;
r |= (v >> 1);
// OR (IF YOU KNOW v IS A POWER OF 2):
unsigned int v; // 32-bit value to find the log2 of
static const unsigned int b[] = {0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0,
0xFF00FF00, 0xFFFF0000};
register unsigned int r = (v & b[0]) != 0;
for (i = 4; i > 0; i--) // unroll for speed...
{
r |= ((v & b[i]) != 0) << i;
}
```

Of course, to extend the code to find the log of a 33- to 64-bit number, we would append another element, `0xFFFFFFFF00000000`

, to `b`

, append `32`

to `S`

, and loop from `5`

to `0`

. This method is much slower than the earlier table-lookup version, but if you don't want big table or your architecture is slow to access memory, it's a good choice. The second variation involves slightly more operations, but it may be faster on machines with high branch costs (e.g. PowerPC).

**Notes:***

*The second version was sent to me by Eric Cole on January 7, 2006.*

*Andrew Shapira subsequently trimmed a few operations off of it and sent me his variation (above) on Sept. 1, 2007.*

*The third variation was suggested to me by John Owens on April 24, 2002; it's faster, but it is only suitable when the input is known to be a power of 2.*

*On May 25, 2003, Ken Raeburn suggested improving the general case by using smaller numbers for b[], which load faster on some architectures (for instance if the word size is 16 bits, then only one load instruction may be needed). These values work for the general version, but not for the special-case version below it, where v is a power of 2;*

*Glenn Slayden brought this oversight to my attention on December 12, 2003.*

```
uint32_t v; // find the log base 2 of 32-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v |= v >> 1; // first round down to one less than a power of 2
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
```

The code above computes the log base 2 of a 32-bit integer with a small table lookup and multiply. It requires only 13 operations, compared to (up to) 20 for the previous method. The purely table-based method requires the fewest operations, but this offers a reasonable compromise between table size and speed.

If you know that `v`

is a power of 2, then you only need the following:

```
static const int MultiplyDeBruijnBitPosition2[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];
```

**Notes:**

*Eric Cole devised this January 8, 2006 after reading about the entry below to round up to a power of 2 and the method below for computing the number of trailing bits with a multiply and lookup using a DeBruijn sequence.*

*On December 10, 2009, Mark Dickinson shaved off a couple operations by requiring v be rounded up to one less than the next power of 2 rather than the power of 2.*

```
unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of
int r; // result goes here
int t; // temporary
static unsigned int const PowersOf10[] =
{1, 10, 100, 1000, 10000, 100000,
1000000, 10000000, 100000000, 1000000000};
t = (IntegerLogBase2(v) + 1) * 1233 >> 12; // (use a lg2 method from above)
r = t - (v < PowersOf10[t]);
```

The integer log base 10 is computed by first using one of the techniques above for finding the log base 2. By the relationship `log10(v) = log2(v) / log2(10)`

, we need to multiply it by `1/log2(10)`

, which is approximately `1233/4096`

, or `1233`

followed by a right shift of `12`

. Adding one is needed because the `IntegerLogBase2`

rounds down. Finally, since the value `t`

is only an approximation that may be off by one, the exact value is found by subtracting the result of `v < PowersOf10[t]`

.

This method takes 6 more operations than `IntegerLogBase2`

. It may be sped up (on machines with fast memory access) by modifying the log base 2 table-lookup method above so that the entries hold what is computed for `t`

(that is, pre-add, -mulitply, and -shift). Doing so would require a total of only 9 operations to find the log base 10, assuming 4 tables were used (one for each byte of `v`

).

**Notes:**

*Eric Cole suggested I add a version of this on January 7, 2006.*

```
unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of
int r; // result goes here
r = (v >= 1000000000) ? 9 : (v >= 100000000) ? 8 : (v >= 10000000) ? 7 :
(v >= 1000000) ? 6 : (v >= 100000) ? 5 : (v >= 10000) ? 4 :
(v >= 1000) ? 3 : (v >= 100) ? 2 : (v >= 10) ? 1 : 0;
```

This method works well when the input is uniformly distributed over 32-bit values because 76% of the inputs are caught by the first compare, 21% are caught by the second compare, 2% are caught by the third, and so on (chopping the remaining down by 90% with each comparision). As a result, less than 2.6 operations are needed on average.

**Notes:**

*On April 18, 2007, Emanuel Hoogeveen suggested a variation on this where the conditions used divisions, which were not as fast as simple comparisons.*

```
const float v; // find int(log2(v)), where v > 0.0 && finite(v) && isnormal(v)
int c; // 32-bit int c gets the result;
c = *(const int *) &v; // OR, for portability: memcpy(&c, &v, sizeof c);
c = (c >> 23) - 127;
```

The above is fast, but IEEE 754-compliant architectures utilize subnormal (also called denormal) floating point numbers. These have the exponent bits set to zero (signifying `pow(2,-127))`

, and the mantissa is not normalized, so it contains leading zeros and thus the `log2`

must be computed from the mantissa. To accomodate for subnormal numbers, use the following:

```
const float v; // find int(log2(v)), where v > 0.0 && finite(v)
int c; // 32-bit int c gets the result;
int x = *(const int *) &v; // OR, for portability: memcpy(&x, &v, sizeof x);
c = x >> 23;
if (c)
{
c -= 127;
}
else
{ // subnormal, so recompute using mantissa: c = intlog2(x) - 149;
register unsigned int t; // temporary
// Note that LogTable256 was defined earlier
if (t = x >> 16)
{
c = LogTable256[t] - 133;
}
else
{
c = (t = x >> 8) ? LogTable256[t] - 141 : LogTable256[x] - 149;
}
}
```

**Notes:**

*On June 20, 2004, Sean A. Irvine suggested that I include code to handle subnormal numbers.*

*On June 11, 2005, Falk Hüffner pointed out that ISO C99 6.5/7 specified undefined behavior for the common type punning idiom *(int *)&, though it has worked on 99.9% of C compilers. He proposed using memcpy for maximum portability or a union with a float and an int for better code generation than memcpy on some compilers.*

```
const int r;
const float v; // find int(log2(pow((double) v, 1. / pow(2, r)))),
// where isnormal(v) and v > 0
int c; // 32-bit int c gets the result;
c = *(const int *) &v; // OR, for portability: memcpy(&c, &v, sizeof c);
c = ((((c - 0x3f800000) >> r) + 0x3f800000) >> 23) - 127;
```

So, if `r`

is `0`

, for example, we have `c = int(log2((double) v))`

. If `r`

is `1`

, then we have `c = int(log2(sqrt((double) v)))`

. If `r`

is `2`

, then we have `c = int(log2(pow((double) v, 1./4)))`

.

**Notes:**

*On June 11, 2005, Falk Hüffner pointed out that ISO C99 6.5/7 left the type punning idiom `*(int *)&`

undefined, and he suggested using `memcpy`

.

```
unsigned int v; // input to count trailing zero bits
int c; // output: c will count v's trailing zero bits,
// so if v is 1101000 (base 2), then c will be 3
if (v)
{
v = (v ^ (v - 1)) >> 1; // Set v's trailing 0s to 1s and zero rest
for (c = 0; v; c++)
{
v >>= 1;
}
}
else
{
c = CHAR_BIT * sizeof(v);
}
```

The average number of trailing zero bits in a (uniformly distributed) random binary number is one, so this `O(trailing zeros)`

solution isn't that bad compared to the faster methods below.

**Notes:**

*Jim Cole suggested I add a linear-time method for counting the trailing zeros on August 15, 2007.*

*On October 22, 2007, Jason Cunningham pointed out that I had neglected to paste the unsigned modifier for v.*

```
unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -signed(v);
if (v) c--;
if (v & 0x0000FFFF) c -= 16;
if (v & 0x00FF00FF) c -= 8;
if (v & 0x0F0F0F0F) c -= 4;
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;
```

Here, we are basically doing the same operations as finding the log base 2 in parallel, but we first isolate the lowest 1 bit, and then proceed with `c`

starting at the maximum and decreasing. The number of operations is at most `3 * lg(N) + 4`

, roughly, for `N`

bit words.

**Notes:**

*Bill Burdick suggested an optimization, reducing the time from 4 * lg(N) on February 4, 2011.*

```
unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c; // c will be the number of zero bits on the right,
// so if v is 1101000 (base 2), then c will be 3
// NOTE: if 0 == v, then c = 31.
if (v & 0x1)
{
// special case for odd v (assumed to happen half of the time)
c = 0;
}
else
{
c = 1;
if ((v & 0xffff) == 0)
{
v >>= 16;
c += 16;
}
if ((v & 0xff) == 0)
{
v >>= 8;
c += 8;
}
if ((v & 0xf) == 0)
{
v >>= 4;
c += 4;
}
if ((v & 0x3) == 0)
{
v >>= 2;
c += 2;
}
c -= v & 0x1;
}
```

The code above is similar to the previous method, but it computes the number of trailing zeros by accumulating `c`

in a manner akin to binary search. In the first step, it checks if the bottom 16 bits of `v`

are zeros, and if so, shifts `v`

right 16 bits and adds 16 to `c`

, which reduces the number of bits in `v`

to consider by half. Each of the subsequent conditional steps likewise halves the number of bits until there is only 1.

This method is faster than the last one (by about 33%) because the bodies of the if statements are executed less often.

**Notes:**

*Matt Whitlock suggested this on January 25, 2006.*

*Andrew Shapira shaved a couple operations off on Sept. 5, 2007 (by setting c=1 and unconditionally subtracting at the end).*

```
unsigned int v; // find the number of trailing zeros in v
int r; // the result goes here
float f = (float)(v & -v); // cast the least significant bit in v to a float
r = (*(uint32_t *)&f >> 23) - 0x7f;
```

Although this only takes about 6 operations, the time to convert an integer to a float can be high on some machines. The exponent of the 32-bit IEEE floating point representation is shifted down, and the bias is subtracted to give the position of the least significant 1 bit set in `v`

. If `v`

is zero, then the result is `-127`

.

```
unsigned int v; // find the number of trailing zeros in v
int r; // put the result in r
static const int Mod37BitPosition[] = // map a bit value mod 37 to its position
{
32, 0, 1, 26, 2, 23, 27, 0, 3, 16, 24, 30, 28, 11, 0, 13, 4,
7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6, 0, 21, 14, 9, 5,
20, 8, 19, 18
};
r = Mod37BitPosition[(-v & v) % 37];
```

The code above finds the number of zeros that are trailing on the right, so binary `0100`

would produce `2`

. It makes use of the fact that the first 32 bit position values are relatively prime with `37`

, so performing a modulus division with `37`

gives a unique number from `0`

to `36`

for each. These numbers may then be mapped to the number of zeros using a small lookup table. It uses only 4 operations, however indexing into a table and performing modulus division may make it unsuitable for some situations. I came up with this independently and then searched for a subsequence of the table values, and found it was invented earlier by Reiser, according to Hacker's Delight.

```
unsigned int v; // find the number of trailing zeros in 32-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
```

Converting bit vectors to indices of set bits is an example use for this. It requires one more operation than the earlier one involving modulus division, but the multiply may be faster. The expression `(v & -v)`

extracts the least significant 1 bit from `v`

. The constant `0x077CB531UL`

is a de Bruijn sequence, which produces a unique pattern of bits into the high 5 bits for each possible bit position that it is multiplied against. When there are no bits set, it returns `0`

. More information can be found by reading the paper Using de Bruijn Sequences to Index 1 in a Computer Word by Charles E. Leiserson, Harald Prokof, and Keith H. Randall.

**Notes:**

*On October 8, 2005 Andrew Shapira suggested I add this.*

*Dustin Spicuzza asked me on April 14, 2009 to cast the result of the multiply to a 32-bit type so it would work when compiled with 64-bit ints.*

```
unsigned int const v; // Round this 32-bit value to the next highest power of 2
unsigned int r; // Put the result here. (So v=3 -> r=4; v=8 -> r=8)
if (v > 1)
{
float f = (float)v;
unsigned int const t = 1U << ((*(unsigned int *)&f >> 23) - 0x7f);
r = t << (t < v);
}
else
{
r = 1;
}
```

The code above uses 8 operations, but works on all `v <= (1<<31)`

.

`1 < v < (1<<25)`

:```
float f = (float)(v - 1);
r = 1U << ((*(unsigned int*)(&f) >> 23) - 126);
```

Although the quick and dirty version only uses around 6 operations, it is roughly three times slower than the technique below (which involves 12 operations) when benchmarked on an Athlon™ XP 2100+ CPU. Some CPUs will fare better with it, though.

**Notes:**

*On September 27, 2005 Andi Smithers suggested I include a technique for casting to floats to find the lg of a number for rounding up to a power of 2. Similar to the quick and dirty version here, his version worked with values less than (1<<25), due to mantissa rounding, but it used one more operation.*

```
unsigned int v; // compute the next highest power of 2 of 32-bit v
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
```

In 12 operations, this code computes the next highest power of 2 for a 32-bit integer. The result may be expressed by the formula `1U << (lg(v - 1) + 1)`

. Note that in the edge case where `v`

is `0`

, it returns `0`

, which isn't a power of 2; you might append the expression `v += (v == 0)`

to remedy this if it matters. It would be faster by 2 operations to use the formula and the log base 2 method that uses a lookup table, but in some situations, lookup tables are not suitable, so the above code may be best. (On a Athlon™ XP 2100+ I've found the above shift-left and then OR code is as fast as using a single BSR assembly language instruction, which scans in reverse to find the highest set bit.) It works by copying the highest set bit to all of the lower bits, and then adding one, which results in carries that set all of the lower bits to 0 and one bit beyond the highest set bit to 1. If the original number was a power of 2, then the decrement will reduce it to one less, so that we round up to the same original value.

**Notes:**

*You might alternatively compute the next higher power of 2 in only 8 or 9 operations using a lookup table for floor(lg(v)) and then evaluating 1<<(1+floor(lg(v))); Atul Divekar suggested I mention this on September 5, 2010.*

*Devised by Sean Anderson, Sepember 14, 2001.*

*Pete Hart pointed me to a couple newsgroup posts by him and William Lewis in February of 1997, where they arrive at the same algorithm.*

```
unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z = 0; // z gets the resulting Morton Number.
for (int i = 0; i < sizeof(x) * CHAR_BIT; i++) // unroll for more speed...
{
z |= (x & 1U << i) << i | (y & 1U << i) << (i + 1);
}
```

Interleaved bits (aka Morton numbers) are useful for linearizing 2D integer coordinates, so `x`

and `y`

are combined into a single number that can be compared easily and has the property that a number is usually close to another if their `x`

and `y`

values are close.

```
static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,
0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,
0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,
0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,
0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,
0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,
0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,
0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,
0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,
0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,
0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,
0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,
0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,
0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,
0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,
0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,
0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,
0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,
0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,
0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,
0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,
0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,
0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,
0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,
0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,
0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,
0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,
0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};
unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.
z = MortonTable256[y >> 8] << 17 |
MortonTable256[x >> 8] << 16 |
MortonTable256[y & 0xFF] << 1 |
MortonTable256[x & 0xFF];
```

For more speed, use an additional table with values that are `MortonTable256`

pre-shifted one bit to the left. This second table could then be used for the `y`

lookups, thus reducing the operations by two, but almost doubling the memory required. Extending this same idea, four tables could be used, with two of them pre-shifted by 16 to the left of the previous two, so that we would only need 11 operations total.

In 11 operations, this version interleaves bits of two bytes (rather than shorts, as in the other versions), but many of the operations are 64-bit multiplies so it isn't appropriate for all machines. The input parameters, `x`

and `y`

, should be less than `256`

.

```
unsigned char x; // Interleave bits of (8-bit) x and y, so that all of the
unsigned char y; // bits of x are in the even positions and y in the odd;
unsigned short z; // z gets the resulting 16-bit Morton Number.
z = ((x * 0x0101010101010101ULL & 0x8040201008040201ULL) *
0x0102040810204081ULL >> 49) & 0x5555 |
((y * 0x0101010101010101ULL & 0x8040201008040201ULL) *
0x0102040810204081ULL >> 48) & 0xAAAA;
```

**Notes:**

*Holger Bettag was inspired to suggest this technique on October 10, 2004 after reading the multiply-based bit reversals here.*

```
static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF};
static const unsigned int S[] = {1, 2, 4, 8};
unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x
unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.
// x and y must initially be less than 65536.
x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
y = (y | (y << S[3])) & B[3];
y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];
z = x | (y << 1);
```

```
// Fewer operations:
unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
```

The code above may be useful when doing a fast string copy in which a word is copied at a time; it uses 5 operations. On the other hand, testing for a `null`

byte in the obvious ways (which follow) have at least 7 operations (when counted in the most sparing way), and at most 12.

```
// More operations:
bool hasNoZeroByte = ((v & 0xff) && (v & 0xff00) && (v & 0xff0000) && (v & 0xff000000))
// OR:
unsigned char * p = (unsigned char *) &v;
bool hasNoZeroByte = *p && *(p + 1) && *(p + 2) && *(p + 3);
```

The code at the beginning of this section (labeled "Fewer operations") works by first zeroing the high bits of the 4 bytes in the word. Subsequently, it adds a number that will result in an overflow to the high bit of a byte if any of the low bits were initialy set. Next the high bits of the original word are ORed with these values; thus, the high bit of a byte is set **iff** any bit in the byte was set. Finally, we determine if any of these high bits are zero by ORing with ones everywhere except the high bits and inverting the result. Extending to 64 bits is trivial; simply increase the constants to be `0x7F7F7F7F7F7F7F7F`

.

For an additional improvement, a fast pretest that requires only 4 operations may be performed to determine if the word may have a zero byte. The test also returns true if the high byte is `0x80`

, so there are occasional false positives, but the slower and more reliable version above may then be used on candidates for an overall increase in speed with correct output.

```
bool hasZeroByte = ((v + 0x7efefeff) ^ ~v) & 0x81010100;
if (hasZeroByte) // or may just have 0x80 in the high byte
{
hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
}
```

There is yet a faster method — use `hasless(v, 1)`

, which is defined below; it works in 4 operations and requires no subsquent verification. It simplifies to:

```
#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)
```

The subexpression `(v - 0x01010101UL)`

, evaluates to a high bit set in any byte whenever the corresponding byte in `v`

is zero or greater than `0x80`

. The sub-expression `~v & 0x80808080UL`

evaluates to high bits set in bytes where the byte of `v`

doesn't have its high bit set (so the byte was less than `0x80`

). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in `v`

were zero, since the high bits set due to a value greater than `0x80`

in the first sub-expression are masked off by the second.

**Notes:**

*Paul Messmer suggested the fast pretest improvement on October 2, 2004.*

*Juha Järvi later suggested hasless(v, 1) on April 6, 2005, which he found on Paul Hsieh's Assembly Lab; previously it was written in a newsgroup post on April 27, 1987 by Alan Mycroft.*

We may want to know if any byte in a word has a specific value. To do so, we can XOR the value to test with a word that has been filled with the byte values in which we're interested. Because XORing a value with itself results in a zero byte and nonzero otherwise, we can pass the result to `haszero`

.

```
#define hasvalue(x,n) (haszero((x) ^ (~0UL/255 * (n))))
```

**Notes:**

*Stephen M Bennet suggested this on December 13, 2009 after reading the entry for haszero.*

Test if a word `x`

contains an unsigned byte with `value < n`

. Specifically for `n=1`

, it can be used to find a 0-byte by examining one long at a time, or any byte by XORing x with a mask first. Uses 4 arithmetic/logical operations when `n`

is constant.

Requirements: `x>=0`

; `0<=n<=128`

```
#define hasless(x,n) (((x)-~0UL/255*(n))&~(x)&~0UL/255*128)
```

To count the number of bytes in `x`

that are less than `n`

in 7 operations, use:

```
#define countless(x,n) (((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
```

**Notes:**

*Juha Järvi sent this clever technique to me on April 6, 2005.*

*The countless macro was added by Sean Anderson on April 10, 2005, inspired by Juha's countmore, below.*

Test if a word `x`

contains an unsigned byte with `value > n`

. Uses 3 arithmetic/logical operations when `n`

is constant.

Requirements: `x>=0`

; `0<=n<=127`

```
#define hasmore(x,n) (((x)+~0UL/255*(127-(n))|(x))&~0UL/255*128)
```

To count the number of bytes in `x`

that are more than `n`

in 6 operations, use:

```
#define countmore(x,n) (((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
```

**Notes:**

*The macro hasmore was suggested by Juha Järvi on April 6, 2005, and he added countmore on April 8, 2005.*

When `m < n`

, this technique tests if a word `x`

contains an unsigned byte `value`

, such that `m < value < n`

. It uses 7 arithmetic/logical operations when `n`

and `m`

are constant.

Note: Bytes that equal `n`

can be reported by `likelyhasbetween`

as false positives, so this should be checked by character if a certain result is needed.

Requirements: `x>=0`

; `0<=m<=127`

; `0<=n<=128`

```
#define likelyhasbetween(x,m,n) ((((x)-~0UL/255*(n))&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
```

This technique would be suitable for a fast pretest. A variation that takes one more operation (8 total for constant `m`

and `n`

) but provides the exact answer is:

```
#define hasbetween(x,m,n) ((~0UL/255*(127+(n))-((x)&~0UL/255*127)&~(x)&((x)&~0UL/255*127)+~0UL/255*(127-(m)))&~0UL/255*128)
```

To count the number of bytes in `x`

that are between `m`

and `n`

(exclusive) in 10 operations, use:

```
#define countbetween(x,m,n) (hasbetween(x,m,n)/128%255)
```

**Notes:**

*Juha Järvi suggested likelyhasbetween on April 6, 2005.*

*From there, Sean Anderson created hasbetween and countbetween on April 10, 2005.*

Suppose we have a pattern of `N`

bits set to `1`

in an integer and we want the next permutation of `N`

`1`

bits in a lexicographical sense. For example, if `N`

is `3`

and the bit pattern is `00010011`

, the next patterns would be `00010101`

, `00010110`

, `00011001`

, `00011010`

, `00011100`

, `00100011`

, and so forth. The following is a fast way to compute the next permutation.

```
unsigned int v; // current permutation of bits
unsigned int w; // next permutation of bits
unsigned int t = v | (v - 1); // t gets v's least significant 0 bits set to 1
// Next set to 1 the most significant bit to change,
// set to 0 the least significant ones, and add the necessary 1 bits.
w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
```

The `__builtin_ctz(v)`

GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros. If you are using Microsoft compilers for x86, the intrinsic is `_BitScanForward`

. These both emit a bsf instruction, but equivalents may be available for other architectures. If not, then consider using one of the methods for counting the consecutive zero bits mentioned earlier.

Here is another version that tends to be slower because of its division operator, but it does not require counting the trailing zeros.

```
unsigned int t = (v | (v - 1)) + 1;
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);
```

**Notes:**

*Thanks to Dario Sneidermanis of Argentina, who provided this on November 28, 2009.*

`bcrypt`

is a great utility for encrypting files with the `blowfish`

cipher, which was designed by the legendary cryptographer Bruce Schneier.
One of the reasons that `bcrypt`

is so secure is the internal hashing scheme it uses.

In a typical hash function, such as MD5 or SHA-1/

]]>`bcrypt`

is a great utility for encrypting files with the `blowfish`

cipher, which was designed by the legendary cryptographer Bruce Schneier.
One of the reasons that `bcrypt`

is so secure is the internal hashing scheme it uses.

In a typical hash function, such as MD5 or SHA-1/2, there is only one loop iteration of the hash function. This is because MD5 and the SHA family of hashes were designed to be *fast*, and they are.

Hashing with MD5 or SHA is still incredibly useful for certain tasks, like generating checksums, caching, and other tasks that need to happen quickly, but **neither MD5 nor SHA are resistant to brute-force password cracking**.

Why? Precisely because of how they were designed. Password crackers *love* fast hashing algorithms, because they can generate millions or even billions of guesses per second while brute forcing. Users also love fast hashing algorithms, because they don't have to wait for 20 seconds on the signup page for their plaintext password to be hashed. They love it, that is, until the database of weak hashes is inevitably leaked.

Both MD5 and SHA are considered to be 'cryptographically secure' hash functions, but MD5 has known vulnerabilities to the birthday attack, and just being 'cryptographically secure' doesn't mean you should use them for hashing secure information.

In addition, neither MD5 nor SHA implement secure salting on their own without the developer stumbling through their own implementation and failing 90% of the time, usually by unintentionally reducing the overall keyspace, using the same salt for everyone (this is actually called 'peppering', but the actual use-case is different) or using bad salts such as usernames. Given all these issues with MD5 and SHA, it is clear to see why they fall short of the task when it comes to hashing secure data such as passwords.

**TL;DR**: **Don't ever use MD5 or SHA to hash passwords!**

So how does `bcrypt`

do a better job? When hashing a string of plaintext, `bcrypt`

runs the main loop over a specified number of loop iterations. The number of iterations is defined by a `count`

variable, which is given to `bcrypt`

hash function. The function will complete a number of iterations defined by:

$$iterations = 2^{count}$$

This can be simulated for other hash functions, like SHA-256, but this, again, is **way too easy to screw up**. A better idea is to stick with the crypto mantra:

Of course, like anything, you can still screw up implementing bcrypt, like Ashley Madison developers did, but if you use a library, chances are you'll do it correctly.

In any case, what `bcrypt`

achieves by running through, say, 4096 iterations is that the time it takes to brute-force hashes using tools like John the Ripper or Hashcat is increased **super-duper-significantly**.

Like, instead of cracking at **2.2 billion** guesses per second, you could be cracking at **6 guesses per second**.

Again, you could just run over 4096 iterations of SHA-256, but you will almost certainly screw it up:

```
hash = sha256(salt + password);
for (int i = 0; i < 4096; i++) {
hash = sha256(hash);
}
```

**Wrong, wrong, wrong.** Depending on the length of your input and the number of iterations, you just increased the number of potential collisions dramatically, which is not good.

You could feed the input back into each iteration, like this:

```
hash = sha256(salt + password);
for (int i = 0; i < 4096; i++) {
hash = sha256(salt + password + hash);
}
```

which is the correct thing to do, but really, **just use bcrypt**.

**Side note:** the PBKDF2 algorithm uses this scheme, but it is not as resistant to ASIC- or GPU-based brute-force attacks as `bcrypt`

or `scrypt`

.

**Fun fact:** on OS X your passwords are now stored in a Plist file on a per-user basis. By default, OS X will use PBKDF2 with 25125 rounds of SHA-256 and 42 digits of base-64 encoded salt. You can see your password hash with:

```
sudo defaults read /var/db/dslocal/nodes/Default/users/username.plist ShadowHashData|tr -dc 0-9a-f|xxd -r -p|plutil -convert xml1 - -o -
```

So, after all this, you want to use `bcrypt`

to check out what all the fuss is about. Well, that's easy, since it is already installed on Linux (most distros actually use `bcrypt`

to hash your login password), and it's super easy to grab for OS X:

```
brew install bcrypt
```

You can now easily encrypt and decrypt files and the like with the bcrypt command line utility. Nice!

But... what if you just want to generate a hash, or mess around with using different count values?

Well, that's why I wrote `bcrypt-hash`

(https://github.com/gibsjose/bcrypt-hash).

Install it with:

```
brew tap gibsjose/crypto
brew install bcrypt-hash
```

Now we can check out a couple of use-cases:

```
# Example: Hash the plaintext 't3rr1bl3_p4$$w0rd' with a cost factor of 12
bcrypt-hash -c 12 't3rr1bl3_p4$$w0rd'
```

Which generates:

```
$2y$12$UzKl7mitlZJt52PAMemYmeb9YUC9XhvX6DlbtbaVtdqI32TCPPCj6
```

Or you can omit the cost factor in favour of the default:

```
# Example: Hash the plaintext 'Look! Here is some plaintext...' with the default cost factor of 10
bcrypt-hash 'Look! Here is some plaintext...'
```

```
$2y$10$k8pe9htFbLrJD/EjOE3In.RPOFpPz2WZ44lwQVt8RJRmUgXNnfnSC
```

Finally, you can also check a plaintext against a hash you have generated or have otherwise obtained:

```
# Example: Check the plaintext 'test' against a correct hash
bcrypt-hash check 'test' '$2y$10$5ixGI4bAKbWI4bdlzbXi9uqaOrysHRuqbBLP4N8HhgPL6c5yIuS2a'
```

Which will give:

```
Verified
```

**Don't ever use MD5 or SHA for hashing passwords****Use bcrypt****Use a library, like PHP 5.5+'s built in**`password_hash()`

and`crypto()`

functions**Install**`bcrypt-hash`

if you want to mess with generating or verifying standalone`bcrypt`

hashes from the command-line

Et voilà, here are the scripts in both `fish`

and `bash`

:

`fish`

`# `sizeof`: get the size of a file/`

]]>Et voilà, here are the scripts in both `fish`

and `bash`

:

`fish`

```
# `sizeof`: get the size of a file/directory in human readable format
function sizeof
if test -d $argv
dirsize $argv
else
filesize $argv
end
end
function filesize
ls -lah $argv | awk '{print $5}'
end
function dirsize
du -sh $argv | awk '{print $1}'
end
```

Just add this to your `config.fish`

, or even better, as a function in your fish functions directory at `~/.config/fish/functions/sizeof.fish`

.

`bash`

```
# `sizeof`: get the size of a file/directory in human readable format
function sizeof {
if [[ -d $1 ]]; then
dirsize $1
else
filesize $1
fi
}
function filesize {
ls -lah $1 | awk '{print $5}'
}
function dirsize {
du -sh $1 | awk '{print $1}'
}
```

And add this either to your `.bashrc`

or your `.bash_functions`

file.

Other free Ghost themes can be found at their marketplace.

This tutorial will

]]>Other free Ghost themes can be found at their marketplace.

This tutorial will follow the creation of my custom theme, Boo, from start to finish.

Ghost already has some good documentation on the file structure of themes here, so you should read their overview carefully before moving on.

Basic Theme File Structure (from Ghost documentation):

```
├── /assets
| └── /css
| ├── screen.css
| ├── /fonts
| ├── /images
| ├── /js
├── default.hbs
├── index.hbs [required]
└── post.hbs [required]
└── package.json [will be required from 0.6]
```

For my theme, Boo (did anyone catch the Super Mario reference yet?), I want to accomplish the following things:

- Centre-position a large logo above the blog name on the main page
- Play with the fonts
- Change the colour of post title hyperlinks on the main page and hyperlinks within posts
- Play with different background patterns
- Add a
`favicon`

- Add links to my GitHub, personal website, and Twitter, and a short bio, at the bottom of the page
- Add a 'share on ...' button for Reddit
- Add Disqus comments to the blog
- Add syntax highlighting and change the colour scheme of code snippets
- (Maybe) mess with the navigation

To actually update what is displayed to the users in the browser, you will have to **restart Ghost after each change to the theme files.**

Restart Ghost with:

```
sudo service ghost restart
```

For reference and comparison, here's the main page (`index.html`

) of the blog using the default `Casper`

theme:

`package.json`

Before I begin editing, I will change the `name`

and `version`

attributes in the `package.json`

file, and add additional fields:

```
{
"name": "Boo",
"version": "0.1.0",
"author": { "name" : "Joe Gibson",
"email" : "gibsjose@mail.gvsu.edu",
"url" : "http://www.gibsjose.com"
},
"description": "Boo - A clean and simple Ghost blog theme",
"keywords": "boo, ghost, casper, blog, theme, gibsjose",
"homepage": "www.github.com/gibsjose/Boo",
"license": "MIT"
}
```

More info on the `package.json`

and the possible tags can be found here.

As silly as it sounds, this is the main reason I wanted to modify the theme in the first place, beyond simply enjoying it as an educational exercise.

I will begin by editing the `index.hbs`

file, by inserting the image code right above the lines:

```
{{! Insert the image code right here...}}
<h1 class="page-title">{{@blog.title}}</h1>
<h2 class="page-description">{{@blog.description}}</h2>
```

I'm using the `@blog.logo`

handle so I can easily change the logo everywhere by just changing the logo on the Ghost general settings page.

To insert the logo, I added the following snippet:

```
{{! Large logo centred above the title}}
{{#if @blog.logo}}
<a class="blog-logo-large" href="{{@blog.url}}"><img src={{@blog.logo}}" alt="{{@blog.title}}" /></a>
{{/if}}
```

**Note** that I used `class="blog-logo-large"`

, instead of `blog-logo`

.

This means I will have to create a new entry in the `assets/css/screen.css`

file:

```
/* Large logo centred above blog title */
.blog-logo-large {
display: block;
background: none !important; /* Makes sure there is never a background */
border: none !important; /* Makes sure there is never a border */
}
.blog-logo-large img {
-webkit-box-sizing: border-box;
-moz-box-sizing: border-box;
box-sizing: border-box;
display: block;
margin-left: auto;
margin-right: auto;
height: 180px;
padding: 1px 0 5px 0;
width: auto;
}
```

The key attributes here are under `.blog-logo-large img`

:

`margin-left: auto;`

- These set the left and right margins to be equal, centring the image`margin-right: auto;`

`height: 180px;`

- This sets the height of the logo and thus the size

To maintain the responsiveness when changing screen size, I also added classes under the following entries:

```
@media only screen and (max-width: 900px) {
...
.blog-logo-large img {
padding: 16px 0;
}
...
}
```

and

```
@media only screen and (max-width: 500px) {
...
.blog-logo-large {
padding: 20px;
}
.blog-logo-large img {
height: 100px;
}
...
}
```

Additionally, I removed the small logo on the left of the main page, and increased its size on the post pages, by simply removing the following lines in the `main-nav`

class of the `index.hbs`

header:

```
{{#if @blog.logo}}
<a class="blog-logo" ...
{{/if}}
```

and modifying the `height`

attribute of the `.blog-logo img`

class:

```
...
height: 64px;
...
```

The end result:

The `Casper`

theme uses the Google Fonts API, which is a great way to use custom fonts on your site without any hassle, although it does add to the load time of the page significantly more than using local fonts.

You can browse Google Fonts with their interactive font search, and preview the fonts with any text you want.

For my page, I'm going to use the following fonts:

- Main Page Title:
`Righteous`

- Main Page Description:
`EB Garamond`

- Post Titles:
`Raleway`

- Post Body:
`EB Garamond`

- Everything Else:
`Raleway`

Using the Google Fonts page, you can browse through the fonts and just click `Add to Collection`

for each font you would like. Then click on `Use`

on the bottom right corner, and you can select each of the font weights you want, as well as additional character sets (*i.e.* `latin-ext`

, `cyrillic`

, *etc.*).

After selecting which fonts you want, Google automatically generates an html snippet to copy to the `head`

in `default.hbs`

:

```
<link href='http://fonts.googleapis.com/css?family=Raleway:400,700|Righteous|EB+Garamond&subset=latin,latin-ext,cyrillic' rel='stylesheet' type='text/css'>
```

I put this right below the `{{! Styles'n'Scripts }}`

comment in `default.hbs`

.

Next, to actually use the fonts, I will edit the `assets/css/screen.css`

stylesheet. To specify the font, I use the `font-family`

attribute, and the `font-weight`

attribute to specify (surprise) the font weight.

First I will change the font for post body and headings:

In `body`

:

```
...
font-family: 'EB Garamond', serif;
font-weight: 400;
...
```

In `h1, h2, h3, h4, h5, h6`

:

```
...
font-family: 'Raleway', sans-serif;
font-weight: 700;
...
```

Changing the hyperlink colours is easy as well. I just changed the links to a blue colour when hovered. In `assets/css/screen.css`

:

```
a {
color: #4A4A4A;
transition: color 0.3s ease;
}
a:hover {
color: #3b7dff;
}
```

My goal for the background/colours is to change the colour of the `main-header`

when there is no cover in the ghost settings, change the title font colour to yellow, and change the colour of the description to light grey.

In `assets/css/screen.css`

:

```
.no-cover.main-header {
min-height: 160px;
max-height: 40vh;
background: #404040; /* Dark grey */
}
.no-cover .page-title {
color: rgba(255, 255, 56, 1); /* Yellow */
}
.no-cover .page-description {
color: rgba(140, 140, 140, 1); /* Light grey */
}
```

I also changed the colour of the logo to match the darker background, but that is just changing the image and uploading it again via the Ghost settings panel.

If you have a cover defined in the Ghost settings, none of these changes will matter, and you will instead be editing the normal `.page-title`

`color`

attribute, instead of the `.no-cover .page-title`

attribute.

This logo is also displayed in the top left corner of every page, but this presents an issue: The main page has a dark background but the post pages are white, and my logo happens to be yellow.

I tried adding a drop-shadow, adding a background to the image, and a few other things, but I ultimately decided to create two new images in a new folder under `assets`

: `assets/img/post-logo.png`

and `assets/img/post-logo-hover`

. Then I used a bit of JavaScript to display they darker logo by default and then switch to the yellow logo on hover for an extra bit of pizzaz.

In `post.hbs`

:

```
{{#if @blog.logo}}
<a class="blog-logo" href="{{@blog.url}}">
<img src="{{asset "img/post-logo.png"}}" onmouseover="this.src='{{asset "img/post-logo-hover.png"}}'" onmouseout="this.src='{{asset "img/post-logo.png"}}'" alt="{{@blog.title}}" />
</a>
{{/if}}
```

Notice that I used the `{{asset "path/to/image"}}`

built-in helper, rather than a link directly to the file. This makes the theme easier to reuse and modify later, although this method of using an image that is **not** governed under the `@blog-...`

helper will be an issue if someone wants to use this theme out of the box and doesn't have the knowledge or ability to put their own post page icons in the `assets/img/`

folder.

At this point, my blog main page looks like this:

Pretty nice!

`favicon`

By default, a Ghost blog will use the standard Ghost `favicon`

. To use your own, you just need to add it to the `assets`

folder, and use the built-in `{{asset}}`

helper.

In `defaults.hbs`

, change:

```
<link rel="shortcut icon" href="{{asset "favicon.ico"}}">
```

to

```
<link rel="shortcut icon" href="{{asset "/favicon.ico"}}">
```

Note the addition of the leading `/`

.

Then simply place your `favicon`

under `assets/favicon.ico`

.

The footer that appears under each post is found in, you guessed it: `post.hbs`

. Most of the information it displays is from variables defined by using your Ghost dashboard, by clicking on `Users`

and then your username.

However, I still want to add a link to my GitHub and Twitter, and also change some of the hyperlinks from my blog to my personal website.

Adding links to GitHub and Twitter is easy... just use `<a href=...>`

.

But what if I want to have the fancy icons? Well, you can either use something like FontAwesome, or roll your own icon font with Icomoon. I chose the latter, and added the following files generated by Icomoon to the `assets/fonts`

folder:

`boo-icons.eot`

`boo-icons.svg`

`boo-icons.ttf`

`boo-icons.woff`

Then, I added to the `assets/css/screen.css`

file under the font section:

```
/* Boo */
@font-face {
font-family: "boo-icons";
src:url("../fonts/boo-icons.eot?v=1");
src:url("../fonts/boo-icons.eot?v=1#iefix") format("embedded-opentype"),
url("../fonts/boo-icons.woff?v=1") format("woff"),
url("../fonts/boo-icons.ttf?v=1") format("truetype"),
url("../fonts/boo-icons.svg?v=1#icons") format("svg");
font-weight: normal;
font-style: normal;
}
/* Apply these base styles to all icons */
[class^="icon-"]:before, [class*=" icon-"]:before {
font-family: "boo-icons", "Open Sans", sans-serif;
speak: none;
font-style: normal;
font-weight: normal;
font-variant: normal;
text-transform: none;
line-height: 1;
text-decoration: none !important;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}
.boo-icon-github:before {
content: "\e600";
}
.boo-icon-twitter:before {
content: "\e601";
}
.boo-icon-reddit:before {
content: "\e602";
}
```

and the following further down under the `.post-footer`

attributes to change the colours when hovering:

```
.post-footer .share .boo-icon-reddit:hover {
color: #FF4500;
}
.post-footer .author .boo-icon-github:hover {
color: #6CC644;
}
.post-footer .author .boo-icon-twitter:hover {
color: #55acee;
}
```

To choose the colours, I used BrandColors, which gives you the hex value for the colour set of almost any brand.

**NOTE**: I was unable to get the font icons to work properly as of right now, so I am just using the text for `GitHub`

and `Twitter`

, but I will edit this as soon as I get it working.

Ghost has a great tutorial on this here, and there's really not anything else that I would add to that, other than that if you have downloaded my `Boo`

theme then you will have to modify `post.hbs`

and `default.hbs`

to use your own Disqus username.

Also, I added a slim vertical bar between the date and comment items on the loop, by adding the following to `assets/css/screen.css`

:

```
.post-comment-count {
display: inline-block;
margin-left: 8px;
padding-left: 12px;
border-left: #d5dbde 1px solid;
font-size: 1.3rem;
white-space: nowrap;
}
```

and then using the new class in `partials/loop.hbs`

:

```
{{! Adding comment counts }}
<a class="post-comment-count" href="{{url}}#disqus_thread">Comments</a>
```

Another blog post has a nice tutorial on how to add syntax highlighting using `prism.js`

.

Getting `prism.js`

to work is super simple. Just head over to PrismJs's Download Page and configure the JavaScript and CSS files and download them. You will have downloaded `prism.js`

and `prism.css`

. Now simply place them in the `assets/js/`

and `assets/css/`

folders, respectively.

Then edit `default.hbs`

:

Add the following line under `{{! Styles'n'Scripts }}`

:

```
<link rel="stylesheet" type="text/css" href="{{asset "css/prism.css"}}" />
```

and the following line just *above* `{{! The main JavaScript file for Boo }}`

:

```
<script type="text/javascript" src="{{asset "js/prism.js"}}"></script>
```

Over the course of editing the theme, I made a number of small stylistic changes, such as changing colours of buttons, adding small animations, making external links open in new tabs instead of in the same page, etc. By now, you should be able to make any stylistic changes you want for your own theme.

Oh, and the code to open external links in new tabs is:

```
<script type="text/javascript">
$(document).ready(function() {
$( 'a[href^="http://"]' ).attr( 'target','_blank' );
$( 'a[href^="https://"]' ).attr( 'target','_blank' );
});
</script>
```

You can put this code in the `code injection`

section on the Ghost admin panel.

They have managed to put together a plethora of outstanding web tutorials and guides, explaining everything from the very basics to low-level topics.

This is a list of

]]>They have managed to put together a plethora of outstanding web tutorials and guides, explaining everything from the very basics to low-level topics.

This is a list of my favourite tutorials and guides from DigitalOcean, listed by topic, in no particular order.

- [
**Series**] Building for Production: Web Applications - 5 Common Server Setups for your Web Application
- 5 Ways to Improve your Production Web Application Server Setup
- An Introduction to HAProxy and Load Balancing Concepts

- Initial Server Setup with Ubuntu 14.04
- Additional Recommended Steps for New Ubuntu 14.04 Servers
- How to Add Swap on Ubuntu 14.04

- [
**Series**] An Introduction to Managing DNS

- [
**Series**] Introduction to Git: Installation, Usage, and Branches - How to Use Git Hooks to Automate Development and Deployment Tasks
- How to Set Up a Private Git Serve on a VPS
- How to Set Up GitLab as your Very Own Private GitHub Clone
- Git is Simpler Than You Think
- Git Magic (Stanford)
`gittutorial`

man page- Understanding Git Conceptually
- Git Basics
- Atlassian Git Tutorials
`git`

- The Simple Guide- Git Immersion Lab

- 7 Security Measures to Protect your Servers
- An Introduction to Securing your Linux VPS
- How to Mitigate DDoS Attacks Against your Website with CloudFlare
- How to Set Up a Firewall with UFW on Ubuntu 14.04
- How to Set Up a Firewall Using IPTables on Ubuntu 14.04
- How to Install and Configure OSSEC Security Notifications on Ubuntu 14.04
- How to Protect your Linux Server Against the GHOST Vulnerability
- How to Protect your Linux Server Against the Shellshock Bash Vulnerability
- How to Protect your Linux Server Against the Heartbleed OpenSSL Vulnerability

- SSH Essentials: Working with SSH Servers, Clients, and Keys
- How to Set Up SSH Keys
- How to Use SSH Keys with DigitalOcean Droplets
- How to Configure SSH Key-Based Authentication on a Linux Server
- Understanding the SSH Encryption and Connection Process

- [
**Series**] Introduction to Nginx and LEMP on Ubuntu 14.04 - Understanding Nginx HTTP Proxying, Load Balancing, Buffering, and Caching
- Apache (LAMP) vs Nginx (LEMP): Practical Considerations

- How to Install Node.js on an Ubuntu 14.04 Server
- How to Set Up a Node.js Application for Production on Ubuntu 14.04

- A Basic MySQL Tutorial
- How to Create a New User and Grant Permissions in MySQL
- How to Import and Export Databases and Reset a Root Password in MySQL

I cover setting up Ghost in a later post.

DigitalOcean offers 'droplets': Lightweight SSD-based machines, with a number of both one-click and in-depth configurations and options. You can pretty much do anything with a droplet, from creating a blog (*hey there*), running a LAMP/LEMP web server, rolling your own VPN, and everything in between.

Plus, their documentation and tutorials are mind-bogglingly awesome and cover nearly every topic you could imagine when setting up any kind of server, from the very basics of DNS to how to properly use `git`

branching models.

I'll go through the steps to create a DigitalOcean droplet step-by-step, although in reality it is a pretty foolproof process.

I'll also cover setting up a hostname to address the droplet with the amazing (and **free**) service yDNS

My DigitalOcean droplet tech specs:

- 512 MB of RAM
- 1 CPU
- 20 GB of SSD storage
- Ubuntu 14.04 x64

Note: You will configure these when you set up your droplet.

This one is pretty straightforward... Create an account at DigitalOcean and fill in your credit card info. The most affordable plan is $5, so that's the one I chose - I'm currently broke.

Now just click on Create Droplet and fill in your desired `hostname`

. Note that this does **not** have to be the same as your domain name.

Next, select the plan you want. You can find help on choosing the right server and plan here.

For the region selection you should choose the region physically closest to you, unless you want the machine to be located in a specific place for some reason.

Now select the desired distribution. I chose 64-bit `Ubuntu 14.04`

. There are a number of distributions to choose from, and you can select the version and architecture (32-bit or 64-bit) for each one.

You can also select the `Applications`

tab and select from a myriad of one-click applications. Most of them will run fine on the $5 plan, but some (like a GitLab server) require the $10 plan. If you want to run a Ghost blog, for example, you can either select the base `Ubuntu 14.04 x64`

image from the `Distributions`

tab, or you can choose an image with it pre-installed by selecting the `Applications`

tab and selecting `Ghost`

, which is what I ultimately did.

Configure any additional settings:

I left them all unchecked.

**You should be using SSH to log into your droplet.** You can add a key now, or you can also add one later under the `Security`

item in your account settings.

Aaaand if everything looks good, go ahead and click `Create Droplet`

and you're done with the creation step!

After about a minute you will be taken to a page that looks something like this:

Copy the IP address in the top left corner. This is the address you will use to SSH into your droplet and to assign a host name (next) or domain name to it.

I always like to create a intermediary hostname for my servers, rather than just using the IP address, even though I'm eventually going to point a final domain name to it.

There is a truly awesome and **free** service called yDNS that makes this incredibly easy to do.

Just create an account through yDNS and then click on `Create Host`

Then enter your details. The hostname followed by the domain will be the actual name you will use to refer to your server address. The actual domain name has no impact on anything, so just choose your favourite.

In this example I used my hostname, `microamps`

, the domain I chose, `.ynds.eu`

, and a fake IP, `12.12.12.12`

.

Enter in your info, and change the TTL if you want. The TTL is just how long the DNS packet is considered valid for when cached. Browsers will cache DNS packets so they don't constantly check for changing records every time you request content from the same host.

Click `Create Host`

, and then you can see your hosts on the main page:

Make sure the type `A`

record (IPv4) matches your configuration for the hostname and IP address.

Note: A great thing about yDNS is that they have a bash updater script that you can run as a

`cronjob`

to always keep the IP up-to-date. This is especially nice when, like me, you run a RaspberryPi on your home network, whose IP constantly changes.

I can now refer to my server (via `ssh`

and using a web browser) as:

```
microamps.ydns.eu
```

There's not much extra for me to write here. I basically followed the following guides by DigitalOcean, which cover most everything. Additional packages I installed are discussed next.

`git`

As Dennis from *It's Always Sunny in Philadelphia* so eloquently put it:

'You got got. Guys like us - we don't get got. We go get.'

No, that has absolutely nothing to do with `git`

, but here's how to install and configure it:

```
sudo apt-get update
sudo apt-get install git
```

```
git config --global user.name "Your Name"
git config --global user.email "your@email.com"
```

`emacs`

Super simple (like most things with `apt-get`

...):

```
sudo apt-get update
sudo apt-get install emacs
```

]]>