joshuago’s algorithms Bookmarks
The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i. e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks.
The Boehm GC is able to function without any cooperation from the compiler or the runtime environment. In C, the only adjustment one needs to make is to redirect calls to stdlib’s malloc()/free() to equivalent ones supplied by the Boehm GC.
The most common in-memory database index strategy is called T-tree. IBM solidDB instead uses an index called trie (or prefix tree), which was originally created for text searching but turns out to be perfect for in-memory indexing.
Spatial indexing is increasingly important as more and more data and applications are geospatially-enabled. Efficiently querying geospatial data, however, is a considerable challenge: because the data is two-dimensional (or sometimes, more), you can't use standard indexing techniques to query on position. Spatial indexes solve this through a variety of techniques.
A feast for the mind. Excellent for review, exploration, and inspiration.
Use bcrypt because it's slow as hell. It introduces a work factor which affects how expensive the hash function will be, and can keep up with Moore's law.
We programmers need all the help we can get, and we should never assume otherwise. Joshua Bloch of Google walks through a binary search implementation to discuss a bug that went undetected for years.
The best computer science papers from various top-tier conferences.
A good list of books to refresh, replace, or supplement a core computer science education.
An explanation from the original author of GNU grep explaining its fast inner workings.