Public Member Functions | |
IndexIVF_HNSW_Grouping (size_t dim, size_t ncentroids, size_t bytes_per_code, size_t nbits_per_idx, size_t nsubcentroids) | |
void | add_group (size_t group_idx, size_t group_size, const float *x, const idx_t *ids) |
void | search (size_t k, const float *x, float *distances, long *labels) |
void | write (const char *path_index) |
Write index to the path. | |
void | read (const char *path_index) |
Read index from the path. | |
void | train_pq (size_t n, const float *x) |
void | compute_inter_centroid_dists () |
Compute distances between the group centroid and its <subc> nearest neighbors in the HNSW graph. | |
Public Member Functions inherited from ivfhnsw::IndexIVF_HNSW | |
IndexIVF_HNSW (size_t dim, size_t ncentroids, size_t bytes_per_code, size_t nbits_per_idx) | |
void | build_quantizer (const char *path_data, const char *path_info, const char *path_edges, size_t M=16, size_t efConstruction=500) |
void | assign (size_t n, const float *x, idx_t *labels, size_t k=1) |
virtual void | add_batch (size_t n, const float *x, const idx_t *xids, const idx_t *precomputed_idx=nullptr) |
void | compute_centroid_norms () |
Compute norms of the HNSW vertices. | |
void | rotate_quantizer () |
For correct search using OPQ encoding rotate points in the coarse quantizer. | |
Public Attributes | |
size_t | nsubc |
Number of sub-centroids per group. | |
bool | do_pruning |
Turn on/off pruning. | |
std::vector< std::vector< idx_t > > | nn_centroid_idxs |
Indices of the <nsubc> nearest centroids for each centroid. | |
std::vector< std::vector< idx_t > > | subgroup_sizes |
Sizes of sub-groups for each group. | |
std::vector< float > | alphas |
Coefficients that determine the location of sub-centroids. | |
Public Attributes inherited from ivfhnsw::IndexIVF_HNSW | |
size_t | d |
Vector dimension. | |
size_t | nc |
Number of centroids. | |
size_t | code_size |
Code size per vector in bytes. | |
hnswlib::HierarchicalNSW * | quantizer |
Quantizer that maps vectors to inverted lists (HNSW [Y.Malkov]) | |
faiss::ProductQuantizer * | pq |
Produces the residual codes. | |
faiss::ProductQuantizer * | norm_pq |
Produces the norm codes of reconstructed base vectors. | |
faiss::LinearTransform * | opq_matrix |
Rotation matrix for OPQ encoding. | |
bool | do_opq |
Turn on/off OPQ encoding. | |
size_t | nprobe |
Number of probes at search time. | |
size_t | max_codes |
Max number of codes to visit to do a query. | |
std::vector< std::vector< idx_t > > | ids |
Inverted lists for indexes. | |
std::vector< std::vector < uint8_t > > | codes |
PQ codes of residuals. | |
std::vector< std::vector < uint8_t > > | norm_codes |
PQ codes of norms of reconstructed base vectors. | |
Protected Attributes | |
std::vector< float > | query_centroid_dists |
Distances to the coarse centroids. Used for distance computation between a query and base points. | |
std::vector< std::vector< float > > | inter_centroid_dists |
Distances between coarse centroids and their sub-centroids. | |
Protected Attributes inherited from ivfhnsw::IndexIVF_HNSW | |
std::vector< float > | norms |
L2 square norms of reconstructed base vectors. | |
std::vector< float > | centroid_norms |
L2 square norms of coarse centroids. | |
std::vector< float > | precomputed_table |
Size pq.M * pq.ksub. | |
Additional Inherited Members | |
Public Types inherited from ivfhnsw::IndexIVF_HNSW | |
typedef uint32_t | idx_t |
all indices are this type | |
Protected Member Functions inherited from ivfhnsw::IndexIVF_HNSW | |
float | pq_L2sqr (const uint8_t *code) |
L2 sqr distance function for PQ codes. | |
void ivfhnsw::IndexIVF_HNSW_Grouping::add_group | ( | size_t | group_idx, |
size_t | group_size, | ||
const float * | x, | ||
const idx_t * | ids | ||
) |
Add <group_size> vectors of dimension <d> from the <group_idx>-th group to the index.
group_idx | index of the group |
group_size | number of base vectors in the group |
x | base vectors to add (size: group_size * d) |
ids | ids to store for the vectors (size: groups_size) |
|
virtual |
Search procedure
During the IVF-HNSW-PQ + Grouping search we compute
d = || x - y_S - y_R ||^2
where x is the query vector, y_S the coarse sub-centroid, y_R the refined PQ centroid. The expression can be decomposed as:
d = (1 - α) * (|| x - y_C ||^2 - || y_C ||^2) + α * (|| x - y_N ||^2 - || y_N ||^2) + || y_S + y_R ||^2 - 2 * (x|y_R)
term 1 term 2 term 3 term 4
We use the following decomposition:
Norms of centroids are precomputed and saved without compression, as their memory consumption is negligible. If it is necessary, the norms can be added to the term 3 and compressed to byte together. We do not think that it will lead to considerable decrease in accuracy.
Since y_R defined by a product quantizer, it is split across sub-vectors and stored separately for each sub-vector.
Reimplemented from ivfhnsw::IndexIVF_HNSW.
|
virtual |
Train product quantizers
n | number of training vectors of dimension d |
x | learn vectors, size n * d |
Reimplemented from ivfhnsw::IndexIVF_HNSW.