Vector Database Watermarking

Jan 10, 2025·

Zhiwen Ren

Qiyi Yao

Wei Fan

Jing Qiu

Weiming Zhang

Nenghai Yu

· 1 min read

Abstract

Vector databases support machine learning tasks using Approximate Nearest Neighbour (ANN) query functionality, making them highly valuable digital assets. However, they also face security threats like unauthorized replication. By embedding stealth information, watermarking technology can be used for ownership authentication. This paper introduces a watermarking scheme specifically designed for vector databases. The scheme consists of four steps: generating identifiers, grouping, cryptographic mapping, and modification. Since watermark embedding requires modification of certain vectors, it may negatively affect the ANN query results. Further investigation reveals that in the widely used Hierarchical Navigable Small World (HNSW) indexing structure for vector databases, heuristic edge selection and pruning strategies result in some vectors having fewer edges or even none at all. These vectors exhibit significantly lower query frequencies than others, which means that modifying these vectors incurs less impact on query results. Based on this observation, we propose the Transparent Vector Priority (TVP) watermarking scheme, which prioritizes embedding the watermark in these low-query-frequency “transparent” vectors to minimize the impact of watermark embedding on query results. Experimental results show that compared to the current most effective and relevant watermarking schemes, the TVP scheme can significantly reduce the number of missed and false queries by approximately 75%.

Type

Preprint

Publication

Submitted, The Thirty-Ninth Annual Conference on Neural Information Processing Systems

Last updated on Jan 10, 2025

Watermarking

Authors

Qiyi Yao

Ph.D. Candidate

← Lossy Polar Coding for a Symmetric Discrete Memoryless Source with a Time-Varying Distortion Measure Jan 14, 2025

Reliable Robust Adaptive Steganographic Coding Based on Nested Polar Codes Dec 3, 2024 →