Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. Existing methods primarily focus on ensuring that watermark embedding does not degrade the model performance. However, they often overlook critical challenges in real-world deployment scenarios, such as the complexity of watermark key management, user-defined generation parameters, and the difficulty of verification by arbitrary third parties. To address this issue, we propose Gaussian Shading++, a diffusion model watermarking method tailored for real-world deployment. We propose a double-channel design that leverages pseudorandom error-correcting codes to encode the random seed required for watermark pseudorandomization, achieving performance-lossless watermarking under a fixed watermark key and overcoming key management challenges. Additionally, we model the distortions introduced during generation and inversion as an additive white Gaussian noise channel and employ a novel soft decision decoding strategy during extraction, ensuring strong robustness even when generation parameters vary. To enable third-party verification, we incorporate public key signatures, which provide a certain level of resistance against forgery attacks even when model inversion capabilities are fully disclosed. Extensive experiments demonstrate that Gaussian Shading++ not only maintains performance losslessness but also outperforms existing methods in terms of robustness, making it a more practical solution for real-world deployment.
May 16, 2025
Recent provably secure linguistic steganography (PSLS) methods rely on mainstream autoregressive language models (ARMs) to address historically challenging tasks, that is, to disguise covert communication as ``innocuous'' natural language communication. However, due to the characteristic of sequential generation of ARMs, the stegotext generated by ARM-based PSLS methods will produce serious error propagation once it changes, making existing methods unavailable under an active tampering attack. To address this, we propose a robust provably secure linguistic steganography with diffusion language models (DMs). Unlike ARMs, DMs can generate text in partial parallel manner, allowing us to find robust positions for steganographic embedding that can be combined with error-correcting codes. Furthermore, we introduce an error correction strategies, including pseudo-random error correction and neighborhood search correction, during steganographic extraction. Theoretical proof and experimental results demonstrate that our method is secure and robust. It can resist token ambiguity in stegotext segmentation and, to some extent, withstand token-level attacks of insertion, deletion, and substitution.
May 1, 2025
As an important technique to achieve covert communication, steganography has developed greatly in pursuit of the secrecy of concealment suggested by Shannon. The most widely used scheme is called adaptive steganography which is composed of two phases: the distortion calculation phase and the adaptive steganographic coding phase. Conventionally, adaptive steganography assumes a noise-free lossless channel between the sender and the receiver. However, in real-world applications, the stego (cover media with secret messages embedded) would suffer from various lossy operations and be modified during transmission, leading to extraction error at the receiver. In this paper, robust adaptive steganographic coding is considered. We formalize the problem to be a problem of finding the maximum embedding rate of a special communication system with a normalized adaptive distortion function and a non-stationary memoryless sequence of discrete channels such that the secret message can be communicated without error. The theoretical rate-distortion bound of the problem is established which is a great leap forward in the research field of robust steganography. By modeling the noisy channels in real-world applications into non-stationary discrete memoryless channels (DMCs), our bound can be used to evaluate the existing robust methods and will serve as the ultimate goal for the design of new practical robust adaptive steganographic coding algorithms.
Apr 8, 2025
Cloud infrastructures have become increasingly popular among distributed applications in the modern era due to their efficient failure recovery, global reach, mobility, and service integration. Highest Random Weight (HRW) is one of the Consistent Hashing (CH) schemes that offers indefinite scalability while maintaining consistency, lookup, minimal dispersal, and load balancing for cloud environments. However, current versions of HRW and other CH algorithms offer high memory costs, exchange excessive messages for TCP connections, and sacrifice performance when considering large scalable distributed systems due to extensive rehashing or O(w) comparisons among $w$ working nodes. This paper suggests a weight function for the Highest Random Weight (HRW) scheme, termed as Multiplication Modulo-based Highest Random Weight (MM-HRW) scheme. In the MM-HRW scheme, the entire key range can be divided into several nonoverlapping ranges, and each range corresponds to a working node. By applying binary search on the ranges, we can find out the working node of a key with O(log (w)) comparisons. Additionally, we present the corresponding algorithms, implementations, and mathematical validation to support the experimental findings. The results demonstrate that MM-HRW consistently achieves an optimal memory footprint, validating its scalability across various scenarios. Additionally, by considering large cluster size, MM-HRW scheme achieves the highest lookup rate while maintaining O(log (w)) comparisons across all scenarios.
Mar 22, 2025
We consider using polar codes for lossy source coding of a symmetric Discrete Memoryless Source (DMS) with a time-varying distortion measure instead of the conventional fixed single-letter distortion measure in this paper. First, the specific lossy source coding problem is formally stated and the rate-distortion bound is developed. Second, with the theoretical pursuit, a polar codes-based scheme is proposed. The code construction step of the scheme takes in the non-stationary sequence of test channels corresponding to the test channel model between the source sequence and the recovered sequence of the lossy source coding problem. Then, a Successive Cancellation (SC) encoder is employed to perform the actual encoding of the source sequence. We develop the optimal characterizations of the test channels, with which the proposed method shows near-optimal performance very close to the rate-distortion bound. We further prove that given the sufficient condition of fast source polarization, the proposed method under a randomized SC encoder achieves the rate-distortion bound for the given source settings when the source length N tends to infinity.
Jan 14, 2025
Vector databases support machine learning tasks using Approximate Nearest Neighbour (ANN) query functionality, making them highly valuable digital assets. However, they also face security threats like unauthorized replication. By embedding stealth information, watermarking technology can be used for ownership authentication. This paper introduces a watermarking scheme specifically designed for vector databases. The scheme consists of four steps: generating identifiers, grouping, cryptographic mapping, and modification. Since watermark embedding requires modification of certain vectors, it may negatively affect the ANN query results. Further investigation reveals that in the widely used Hierarchical Navigable Small World (HNSW) indexing structure for vector databases, heuristic edge selection and pruning strategies result in some vectors having fewer edges or even none at all. These vectors exhibit significantly lower query frequencies than others, which means that modifying these vectors incurs less impact on query results. Based on this observation, we propose the Transparent Vector Priority (TVP) watermarking scheme, which prioritizes embedding the watermark in these low-query-frequency “transparent” vectors to minimize the impact of watermark embedding on query results. Experimental results show that compared to the current most effective and relevant watermarking schemes, the TVP scheme can significantly reduce the number of missed and false queries by approximately 75%.
Jan 10, 2025