AI and the Law – Copyright Traps for Large Language Models – This new tool can tell you whether AI has stolen your work

August 2, 2024

pIXELsHAM.com

https://github.com/computationalprivacy/copyright-traps

Copyright traps (see Meeus et al. (ICML 2024)) are unique, synthetically generated sequences who have been included into the training dataset of CroissantLLM. This dataset allows for the evaluation of Membership Inference Attacks (MIAs) using CroissantLLM as target model, where the goal is to infer whether a certain trap sequence was either included in or excluded from the training data.

This dataset contains non-member (label=0) and member (label=1) trap sequences, which have been generated using this code and by sampling text from LLaMA-2 7B while controlling for sequence length and perplexity. The dataset contains splits according to seq_len_{XX}_n_rep_{YY} where sequences of XX={25,50,100} tokens are considered and YY={10, 100, 1000} number of repetitions for member sequences. Each dataset also contains the ‘perplexity bucket’ for each trap sequence, where the original paper showed that higher perplexity sequences tend to be more vulnerable.

Note that for a fixed sequence length, and across various number of repetitions, each split contains the same set of non-member sequences (n_rep=0). Also additional non-members generated in exactly the same way are provided here, which might be required for some MIA methodologies making additional assumptions for the attacker.

COLLECTIONS

| Featured AI
| Design And Composition
| Explore posts

POPULAR SEARCHES

FEATURED POSTS

Social Links

DISCLAIMER – Links and images on this website may be protected by the respective owners’ copyright. All data submitted by users through this site shall be treated as freely available to share.

AI and the Law – Copyright Traps for Large Language Models – This new tool can tell you whether AI has stolen your work

SourceTree vs Github Desktop – Which one to use

4dv.ai – Remote Interactive 3D Holographic Presentation Technology and System running on the PlayCanvas engine

Scene Referred vs Display Referred color workflows

Photography basics: Solid Angle measures

UV maps

Glossary of Lighting Terms – cheat sheet

Kling 1.6 and competitors – advanced tests and comparisons

Black Forest Labs released FLUX.1 Kontext