BREAKING NEWS
LATEST POSTS
-
AI and the Law – Copyright Traps for Large Language Models – This new tool can tell you whether AI has stolen your work
https://github.com/computationalprivacy/copyright-traps
Copyright traps (see Meeus et al. (ICML 2024)) are unique, synthetically generated sequences who have been included into the training dataset of CroissantLLM. This dataset allows for the evaluation of Membership Inference Attacks (MIAs) using CroissantLLM as target model, where the goal is to infer whether a certain trap sequence was either included in or excluded from the training data.
This dataset contains non-member (
label=0
) and member (label=1
) trap sequences, which have been generated using this code and by sampling text from LLaMA-2 7B while controlling for sequence length and perplexity. The dataset contains splits according toseq_len_{XX}_n_rep_{YY}
where sequences ofXX={25,50,100}
tokens are considered andYY={10, 100, 1000}
number of repetitions for member sequences. Each dataset also contains the ‘perplexity bucket’ for each trap sequence, where the original paper showed that higher perplexity sequences tend to be more vulnerable.Note that for a fixed sequence length, and across various number of repetitions, each split contains the same set of non-member sequences (
n_rep=0
). Also additional non-members generated in exactly the same way are provided here, which might be required for some MIA methodologies making additional assumptions for the attacker. -
Neuralink rival Synchron’s brain implant now lets people control Apple’s Vision Pro with their minds
Synchron is building a brain-computer interface, or a BCI, designed to help patients with paralysis operate technology like smartphones and computers with their minds.
-
Canva acquires Leonardo.ai
https://techcrunch.com/2024/07/29/canva-acquires-leonardo-ai-to-boost-its-generative-ai-efforts
The financial terms of the deal weren’t disclosed, but Canva co-founder and chief product officer Cameron Adams said it’s a mix of cash and stock. All of Leonardo.ai’s 120 employees will be joining Canva, including the executive team.
“Leonardo will continue to run independently of Canva with a focus on rapid innovation, research and development, now backed by Canva’s resources,” Adams told TechCrunch. “We’ll keep offering all of Leonardo’s existing tools and solutions. This acquisition aims to help Leonardo develop its platform and deepen their user growth with our investment, including by expanding their API business and investing in foundational model R&D.”
-
Autodesk acquires Wonder Dynamics
This strategic move supports Autodesk’s goal to democratize creative tools and foster innovation in the media and entertainment industry. Terms of the deal were not disclosed.
-
Free fonts
https://fontlibrary.org
https://fontsource.orgOpen-source fonts packaged into individual NPM packages for self-hosting in web applications. Self-hosting fonts can significantly improve website performance, remain version-locked, work offline, and offer more privacy.
https://www.awwwards.com/awwwards/collections/free-fonts
http://www.fontspace.com/popular/fonts
https://www.urbanfonts.com/free-fonts.htm
http://www.1001fonts.com/poster-fonts.html
How to use @font-face in CSS
The
@font-face
rule allows custom fonts to be loaded on a webpage: https://css-tricks.com/snippets/css/using-font-face-in-css/
FEATURED POSTS
-
How to paint a boardgame miniatures
Steps:
- soap wash cleaning
- primer
- base-coat layer (black/white)
- detailing
- washing aka shade (could be done after highlighting)
- highlights aka dry brushing (could be done after washing)
- varnish (gloss/satin/matte)
-
AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability
“Simon Willison created a Datasette browser to explore WebVid-10M, one of the two datasets used to train the video generation model, and quickly learned that all 10.7 million video clips were scraped from Shutterstock, watermarks and all.”
“In addition to the Shutterstock clips, Meta also used 10 million video clips from this 100M video dataset from Microsoft Research Asia. It’s not mentioned on their GitHub, but if you dig into the paper, you learn that every clip came from over 3 million YouTube videos.”
“It’s become standard practice for technology companies working with AI to commercially use datasets and models collected and trained by non-commercial research entities like universities or non-profits.”
“Like with the artists, photographers, and other creators found in the 2.3 billion images that trained Stable Diffusion, I can’t help but wonder how the creators of those 3 million YouTube videos feel about Meta using their work to train their new model.”
-
Anders Langlands – Render Color Spaces
https://www.colour-science.org/anders-langlands/
This page compares images rendered in Arnold using spectral rendering and different sets of colourspace primaries: Rec.709, Rec.2020, ACES and DCI-P3. The SPD data for the GretagMacbeth Color Checker are the measurements of Noburu Ohta, taken from Mansencal, Mauderer and Parsons (2014) colour-science.org.