Threat model: Rogue employee at a cloud AI provider like OpenAI who can read LM API requests which may include email data (or equivalently, a hacker with access to OpenAI logs).
Task: Given dataset X and a fraud dataset Z, improve the data efficiency of fine-tuning a model using a transformation y with a key k such that:
A baseline example is a substitution cipher in the token vocabulary space as y, where k is the substitution order - we know that if we completely retrain a model on y(X), it should have the same accuracy as one trained on X, and also be hard for a human to understand. But a.) it's vulnerable to statistical attack b.) it may be data-inefficient to train.
Is there a better such y? What is the data-efficiency of training it?
If we can do this, we make personal data safer when using deployed AI models (preserving the economic value of the individual), and the cloud logs less of a honeypot target for hackers.
*Partial credit may also be awarded to multiple submissions and to the open-source dependencies of the submissions.