Publications | Teodora Baluta

2025

NeurIPS’25

Model Provenance Testing for Large Language Models

Ivica Nikolic, Teodora Baluta, and Prateek Saxena

The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

Abs Paper

Large language models are increasingly customized through fine-tuning and other adaptations, creating challenges in enforcing licensing terms and managing downstream impacts such as protecting intellectual property or identifying vulnerabilities. We address this challenge by developing a framework for testing model provenance. Our approach is based on the key observation that real-world model derivations preserve significant similarities in model outputs that can be detected through statistical analysis. Using only black-box access to models, we employ multiple hypothesis testing to compare model similarities against a baseline established by unrelated models. On two comprehensive real-world benchmarks spanning models from 30M to 4B parameters and comprising over 600 models, our tester achieves 90-95% precision and 80-90% recall in identifying derived models. These results demonstrate the viability of systematic provenance verification in production environments even when only API access is available.

2024

SafeGen’24

Unlearning in- vs. out-of-distribution data in LLMs under gradient-based methods

Teodora Baluta, Pascal Lamblin, Daniel Tarlow, Fabian Pedregosa, and Gintare Karolina Dziugaite

In Neurips Safe Generative AI Workshop, 2024

Abs Paper Code

Machine unlearning aims to solve the problem of removing the influence of selected training examples from a learned model. Despite the increasing attention to this problem, it remains an open research question how to evaluate unlearning in large language models (LLMs), and what are the critical properties of the data to be unlearned that affect the quality and efficiency of unlearning. This work formalizes a metric to evaluate unlearning quality in generative models, and uses it to assess the trade-offs between unlearning quality and performance. We demonstrate that unlearning out-of-distribution examples requires more unlearning steps but overall presents a better trade-off overall. For in-distribution examples, however, we observe a rapid decay in performance as unlearning progresses. We further evaluate how example’s memorization and difficulty affect unlearning under a classical gradient ascent-based approach.
PhD Thesis

Rigorous Security Analysis of Machine Learning Systems

Teodora Baluta

National University of Singapore, 2024

Abs Paper

Machine learning systems have drastically transformed the technological landscape. However, both conceptual and algorithmic gaps remain that hinder refutability of security claims in machine learning. This thesis puts forward foundations for rigorous security analysis of ML systems: We give formal security definitions, abstractions and sound procedures (with guarantees). On a conceptual front, we propose three novel queries to evaluate security and privacy properties of neural networks: counting queries, causal queries and decisional convergence queries. On an algorithmic front, we develop sound algorithms to answer each category of queries, which draw upon propositional logic, hypothesis testing, causal reasoning and linear algebraic techniques. We demonstrate the utility of these queries in several applications such as robustness, privacy and trojaning. We show that our queries can refute or support new and existing claims about the behavior of neural networks in the presence of adversaries, providing concrete advances towards refutability in ML security.

2023

CCS’23

Unforgeability in Stochastic Gradient Descent

Teodora Baluta, Ivica Nikolić, Racchit Jain, Divesh Aggarwal, and Prateek Saxena

In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2023

Abs Paper Code

Stochastic Gradient Descent (SGD) is a popular training algorithm, a cornerstone of modern machine learning systems. Several security applications benefit from determining if SGD executions are forgeable, i.e., whether the model parameters seen at a given step are obtainable by more than one distinct set of data samples. In this paper, we present the first attempt at proving impossibility of such forgery. We furnish a set of conditions, which are efficiently checkable on concrete checkpoints seen during training runs, under which checkpoints are provably unforgeable at that step. Our experiments show that the conditions are somewhat mild and hence always satisfied at checkpoints sampled in our experiments. Our results sharply contrast prior findings at a high level: We show that checkpoints we find to be provably unforgeable have been deemed to be forgeable using the same methodology and experimental setup suggested in prior work. This discrepancy arises because of unspecified subtleties in definitions. We experimentally confirm that the distinction matters, i.e., small errors amplify during training to produce significantly observable difference in final models trained. We hope our results serve as a cautionary note on the role of algebraic precision in forgery definitions and related security arguments.
OOPSLA’23

User-Customizable Transpilation of Scripting Languages

Bo Wang, Aashish Kolluri, Ivica Nikolić, Teodora Baluta, and Prateek Saxena

Proceedings of the ACM on Programming Languages (OOPSLA), 2023

Abs Paper Code

A transpiler converts code from one programming language to another. Many practical uses of transpilers require the user to be able to guide or customize the program produced from a given input program. This customizability is important for satisfying many application-specific goals for the produced code such as ensuring performance, readability, ease of exposition or maintainability, compatibility with external environment or analysis tools, and so on. Conventional transpilers are deterministic rule-driven systems often written without offering customizability per user and per program. Recent advances in transpilers based on neural networks offer some customizability to users, e.g. through interactive prompts, but they are still difficult to precisely control the production of a desired output. Both conventional and neural transpilation also suffer from the "last mile" problem: they produce correct code on average, i.e., on most parts of a given program, but not necessarily for all parts of it. We propose a new transpilation approach that offers fine-grained customizability and reusability of transpilation rules created by others, without burdening the user to understand the global semantics of the given source program. Our approach is mostly automatic and incremental, i.e., constructs translation rules needed to transpile the given program as per the user’s guidance piece-by-piece. Users can rely on existing transpilation rules to translate most of the program correctly while focusing their effort locally, only on parts that are incorrect or need customization. This improves the correctness of the end result. We implement the transpiler as a tool called DuoGlot, which translates Python to Javascript programs, and evaluate it on the popular GeeksForGeeks benchmarks. DuoGlot achieves 90% translation accuracy and so it outperforms all existing translators (both handcrafted and neural-based), while it produces readable code. We evaluate DuoGlot on two additional benchmarks, containing more challenging and longer programs, and similarly observe improved accuracy compared to the other transpilers.
SAT’23

Explaining SAT Solving Using Causal Reasoning

Jiong Yang, Arijit Shaw, Teodora Baluta, Mate Soos, and Kuldeep S. Meel

In International Conference on Theory and Applications of Satisfiability Testing (SAT), 2023

Abs Paper Code

The past three decades have witnessed notable success in designing efficient SAT solvers, with modern solvers capable of solving industrial benchmarks containing millions of variables in just a few seconds. The success of modern SAT solvers owes to the widely-used CDCL algorithm, which lacks comprehensive theoretical investigation. Furthermore, it has been observed that CDCL solvers still struggle to deal with specific classes of benchmarks comprising only hundreds of variables, which contrasts with their widespread use in real-world applications. Consequently, there is an urgent need to uncover the inner workings of these seemingly weak yet powerful black boxes. In this paper, we present a first step towards this goal by introducing an approach called CausalSAT, which employs causal reasoning to gain insights into the functioning of modern SAT solvers. CausalSAT initially generates observational data from the execution of SAT solvers and learns a structured graph representing the causal relationships between the components of a SAT solver. Subsequently, given a query such as whether a clause with low literals blocks distance (LBD) has a higher clause utility, CausalSAT calculates the causal effect of LBD on clause utility and provides an answer to the question. We use CausalSAT to quantitatively verify hypotheses previously regarded as "rules of thumb" or empirical findings such as the query above. Moreover, CausalSAT can address previously unexplored questions, like which branching heuristic leads to greater clause utility in order to study the relationship between branching and clause management. Experimental evaluations using practical benchmarks demonstrate that CausalSAT effectively fits the data, verifies four "rules of thumb", and provides answers to three questions closely related to implementing modern solvers.

2022

CCS’22

Membership Inference Attacks and Generalization: A Causal Perspective

Teodora Baluta, Shiqi Shen, S Hitarth, Shruti Tople, and Prateek Saxena

In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2022

Abs Paper Code

Membership inference (MI) attacks highlight a privacy weakness in present stochastic training methods for neural networks. It is not well understood, however, why they arise. Are they a natural consequence of imperfect generalization only? Which underlying causes should we address during training to mitigate these attacks? Towards answering such questions, we propose the first approach to explain MI attacks and their connection to generalization based on principled causal reasoning. We offer causal graphs that quantitatively explain the observed MI attack performance achieved for 6 attack variants. We refute several prior non-quantitative hypotheses that over-simplify or over-estimate the influence of underlying causes, thereby failing to capture the complex interplay between several factors. Our causal models also show a new connection between generalization and MI attacks via their shared causal factors. Our causal models have high predictive power (0.90), i.e., their analytical predictions match with observations in unseen experiments often, which makes analysis via them a pragmatic alternative.
CCS’22

LPGNet: Link Private Graph Networks for Node Classification

Aashish Kolluri, Teodora Baluta, Bryan Hooi, and Prateek Saxena

In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2022

Abs Paper Code

Classification tasks on labeled graph-structured data have many important applications ranging from social recommendation to financial modeling. Deep neural networks are increasingly being used for node classification on graphs, wherein nodes with similar features have to be given the same label. Graph convolutional networks (GCNs) are one such widely studied neural network architecture that perform well on this task. However, powerful link-stealing attacks on GCNs have recently shown that even with black-box access to the trained model, inferring which links (or edges) are present in the training graph is practical. In this paper, we present a new neural network architecture called LPGNet for training on graphs with privacy-sensitive edges. LPGNet provides differential privacy (DP) guarantees for edges using a novel design for how graph edge structure is used during training. We empirically show that LPGNet models often lie in the sweet spot between providing privacy and utility: They can offer better utility than "trivially" private architectures which use no edge information (e.g., vanilla MLPs) and better resilience against existing link-stealing attacks than vanilla GCNs which use the full edge structure. LPGNet also offers consistently better privacy-utility tradeoffs than DPGCN, which is the state-of-the-art mechanism for retrofitting differential privacy into conventional GCNs, in most of our evaluated datasets.

2021

ICSE’21

Scalable quantitative verification for deep neural networks

Teodora Baluta, Zheng Leong Chua, Kuldeep S Meel, and Prateek Saxena

In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021

Abs Paper Code

Despite the functional success of deep neural networks (DNNs), their trustworthiness remains a crucial open challenge. To address this challenge, both testing and verification techniques have been proposed. But these existing techniques provide either scalability to large networks or formal guarantees, not both. In this paper, we propose a scalable quantitative verification framework for deep neural networks, i.e., a test-driven approach that comes with formal guarantees that a desired probabilistic property is satisfied. Our technique performs enough tests until soundness of a formal probabilistic property can be proven. It can be used to certify properties of both deterministic and randomized DNNs. We implement our approach in a tool called PROVERO and apply it in the context of certifying adversarial robustness of DNNs. In this context, we first show a new attack-agnostic measure of robustness which offers an alternative to purely attack-based methodology of evaluating robustness being reported today. Second, PROVERO provides certificates of robustness for large DNNs, where existing state-of-the-art verification tools fail to produce conclusive results. Our work paves the way forward for verifying properties of distributions captured by real-world deep neural networks, with provable guarantees, even where testers only have black-box access to the neural network.
FSE’21

SynGuar: guaranteeing generalization in programming by example

Bo Wang, Teodora Baluta, Aashish Kolluri, and Prateek Saxena

In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2021

Abs Paper Code

Programming by Example (PBE) is a program synthesis paradigm in which the synthesizer creates a program that matches a set of given examples. In many applications of such synthesis (e.g., program repair or reverse engineering), we are to reconstruct a program that is close to a specific target program, not merely to produce some program that satisfies the seen examples. In such settings, we wish that the synthesized program generalizes well, i.e., has as few errors as possible on the unobserved examples capturing the target function behavior. In this paper, we propose the first framework (called SynGuar) for PBE synthesizers that guarantees to achieve low generalization error with high probability. Our main contribution is a procedure to dynamically calculate how many additional examples suffice to theoretically guarantee generalization. We show how our techniques can be used in 2 well-known synthesis approaches: PROSE and STUN (synthesis through unification), for common string-manipulation program benchmarks. We find that often a few hundred examples suffice to provably bound generalization error below 5% with high (≥98%) probability on these benchmarks. Further, we confirm this empirically: SynGuar significantly improves the accuracy of existing synthesizers in generating the right target programs. But with fewer examples chosen arbitrarily, the same baseline synthesizers (without SynGuar) overfit and lose accuracy.
CCS’21

Private hierarchical clustering in federated networks

Aashish Kolluri, Teodora Baluta, and Prateek Saxena

In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2021

Abs Paper Code

Analyzing structural properties of social networks, such as identifying their clusters or finding their most central nodes, has many applications. However, these applications are not supported by federated social networks that allow users to store their social links locally on their end devices. In the federated regime, users want access to personalized services while also keeping their social links private. In this paper, we take a step towards enabling analytics on federated networks with differential privacy guarantees about protecting the user links or contacts in the network. Specifically, we present the first work to compute hierarchical cluster trees using local differential privacy. Our algorithms for computing them are novel and come with theoretical bounds on the quality of the trees learned. The private hierarchical cluster trees enable a service provider to query the community structure around a user at various granularities without the users having to share their raw contacts with the provider. We demonstrate the utility of such queries by redesigning the state-of-the-art social recommendation algorithms for the federated setup. Our recommendation algorithms significantly outperform the baselines which do not use social contacts and are on par with the non-private algorithms that use contacts.

2019

CCS’19

Quantitative verification of neural networks and its security applications

Teodora Baluta, Shiqi Shen, Shweta Shinde, Kuldeep S Meel, and Prateek Saxena

In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019

Abs Paper Code

Neural networks are increasingly employed in safety-critical domains. This has prompted interest in verifying or certifying logically encoded properties of neural networks. Prior work has largely focused on checking existential properties, wherein the goal is to check whether there exists any input that violates a given property of interest. However, neural network training is a stochastic process, and many questions arising in their analysis require probabilistic and quantitative reasoning, i.e., estimating how many inputs satisfy a given property. To this end, our paper proposes a novel and principled framework to quantitative verification of logical properties specified over neural networks. Our framework is the first to provide PAC-style soundness guarantees, in that its quantitative estimates are within a controllable and bounded error from the true count. We instantiate our algorithmic framework by building a prototype tool called NPAQ that enables checking rich properties over binarized neural networks. We show how emerging security analyses can utilize our framework in 3 concrete point applications: quantifying robustness to adversarial inputs, efficacy of trojan attacks, and fairness/bias of given neural networks.
NDSS’19

One Engine To Serve’em All: Inferring Taint Rules Without Architectural Semantics.

Zheng Leong Chua, Yanhao Wang, Teodora Baluta, Prateek Saxena, Zhenkai Liang, and Purui Su

In Network and Distributed System Security (NDSS) Symposium, 2019

Abs Paper Code

Dynamic binary taint analysis has wide applications in the security analysis of commercial-off-the-shelf (COTS) binaries. One of the key challenges in dynamic binary analysis is to specify the taint rules that capture how taint information propagates for each instruction on an architecture. Most of the existing solutions specify taint rules using a deductive approach by summarizing the rules manually after analyzing the instruction semantics. Intuitively, taint propagation reflects on how an instruction input affects its output, and thus can be observed from instruction executions. In this work, we propose an inductive method for taint propagation and develop a universal taint tracking engine that is architecture-agnostic. Our taint engine, TAINTINDUCE, can learn taint rules with minimal architectural knowledge by observing the execution behavior of instructions. To measure its correctness and guide taint rule generation, we define the precise notion of soundness for bit-level taint tracking in this novel setup. In our evaluation, we show that TAINTINDUCE automatically learns rules for 4 widely used architectures: x86, x64, AArch64, and MIPS-I. It can detect vulnerabilities for 24 CVEs in 15 applications on both Linux and Windows over millions of instructions and is comparable with other mature existing tools (TEMU, libdft, Triton). TAINTINDUCE can be used as a stand-alone taint engine or be used to complement existing taint engines for unhandled instructions. Further, it can be used as a cross-referencing tool to uncover bugs in taint engines, emulation implementations and ISA documentations.

2017

WSC’17

Modeling the effects of insider threats on cybersecurity of complex systems

Teodora Baluta, Lavanya Ramapantulu, Yong Meng Teo, and Ee-Chien Chang

In 2017 Winter Simulation Conference (WSC), 2017