Poster Information
A Controls-Oriented Approach for Data-Driven System Identification Towards Enhanced Manufacturing Design and Control of Solid-State Batteries
Solid-state batteries are a promising next-generation technology to meet the ever-growing demand for higher-performing and safer batteries with higher energy density, reduced flammability, and faster charging capabilities. However, the material interfaces are prone to severe and rapid degradation in the form of void formation and dendritic growth, resulting in performance losses or failure of the cell. Typically, battery health management for new battery technologies entails the use of physics-based modeling of the electro-chemical processes, which can be computationally complex and difficult for real-time health monitoring. Additionally, even with the computational cost of simulating battery health degradation in high-fidelity, statistical variation due to manufacturing processes will still present model inaccuracies in application. Approaching the problem with controls methodology provides an avenue for real-time monitoring and evaluation of solid-state battery health at the material interfaces. This project introduces a controls-oriented methodology for data-driven system identification and design evaluation for solid-state batteries. The complex electro-chemical processes governed by partial differential equations are reduced as a linear parameter varying model defined by ordinary differential equations. The reduced-order, data-driven model opens the door for computationally efficient real-time estimation and prediction, as well as closed-loop feedback control of solid-state interface degradation.
A Mean-field Analysis of Deep ResNet and Beyond
Despite the widespread empirical success of ResNet, the generalization properties of deep ResNet are rarely explored beyond the lazy training regime. In this work, we investigate scaled ResNet in the limit of infinitely deep and wide neural networks, of which the gradient flow is described by a partial differential equation in the large-neural network limit, i.e., the mean-field regime. To derive the generalization bounds under this setting, our analysis necessitates a shift from the conventional time-invariant Gram matrix employed in the lazy training regime to a time-variant, distribution-dependent version. To this end, we provide a global lower bound on the minimum eigenvalue of the Gram matrix under the mean-field regime. Besides, for the traceability of the dynamic of Kullback-Leibler (KL) divergence, we establish the linear convergence of the empirical error and estimate the upper bound of the KL divergence over parameters distribution. Finally, we build the uniform convergence for generalization bound via Rademacher complexity. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime and contribute to advancing the understanding of the fundamental properties of deep neural networks.
A Simulation Preorder for Koopman-like Lifted Control Systems
We introduce a simulation preorder among lifted systems, a generalization of finite-dimensional Koopman approximations (also known as approximate immersions) to systems with inputs. It is proved that this simulation relation implies the containment of both the open- and closed-loop behaviors. Optimization-based sufficient conditions are derived to verify the simulation relation in two special cases: i) a nonlinear (unlifted) system and an affine lifted system and, ii) two affine lifted systems. Numerical examples demonstrate the approach using backward reachable sets.
Agreement to Optimality: Methods for Distributed Decision Making in Cyber-Physical Networks
In this poster, I will present new algorithms for solving distributed decision-making problems. The developed algorithms focus on different practical aspects of distributed decision-making problems, namely handling constraints, imperfect communication and finite-time feasibility guarantees. The algorithms are truly distributed allowing for distributed synthesis.
An Observer-based Resilient Control Strategy for Leader-Follower Multi-Agent Systems under False-Data-Injection Attacks
In this study, we develop an observer–based resilient control strategy tailored for leader-follower multi-agent systems (MASs). Our approach specifically focuses on the system’s vulnerability in the face of false-data-injection (FDI) cyberattacks, falsifying the sensors of each follower. Since the operation and functionality of MASs are strongly dependent on the inter-agent communication-based structure, MASs have an intrinsic system vulnerability against FDI cyberattacks. The potential impact of cyberattacks can be easily propagated through the entire MAS dynamics, leading to performance degradation in consensus (e.g., formation control and velocity-matching). To tackle this security issue, we propose an observer–based resilient control strategy that can effectively counteract the impact of FDI cyberattacks. This method utilizes the FDI attack estimator, employing the Lyapunov stability criterion and its associated linear matrix inequalities (LMIs). Finally, we present an illustrative example of a leader-follower MAS to demonstrate the effectiveness of our proposed approach.
An Optimization-Based Planner with B-spline Parameterized Continuous-Time Reference Signals
For the cascaded planning and control modules implemented for robot navigation, the frequency gap between the planner and controller has received limited attention. In this study, we introduce a novel B-spline parameterized optimization-based planner (BSPOP) designed to address the frequency gap challenge with limited onboard computational power in robots. The proposed planner generates continuous-time control inputs for low-level controllers running at arbitrary frequencies to track. Furthermore, when considering the convex control action sets, BSPOP uses the convex hull property to automatically constrain the continuous-time control inputs within the convex set. Consequently, compared with the discrete-time optimization-based planners, BSPOP reduces the number of decision variables and inequality constraints, which improves computational efficiency as a byproduct. Simulation results demonstrate that our approach can achieve a comparable planning performance to the high-frequency baseline optimization-based planners while demanding less computational power. Both simulation and experiment results show that the proposed method performs better in planning compared with baseline planners in the same frequency.
Bayesian Learning in Mean-field Games with observational noise
There are many settings in which agents learn from observing the actions of other agents. Bayesian observational learning models provide a framework for studying such situations and have been well-studied in settings where agents sequentially choose Bayes’ optimal actions by learning from the actions of previous agents. Here, we consider such observational learning in a mean-field game setting, in which agents repeatedly choose actions over time to maximize an infinite horizon discounted pay-off. This pay-off depends on the underlying mean-field population state, which agents do not know and only have a prior common belief over it. At the end of each time-step, agents observe a common signal which is an imperfect observation of the mean-field action profile played in that time-step and use this to update their beliefs. We give a sequential decomposition of this game that enables one to characterize Markov perfect equilibria of the game. We then focus on a particular sub-class of these games which can be viewed as a mix of coordination/anti-coordination players. Using the sequential decomposition, we characterize the impact of varying the observation quality on the outcome of the game and show that this can exhibit non-monotonic behaviour, where in many instances, poorer observations lead to better expected total discounted pay-offs.
Benchmarking in the Era of Learning-Based Control and Toward a Foundational Control Transformer
In the era of learning-based control, the explosive surge of data demands rethinking our traditional model-based foundations. To thoroughly evaluate learning-based control algorithms in terms of convergence, stability, robustness, and scalability, we present controlgym, an open-source benchmark with 46 large-scale control problems. Using controlgym as a data generator, we further explore the Decision Transformer (DT) architecture as a general framework for closed-loop nonlinear output-feedback control. Specifically, we frame the control task as the "next-token prediction" of the current optimal action conditioned on past observations, actions, and rewards, eliminating the explicit state estimation step. Then, we leverage the pre-trained language models, i.e., the GPT series, to initialize DT and subsequently train it for control tasks using low-rank adaptation. Our preliminary controlgym experiments suggest DT's capability to capture the parameter-agnostic structures intrinsic to control tasks. DT exhibits zero-shot generalization abilities for completely new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data. These findings underscore DT's potential as a foundational controller for general control applications. For more details, refer to https://arxiv.org/abs/2311.18736 and https://arxiv.org/abs/2404.02407.
Capabilities of Large Language Models in Control Engineering
In this work, we explore the capabilities of state-of-the-art large language models (LLMs) such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control problems. Controls provides an interesting case study for LLM reasoning due to its combination of mathematical theory and engineering design. We introduce ControlBench, a benchmark dataset tailored to reflect the breadth, depth, and complexity of classical control design. We use this dataset to study and evaluate the problem-solving abilities of these LLMs in the context of control engineering. We present evaluations conducted by a panel of human experts, providing insights into the accuracy, reasoning, and explanatory prowess of LLMs in control engineering. Our analysis reveals the strengths and limitations of each LLM in the context of classical control, and our results imply that Claude 3 Opus has become the state-of-the-art LLM for solving undergraduate control problems. Our study serves as an initial step towards the broader goal of employing artificial general intelligence in control engineering.
Causal Discovery Based Network Identification of Static and Dynamic Electronic Circuits
This study shows that many electronic circuits can be modeled as structural equation model (SEM) or linear dynamical Influence Model (LDIM). Which leads us to believe that by using voltage measurement data at various nodes of the electronic circuits the network structure of the circuits can be reconstructed. This poster provides some reconstruction example based on high fidelity simulation and data collection in Cadence. Moreover, the study shows a reconstruction example for a actual circuit implemented in a PCB.
Collaborative Safety for Multi-agent Systems
The safe coordination of multi-agent systems presents a complex and dynamic research frontier, encompassing various objectives such as ensuring group coherence while navigating obstacles and avoiding collisions between agents. Expanding upon our prior work in distributed collaborative control for networked dynamic systems, we introduce an algorithm tailored for the formation control of multi-agent systems, considering individual agent dynamics, induced formation dynamics, and local neighborhood information within a predefined sensing radius for each agent. Our approach prioritizes individual agent safety through iterative communication rounds among neighbors, enforcing safety conditions derived from high-order control barrier functions to mitigate potentially hazardous control actions within the cooperative framework. Emphasizing explainable AI principles, our method provides transparent insights into decision-making processes via model-based methods and intentional design of individual agent safety constraints, enhancing the interpretability and trustworthiness of multi-agent system behavior.
Competitive Perimeter Defense in Tree Environments
We consider a perimeter defense problem in a rooted full tree graph environment in which a single defending vehicle seeks to defend a set of specified vertices, termed as the perimeter from mobile intruders that enter the environment through the tree's leaves. We adopt the technique of competitive analysis to characterize the performance of an online algorithm for the defending vehicle. We first derive fundamental limits on the performance of any online algorithm relative to that of an optimal offline algorithm. Specifically, we give three fundamental conditions for finite, 2, and 3/2 competitive ratios in terms of the environment parameters. We then design and analyze three classes of online algorithms that have provably finite competitiveness under varying environmental parameter regimes. Finally, we give a numerical visualization of these regimes to better show the comparative strengths and weaknesses of each algorithm.
Enabling Centralized Fleet Management for Resilient Intra-Factory Logistics
The extraordinary challenges posed by the COVID-19 pandemic have exposed inadequacies in the readiness of the industrial sector to handle logistical and operational uncertainties. In this context, the resilience of intrafactory logistics to such unpredictable events can be significantly enhanced by replacing assembly lines with material handling robot fleets. To ensure uninterrupted operation regardless of real-world uncertainties that could also include blocked passageways, changes in pickup and delivery times, and robot malfunctions, the robot fleet manager is tasked with adapting task allocation and routing for the robots of the fleet. This research addresses the challenge of enabling centralized fleet management (CFM) that can produce near-optimal policies in real-time by leveraging global information and the resilience of the entire fleet. Real-time implementation of CFMs is extremely challenging due to the computational demands of the underlying NP-hard problem of task assignment. However, intrafactory logistics necessarily involve a nominally defined material handling requirement that is dependent on the specific product being manufactured. This provides the unique opportunity to capitalize on pre-calculated costs associated with task-assigning decisions of the nominal problem. To achieve this, a Monte Carlo Tree Search (MCTS) algorithm is used to first build a nominal search tree, populating cost estimates of task assigning decisions while exploring the decision space. When the nominal requirement is perturbed, the proposed algorithm tests solutions based on their nominal costs, first identifying promising regions of the perturbed search space and then rapidly updating the costs throughout the nominal search tree. This results in an any time CFM algorithm that, for any computational budget, produces solutions with a lower optimality gap than when restarting computation each time a perturbation is realized. The contribution of this research lies in the re-utilization of the MCTS search tree topology and cost estimates as prior knowledge when the problem parameters are affected by a perturbation. By adapting the cost re-utilization strategy based on the revealed shift in nominal solution costs, the proposed method accounts for the possibility that the updated optimum may be significantly altered by the perturbation. The algorithm is benchmarked on the Li \& Lim dataset which replicates intrafactory logistics by requiring capacitated agents to complete material handling tasks associated with pickup-delivery constraints and time windows. Its performance is compared with a decentralized approach that is commonly applied to such problems for real-time tractability, and also with a CFM that does not make use of the learned information of the nominal search space.
Enforcing safety constraint in reinforcement learning with various observations under unknown environment
We present a series of works focusing on integrating safety measures with reinforcement learning (RL) to ensure system stability and agent safety in various complex environments. The first work presents a framework that combines RL with formal verification through a bilevel optimization problem, improving safety and stability certificates in cyber-physical systems. The second paper proposes a novel RL approach using generative model-based soft barrier functions to enforce hard safety constraints in stochastic environments, outperforming conventional CMDP methods. The third paper introduces a pixel-observation safe RL algorithm that uses latent barrier-like functions to encode state-wise safety constraints efficiently in high-dimensional environments, significantly reducing safety violations and enhancing safety convergence in comparison to existing methods. Together, these studies advance the field of safe RL by developing innovative methods that effectively balance reward maximization and safety in diverse and challenging settings.
From Distributed Coverage to Multi-agent Target Tracking
In this study, we tackle the challenge of deploying mobile agents tasked with the visual tracking of an unpredictable mobile intruder within a polygonal environment. Initially, we introduce an efficient algorithm to partition the environment into a set of regions, or tiles, to guarantee distributed coverage. We also develop a strategy for a line guard to effectively track the mobile intruder's movements. Building on this, we extend our investigation to the generation of critical points, termed 'kernel points,' to facilitate complete environmental coverage and present a polygon partitioning method informed by these points. Through the lens of a pursuit-evasion game, we analyze the dynamics around corners and propose adaptable control laws for a free guard to maintain a line of sight with the intruder. Finally, we explore variations of these control laws, including capture and search strategies, and through simulations, we showcase the enhancements in the team's visual tracking capabilities. These strategies collectively form a comprehensive framework for resource-constrained deployment in surveillance operations.
Game-Theoretic Iterative Learning Control for Bidirectionally Coupled Human-Automation Systems
Systems in the future will need to be flexible, allowing for dynamic changes in team configurations to enable effective use of limited resources. A key enabler for such a system is the development of intelligent human-robot teams, in which the robotic system iteratively learns human interactions based on specific control actions from the autonomous system. The ability to dynamically learn and predict human behaviors will allow the robot to adjust its actions in response to a change in the human counterpart. To facilitate this learning framework, we must develop methods to learn these models of human behavior that serve as the mechanism for predicting human-robot interactions within this complex system. This work seeks to apply a game-theoretic framework within an iterative learning control architecture to derive models of human behavior for improved human-robot interactions that can adapt over various contexts and team compositions. The feasibility of framing human-robot interactions within a game-theoretic, iterative learning control framework is being studied through repetitively induced tasks within a manufacturing-like setting. An estimated understanding of the human’s decision-making strategy will be derived from the data and used to predict subsequent behaviors in response to control decisions by the robot. These predictions will be compared against the observed human behavior. Next, a method to determine an estimate of a human’s response when subject to a novel scenario that can be used to compute a corresponding robot control input will be developed. This method will leverage the use of a library of previously learned interactions based on the task and team composition. Lastly, the impact of uncertainty in our understanding of the human response to predict the interactions and determine an optimal control decision for the robotic system will be studied. In particular the influence of fatigue as a slow degradation in the human’s performance and thus change in the behavioral model over time will be explored. The outcomes from this proposed research will result in a framework for improved human-robot interactions that is applicable to other coupled human-robot systems.
How Much Reserve Fuel: Quantifying the Maximal Energy Cost of System Disturbances
Motivated by the design question of additional fuel needed to complete a task in an uncertain environment, this paper introduces metrics to quantify the maximal additional energy used by a control system in the presence of bounded disturbances when compared to a nominal, disturbance-free system. In particular, we consider the task of finite-time stabilization for a linear time-invariant system. We first derive the nominal energy required to achieve this task in a disturbance-free system, and then the worst-case energy over all feasible disturbances. The latter leads to an optimal control problem with a least-squares solution, and then an infinite-dimensional optimization problem where we derive an upper bound on the solution. The comparison of these energies is accomplished using additive and multiplicative metrics, and we derive analytical bounds on these metrics. Simulation examples on an ADMIRE fighter jet model demonstrate the practicability of these metrics, and their variation with the task hardness, a combination of the distance of the initial condition from the origin and the task completion time.
Human vs. Generative AI in Content Creation Competition: Symbiosis or Conflict?
The advent of generative AI (GenAI) technology produces transformative impact on the content creation landscape, offering alternative approaches to produce diverse, high-quality content across media, thereby reshaping online ecosystems but also raising concerns about market over-saturation and the potential marginalization of human creativity. Our work introduces a competition model generalized from the Tullock contest to analyze the tension between human creators and GenAI. Our theory and simulations suggest that despite challenges, a stable equilibrium between human and AI-generated content is possible. Our work contributes to understanding the competitive dynamics in the content creation industry, offering insights into the future interplay between human creativity and technological advancements in GenAI.
Incorporating causal knowledge into Shapley values: A maximum entropy approach
Artificial intelligence models have demonstrated remarkable performance across various domains, leading to their widespread use in various aspects of our lives. However, concerns persist regarding their reliability, applicability, and safety in complex real-world scenarios. Traditional scientific and engineering practices prefer simpler models whose fewer parameters could be easily interpreted and understood, yet AI models often possess parameter spaces spanning thousands to billions of dimensions, rendering them inscrutable to human understanding. Consequently, explaining the decision-making processes of these complex models is a first step toward their dependable deployment in critical applications. This has motivated a new field of research called eXplainable AI (XAI), which aims to explain AI-made decisions. A prominent XAI tool draws inspiration from Shapley values–a game-theoretic concept addressing a fair division of payoff among players in cooperative games. In this context, the model's outcome is the payoff, while its features represent the players in the game. Despite satisfying several seemingly preferable properties, Shapley values have been criticized for providing explanations that deviate from existing causal knowledge of the phenomenon. We reinterpret Shapley values as the maximum entropy solution to the outcome division problem when no prior causal knowledge is available and introduce a methodology to integrate existing causal knowledge into the explanations while adhering to the maximum entropy principle. Specifically, our approach assumes that prior causal knowledge can be represented by a class of causal graphs, on which do-calculus can be independently applied. We calculate the Shapley values using the interventional distribution for each of these graphs and average using the maximum entropy distribution.
Learning True Objectives: Linear Algebraic Characterizations of Identifiability in Inverse Reinforcement Learning
Inverse reinforcement Learning (IRL) has emerged as a powerful paradigm for extracting expert skills from observed behavior, with applications ranging from autonomous systems to human- robot interaction. However, the identifiability issue within IRL poses a significant challenge, as multiple reward functions can explain the same observed behavior. This paper provides a linear algebraic characterization of several identifiability notions for an entropy-regularized finite horizon Markov decision process (MDP). Moreover, our approach allows for the seamless integration of prior knowledge, in the form of featurized reward functions, to enhance the identifiability of IRL problems.
Market Impacts of Pooling Intermittent Spectrum
Temporal sharing of spectrum as in the CBRS system provides wireless service providers (SPs) with spectrum that is intermittently available. This intermittency can decrease the value of the spectrum to a SP. In this paper we consider a setting where a SP can pool multiple intermittent bands of spectrum with independent availability. We find that pooling can achieve a higher spectrum efficiency in terms of the congestion incurred by users compared to using a single intermittent band (with the same total bandwidth). We show that this efficiency gain can be achieved with a relatively small pool of bands and it quickly converges to the optimal case as the number of bands increases. We also observe that pooled intermittency has a lesser impact on bids if spectrum is auctioned.
Market Models of Security Investments with Shared Spectrum
Security is a critical concern in shared spectrum environments, where attacks can degrade service and influence market interactions between competing service providers (SPs). This paper examines market models in shared spectrum defense, focusing on the strategic interplay between SPs and attackers within licensed shared bands. We develop models to study SPs’ investments in security against revenue-minimizing attackers, i.e., attackers that account for the underlying competition between SPs and seek to minimize the revenue they obtain in this market. We further consider both competitive and collaborative settings. In the competitive setting, we characterize equilibrium investment outcomes, while in the collaborative case, we consider the investments made by a central planner. Our numerical results show that at times the competitive setting can lead to overinvestment.
Markovian Search with Socially Aware Constraints
We study a general class of constrained sequential search problems for selecting multiple candidates from a pool that belongs to different societal groups. We focus on search processes under ex-ante constraints primarily motivated by inducing societally desirable outcomes such as attaining demographic parity among different groups, achieving diversity through quotas, or subsidizing disadvantaged groups within budget. We start with a canonical search model, known as the Pandora’s box model (Weitzman, 1979), under a single affine constraint on the probability of selection and inspection of each candidate. We show that the optimal policy for such a constrained problem retains the index-based structure of the optimal policy for the unconstrained one but potentially randomizes between two policies that are dual-based adjustments of the unconstrained problem; thus they are easy to compute and economically interpretable. Building on these insights, we consider the richer class of search processes, such as search with rejection and multistage search, that can be modeled by joint Markov scheduling (JMS) (Dumitriu et al., 2003; Gittins, 1979). Imposing general affine and convex ex-ante constraints, we give a primal-dual algorithm to find the near-feasible and near-optimal policy. This algorithm, too, randomizes over index-based policies; this time, over a polynomial number of policies whose indices are dual-based adjustments to the Gittins indices of the unconstrained JMS. Our algorithmic developments, while involving many intricacies, rely on a simple, yet powerful observation: There exists a relaxation to the Lagrange dual function of these constrained optimization problems that admit index-based policies akin to the original unconstrained ones. Using a numerical study, we investigate the implications of imposing various constraints, the price of imposing them in terms of utilitarian loss, and whether they induce their intended societally desirable outcomes.
Minimax Estimation of Controlled Markov Chains on Continuous State Space
A controlled Markov chain (CMC) is a paired process which constitute a Markovian state and a non-Markovian control. The control is a random variable which chooses a transition kernel and the state transitions according to that transition kernel. The recent popularity of model-based offline reinforcement learning has made learning this transition kernel (a.k.a. "model") an important open question. This work aims to address that through the lenses of an adaptive, non-parametric, estimator. In particular, we will pose the estimator as a solution to a constrained minimax-optimisation problem and explore its finite sample risk bounds.
Modeling Haptic Interaction through Collaborative Tracking Tasks
Human performance in manual tracking tasks has been shown to improve when participants work together in teams of two (as a dyad). By sharing the same control interface, the haptic interaction between the dyad pair offers an additional channel of sensory feedback not present when performing the task alone. A control theoretic model describing performance improvements of a dyad relative to the performance of the same two agents acting alone had previously been proposed but lacked experimental support. This model included separate visual tracking and haptic interaction controllers for each participant in a given dyad, . To validate this model, we report results from an experiment in which 30 human participants performed a pursuit tracking experiment using a dual joystick interface. In the first 3 trials participants performed the task alone, then in another 3 trials participants performed with their joysticks rigidly linked to another participant performing the same task. Position and force measurements at each joystick were collected. Parametric modeling using an iterative least squares fitting algorithm suggests that each participant used proportional control in response to haptic feedback while using proportional-derivative control in response to visual feedback. Continued analysis and model validation will be used to inform haptic shared control strategies between humans and automation.
Multi-layer Default Risk Contagion in Inter-banking Networks
Default risk spreading processes in inter-banking networks are commonly viewed as contagion processes, with inter-bank loans as a direct spreading channel and overlapping investment portfolios as an indirect channel [1]. In this paper, we propose a multi-layer network default risk contagion model to incorporate an additional panic contagion in the network of depositors as a novel augmentation of previous models. Doing so incorporates the direct characterization of many depositors simultaneously issuing withdrawal requests, i.e. a bank run, and showcases the impact of the level of connectedness of both the inter-banking network and the depositor network on the spreading processes. We calibrate the multi-layer model with post-pandemic data, taking into consideration macroeconomic factors, such as fluctuation of interest rates and asset bubbles. We apply classic system identification methods to model the relationships between federal interest rates and prices of bonds, stocks and cryptocurrencies, and formulate an optimal control problem to mitigate the default risk contagion w.r.t. liquidity ratio requirements in the seasonal stress tests. Finally, We present long-term simulation results with various sets of panic model parameters and liquidity ratio requirements
Multi-Microgrids: Fundamental Tradeoffs between Storage and Power Line Capacities
In recent years, the proliferation of renewable energy resources (RESs) has advanced the traditional grid to a modern grid with interconnected clusters of energy producers and consumers, known as microgrids (MGs). Renewable-powered MGs are envisioned to operate self-sufficiently, being independent of the main grid, to improve the reliability of the entire system in case of catastrophic grid failures. However, despite the benefits, renewable penetration poses a major challenge to the reliable operation of the MGs as RESs are inherently uncertain. In this context, battery energy sources (BESs), provide an attractive option to counter both generation and demand uncertainties with the ability to store excess energy when supply exceeds demand and to provide energy when demand exceeds generation. Here a key challenge is to determine the capacity of the BESs to be installed in the MGs. To this end, we explore policies on when to charge the BESs and when to use them to support loads so that the BESs remain within their operating range, avoiding depletion of charge on one hand and remaining within acceptable margins of maximum charge on the other. In particular, we propose a stochastic control framework to determine BESs capacity under renewable power fluctuation, characterized by Brownian motion, ensuring that during the operational phase, BESs’ operating range is respected with pre-specified high probability. We extend the analysis of the single-MG system to a two-MG system and presents a theoretical framework to determine the optimal power transfer policy between the MGs and the optimal BES capacity for the MGs. We further present a cost-benefit analysis of the system considering the trade-offs between BESs and transmission line installation and maintenance costs.
Non-Parametric Compressed Learning of Dynamical Systems
The mature field of systems theory has enabled the success of model-based decision-making. Model identification typically requires fitting parametric models to data from interaction with the environments. In this talk, I will discuss an operator-theoretic approach to learning compressed representations of nonlinear dynamics from data with provable guarantees. We first build an analytically tractable representation of system dynamics via the conditional mean embedding operator that interacts with a reproducing kernel Hilbert space (RKHS). Then, we allow selective loss in the representation of that operator to control model complexity. I will discuss sample complexity guarantees to learn such operators in centralized and decentralized settings and present applications to reinforcement learning, power system transient stability analysis, and uncertainty propagation.
On the Effect of Bounded Rationality in Electricity Markets
Nash equilibrium is a common solution concept that captures the strategic interaction in electricity market analysis. However, it requires a fundamental but impractical assumption that all market participants are fully rational, which implies unlimited computational resources and cognitive abilities. To tackle the limitation, level-k reasoning is proposed and studied to model the bounded rational behaviors. In this paper, we consider a Cournot competition in electricity markets with two suppliers both following level-k reasoning. One is a self-interested firm and the other serves as a benevolent social planner. First, we observe that the optimal strategy of the social planner is to be of a particular rationality level. Being less or more rational may both result in reduced social welfare. Then, we investigate the effect of bounded rationality on social welfare performance and find that it could largely deviate from that at the Nash equilibrium point. Finally, we characterize optimal, mean maximizing and max-min strategies for the benevolent social planner, when having access to different information. The numerical experiments further demonstrate and validate our findings.
On the Hardness of Learning to Stabilize Linear Systems
In this paper we study the statistical hardness of learning to stabilize linear time-invariant systems. Hardness is measured by the number of samples required to achieve a learning task with a given probability. The work in \cite{tsiamis2022learning} shows that there exist system classes that are hard to learn to stabilize with the core reason being the hardness of identification. Here we present a class of systems that can be easy to identify, thanks to a non-degenerate noise process that excites all modes, but the sample complexity of stabilization still increases exponentially with the system dimension. We tie this result to the hardness of co-stabilizability for this class of systems using ideas from robust control.
Online Matching with Cancellation Costs
We study the online resource allocation problem with overbooking and cancellation costs, also known as the buyback setting. To model this problem, we consider a variation of the classic edge-weighted online matching problem in which the decision maker can reclaim any fraction of any offline resource that is pre-allocated to an earlier online vertex; however, by doing so not only the decision maker loses the previously allocated edge-weight, it also has to pay a non-negative constant factor f of this edge-weight as an extra penalty. Parameterizing the problem by the buyback factor f, our main result is obtaining optimal competitive algorithms for all possible values of f through a novel primal-dual family of algorithms. We establish the optimality of our results by establishing separate lower-bounds for each of small and large buyback factor regimes, and showing how our primal-dual algorithm exactly matches this lower-bound by appropriately tuning a parameter as a function of f. We further study the lower and upper bounds on the competitive ratio in variants of this model, such as matching with deterministic integral allocations or single-resource with different demand sizes.
Online Monitoring of Pattern Dimensions for Microscale Additive Manufacturing Systems
The fabrication of microscale devices using additive manufacturing techniques, such as electrohydrodynamic jet printing, is becoming increasingly popular due to cost efficiency, and more flexible design patterns and materials. However, the fast temporal dynamics and high spatial resolutions associated with certain micro-additive manufacturing technologies have posed a barrier for achieving online performance analysis, drastically limiting the production yields of printed devices. In an effort to improve device throughput while ensuring quality performance, this work presents a methodology for real-time monitoring of printed film thickness using digital image processing techniques. Measurement accuracy was shown to vary with process parameters, such as line-to-line pitch. These efforts lay the foundation for implementing control strategies that track and ensure desired film dimensions to achieve optimal device performance.
Rationality of Learning Algorithms in Repeated Normal-Form Games
Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's 'rationality ratio', which is the ratio of the highest payoff an agent can obtain by deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We first establish that popular learning algorithms such as fictitious play and regret matching are not c-rational for any constant c greater than or equal to 1. We then propose and analyze two algorithms that are provably 1-rational under mild assumptions and have the same convergence properties as (a generalized version of) fictitious play and regret matching, respectively, if all agents follow them. Finally, we show that if an assumption of perfect monitoring is not satisfied, there are games for which c-rational algorithms do not exist
Robust Incentive Design for Non-Myopic Followers
In incentive design, a decision-maker, called the leader, aims to induce desired behaviors in one or more agents, called the followers, by influencing their payoffs. Our work studies the setting of a non-myopic follower, who makes sequential decisions and plans by maximizing the cumulative reward, and a leader who can modify the reward of the follower. While algorithms exist for solving the incentive design problem, they rely on several restrictive assumptions about the follower: 1) When the best response is non-unique, the follower breaks ties in favor of the leader; 2) the leader knows perfectly how the modified reward is perceived by the follower; 3) the follower is fully rational. Motivated by the need for removing these assumptions, we study the problem of robust incentive design, where the goal is to obtain a robust strategy for the leader to achieve nearly optimal performance when these assumptions do not hold. We show that such a robust strategy exists under mild conditions and can be numerically computed using mixed-integer linear programming.
Robust Stackelberg Equilibrium
This paper provides a systematic study of the robust Stackelberg equilibrium (RSE), which naturally generalizes the widely adopted solution concept of the strong Stackelberg equilibrium (SSE). The RSE accounts for any possible up-to- suboptimal follower responses in Stackelberg games and is adopted to improve the robustness of the leader's strategy. While a few variants of robust Stackelberg equilibrium have been considered in previous literature, the RSE solution concept we consider is importantly different -- in some sense, it relaxes previously studied robust Stackelberg strategies and is applicable to much broader sources of uncertainties. We provide a thorough investigation of several fundamental properties of RSE, including its utility guarantees, algorithmics, and learnability. We first show that the RSE we defined always exists and thus is well-defined. Then we characterize how the leader's utility in RSE changes with the robustness level considered. On the algorithmic side, we show that, in sharp contrast to the tractability of computing an SSE, it is NP-hard to obtain a fully polynomial approximation scheme (FPTAS) for any constant robustness level. Nevertheless, we develop a quasi-polynomial approximation scheme (QPTAS) for RSE. Finally, we examine the learnability of the RSE in a natural learning scenario, where both players' utilities are not known in advance, and provide almost tight sample complexity results on learning the RSE. As a corollary of this result, we also obtain an algorithm for learning SSE, which strictly improves a key result of Bai et al. in terms of both utility guarantee and computational efficiency.