> Motivation: Distributed Machine Learning across the Internet is critical for training models with private and sensitive data. However, the slow speed of wide-area-network poses a bottleneck in model synchronization. Strato aims to solve this problem by creating a software-defined network that is optimized to speed up the data pattern of intern-DC ML training.
> Motivation: Distributed Machine Learning across the Internet is critical for training models with private and sensitive data. However, the slow speed of wide-area-network poses a bottleneck in model synchronization. Strato aims to solve this problem by creating a software-defined network that is optimized to speed up the data pattern of intern-DC ML training.
> Exceptional Performance: Leveraging Rust's powerful zero-cost abstraction and ownsership model, Strato is capable of achieving multi-Gbps throughput on mid-tier cloud vCPUs. The performance is further augmented with coroutines and async I/O, thanks to Rust's powerful tokio runtime. Fairness in ensured in packet processing between coroutines to cater to the needs of distributed communciation algorithms such as RingAllReduce.
> Exceptional Performance: Leveraging Rust's powerful zero-cost abstraction and ownsership model, Strato is capable of achieving multi-Gbps throughput on mid-tier cloud vCPUs. The performance is further augmented with coroutines and async I/O, thanks to Rust's powerful tokio runtime. Fairness in ensured in packet processing between coroutines to cater to the needs of distributed communciation algorithms such as RingAllReduce.
> Dynamic multi-path routing: To bypass the bottleneck of the slow WAN, Strato employs a real-time multi-path routing algorithm that is capable of dynamically selecting the best path for each stream on the current network condition. The optimzation is performed on a centralzied controller that seamlessly can be easily extended via a database-oriented API.
> Dynamic multi-path routing: To bypass the bottleneck of the slow WAN, Strato employs a real-time multi-path routing algorithm that is capable of dynamically selecting the best path for each stream on the current network condition. The optimzation is performed on a centralzied controller that seamlessly can be easily extended via a database-oriented API.
> Seamless Integration with ML Frameworks: Strato is carefully engineered with NCCL and OpenMPI in mind, and seamlessly supports high-level distributed ML frameworks from PyTorch Distributed to HuggingFace Accelerate and DeepSpeed.
> Seamless Integration with ML Frameworks: Strato is carefully engineered with NCCL and OpenMPI in mind, and seamlessly supports high-level distributed ML frameworks from PyTorch Distributed to HuggingFace Accelerate and DeepSpeed.
> Motivation: Network traffic engineering (TE) aims to optimize network resource allocation for each data flow. Computing optimal solutions for large-scale network is time-consuming for large networks, which opens up opportunity for an ML-based approach to approximate the optimal solution.
> Motivation: Network traffic engineering (TE) aims to optimize network resource allocation for each data flow. Computing optimal solutions for large-scale network is time-consuming for large networks, which opens up opportunity for an ML-based approach to approximate the optimal solution.
> Core Challenges Solved: TE problems are formulated as constrained optimization problems. However, training a neural network to approximate the optimal solution while respecting the contraints is inherently difficult. DeepTE solves this problem with the novel mechanism Gradient Descent-based Feasibility Projection that guarantees the solution meets contraints within finite iterations.
> Core Challenges Solved: TE problems are formulated as constrained optimization problems. However, training a neural network to approximate the optimal solution while respecting the contraints is inherently difficult. DeepTE solves this problem with the novel mechanism Gradient Descent-based Feasibility Projection that guarantees the solution meets contraints within finite iterations.
> Effecient Model Design: DeepTE's core deep neural network model combines the power of state-of-the-art GroupRevGNN to comprehend the relationship between nodes and flows in the large network topology, and leverages parallel fully connected layers to approximate optimal solutions taking into account of the global network state. The model is trained with unsupervised learning and reinforcement learning, and is designed with a parallelism in mind to accelerate training and inference.
> Effecient Model Design: DeepTE's core deep neural network model combines the power of state-of-the-art GroupRevGNN to comprehend the relationship between nodes and flows in the large network topology, and leverages parallel fully connected layers to approximate optimal solutions taking into account of the global network state. The model is trained with unsupervised learning and reinforcement learning, and is designed with a parallelism in mind to accelerate training and inference.
> Performance Achieved: DeepTE is able to achieve over 95% accuracy in predicting the optimal solution for TE problems with 1000+ nodes and 10000+ flows. Not only does this out-performs state-of-the-art works in terms of accuracy, it also does so under 100ms, which is over 100x improvement in terms of speed.
> Performance Achieved: DeepTE is able to achieve over 95% accuracy in predicting the optimal solution for TE problems with 1000+ nodes and 10000+ flows. Not only does this out-performs state-of-the-art works in terms of accuracy, it also does so under 100ms, which is over 100x improvement in terms of speed.
> Motivation: ML models can be vulnerable to various adversarial attacks algorithms. However, there is no standard benchmarking tool for comparing the performance of these different attack algorithms. DeepPen aims to solve this problem by providing a web application where researchers can upload their attack algorithms and compare their performance against other algorithms.
> Motivation: ML models can be vulnerable to various adversarial attacks algorithms. However, there is no standard benchmarking tool for comparing the performance of these different attack algorithms. DeepPen aims to solve this problem by providing a web application where researchers can upload their attack algorithms and compare their performance against other algorithms.
> High-level Design: Four core-modules: Frontend, Backend, Storage, and Sandbox. Front-end presents a simple, clean interface for the user to interact with. Storage allows persistence of results and metrics per user. Back-end defines a communication interface between the user (front-end) and the other subsystems via REST API. Sandbox subsystem handles all aspects of neural network code execution
> High-level Design: Four core-modules: Frontend, Backend, Storage, and Sandbox. Front-end presents a simple, clean interface for the user to interact with. Storage allows persistence of results and metrics per user. Back-end defines a communication interface between the user (front-end) and the other subsystems via REST API. Sandbox subsystem handles all aspects of neural network code execution
> Frontend: Frontend screenshot 1. Interactive code eidtor for users to upload their attack algorithms. User can also choose environmental settings to run their scripts.
> Frontend: Frontend screenshot 1. Interactive code eidtor for users to upload their attack algorithms. User can also choose environmental settings to run their scripts.
> Fontend: Frontend screenshot 2. Experiment configuration model allowing the user to specify the parameters of the experiment, including and the size of models to be attacked, and the test data to be used
> Fontend: Frontend screenshot 2. Experiment configuration model allowing the user to specify the parameters of the experiment, including and the size of models to be attacked, and the test data to be used
> Fontend: Frontend screenshopt 3. Data inspector for the result of the experiment. User can view the how each individual adversaril data sample as bitmap. Data colored in red indicates that the attach was successful, green otherwise.
> Fontend: Frontend screenshopt 3. Data inspector for the result of the experiment. User can view the how each individual adversaril data sample as bitmap. Data colored in red indicates that the attach was successful, green otherwise.
> Fontend: Frontend screenshopt 4. Comprehensive dashboard for comparing how different attack algorithms perform against each other. User can also view the performance of each individual attack algorithm.
> Fontend: Frontend screenshopt 4. Comprehensive dashboard for comparing how different attack algorithms perform against each other. User can also view the performance of each individual attack algorithm.
> SandBox: Overview of Sandbox. Sandbox is a docker container that is responsible for executing the user's attack algorithm. It is also responsible for collecting the results of the attack and returning it to the user. For every request, a new virutal environment is created to ensure that the user's code is executed in isolation.
> SandBox: Overview of Sandbox. Sandbox is a docker container that is responsible for executing the user's attack algorithm. It is also responsible for collecting the results of the attack and returning it to the user. For every request, a new virutal environment is created to ensure that the user's code is executed in isolation.
> SandBox: Horizontal scalability of Sandbox. Sandbox is designed to be horizontally scalable. This allows DeepPen to handle a large number of concurrent requests. The number of Sandbox instances can be scaled up or down based on the current load.
> SandBox: Horizontal scalability of Sandbox. Sandbox is designed to be horizontally scalable. This allows DeepPen to handle a large number of concurrent requests. The number of Sandbox instances can be scaled up or down based on the current load.
> SandBox: Sequence diagram for processing one request. The user's code is executed in a virtual environment. The results of the attack are then returned to the user.
> SandBox: Sequence diagram for processing one request. The user's code is executed in a virtual environment. The results of the attack are then returned to the user.
> Motivation: CloudFormation is a service that allows users to provision and manage AWS resources through a declarative template. CloudFormation is a highly complex service with many features. It is important to track how users are using these features to inform future product development.
> Motivation: CloudFormation is a service that allows users to provision and manage AWS resources through a declarative template. CloudFormation is a highly complex service with many features. It is important to track how users are using these features to inform future product development.
> Core Challenges Solved: CloudFormation is a highly complex service with. It is challenging to decide where to insert metrics collection algorithms that can collect business metrics accurately while minimally impacting the performance of critical datapaths. Moreover, the data collected needs to be integrated with existing BI pipelines, which is challenging due to the large number of existing pipelines.
> Core Challenges Solved: CloudFormation is a highly complex service with. It is challenging to decide where to insert metrics collection algorithms that can collect business metrics accurately while minimally impacting the performance of critical datapaths. Moreover, the data collected needs to be integrated with existing BI pipelines, which is challenging due to the large number of existing pipelines.
> Project Responsibilities: As an intern, I was given the rare opportunity to lead the design and implementation of the entire project. I was responsible for drafting design proposals, leading internal design meetings, integrating the data with existing BI pipelines, designing the data schema for the new data, and finally, deploying the pipelines to 21 regions across the world.
> Project Responsibilities: As an intern, I was given the rare opportunity to lead the design and implementation of the entire project. I was responsible for drafting design proposals, leading internal design meetings, integrating the data with existing BI pipelines, designing the data schema for the new data, and finally, deploying the pipelines to 21 regions across the world.