Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
This is a writeup on the feasibility of using TBSM for edge applications.
TBSM has three portions: first, a DLRM generates an input vector for each action in a time series. Second, every output of each DLRM in time series is basically used as an “embedding” for a second DLRM like model. While there are differences between the top and bottom section (especially with regards to normalization), both exclusively use dot products between “embeddings” and MLPs for computation.
This model is highly usable for edge applications when applied to models with relatively few data points per example, such as the Taobao dataset.
Published:
Like most things in machine learning, trends in NLP move fast. Transformers are not even three years old yet, and are already ubiquitous. With them came a paradigm shift in how models were trained. Instead of training models to do tasks from scratch, common practice is now to start from an expensive pretrained model on the internet. That model is trained on a self-supervised language modeling task to learn the language. This “Pretrain-finetune” pipeline allows for larger models to perform exceptionally on small datasets where they would normally overfit.
In May (or last century in pandemic-ML research time), GPT 3 drew headlines for its ability to generate text. Less talked was the paradigm shift advocated for in the paper: a move to few-shot learning. I want to examine the workload implications of this change.
Published in Association of Consumer Research, 2018
We present a two-stage model of consumer brand choice using behavioral measures of both brand memory and preference. This model outperforms standard models accounting for preferences alone in predicting memory-based choices, and also sheds new light on the mechanism by which brand memory is translated into purchase behavior.
Download here
Published in Association of Consumer Research, 2018
This paper aimed to use the word2vec models to see if we could predict which brands (Coke, Pepsi) people think of given a category (Soft Drinks). It turns out the dot product of the word2vec for the brand and category correlates with recall rate with a roughly power law relation.
Download here
Published in Proceedings of Machine Learning and Systems, 2020
Modern neural networks are increasingly bottlenecked by the limited capacity of on-device GPU memory. Prior work explores dropping activations as a strategy to scale to larger neural networks with fixed memory. However, these heuristics assume uniform cost per layer and only consider simple linear chain architectures, limiting their usability. In this paper, we formalize the problem of trading-off computation time and memory requirements for DNN training as the tensor rematerialization optimization problem. We develop a new system to optimally solve the problem in reasonable times (under an hour) using off-the-shelf MILP solvers.
Download here
Published in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022
While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the resources to pretrain must use existing models with lower performance. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data. Taken together, HPT provides a simple framework for obtaining better pretrained representations with less computational resources.
Download here
Undergraduate course, , 2022
I taught 70 for either as course staff for CSM for a long time at Berkeley. The reason I never switched to a different course is that I felt 70 allowed me the greatest opportunity to help students. The difficulty of the course paired with the fact that it was usually the last course students took for the GPA cutoff meant that students were quite stressed.