Urban Research Corps
The below position(s) are currently available in the Urban Research Corps. Please submit your materials via the Google Form linked below.
Deadline to apply: October 4, 2021
The Mansueto Institute is conducting a project focused on the analysis of social and economic outcomes in American households using data science methods. The project will involve analyzing survey results on a monthly basis that track household experiences across a range of topics including housing security, spending habits, and health patterns. The role will involve maintaining a Python-based pipeline, deploying it on a cloud platform, and creating visualizations.
The role is up to 15 hours per week starting Fall Quarter through Winter 2022, with the possibility to extend.
- Fluency in Python and/or R..
- Seeking a graduate degree (e.g., Master’s or PhD).
- Experience developing pipelines that involve data ingestion, standardization, and transformation.
- Experience with structured, semi-structured, and/or unstructured data formats (e.g., CSV, JSON, Simple Feature, raster/GeoTIFF, and/or XML).
- Knowledge of software development practices, such as: version control; package management / virtual environments (e.g., pyenv, poetry, PipEnv, conda, Docker); unit testing; code review.
- Experience with Linux-based systems and command line tools such as Bash.
- Eagerness to constantly learn, ask questions, share knowledge, and teach others.
- Empathetic and self-aware mindset; express mutual respect, trust, and willingness to assist team members.
- Ability to work in a dynamic environment where requirements and solutions evolve through collaboration.
- Ability to articulate technical barriers and proactively solicit help from team members.
- Ability to work independently on the design and development of research methods.
- Ability to scope the feasibility of unstructured research tasks; self-delegate and create personalized work plans; identify, evaluate, and apply existing solutions to avoid “reinventing-the-wheel.”
- Capable of autodidactic learning from academic articles, self-guided tutorials, Stack Overflow, and package documentation.
- A Bachelor’s degree in a technical subject area is not required, but some formal training in math, computer science, statistics, economics, physics, engineering, or social science is strongly preferred.
- Significant experience in Python, R, and SQL.
- Experience with ETL and data ingestion through web scrapes, S3/FTP sync, and/or APIs.
- Experience writing clean and readable documentation for code repositories and datasets.
- Experience working in the tech industry, applied research organizations, as part of a cross-functional software development team, or working with a principal investigator.
- Familiarity with PyTorch, TensorFlow, Keras, TensorFlow Probability, and/or PyMC3. Practical understanding of probability and statistics, including Bayesian inference.
- Familiarity with workflow management platforms such as Apache Airflow, dbt (data build tool), or large-scale batch and streaming processing frameworks like Apache Beam or Google Cloud Dataflow.
- Familiarity with Google Cloud Platform, Amazon Web Services, Microsoft Azure, Midway HPC / SLURM.
- Familiarity with cloud deployment frameworks (e.g., Docker, Google Cloud Deployment Manager, AWS CloudFormation, Azure Resource Manager, Serverless Framework, TerraForm, Heroku).
Class Level Eligibility:
All current graduate students.
How to Apply
Participants must complete a short online questionnaire, submit their resume, and provide a link to a previous project that includes a code sample. The interview process involves a 30-minute conversation and a short take-home technical assessment.
To apply, please submit the following materials:
- Google Form Q&A
- PDF Resume
- Example of previous project work that includes code