Max's Musings

DC >> NYC to learn data science

I presented the following content on Massive-Scale Entity Resolution (ER) Using Spark + Graph at the 2019 Spark + AI Summit. Check out the video and slides on the conference website here.

tl/dr: Apache Cassandra is a NoSQL database with flexible deployment options that’s highly performant (especially for writes), scalable, fault-tolerant, and proven in production. Common use-cases include IoT, messaging, and fraud detection. You probably shouldn’t use Cassandra if you have a small dataset, have highly transactional data, or need to do...

I was lucky enough to attend Spark Summit East 2017 February 8-9. I had to brave the 12” of snow blizzard Nico brought to Boston, but overall learned a lot about the strategic direction of the Apache Spark open source project and ecosystem. In this post I’ll fill you in...

I just graduated from the Spring 2016 NYC Metis Data Science Bootcamp (DS7 cohort) so I figured it would be a great opportunity to reflect on the experience. If you’re interested in learning more about Metis (i.e., you’re researching, applying, or preparing to attend), this post will provide you with...