Accustoming to Claude Code
For reference, I am a first-year PhD student working in the space of applied machine learning. Recently, I was given access to Claude Code (max version) and have been using it for the past two months and wanted to share my two cents on my philosophy with these types of systems.
What I am finding is, for the large part, Claude Code (CC) has given me the ability to produce vertically-framed research projects with incredible productivity and speed. When I say vertically framed, I am referring to projects that require expertise in multiple domains – think end-to-end systems.
For instance, I was working on a project that required me to create a user-facing app to demo and test my modeling work. As has been the case with the software development industry, there are a myriad of frameworks and inevitable version updates. Because it has been quite a while since I last actually developed an app, CC was incredibly useful in integrating my machine learning model into a scaffolding that allows for end-to-end production. So, I guess, what CC really helped with was to take a partially usable project and really bring it home as an end-to-end product.
That said, this speed comes with tradeoffs. I have found CC to be quite addictive — I could come up with ideas and test them with incredible speed. This can sound great in the beginning, but it very quickly ended up becoming quite limiting. The scaffolding built by CC is incredibly complex — CC tends to map modules and data structures in ways that are deeply unintuitive to humans and frankly too difficult to trace. Due to this complexity, it becomes nearly impossible to actually perform any sort of manual implementations.
At this stage, the code that is supposedly “mine” becomes a black box where the input simply becomes written text to CC and the output is the results generated by the code. I’ve found this to be really dangerous on many fronts, the most obvious being data preprocessing. I’ve caught CC red-handed many times where data is leaked between the train/test split. Such simple yet crucial mistakes then end up propagating throughout the rest of the pipeline. Worth noting too is that not only is the scaffolding generated by CC a black box — so is CC itself.
I think finding the middle ground between manual implementation and automating with CC is one of the most relevant questions at the moment. It is also a question that, I think, is extremely crucial to study in settings specific to academia. Publishing a paper means that all contents are the responsibility of the authors. If the authors themselves do not comprehend the inner workings of the black box generated by CC, it sets a very bad example.
On a more personal note, I have found that CC behaves like a drug and depletes my own ability to actually learn new concepts. Ultimately, there is incredible worth in learning important concepts on your own — implementing and studying them from the ground up, while using AI as a supplementary tool rather than an oracle. Doing so allows us to build intuition. Of course, learning calculus or physics in high school hasn’t directly translated into the research I do today, but it has structured the way I approach problems. A large part of this approach is built upon intuition accumulated over the experience of solving problems across many domains.
One might be tempted to simply defer to CC entirely. But unlike a true oracle, CC is not always correct — and without developed intuition, mistakes go unnoticed. Intuition is precisely what allows us to recognize when an automated agentic system like CC is falling off the rails, and to catch it before it does damage.
I think, given how new all of these systems are, academia and industry alike are facing a huge learning curve. The purpose of making this a public blog post is to encourage other academics to share their experiences as well. This way, we can begin creating a reservoir of working knowledge about how to effectively and safely use tools like CC.