Useful tip I got from “Extreme Productivity” by Robert Pozen, still reading.
Desktop research, or simply known as googling in today’s context, is the act of collecting information about a particular topic or domain, with a broad/generic question in mind, or even without any question at all.
One way to do this is simply keep googling related search terms and noting down facts and statistics, and constantly drilling down on a specific term until it’s milked - sort of like a depth-first search of collecting information. After we are satisfied with all the information we collect, we then try to piece together a story. But is this a good approach?
Pozen argues that “although extensive research might seem a logical first step, it’s actually very inefficient.” This is because “there are literally thousands of facts that could be relevant to any project; do you really want to collect them all?”
Instead, Pozen suggests that we first form “tentative conclusions”, by first thinking hard about our problem. “After a day or so of gathering relevant information, write down your tentative conclusions for the project. These will allow you to more quickly engage in analysis - rather than description - by providing a focus for your subsequent research.”
If you think about it, this makes a lot of sense and sounds more productive. And it’s a pretty scientific way of doing things - scientists don’t endeavour to do all the experiments that they can possibly imagine and then collecting all the data. Instead, they formulate hypotheses and design experiments to refute their claims. If their claims are not refuted, or are even proven, then they become published and reliable facts, for now.
And this applies to more than just desktop research:
Think of how you first learn a programming language outside of a school setting. Odds are you “hacked” your way through to your current level of proficiency, by relying on Google and Stackoverflow. You were probably required to write some production or project code, had an idea, didn’t work out, googled, and try again and again. Not going through things like “Basic data structures in Python” and work from thereon. I am definitely guilty of this - when I first started learning about Spark and writing PySpark code, I tried multiple times to start from some tutorial and slowly work my way up toward proficiency. Each time, I failed either because the material gets boring, or I lose patience, or something like that. It just doesn’t work.
Which is the more efficient of doing exploratory data analysis - plotting every single plot and collecting every single summary statistics there is to your data, or coming up with several hypotheses and then visualising and testing them? “My guess is that age is not normally distributed in this dataset, let’s see.”