“Making sense of most of the world’s data.” Hollywood sometimes starts with a movie title and then “fills in the details”, so why not start a startup with a tagline? I grant that it’s a bit grandiose, but I think it gives a sense of the ambition of the Altaridey platform: it is meant to make sense of most of the data the world collects. By way of describing the purpose of this work, I want to paint a very specific picture: you have a social scientist sitting somewhere in the dimly-lit warrens of American academia, embarking on a career-making study of some very important economic indicator. Not just any study, but one that seeks to build a predictive model substantially explaining all of the indicator’s variation (as career-making studies tend to).
It is still very much a fact today that this sort of predictive model begins life as a thought experiment that goes something like: “what factors could plausibly affect the magnitude of signal A that I care about?” If the scientist knows what he’s doing, he’ll brainstorm for a while, test and reject a few (dozen) theories, and eventually come up with a list of factors that explain 90% of the variation in A, but it is the remaining 10% that point up the fundamental flaw of this approach: a list made in this way is biased from the outset. And, crucially, it is a bias that tends to segregate scientific domains and make it unlikely that strong cross-domain relationships are found or pursued. If I’m a demographer looking at obesity patterns, I’m not going to use weather as an explanation because (A) it wouldn’t occur to me to look for causation in weather, and (B) it would be considered quirky to use it as an explanation, and the last thing demographers want is to be called quirky. What we need is some regimentation in how relationships between data are found and ranked. And this is precisely the challenge that Altaridey addresses: it detects both cross-domain and intra-domain relationships between data series, and, by making the selection of factors an objective process, it will make the generation of predictive models a far more structured enterprise. The implication of such a tool is that if I can substantially parameterize the variation in a signal, then I can predict the signal, and that is [gulp] the ultimate goal.
As I blog about Altaridey over the following weeks and months, a lot of details will be fleshed out and clarified, so I won’t attempt to squeeze too much into this inaugural entry. But in the interest of establishing bona fides, I want to discuss a very important question. What (my mother asks) is the difference between Altaridey and this and this? And that’s a fair question because those are bigger enterprises with missions seemingly analogous if not identical, but also, like, “budgets” and “employees”, and I’m just one enthusiast sitting in my home office, writing code, blogging. By way of an answer to that question, let me offer an analogy. There’s Facebook and there’s email. Both allow people to communicate on subjects of their choosing, enable privacy (in the case of email, by dint of being a person-to-person communication medium). I propose that the salient distinction is one of structure: every form of communication between facebookers, and facebookers themselves, are well-described in some structured database somewhere: facebook knows where to put every kind of communication that it allows, and every bit of it. Email, on the other hand, is far less constrained: structurally, there is no prescription for how to use it, as long as you satisfy a low-level protocol. To complete the analogy, those other data services are email and I’m facebook. Data flows in and is structured in a very specific way, and only certain kinds of data is accepted for processing and only processing that furthers the goal of building connections between data series is performed. Big data services more generally have a broader mission and purport to help you solve a wider array of problems. But by making the mission broader, they don’t permit the kind of tailoring of resources that a narrow mission allows. A spork is more versatile than a spoon, but you still want a spoon if you’re eating soup. Altaridey is a world-class single-purpose tool: a real-time data analysis application that can eventually encompass all of the data domains, generate all the linkages worth knowing about, and find the last 10% of causation. And unlike Facebook, there is practically no network effect to an application like this: data series from most domains are freely available and all one needs to have is the patience to load the data into this exquisitely-tailored tool.