The Mathematical Shape of Big Science Data [@QuantaMagazine for Simons Foundation]
Today’s big data is noisy, unstructured, and dynamic rather than static. It may also be corrupted or incomplete. “We think of data as being comprised of vectors – a string of numbers and coordinates,” said Jesse Johnson, a mathematician at Oklahoma State University. But data from Twitter or Facebook, or the trial archives of the Old Bailey, look nothing like that, which means researchers need new mathematical tools in order to glean useful information from the data sets. “Either you need a more sophisticated way to translate it into vectors, or you need to come up with a more generalized way of analyzing it,” Johnson said.