The PANDA Project is a data warehousing solution for newsrooms. It provides a place for journalists to securely archive data, search it and share it within their organization.
The PANDA board is made up of Brian Boyer and Joe Germuska of the Chicago Tribune, joined by Ryan Pitts of The Spokesman-Review. The lead developer is Christopher Groskopf. Additional source code, bug reports and feedback have been contributed by a community of users.
Development has been funded by a 2011 Knight News Challenge Grant and will continue for one year (ending in early September, 2012).
Grant funds were directed to Investigative Reporters and Editors (IRE) who are facilitating the grant and providing additional infrastructure support.
PANDA A Newsroom Data Appliance.
It’s a recursive acronym.
PANDAs eat data.
No, PANDA is intended to be an internal tool. We provide instructions describing how to secure your data so only users in your organization can see your data. This is implemented both using firewalls and user credentials.
No, PANDA is a place to store and search data, but not a way of publishing it.
No, PANDA does not produce graphics or interactives.
Sort of, but PANDA is self-hosted, open source and designed specifically for newsrooms.
As safe as we can make it, though the safety of your data is far more dependent on backups, server stability, etc. then on choices made while developing PANDA.
PANDA is released under the MIT License, one of the most permissive of all open source licenses. It can be freely used by commerical and non-commerical entities alike.
You do. We offer instructions for hosting on Amazon’s EC2 service or hosting on your own servers. In order not to create a sustainability problem when the grant ends, PANDA is not available as a service.
It depends on how powerful of a server you need, but for an EC2 “small”, storage and bandwidth it will cost you $70-100 a month. This “small” size is our default and probably enough for many small-to-medium size organizations.
Very small organizations can also try running PANDA on an EC2 “micro”, at a cost of $15-30 per month, but this is infrequently tested and not likely to perform well for more than a handful of users.
It depends on how zealous you are about security. A PANDA in a properly secured EC2 environment (i.e. firewalled for your organization and with SSL configured) is a pretty secure beast. However, as with any hosted platform, there is no technical way to gaurantee an employee of Amazon isn’t snooping.
Maybe. If you will be putting highly sensistive data in your PANDA–data so sensitive you are concerned it may be subpeonaed–then you should not host on Amazon. As with any 3rd party service a legal claim to your data could be made against the provider, rather than against you, depriving you of the right to have your lawyers defend against it.
PANDA requires Ubuntu 12.04. This is a Long-Term Support release of Ubuntu, meaning it will be supported with patches by Canonical for five-years.
Support for other platforms is unlikely, but not totally out of the question.
Yes. We would love to make PANDA more modular, but it’s complex array of depedencies make this very difficult and we would prefer to spend our grant funds developing features and ensuring its a stable product.
Obviously nothing is actually stopping you from installing other stuff on the same server. Just don’t do it.
Very likely! If it can run Ubuntu 12.04 it can probably run PANDA. We don’t have “minimum requirements”, but the specs of an EC2 small are:
Any PC manufactured in the last five years should easily exceed these specifications.
Yes, see our API documentation.
Only if you choose make your PANDA API public, which we strongly discourage. PANDA is not designed to support many concurrent users, nor is the data structured in a manner suitable for most user-facing applications. If you want to use PANDA to publish data, we suggest writing a script to shadow tables into a SQL database. This will be more stable and secure, both for your application and for your PANDA.
The linchpin technologies used by PANDA are Python, Django, and Solr. For a more complete list, see our Architecture choices wiki page.