API Documentation

The PANDA application is built on top of a REST API that can be used to power custom applications or import/export data in novel ways.

The PANDA API follows the conventions of Tastypie except in important cases where doing so would create unacceptable limitations. If this documentation seems incomplete, refer to Tastypie’s page on Interacting with the API to become familiar with the common idiom.

Note

You will probably want to try these URLs in your browser. In order to make them work you’ll need to use the format, email, and api_key query string parameters. For example, to authenticate as the default administrative user that comes with PANDA, append the following query string to any url described on this page:

?format=json&email=panda@pandaproject.net&api_key=edfe6c5ffd1be4d3bf22f69188ac6bc0fc04c84b

Unless otherwise specified, all endpoints that return lists support the limit and offset parameters for pagination. Pagination information is contained in the embedded meta object within the response.

Users

User objects can be queried to retrieve information about PANDA users. Passwords and API keys are not included in responses.

Warning

If accessing the API with normal user credentials you will only be allowed to fetch/list users and to update your own data. Superusers can update any user, as well as delete existing users and create new ones.

Example User object:

{
    date_joined: "2011-11-04T00:00:00",
    email: "panda@pandaproject.net",
    first_name: "Redd",
    id: "1",
    is_active: true,
    last_login: "2011-11-04T00:00:00",
    last_name: "",
    resource_uri: "/api/1.0/user/1/"
}

Schema

http://localhost:8000/api/1.0/user/schema/

List

http://localhost:8000/api/1.0/user/

Fetch

http://localhost:8000/api/1.0/user/[id]/

Create

To create a new user, POST a JSON document containing at least the email property to http://localhost:8000/api/1.0/user/. Other properties such as first_name and last_name may also be set. If a password property is specified it will be set on the new user, but it will not be included in the response. If password is omitted the user will need to set a password before they can log in (not yet implemented).

Tasks

The Task API allows you to access data about import, export and reindexing processes running on PANDA. This data is read-only.

Example Task object:

{
    end: "2011-12-12T15:11:25",
    id: "1",
    message: "Import complete",
    resource_uri: "/api/1.0/task/1/",
    start: "2011-12-12T15:11:25",
    status: "SUCCESS",
    task_name: "panda.tasks.import.csv",
    traceback: null
}

Schema

http://localhost:8000/api/1.0/task/schema/

List

http://localhost:8000/api/1.0/task/

List filtered by status

List tasks that are PENDING (queued, but have not yet started processing):

http://localhost:8000/api/1.0/task/?status=PENDING

Note

Possible task statuses are PENDING, STARTED, SUCCESS, and FAILURE.

List filtered by date

List tasks that ended on October 31st, 2011:

http://localhost:8000/api/1.0/task/?end__year=2011&end__month=10&end__day=31

Fetch

http://localhost:8000/api/1.0/task/[id]/

Data Uploads

Due to limitations in upload file-handling, it is not possible to create Uploads via the normal API. Instead data files should be uploaded to http://localhost:8000/data_upload/ either as form data or as an AJAX request. Examples of how to upload files with curl are at the end of this section.

Example DataUpload object:

{
    columns: [
        "id",
        "first_name",
        "last_name",
        "employer"
    ],
    creation_date: "2012-02-08T17:50:09",
    creator: {
        date_joined: "2011-11-04T00:00:00",
        email: "user@pandaproject.net",
        first_name: "User",
        id: "2",
        is_active: true,
        last_login: "2012-02-08T22:45:28",
        last_name: "",
        resource_uri: "/api/1.0/user/2/"
    },
    data_type: "csv",
    dataset: "/api/1.0/dataset/contributors/",
    dialect: {
        delimiter: ",",
        doublequote: false,
        lineterminator: "\r\n",
        quotechar: "\"",
        quoting: 0,
        skipinitialspace: false
    },
    encoding: "utf-8",
    filename: "contributors.csv",
    "guessed_types": ["int", "unicode", "unicode", "unicode"],
    id: "1",
    imported: true,
    original_filename: "contributors.csv",
    resource_uri: "/api/1.0/data_upload/1/",
    sample_data: [
        [
            "1",
            "Brian",
            "Boyer",
            "Chicago Tribune"
        ],
        [
            "2",
            "Joseph",
            "Germuska",
            "Chicago Tribune"
        ],
        [
            "3",
            "Ryan",
            "Pitts",
            "The Spokesman-Review"
        ],
        [
            "4",
            "Christopher",
            "Groskopf",
            "PANDA Project"
        ]
    ],
    size: 168
}

Schema

http://localhost:8000/api/1.0/data_upload/schema/

List

http://localhost:8000/api/1.0/data_upload/

Fetch

http://localhost:8000/api/1.0/data_upload/[id]/

Download original file

http://localhost:8000/api/1.0/data_upload/[id]/download/

Upload as form-data

When accessing PANDA via curl, your email and API key can be specified with the headers PANDA_EMAIL and PANDA_API_KEY, respectively:

curl -H "PANDA_EMAIL: panda@pandaproject.net" -H "PANDA_API_KEY: edfe6c5ffd1be4d3bf22f69188ac6bc0fc04c84b" \
-F file=@test.csv http://localhost:8000/data_upload/

Upload via AJAX

curl -H "PANDA_EMAIL: panda@pandaproject.net" -H "PANDA_API_KEY: edfe6c5ffd1be4d3bf22f69188ac6bc0fc04c84b" \
--data-binary @test.csv -H "X-Requested-With:XMLHttpRequest" http://localhost:8000/data_upload/?qqfile=test.csv

Note

When using either upload method you may specify the character encoding of the file by passing it as a parameter, e.g. ?encoding=latin1

Categories

Categories are referenced by slug, rather than by integer id (though they do have one).

Example Category object:

{
    dataset_count: 2,
    id: "1",
    name: "Crime",
    resource_uri: "/api/1.0/category/crime/",
    slug: "crime"
}

Schema

http://localhost:8000/api/1.0/category/schema/

List

When queried as a list, a “fake” category named “Uncategorized” will also be returned. This category includes the count of all Datasets not in any other category. It’s slug is uncategorized and its id is 0, but it can only be accessed as a part of the list.

http://localhost:8000/api/1.0/category/

Fetch

http://localhost:8000/api/1.0/category/[slug]/

Datasets

Dataset is the core object in PANDA and by far the most complicated. It contains several embedded objects describing the columns of the dataset, the user that created it, the related uploads, etc. It also contains information about the history of the dataset and whether or not it is currently locked (unable to be modified). Datasets are referenced by slug, rather than by integer id (though they do have one).

Example Dataset object:

{
    categories: [ ],
    column_schema: [
        {
            indexed: false,
            indexed_name: null,
            max: null,
            min: null,
            name: "first_name",
            type: "unicode"
        },
        {
            indexed: false,
            indexed_name: null,
            max: null,
            min: null,
            name: "last_name",
            type: "unicode"
        },
        {
            indexed: false,
            indexed_name: null,
            max: null,
            min: null,
            name: "employer",
            type: "unicode"
        }
    ],
    creation_date: "2012-02-08T17:50:11",
    creator: {
        date_joined: "2011-11-04T00:00:00",
        email: "user@pandaproject.net",
        first_name: "User",
        id: "2",
        is_active: true,
        last_login: "2012-02-08T22:45:28",
        last_name: "",
        resource_uri: "/api/1.0/user/2/"
    },
    current_task: {
        creator: "/api/1.0/user/2/",
        end: "2012-02-08T17:50:12",
        id: "1",
        message: "Import complete",
        resource_uri: "/api/1.0/task/1/",
        start: "2012-02-08T17:50:12",
        status: "SUCCESS",
        task_name: "panda.tasks.import.csv",
        traceback: null
    },
    data_uploads: [
        {
            columns: [
                "first_name",
                "last_name",
                "employer"
            ],
            creation_date: "2012-02-08T17:50:09",
            creator: {
                date_joined: "2011-11-04T00:00:00",
                email: "user@pandaproject.net",
                first_name: "User",
                id: "2",
                is_active: true,
                last_login: "2012-02-08T22:45:28",
                last_name: "",
                resource_uri: "/api/1.0/user/2/"
            },
            data_type: "csv",
            dataset: "/api/1.0/dataset/contributors/",
            dialect: {
                delimiter: ",",
                doublequote: false,
                lineterminator: "
                ",
                quotechar: """,
                quoting: 0,
                skipinitialspace: false
            },
            encoding: "utf-8",
            filename: "contributors.csv",
            id: "1",
            imported: true,
            original_filename: "contributors.csv",
            resource_uri: "/api/1.0/data_upload/1/",
            sample_data: [
                [
                    "Brian",
                    "Boyer",
                    "Chicago Tribune"
                ],
                [
                    "Joseph",
                    "Germuska",
                    "Chicago Tribune"
                ],
                [
                    "Ryan",
                    "Pitts",
                    "The Spokesman-Review"
                ],
                [
                    "Christopher",
                    "Groskopf",
                    "PANDA Project"
                ]
            ],
            size: 168
        }
    ],
    description: "",
    id: "1",
    initial_upload: "/api/1.0/data_upload/1/",
    last_modification: null,
    last_modified: null,
    last_modified_by: null,
    locked: false,
    locked_at: "2012-03-29T14:28:02",
    name: "contributors",
    related_uploads: [ ],
    resource_uri: "/api/1.0/dataset/contributors/",
    row_count: 4,
    sample_data: [
        [
            "Brian",
            "Boyer",
            "Chicago Tribune"
        ],
        [
            "Joseph",
            "Germuska",
            "Chicago Tribune"
        ],
        [
            "Ryan",
            "Pitts",
            "The Spokesman-Review"
        ],
        [
            "Christopher",
            "Groskopf",
            "PANDA Project"
        ]
    ],
    slug: "contributors"
}

Schema

http://localhost:8000/api/1.0/dataset/schema/

List

http://localhost:8000/api/1.0/dataset/

List filtered by category

http://localhost:8000/api/1.0/dataset/?category=[slug]

List filtered by user

A shortcut is provided for listing datasets created by a specific user. Simply pass the creator_email parameter. Note that this paramter can not be combined with a search query or other filter.

http://localhost:8000/api/1.0/dataset/?creator_email=[email]

Search for datasets

The Dataset list endpoint also provides full-text search over datasets’ metadata via the q parameter.

Note

By default search results are complete Dataset objects, however, it’s frequently useful to return simplified objects for rendering lists, etc. These simple objects do not contain the embedded task object, upload objects or sample data. To return simplified objects just add simple=true to the query.

http://localhost:8000/api/1.0/dataset/?q=[query]

Fetch

http://localhost:8000/api/1.0/dataset/[slug]/

Create

To create a new Dataset, POST a JSON document containing at least a name property to /api/1.0/dataset/. Other properties such as description may also be included.

If data has already been uploaded for this dataset, you may also specify the data_upload property as either an embedded Upload object, or a URI to an existing DataUpload (for example, /api/1.0/data_upload/17/).

If you are creating a Dataset specifically to be updated via the API you will want to specify columns at creation time. You can do this by providing a columns query string parameter containing a comma-separated list of column names, such as ?columns=foo,bar,baz. You may also specify a column_types parameter which is an array of types for the columns, such as column_types=int,unicode,bool. Lastly, if you want PANDA to automatically indexed typed columns for data added to this dataset, you can pass a typed_columns parameter indicating which columns should be indexed, such as typed_columns=true,false,true.

Import

Begin an import task. Any data previously imported for this dataset will be lost. Returns the original dataset, which will include the id of the new import task:

http://localhost:8000/api/1.0/dataset/[slug]/import/[data-upload-id]/

Export

Exporting a dataset is an asynchronous operation. To initiate an export you simple need to make a GET request. The requesting user will be emailed when the export is complete:

http://localhost:8000/api/1.0/dataset/[slug]/export/

Reindex

Reindexing allows you to add (or remove) typed columns from the dataset. You initiate a reindex with a GET request and can supply column_types and typed_columns fields in the same format as documented above in the section on creating a Dataset.

http://localhost:8000/api/1.0/dataset/[slug]/reindex/

Data

Data objects are referenced by a unicode external_id property, specified at the time they are created. This property must be unique within a given Dataset, but does not need to be unique globally. Data objects are accessible at per-dataset endpoints (e.g. /api/1.0/dataset/[slug]/data/). There is also a cross-dataset Data search endpoint at /api/1.0/data/, however, this endpoint can only be used for search–not for create, update, or delete. (See below for more.)

Warning

The external_id property of a Data object is the only way it can be accessed through the API. In order to work with Data via the API you must include this property at the time you create it. By default this property is null and the Data can not be accessed except via search.

An example Data object with an external_id:

{
    "data": [
        "1",
        "Brian",
        "Boyer",
        "Chicago Tribune"
    ],
    "dataset": "/api/1.0/dataset/contributors/",
    "external_id": "1",
    "resource_uri": "/api/1.0/dataset/contributors/data/1/"
}

An example Data object without an external_id, note that it also has no resource_uri:

{
    "data": [
        "1",
        "Brian",
        "Boyer",
        "Chicago Tribune"
    ],
    "dataset": "/api/1.0/dataset/contributors/",
    "external_id": null,
    "resource_uri": null
}

Warning

You can not add, update or delete data in a locked dataset. An error will be returned if you attempt to do so.

Schema

There is no schema endpoint for Data.

List

When listing data, PANDA will return a simplified Dataset object with an embedded meta object and an embedded objects array containing Data objects. The added Dataset metadata is purely for convenience when building user interfaces.

http://localhost:8000/api/1.0/dataset/[slug]/data/

Fetch

To fetch a single Data from a given Dataset:

http://localhost:8000/api/1.0/dataset/[slug]/data/[external_id]/

Create and update

Because Data is stored in Solr (rather than a SQL database), there is no functional difference between Create and Update. In either case any Data with the same external_id will be overwritten when the new Data is created. Because of this requests may be either POST‘ed to the list endpoint or PUT to the detail endpoint.

An example POST:

{
    "data": [
        "column A value",
        "column B value",
        "column C value"
    ],
    "external_id": "id_value"
}

This object would be POST‘ed to:

http://localhost:8000/api/1.0/dataset/[slug]/data/

An example PUT:

{
    "data": [
        "new column A value",
        "new column B value",
        "new column C value"
    ]
}

This object would be PUT to:

http://localhost:8000/api/1.0/dataset/[slug]/data/id_value/

Bulk create and update

To create or update objects in bulk you may PUT an array of objects to the list endpoint. Any object with a matching external_id will be deleted and then new objects will be created. The body of the request should be formatted like:

{
    "objects": [
        {
            "data": [
                "column A value",
                "column B value",
                "column C value"
            ],
            "external_id": "1"
        },
        {
            "data": [
                "column A value",
                "column B value",
                "column C value"
            ],
            "external_id": "2"
        }
    ]
}

Delete

To delete an object send a DELETE request to its detail url. The body of the request should be empty.

Delete all data from a dataset

In addition to deleting individual objects, its possible to delete all objects within a dataset, by sending a DELETE request to the root per-dataset data endpoint. The body of the request should be empty.

http://localhost:8000/api/1.0/dataset/[slug]/data/