FAQ | Open Data Cube

Frequently Asked Questions

QUESTION:

Can the Data Cube be accessed from R/C++/IDL/etc.?

ANSWER:

This is not currently directly supported, the Data Cube is a Python based API. The base technology managing data access PostgreSQL, so theoretically the functionality can be ported to any language that can interact with the database. An additional option is just shelling out from those languages, accessing data using the Python API, then passing the result back to the other program/language.

QUESTION:

Does the Data Cube support xyz projection?

ANSWER:

Yes, the Data Cube either does support or can support with minimal changes any projection that rasterio can read or write to.

QUESTION:

I want to store more metadata that isn't mentioned in the documentation. Is this possible?

ANSWER:

This entire process is completely customizable. Users can configure exactly what metadata they want to capture for each dataset - we use the default for simplicities sake.

QUESTION:

Does ingestion handle preprocessing or does data need to be processed before ingestion?

ANSWER:

The ingestion process is simply a reprojection and resampling process for existing data. Data should be preprocessed before ingestion.

QUESTION:

How flexible are the spatial grid definitions in ODC?

ANSWER:

An ODC grid specification can be in any projection supported by GDAL. Individual files can remain in their native projection (such as Landsat scenes in various UTM projections). Any spatial grid can be used to specify a output projection and resolution when retrieving data from ODC.

QUESTION:

What are the currently predefined spatial grids in ODC?

ANSWER:
DEA has created it’s own grid definition, using 100km square tiles in Australian Albers Equal Area (EPSG:3577). I know that our CEOS colleagues have used WGS84, EPSG:4326 and I believe there are a variety of other grid specifications that other ODC operators have created that are specific to their area of interest.

QUESTION:
Can data on the standard LANDSAT-WRS and MODIS MODLAND Sinusoidal grids be ingested into the ODC directly, without transforming it to a standard ODC grid?

ANSWER:
Yes and our preference is to work with satellite datasets in their native grid projections.

I’d note that there is an unfortunate and persistent misunderstanding regarding ODC that we are working to address: ODC does NOT require source data to be ingested in to a tile grid, only indexed. In earlier versions of ODC, only tiled ingested data was available via the “load” methods and indexing was a precursor step - now, ingestion is an optional step that should only be taken by experienced users who understand how to optimize their data for their execution environment and workloads.

QUESTION:
How could one use ODC to merge LANDSAT and SENTINEL-2A/2B?

ANSWER:
First: Determine a target resolution and projection. Second: either reproject/warp/resample at time of load or ingest your data to your defined pixel resolution and projection. This would be an iterative process, so I wouldn’t recommend ingesting your data until you have understood what impact your resampling technique has had on your data.

QUESTION:

Is there a GDAL interface to the ODC?

ANSWER:
Generally speaking, ODC is incapable of indexing data not readable by GDAL and any ingested data should remain readable by GDAL. In that sense, the datashould always be in a GDAL supported format. It is possible to create and load data from files that are not GDAL compliant - CSIRO has created an extension to ODC for use with AWS S3 that creates serialized chunks of data that are not accessible except via an ODC API. It is conceivable that you may wish to use a GDAL interface to interact with the ODC API rather than individual datasets - in which case we can help write one.

QUESTION:

How are different temporal instances of the same data set handled in ODC? Consider a set of LANDSAT-8 images for one particular WRS path/row. Are they stored in ODC as separate images? How can they be related in ODC queries?

ANSWER:
Every dataset is indexed and when queries are executed results are aggregated and returned to the user. In terms of multiple scenes from a single pass this will be grouped into a single solar day. In the case of the typical north-south overlap of two contiguous scene from a single pass the default would be for the northern pixels to be returned - this can be controlled by the user. These can be queried temporally, spatially or via it’s WRS path/row if that has been captured in thedataset metadata. A query can be performed in ODC specifying an area of interest and a time range, without having to know about a particular satellite's path/row system. The returned data is provided as a 3-dimensional array, with labelled coordinates for time axis and the spatial x/y or lat/lon axes.

QUESTION:

Could one store a time-first, space-later grid in ODC? This is a grid where all time instances of one measure are stored together. Consider a MODIS MODLAND Sinusoidal grid where each tile would hold all time instances of a measure (e.g. NDVI). Such a tile would be a since TIFF/netCDF file with 468 "bands", one for each time instance of 16-day MOD09 product for 18 years (2001-2018)

ANSWER:

Yes. In general any GDAL raster format is supported - whether they are orientated to maximize performance over time, space or attribute/band/parameter.
DEA has tools to restructure the underlying storage from spatial tiles to time-stacked NetCDF files, with a tunable chunking configuration. We typically run this once a full year's worth of data has been collected. This is done transparently without impact on the user.

QUESTION:

Is it possible to use the ODC metadata interface (run in PostgreSQL) to index existing analysis-ready data sets? Suppose one has a lot of data already stored in Amazon AWS which is analysis-ready. How is this data ingested in ODC?

ANSWER:
Not only is this possible, but this is the strategic vision for DEA’s delivery architecture using ODC. Datasets from S3 can be indexed individually - however we are currently working with Radiant Earth to both publish our datasets in AWS S3 with STAC metadata and to extend ODC to be able to index datasets using STAC. Using STAC’s principles of static, lightweight representation of spatial metadata would mean large collections could be indexed quickly, with a minimum of GET requests.

QUESTION:

What are possible spatio-temporal queries available in the ODC metadata server? Have you designed a spatio-temporal query language?

ANSWER:

Spatio-temporal queries are the basis for most algorithms and demonstrate the true power of the ODC. Queries are currently restricted to creating Python query objects and passing them to the ODC API and Database Index. These queries can search by temporal and spatial extent as well as any other fields associated with a dataset. Dataset extended metadata, including lineage, are stored in JSONB objects within the database - which provides enough flexibility and information for most use cases (at least the ones we have encountered so far).

We have an online user interface (demo at: http://ec2-52-201-154-0.compute-1.amazonaws.com/) that can run any number of spatio-temporal queries using many common application algorithms (e.g. mosaics, indexes, water detection). We also have several Jupyter Notebook interfaces that allow users to run custom Python scripts and build their own algorithms.

Africa Regional Data Cube Webinar - November 30, 2018

A webinar hosted by the Africa Open Data Network (AODN)

Sign-up for the Open Data Cube mailing list!

Frequently Asked Questions