Archaeoscape: LiDAR archaeology ML dataset
We present Archaeoscape, a novel open-access dataset for archaeological research, spanning 888 km² in Cambodia with 31,141 expert-annotated archaeological features from the Angkorian period. Archaeoscape is over four times larger than comparable datasets, and the first LiDAR archaeology resource with open-access data, annotations, and models.
The imagery was collected during the KALC (2012) and CALI (2015) aerial LiDAR campaigns. The annotations are the product of systematic documentation and field verification spanning over 30 years, from initial pre-LiDAR surveys in 1993 through continuous mapping until 2024.
Description
The 888 km² dataset is split into 23 non-overlapping parcels assigned to:
- Training set: 623 km², 16 parcels.
- Validation set: 97 km², 3 parcels.
- Test set: 168 km², 4 parcels.
It includes high-resolution (0.5m) orthophotos and LiDAR-derived normalized Digital Terrain Models (nDTM), encompassing over 3.5 billion pixels with RGB values, elevation data, and polygonal annotations.
The annotations cover 5 classes:
- Temple (827 instances, 0.2% pixels). From monumental complexes to small shrines.
- Mound (14,400, 8.6%). Earthen features indicating habitation, embankments, crafting sites.
- Hydrology (16,184, 10.4%). Hydro-engineering features like rivers, ponds, canals and reservoirs.
- Void (3,145, 2.5%). Ambiguous areas, excluded from evaluation.
- Background (78.3%). Regions lacking distinguishable archaeological features.
To protect sensitive archaeological sites, the data is distributed without georeferencing and released through credentialized open access - users must provide their credentials and explicitly agree to the license terms prohibiting re-georeferencing, commercial use, and redistribution.
License
The Archaeoscape dataset is distributed under a custom license, which prohibits redistribution and attempts at localizing the data. We provide the full text of the license below.
The École française d'Extrême-Orient (EFEO) makes the Archaeoscape dataset (the “DATASET”) available for research and educational purposes to individuals or entities ("USER") that agree to the terms and conditions stated in this License.
- 1. The USER may access, view, and use the DATASET without charge for lawful non-commercial research purposes only. Any commercial use, sale, or other monetization is prohibited. The USER may not use the DATASET for any unlawful activities, including but not limited to looting, vandalism, and disturbance of archaeological sites.
- 2. The USER may not attempt to identify the location of any part of the DATASET and must exercise all reasonable and prudent care to avoid the disclosure of the locations referenced in the DATASET in any publication or other communication.
- 3. The USER may not share access to the DATASET with anyone else. This includes distributing the download link or any portion of the DATASET. Other users must register separately and comply with all the terms of this License.
- 4. The USER must use the DATASET in a manner that respects the cultural heritage of Cambodia and its people, and in compliance with the relevant Cambodian authorities. Any use of the DATASET that could harm or exploit these cultural sites or their environment is strictly prohibited.
- 5. The USER must properly attribute the EFEO as the source of the data in any publications, presentations, or other forms of dissemination that make use of the DATASET.
- 6. This agreement may be terminated by either party at any time, but the USER's obligations with respect to the DATASET shall continue after termination. If the USER fails to comply with any of the above terms and conditions, their rights under this License shall terminate automatically and without notice.
THE DATASET IS PROVIDED "AS IS," AND THE EFEO DOES NOT MAKE ANY WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. IN NO EVENT SHALL THE EFEO OR ITS COLLABORATORS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY ARISING FROM THE USE OF THE DATASET.