Validation and Thoughts of Improvement

Validation - CART and Random Forest

To validate our classifications, we will need to collect Validation data. Just like for Training data, this can be done using the geometries-tools!
Collect validation data for the same classes you used for classification, and make sure to use the same property names and values.

When we are done defining our validation classes, we will once again need to merge them into one FeatureCollection. Then, we can validate the classifications by calculating an error matrice as well as the overall accuracy.

//Now that we defined our validation classes, we will once again need to merge them into one FeatureCollection.
//Merge into one FeatureCollection
var valMerge = vwater.merge(vvegetation).merge(vurban).merge(vbaresoil);
print(valMerge, 'Merged Validation data');

 
 

//Validation for the CART-Classifier
var cartvalidation = cartclassified.sampleRegions({
  collection: valMerge,
  properties: ['landcover'],
  scale: 30,
});
print(cartvalidation.limit(5000), 'Cart Validation');

//Error Matrix and Overall Accuracy for the CART-Classifier
var cartTestAccuracy = cartvalidation.errorMatrix('landcover', 'classification');
print(cartTestAccuracy, 'Validation of the CART Error Matrix');
//We can also use the error matrix to show the overall accuracy by calling .accuracy
print(cartTestAccuracy.accuracy(), 'CART Overall Accuracy');

//This rough CART classification map has an overall validation accuracy of 88.91%, which seems pretty nice.
 
 
 
 
//Validation for the Random Forest-Classifier
var rfvalidation = rfclassified.sampleRegions({
  collection: valMerge,
  properties: ['landcover'],
  scale: 30,
});
print(rfvalidation.limit(5000), 'Random Forest Validation');

//Error Matrix and Overall Accuracy for the Random Forest-Classifier
var rfTestAccuracy = rfvalidation.errorMatrix('landcover', 'classification');
print(rfTestAccuracy, 'Validation of the Random Forest Error Matrix');
//We can also use the error matrix to show the overall accuracy by calling .accuracy
print(rfTestAccuracy.accuracy(), 'Random Forest Overall Accuracy');

//This rough Random Forest classification map has an overall validation accuracy of 85.68%, which seems pretty nice.

Compare both classified images! Do you think they did a good job classifying our scene?

Consider the following factors:
How many classes did we choose?
How many sample points did we take per class?
Is point-data an adequate source of data for our task? Would polygons be better suited, and why?
Which sampling strategy did we choose? Is it adequate, or might another approach, like a stratified one, be better suited?

And, most importantly: how can we improve our classification?

Considerations:
Keeping in mind the really low effort, the classifications show surprisingly good results. Of course, this absolutely has to be taken in consideration of the low efforts we put into it, and only for the purpose of getting a rough overview of the spatial distribution of our classes.
Water and Vegetation areas look pretty solid, whereas the algorithm had difficulties to differentiate Urban and Bare Soil areas. Especially bare areas with really high reflection values are indicated as urban areas. On the other hand, all of the actual urban areas are correctly indicated.

To evaluate how we could improve this classification, let's go through the points brought up just before.

How many classes did we choose? 4 Classes. For such a huge research area, adding a few classes could improve the results. Useful additional classes could be for example cultivated and uncultivated fields.

How many sample points did we take per class? 20 points per class. This is by any means not enough for a representative Land Cover Classification, especially if the research area is so huge. The more the better, but it should be at least around 100 samples per class.

Is point-data an adequate source of data? It works, but using polygons would provide much better results, as a wider pixel value variation is being taken into consideration. If possible, polygons should always be used.

Which sampling strategy did we choose? Equalized sampling, so every class is evenly represented. A stratified approach might be better, as water and vegetation areas have proven to be already pretty well classified with our low efforts. This means that it might make sense to put more effort into the urban and bare soil classes, which means more creating more training samples for those classes.

Department of Earth Sciences

GEO-IT - The Technology of Data Acquisition for Sustainable Development and Crisis Management

Validation and Thoughts of Improvement

Validation - CART and Random Forest

Vbrick Rev