The methodology of SemanticGIS
is built around three interconnected concepts:
- A structuring of a GIS project into five well-defined phases
- Documenting the GIS project through the use of a Design Rationale
- Structuring the analytical process through the use of a domains specific language. This article focuses on the five phases of the GIS project, these five phases are named after the primary intellectual task undertaken in each phase and function as a way to isolate the tasks, allowing for a focused and well-documented intellectual process.
1. Project Scoping
The first phase of any project is to define its purpose, boundaries, and real-world context. All projects exist within an organisational framework of rules, resources, and requirements. The first and most critical step is to understand this framework, as it defines the practical limits and directs the intellectual focus of your work.
The Organisational Framework: Understanding Your Constraints
Before defining your specific research goals, you must identify and document the external factors that will shape your project. These are often non-negotiable constraints imposed by clients, employers, or legal bodies.
-
The Brief: You must begin by analysing the project brief from the client (or teacher). This document dictates the entire structure of your project. Your first task is to determine which of two primary workflows it requires:
-
Exploitative Workflow: The brief poses an open-ended question (e.g., “Investigate the relationship between green space and property values”). Your process will be iterative, and the final output may not be known at the start.
-
Reverse Engineering Workflow: The brief demands hard deliverables (e.g., “Produce a map of all land parcels suitable for urban agriculture”). Here, your process is determined by working backward from the required output to identify the necessary data and analytical steps. It is always a best practice to reformulate the brief in your own words, clarifying the workflow and deliverables, and have it approved by the client to ensure a shared understanding. Also note that a brief can seldom be answered through a enticingly Exploitative workflow or a enticingly Reverse Engineering Workflow: often the two work in tandem. This articel presents the phases in the order of a Exploitative workflow since this is the simplest. At the end of the articel we wil discuss how to modify the proces for a more Reverse Engineering Workflow:
-
-
The Stakeholder Landscape: Beyond the primary client (or teacher) who provides the brief, most projects have a wider network of stakeholders. You should identify who they are and what their interests or requirements might be.
- Who are they? Stakeholders could include the IT department that controls data access and software, the end-users who will interact with your final map, a communications team that must approve public-facing materials, or even community groups affected by the analysis.
- Why does it matter? A brilliant analysis that the IT department can’t support or that the end-users find confusing is a failed project. Mapping these stakeholders early helps you anticipate needs and manage expectations.
-
Resources: You need to assess the practical resources available. This includes the time allocated for the project, the budget, and your access to crucial, task-specific data, software, or expertise within the organisation. The Technical Infrastructure This is a more specific aspect of “Resources.” It’s not just about what you have access to, but what you are required to use.
- What is it? This includes constraints like a mandatory software ecosystem (e.g., “This organization works exclusively with ESRI products”), required database formats (e.g., “All data must be stored in the central PostgreSQL/PostGIS database”), or specific programming languages for scripts (e.g., “All automation must be done in Python 3.9”).
- Why does it matter? These constraints dictate your tool choices from the very beginning. Knowing them prevents you from building a solution in QGIS and R when the client can only support ArcGIS Pro and its proprietary tools.
-
Project Management: Document any formal project management requirements. This includes deadlines and milestones, but also any specific formats for external documentation or reporting. While you must meet these external demands, your
semanticGIS
Design Rationale remains your non-negotiable internal tool for ensuring analytical rigour. -
Legislation and Compliance: You must identify any legal or regulatory constraints, such as privacy regulations (GDPR), data protection laws, or copyright on data that govern how you can acquire, store, and use information. In addition to the Legislation you should also consider Ethical Guidelines. This goes beyond strict “Legislation and Compliance.” Even if an analysis is legal, it may have significant ethical implications that you must consider.
- What is it? This involves assessing the potential social impact of your work. For example, will a crime hotspot analysis unintentionally stigmatise a neighbourhood and reinforce social biases? Does a site suitability analysis for a new waste facility fairly consider the impact on all communities?
- Why does it matter? As a researcher and consultant, you have a professional responsibility to “do no harm.” Documenting your consideration of these ethical dimensions is a hallmark of a mature and responsible analytical process.
-
Design and Branding: Identify any constraints on your final visualisations. Your client or organisation may have a design manual with specific rules for fonts, colours, and logos that you must follow.
Defining Your Intellectual Core: Your Analytical Goals
Now that you have established the practical boundaries of your project, you can define its specific intellectual core. This involves translating the brief into a focused, actionable, and achievable research plan that fits within the identified constraints.
Your task is to:
-
Frame the Problem: In your own words, write a clear statement of the research question or problem you will address.
-
Define Objectives: List the specific, measurable goals you will achieve to answer that question.
-
Establish Scope: Clearly define what is included and excluded from your analysis, including your geographical study area and any thematic or temporal limits.
The deliverable for this phase is a formal project scope that clearly outlines the external constraints and your specific analytical goals. This will be the first chapter in your Design Rationale.
2. Data Modelling: Defining the Characters in Your Story
Think of the Project Scope you just completed as the plot outline for a story. This Data Modelling phase is where you define the characters who will act in that story.
The process is a stepwise refinement. You’ll move from a vague idea of a character (e.g., “the main road”) to a precise, detailed “character sheet” that leaves no room for ambiguity. This ensures every piece of data you use has a clear and well-defined role to play in your analysis.
Step 1: Sketching Your Characters (The Universe of Discourse)
Your first task is to create a “cast list” for your story. This is your Universe of Discourse (UoD)—an initial list of all the key concepts, or “characters,” in your story (e.g., ‘building’, ‘forest’, ‘road’).
To give these concepts a concrete form, you must define each one using one of three fundamental character archetypes. These archetypes provide the basic template for how you will describe your data:
- The Object: This is the archetype for discrete, countable “things.” The template for an Object requires a well-defined boundary. Examples include ‘building’, ‘lake’, or ‘parcel of land’.
- The Property: This is the archetype for the measurable characteristics that describe your Objects or the space they inhabit. This archetype has two distinct templates:
- As an Attribute: A template for a single value attached to a specific Object. For example,
room_temperature
is an attribute of a ‘room’ object. - As a Field: A template for a value that varies continuously across space. For example, the ambient
outdoor_temperature
across a landscape is a field.
- As an Attribute: A template for a single value attached to a specific Object. For example,
This initial sketch—defining your characters using these core archetypes—is the foundation for the next crucial step: finding data that can bring them to life.
Step 2: The Critical Juncture - The Data Search
With your preliminary ontology in hand, your first and most important action is to search for existing datasets that can represent your concepts. The results of this search will determine the entire course of your project, sending you down one of two distinct paths.
Path A: Your Project is Based on Existing Data
This is the most common path. You’ve used the informal “character list” from your UoD as a search filter to find promising datasets. Your task now is a deep, critical investigation of those potential datasets, remembering that “The devil, as always, is in the detail”.
1. The Investigation: From UoD to DoD
The core of this path is to scrutinise the details of the datasets you’ve found. We describe these formal, detailed specifications of a dataset in terms of a Domain of Discourse (DoD). You can think of the DoD as the dataset’s official “rulebook,” which you’ll typically find in its metadata or technical documentation. Your investigation involves comparing your informal concepts (your UoD) with the precise rules of a dataset’s DoD.
Let’s look at the GeoDanmark example:
- Your UoD has the concept of a
'road'
for a traffic analysis. - You find the
vejmidter
layer in the GeoDanmark dataset. - Your investigation of the dataset’s DoD (its technical specification) reveals their definition of “road” includes not only existing roads but also planned and demolished roads.
This investigation reveals a critical mismatch. The dataset, in its raw form, is not fit for your specific purpose.
2. The Decision: Defining Your Project’s DoD
Based on the findings of your investigation, you must now make a strategic decision. This decision will, in turn, formally define your project’s DoD for that character.
You have three choices:
-
Accept: You determine the dataset’s DoD is fit for your purpose. By choosing this, the dataset’s DoD becomes your project’s official DoD.
-
Transform: You find the data valuable but flawed for your needs. You decide to modify it (e.g., filter out “planned” roads). Your project’s DoD is then defined by the transformation rules you create.
-
Reject: You conclude that the dataset’s DoD is fundamentally incompatible with your project’s needs, and no reasonable transformation is possible. You must then either search for other data or move to Path B: Creating New Data.
3. Data Sourcing and Preparation
In the Data Modelling phase, you finalized your DoD and chose your data strategy. Now, in the Data Sourcing phase, you will execute that strategy to build your clean and documented project database.
(The rest of your excellent, detailed text for this phase can follow here, as it’s already perfectly structured around the “Accept,” “Transform,” and “Record” strategies.)
1.2 Data Modelling:
The project’s Universe of Discourse (UoD) is the conceptual boundary of your analysis. It’s the complete, informally defined list of concepts that are relevant to your scientific question. These concepts are the names we give to the raw phenomena we observe. The concepts within our UoD can be used to describe two fundamental things:
- Objects: These are concepts describing discrete, countable things that are understood to have a well-defined boundary. Examples include the concepts of a ‘building’, a ‘lake’, or a ‘parcel of land’.
- Properties: These are concepts describing the measurable characteristics of our phenomena. A property, like ‘temperature’, can be conceptualised in two distinct ways:
- As an Attribute of an Object: A single value attached to a discrete object. For example,
room_temperature
is an attribute of a specific ‘room’ object. - As a Field: A value that varies continuously across space. For example, the ambient
outdoor_temperature
across a landscape is a field.
- As an Attribute of an Object: A single value attached to a discrete object. For example,
1.3 Data Sourcing and Preparation
Having defined your Domain of Discourse (DoD) and chosen your data strategy in the previous step, you must now execute that strategy to assemble a complete, clean, and project-ready database. This phase is the practical work of preparing the specific inputs for your analysis.
The specific actions you take here are determined by the strategy you chose.
1. Adopting an Existing Dataset
If you determined that an existing dataset’s DoD is fit for your purpose, this is the most straightforward path.
1. Acquire the Data: Download the dataset from the official provider.
2. Document Provenance: Immediately record the source, download date, version number, and a link to the original metadata in your project documentation. This is critical for reproducibility.
3. Initial Quality Check: Load the data into your GIS environment to ensure it is not corrupt, covers your study area, and has the expected attributes.
The outcome of this strategy is a “raw” dataset that is documented and ready for analysis.
2. Transforming an Existing Dataset
This is the most common strategy in practice. It involves acquiring a raw dataset and modifying it to conform precisely to your project’s DoD.
1. Acquire and Check: Perform all the steps from Strategy 1 first.
2. Develop a Transformation Recipe: Document the exact sequence of steps you will take to adapt the data. This recipe might include:
- Filtering rows: Selecting a subset of features that match your definition (e.g., selecting only roads with status = 'existing'
).
- Modifying attributes: Renaming or recalculating fields to match your schema.
- Geometric operations: Clipping the data to your study area boundary.
3. Execute and Document: Run the transformation process, saving the output as a new derived dataset. Your documented recipe ensures this process is transparent and can be repeated.
The outcome is a new, processed dataset that is perfectly aligned with your DoD.
3. Recording New Data
This is the most resource-intensive strategy and should only be undertaken when no suitable data exists.
1. Finalise Methodology through a Pilot Study: Before full-scale recording, you should conduct a pilot study to test your DoD and recording methodology in the field. This iterative process identifies practical issues and allows you to refine your ontology into a final, robust set of rules.
2. Record the Data: Carry out the whole data recording process (e.g., fieldwork, digitising from imagery) according to your finalised methodology.
3. Quality Assurance: Implement a quality control process to check for errors, inconsistencies, and missing values as the data is being created.
The outcome is an entirely new, bespoke dataset, created from scratch to match your DoD.
Regardless of the path taken, this phase concludes when you have a final, documented project database. This clean dataset is the official input for the next step, the Analytical Approach.
1.4 Analytical Approach
In this phase, you deconstruct the problem into spatial operations and analytical steps. Document which GIS methods and tools you use (e.g., buffering, density analysis, spatial joins), and why you selected these over alternatives. Discuss how your operations align with the problem as defined in your conceptual ontology.
1.5 Dissemination and Communication
Finally, you explain how you choose to present your findings. Consider who your audience is (e.g., the public, policymakers, clients), and how your choice of maps, visuals, or text will shape their understanding. Justify your medium (e.g., interactive map, printed report) and design choices in terms of clarity, accessibility, and impact.