Wednesday, September 03, 2014

Brainstorm verbetering Nationaal Georegister morgen bij geonovum

Helaas kan ik morgen niet bij deze sessie zijn, maar wil toch wel mijn gedachten hierover delen. De sessie is georganiseerd door Geonovum om bij gebruikers (u en ik) na te gaan wat er verbeterd zou kunnen worden aan NGR.

- Belangrijkste punt is natuurlijk: die interface die 5 jaar geleden (de ijstijd) hip was, die kan echt niet meer, het schaalt niet, te technisch, te complex enz. Goed nieuws, geonetwork 3 verschijnt eind dit jaar, interface is van scratch herbouwd in Angular/Bootstrap/Less enz. Dus het schaalt op mobiele devices, hip-by-default, maar belangrijker: veel eenvoudiger naar wens aan te passen. [noot van de redactie: NGR is gebaseerd op versie 2.9.x van GeoNetwork, een upgrade naar 3.0 zal een inspanning kosten (overzetten maatwerk-aanpassingen)]

- Daarentegen wel een belangrijke discussie is: waar positioneer je het register en wat zijn de belangrijkste use-cases rond het register?

- Zou je willen dat alle datasets (services, software, documenten, sensornetwerken) met een ruimtelijke component in het register beschikbaar komen, wat tot een overdaad aan (dubbele) informatie leidt (en daarmee een slechte kloon van google) of wil je bijvoorbeeld alleen landelijke datasets van overheidsinstellingen in het register beschikbaar stellen?

- Hoe positioneer je je ten opzichte van de open data portalen. Veel initiatieven in de geo-catalogussen-sector richten zich op het nabouwen van een eenvoudige CKAN achtige interface (of gebruiken ckan als interface op geonetwork). Iemand anders stelde echter juist dat we ons (als geo) juist moeten onderscheiden door de kaart centraler te zetten. Als mensen een eenvoudige interface willen, dan gaan ze maar naar http://data.overheid.nl. Een kaart georiƫnteerde catalogus begint met een kaart en via een zoekinterface kun je kaart-data aan die kaart toevoegen. Maar de kaart kan ook datasets suggereren op basis van aangeklikte thema's en de locatie (en periode) waar je op ingezoomed bent. http://map.geo.admin.ch/ is hier een heel mooi voorbeeld van. De waarheid ligt waarschijnlijk in het midden, ik zie meer in het aanbieden van 2 zoekingangen, de tekstuele manier en de kaart manier. De ontwikkelingen in de aankomende 3 versie van geonetwork gaan gelukkig die kant op. 1 punt in dit kader is hier helder: de mapping van iso19139 naar DCAT moet beter! Hopenlijk gaat het 'IPM dataset', waar binnenlandse zaken mee bezig is dit proces faciliteren. Ook moeten geo-dataproviders zich beter realiseren dat hun data ook via andere portalen ontsloten zal worden en dat zij mogelijk hiervoor hun data beschrijving (en data formaat) hierop moeten afstemmen.

Een belangrijke use-case rond het register is voldoen aan INSPIRE wetgeving, dit wordt misschien te weinig bij de eindgebruiker onder de aandacht gebracht, waardoor de gebruiker terug schrikt van de gebruikte terminologie. INSPIRE heeft mijns inziens ook een beetje ongelukkig de term view en download service bedacht, een niet-geo gebruiker kan hierdoor op het verkeerde been gebracht worden door te denken dat met zo'n service eenvoudig data te downloaden is, het omgekeerde is waar. Door gewoon de technische term WFS te hanteren voorkom je dit. Overigens is het goed mogelijk om met datapipes veel van de technische formaten toch eenvoudig beschikbaar te maken als json/csv (met uitzondering van sommige complexe GML en basisregistratie schema's).

Ook veel gehoord is de discussie: is ngr voor de geo professional en pdok-kaart voor de semi-pro? Het zou een keus kunnen zijn het zo te positioneren, echter het ngr omvat wel veel meer data dan dat via pdok beschikbaar is (waterschappen, rdw, rce, tno, rivm, knmi, defensie, provincies enz publiceren niet via pdok). Maar je zou bijvoorbeeld in pdok-kaart een eenvoudige catalog-search in kunnen bouwen.

Een andere verbetering in het NGR zie ik in mogelijkheden voor linked data. Het huidige dcat-rdf endpoint van ngr bevat nog teveel bugs (mede door een slechte iso19139 naar dcat mapping). Dataproviders zouden daarnaast een eenvoudige interface moeten krijgen om semantiek aan de attributen van hun csv/shapefile toe te kennen (of natuurlijk direct een rdf dataset of endpoint registreren), je kunt de dataset daarmee relatief eenvoudig omzetten naar json-ld (een rdf encoding). Vervolgens zou een sparql endpoint ingericht kunnen worden op het ngr wat zowel de metadata als data doorzoekt. Daarmee kan dan in 1 keer de meest-simpel-te-vragen-maar-meest-uitdagend-te-beantwoorden geo vraag gesteld worden: "wat legt de overheid allemaal vast rond mijn huis?" Ook google profiteert, want die kan de data semantischer indexeren.

Een ander punt is het spelen met data, wat de open data portalen erg goed oppikken (bv socrata), maar in het geo domein ook cartodb. Het aanpassen van legenda's, het clusteren, filteren, animeren van data (wms is achterhaald, wmts alleen voor achtergrond data, wfs en data bestanden heb je nodig). En waarom altijd een kaart? een grafiek is in sommige gevallen veel sprekender. Zo'n visualisatie beschrijf je, sla je op in de catalogus en deel je met je collega's, die dan ook commentaar moeten kunnen geven.

Nog een puntje, ik zie graag betere support voor sensor en IoT standaarden (sensor locaties op een kaartje, sensor aanklikken, grafiekje met meetwaardes). En hoe gaan we 3D visualiseren in NGR? OpenLayers 3 is geen beperking...

En de homepage mag een tikkeltje aansprekender... Het moet in 1 blik duidelijk zijn dat je op het coolste geo portaal van Nederland aangeland bent, what's new, twitter feed, aantal datasets per categorie enz
 
Een "for developers" sectie was een goed initiatief, maar dient verder uitgewerkt te worden. En beste developers, NGR=GeoNetwork, dus als je een bug vind in NGR, registreer (en fix) m dan in de GeoNetwork-Github, dan rolt ie op termijn automatisch door naar NGR. "stable-develop" is de brach die uit gaat monden in de Geonetwork 3 release.

Ik wens jullie een toffe dag toe morgen



Monday, September 23, 2013

#Foss4g13 wake-up-call

At #foss4g13 last week... After friday evening's entertainment (the movie blues brothers) I started with this wake-up-call in the auditorium on saturday morning ... (I guess some of it should soon be available in the conference recordings)

Good morning ladies and gentlemen at the Nottingham Campus Auditorium. We're so glad to see so many of you lovely people here at this time, and we would especially like to welcome all the representatives of the local OSGeo Community who deserve you're credits right after this show. We do sincerely hope you'll all enjoy the show, and please remember people, that no matter who you are, and what you code to live, thrive and survive, we chose a license that make us all the same. You, me them, everybody.

* based on works registered at Universal Studios

Wednesday, July 03, 2013

The internet of things and geo services



Today I visited the concluding conference of a linked open data pilot in the Netherlands (http://pilod.nl). In this blog some thoughts on the matter from a (spatial) catalogue perspective. In the last years a lot of energy has been put in publishing open datasets on the web using open geo standards like CSW/WFS (triggered by regulations like Inspire, basis registraties etc). Unfortunately, due to their nature, geo-services can hardly be used in the linked open data web. As geo community I think we should investigate how to support the internet of thinges from our existing services. It's just a shame if they are left out. Or linked data might even bridge the gap that quite some experience between the geo community and regular ICT

CSW/WFS to RDF
Most geo data is tagged with metadata these days. In the metadata references are made to contact point info and keywords from skos thesauri (like gemet). If we'd be able to present this metadata as RDF along with the spatial data itself, the data will link with exisiting resources in the internet of things instantly. Geonetwork does have a basic RDF-endpoint-implementation these days (/srv/eng/rdf.search, funded by EEA), the actual challenge is in creating rdf output from the WFS services (and linked shapefiles) out there and link to that from the rdf-metadata. A potential solution would be to extend products like geoserver and mapserver, which do have support for a range of output formats for WFS (like GML,KML,SHP,CSV,JSON), with an RDF output format. From the RDF metadata one could then link to a WFS-getfeatures request in format application/rdf+xml. Spiders would be able to index the data and use it as linked data to support GeoSparql queries.

Implementing rdf support can be quite easy if it would be a pure xslt transformation (gml to rdf). However RDF is far more usefull if additional items are added to each getfeature-response-document. Items like contact point and keywords (as defined in the service definition) link the record to other resources on the web.

iso19110
An OGC practice described in iso19110 (http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=39965) but unfortunately not commonly implemented is feature catalog metadata (for example http://sdi.eea.europa.eu/catalogue/srv/eng/search?uuid=99b05c69-1532-48b8-bfeb-4cb0df539c88). This could become an essential link in creating RDF from WFS/CSW. ISO19110 describes the attributes used in a dataset. From this metadata references can be made to common (linked data) vocabularies, which then (hopefully) can be used to convert the GML to optimally linked RDF.

Fixed location
The internet of things is based on the fact that each resource on the web has a unique location (url), which will not change over time. In gis servers a record or feature is accessed via a getRecord/getFeature request. One can imagine a scenario that you use this type of requests as unique url for the resource. However, this url is actually not that fixed. For example the order of attributes in the request can change, it can be a post request and the format parameter is explicitely defined in the url (in stead of as an accept header, generally used in open data content negotiation). But more challanging is the version parameter of the standard, after a while a new version of the catalogue standard will become available, and the old standard might get deprecated. The url of the request will change accordingly. This could be managed by adding additional links in the referring document. A link could be added for each supported version, format, language, spatial reference system etc...

Geonetwork as a gateway
Considering above drawbacks it might be an idea to extend the functionality of Geonetwork to act as a gateway between geo and rdf. Geonetwork has all of the required iso19115/iso19110 and links to the actual downloads and services available. It could (daily/hourly) harvest all of that data and convert it to RDF quite optimally. Some additional advantages: a search query in geonetwork will include a search into the actual data and it would solve the uri-strategy for the geo domain instantly: All dutch geodata (PDOK) will for example be available at:
nationaalgeoregister.nl/geonetwork/dataset/{namespace}:{uuid}/record/{id}
For sure some exceptions should be made for frequently changing data like sensor services, and the processing should be delegated to other nodes in the cluster.

For sure above approach will have legal limitations (who is responsible for the data/service). In the long run each organisation will need an RDF endpoint (and register that endpoint in the iso19139 metadata?). But my proposal can offer a temporary solution for organisations which are not ready yet to do the full RDF implementation.

Read more
A search on the web learned there are plenty of initiatives in this area, most important the FP7 Geoknow project (http://geoknow.eu). I'm curious for their results.
Also check "Opportunities and Challenges for using  Linked Data in INSPIRE" by Sven Schade and Michael Lutz http://ceur-ws.org/Vol-691/paper5.pdf


Monday, July 01, 2013

Open data Hackaton Amsterdam

Last saturday I visited the Open Data Hackathon in Amsterdam organised by "HackdeOverheid" (http://www.hackdeoverheid.nl). It was my first visit to such an event. I really liked the vibrant atmosphere, but in the days after some thoughts on it returned to me frequently which I would like to share with you here.

Identifying OpenData
These days it's quite popular at governments to publish data as OpenData. For sure any data published is interesting. However quite some datasets published are aggregated in some way, which makes it less usefull for usecases the publisher hadn't thought about it could also be used for. Sometimes the aggregation is done to facilitate developers, but most of the times it's done for other reasons like to protect the privacy of the persons mentioned in the data.
For example at this hackaton a dataset was presented by SVB (http://www.hackdeoverheid.nl/voornamen-data-beschikbaar-voor-apps), they summerized the given firstname over the whole country, which limits the dataset for a single purpose: firstname-popularity. If they had aggregated to street/postode/area (or not at all) people might have used the dataset to relate name-giving to region or even economical status.
Which leads me to a suggestion to publishers, provide us with the raw data please. Offer aggregations as a separate download.

Open standards
At the event there were frequent requests for open formats like CSV/JSON/TXT, also people requested API's. But there was not much awareness on open standards. At such a point I always realise that as a geo community we're quite far in development and implementation of open standards. The risk of every organisation implementing it's own formats and API's is that a data miner should develop specific format conversions for each organisation he wants to extract data from. Think of the dutch communities, we have some 250 communities, if they would all develop a specific api on their data, it would be very hard to extract similar data from all those api's. Quite some people are aware of this riks, that's why government develloped "basisregistraties": indications on how to store and comunicate data on certain thematic area's, to be implemented by for example all communities. And quite important for the Open Data movement, since most of the data available via the "basisregistraties" will be open data. A first example of this is "Basisregistratie gebouwen", a dataset (+ soap and WFS api) which contains all buildings in the Netherlands.  Ok, this is not a simple json-rest API, but hey, we're developers, we are not afraid of a little XML. My colleague pointed me on http://5stardata.info/ where indeed complying to unified data models in not mentioned as a star, they point at linked data as a way to go. Which indeed might be a better pattern to interact with data from different origin. But afaik is quite experimental at most organisations.

GeoJSON in GitHub
Recently the Github team added GeoJSON support in Github. Uploaded GeoJSON files are displayed as maps on a nice backdrop using LeafletJS. Since then people started uploading masses of GeoJSON files, also in preparation of this Hackathon. For sure there is the risk this will be a single action, and the data will soon be outdated, but if done correctly it could mean a real change in how we're used to publishing data. Imagine:
- An automated proces updates the GeoJSON data in Github every... In the history of git you can then see very nicely the historic development of the dataset.
- You can fork the dataset, reformat it and publish it again, or even open a pull request for data optimisations
- To make the data accessible in traditional GIS you could add a WMS/WFS server which uses the Github GeoJSON as input-storage (using OGR)
- In the end people will love Git as storage and will introduce Git servers in their own organisation as master storage and just clone to Github.
Related to this there is a proposal by Max Ogden & OKFN https://github.com/maxogden/dat and another proposal by OpenGeo https://github.com/opengeo/GeoGit. Today I noticed a blog by Rufus Pollock on the matter, http://blog.okfn.org/2013/07/02/git-and-github-for-data, amazing to see the movement on this theme these days

OGC vs best practices
The last thought is on OGC vs best practices standards. These days we see projects like MapBox, CartoDB, LeafletJS, GeoJSON being very popular, but dissociating from OGC standards.
For sure they use conventions between the products (epsg:900913, TMS, GeoJSON), but those conventions are a result of best practices in the community and not designed in a highly technological and political process at OGC. These best practices standards are  light weight, focus on performance, are much easier to implement, widespread and offer a more fluent user experience than any application using OGC standards. OGC should really focus on these leighter standards. We are at a point that data distributors and proprietary spatial software implementors get much pressure from users to also support the best practices standards, resulting in the fact that the best practices standards are widely implemented in both open source and proprietary systems without having been adopted by OGC.


Friday, June 21, 2013

which OGC standard to use to recieve info from mobile phones being used as sensors

We're cooperating in an european FP7 project called COBWEB (http://cobwebproject.eu) on enabling citizens to contribute nature observations in biospheres. From an interoperability aspect it's quite usefull to use an OGC standard to publish observations from the registered phones. But which one to use, there's a couple of them out there. I did some research, but I'm still not sure. Maybe this research helps others, or comment if you have additional ideas on the matter


Most prominent is the openLS standard, which is actually a set of "core" services around location. One of them being the location service, where a client registers with a server and the server sends out requests for postion at regular intervals, the client then responses with it's location. I'm not sure though, if such a model would work on mobile phones. I think a phone can't have a listener for incoming requests like this (what if it's stand-by or offline at that moment)? You can probably run some webserver like i-jetty or nanohttpd, but i guess it will drain the battery if you keep it alive constantly.

GeoSMS offers location info over SMS, which is a rather costly method, but could work out on phones without internet access.  

NetCDF is a set of formats for creation, access, and sharing of array-oriented data. NetCDF is quite optimal in size and can be accessed with the thredds data server using OGC:CSW or OpeNDap. Seems not very optimal for single sensor transactions. Quite strong in aggregations. NetCDF is mainly used in oceanography and meteorology sensing  

SensorWeb  
SOS/O&M: standards for querying and inserting sensor observations and registring sensors (might be usefull for mobile use, although seems to follow same path as OpenLS in that the server takes initiative to read the sensor, and i'm not sure if this is possible on a mobile phone)  
SES (event service): register for certain events, and get notified if event occurs (might be needed in some usecases)  
 SPS (planing service): can be used to query if a sensor is capable of performing a task and assign the actual task (usecase: send a message to all registered phones: "who is in neighboorhood of xxx,yyy", if yes: "please go to xxx,yyy") 

Geopackage is mostly a database for use on a mobile phone in case of lack of network connectivity. Users can store their measurements in the local database and when back online use the data from the database to trigger posting of observations. Also users might cache image- or vector-tiles for an area so they still have a map while being offline. Geopackage offers some nice options over using plain file or sqlite storage, but comes at a price (extra dependencies for the app).  

WFS-transactional can be used to upload measurements in GML format  

WPS can be used to do any other request/response type even asynchronous, can be used to upload measurements

Conclusion:
  • Most suitable would be SOS or OpenLS, but i wonder if they'd fit in a usecase where a phone takes initiative to send observations as soon as it's back online.
  • If above doesn't work out WFS-t or WPS seem most appropriate
enable citizens living within Biosphere Reserves to collect environmental data using mobile devices - See more at: http://cobwebproject.eu/#sthash.tBOqUwad.dpuf
enable citizens living within Biosphere Reserves to collect environmental data using mobile devices - See more at: http://cobwebproject.eu/#sthash.tBOqUwad.dpuf
enable citizens living within Biosphere Reserves to collect environmental data using mobile devices - See more at: http://cobwebproject.eu/#sthash.tBOqUwad.dpuf
enable citizens living within Biosphere Reserves to collect environmental data using mobile devices - See more at: http://cobwebproject.eu/#sthash.tBOqUwad.dpuf
enable citizens living within Biosphere Reserves to collect environmental data using mobile devices - See more at: http://cobwebproject.eu/#sthash.tBOqUwad.dpuf