Critical Technology: 2021

Friday, April 09, 2021

The Leanstack Way

The Oceans of Data Lab is honored to be a part of PropelICT's startup accelerator. We had our kickoff meeting a few days ago and the current focus is on learning the Leanstack methodology and using lean canvas to tease out ALL the important details for success. The inline supporting learning modules that are available through leanstack are very helpful.

The Lean Canvas - numbers indicate the order of completion

I feel fortunate that I have been familiar with Agile and Lean approaches for over 20 years. I've got two favorite sayings I use when running software teams, and I like to think I run many aspects of my life with an Agile / Lean mindset.

Ship and ship often (deliver new releases as often and frequently as possible)
Fail and fail often (take risks, innovate, don't apologize, keep moving), success comes from failure.

For me the use of Lean in startups all began with Eric Ries when I watched a YouTube interview of Eric conducted a decade ago. This interview became a part of a 2011 blog post where I describe lean approaches within the Director of Technology role. Since this time I have revisited the works of Eric Ries every few years, he has a lot of useful insights to lean startups. One of my all time favorites in the talk he gave at Google 10 years back.

Google Talk: Eric Ries and the Lean Startup

Thursday, April 08, 2021

ODL Newsletter - March 2021

The Oceans of Data Lab (ODL) monthly newsletter is also finding its footing. It is still going to include monthly updates to the progress we make AND it will start with a few articles of interest within the data labs technology world. I am discovering so many interesting technologies and approaches within the data realm. I'm going to fold my 30 years of data experience into why I believe these are of interest to those working with large amounts of data.

Apache Data Lab

The Apache data lab that comes from the same organization that has brought us so many of the important technologies over the years. And specifically, to think of all the big data technologies they have delivered in recent years... there are just to many to list. What I like most about the data lab is its ability to be deployed to the big three cloud hosting environments. Super smart given the storage and compute requirements for data projects shouldn't be the responsibility of Apache.

DataOps and the DataKitchen

DataOps is a fairly recent concept / term that is about seven years old... and it makes sense that it becomes a discipline in itself as it is not DevOps for Data, it is so much more. The DataKitchen looks to be doing some amazing work in this capacity and have published a good read to help get your head into this important and emerging technology space.

I'm another 3 people into working towards my 100 conversations. It is said that you need to have 100 conversations as you solidify your business / startup idea. So I managed to get another three conversations in. I know this isn't that many, but that's ok as this month was more about setting up technology and thinking about risk, revenue, and the escalator pitch for the startup. I still need to talk with people, and I need peoples help, always. If you know anyone who works with analyzing data or works for a business that has a growing interest in their data, I'd love to talk with them.

What has changed this month?

Over this month my thinking has broadened and become more focused on the needs of organizations and their data. No real pivot, but clarifying what the business will be. The changes fell into three main themes;

A broader interest in helping people with their data. The backstory to my career has always been the information technology around the data. For 30 years I have focused on managing, moving, and building software for the data. This will continue with the data lab. We are still interested in ocean data and a reference architecture for the digitization of oceans, these subjects will become part of the bigger data lab.
It's a Data Lab. It became very clear this month that what I was wanting to do is stand up and run a data lab. I had a great conversation with Graham Truax at Innovation Island and this identified the alignment with my accelerator pitch and the data lab concept. After I re-read my proposal (and subsequent acceptance) to the PropelICT accelerator I confirmed... the startup is focused on creating a data lab with related products and services.
Start with a services focus, rather than product. We need revenue and the data lab is not a small product with a near MVP that can generate revenue. There are a number of MVP's that could bring business value for our customers, but nothing with significant revenue possibility. So our focus needs to be on services where we can leverage the skills and knowledge of the founder and identify projects that align well with the overall vision for Oceans of Data Lab.

It's been a business and technology focused month

This really was a more technology focused month. It was getting all the infrastructure in place to have the lab, fetch some data, and display a basic analytics dashboard. So while setting things up, we weren't that focused on reaching out to potential customers.

Setting up servers
Installing, configuring, and securing analytics software (ELK stack)
Identifying and registering domain names, setting up websites.

What are the risks and assumptions?

We also thought about what are the business risks and what assumptions are we making that could work against our success. We are not going to get into these in detail, writing them down and publishing them helps attract attention and hopefully getting the feedback we need to reduce the risk and prove or disprove the assumptions. We are also focused on what can be a product rather than what is a service.

Assumptions

Companies / Organizations will participate in a publish - subscribe business model for data sets
The data lab concept for preparing data sets for publishing will become accepted by SMB

Risks

MVP doesn't generate enough revenue or provide business value
Primary founder having knowledge, energy, or bandwidth to keep up the pace
Finding skilled employees with deep understanding of data engineering
High cost of cloud based infrastructure

Where is the revenue?

The transactional costs in the publish and subscribe (every data set transaction earns money)

this is definitely my riskiest assumption

SMB pay for services in preparing the data sets for themselves and the marketplace.

does the rise of the data engineer role show a willingness to pay for data preparation

What do we consider our Escalator Pitch?

These are early times and we don't yet have a story to tell. Gak! The escalator pitch is hard, and we really don't know what we are doing when it comes to an escalator pitch.

We help SMB realize new revenue possibilities from their existing data.
We reduce the cost of data preparation for their internal analysis and business intelligence.
We provide the services and technology to help you make sense of all your data.
We make it easy for you to see the value and opportunities based upon your unique business data.

Next Steps:

We need to focus on the customer. We need to find the customers and talk to them.
We need to reduce our risk and prove, or disprove, our assumptions.
We need a technical platform to host an Minimum Viable Product (MVP). I need to identify and prioritize a few MVP's.

If you find the Data Lab an interesting idea or have the need to bring greater value to your existing data, please feel free to contact me. We are building a business and we want to help you bring greater value from your data.

Tuesday, March 23, 2021

It's Alive! The Elastic Stack as our Data Lab

So much technical work, so little time! I finished my first three sprints toward standing up the data lab. Standing up infrastructure from scratch so you have clean new compute power is fun, and also a lot of work. Particularly when you include; doing it right, taking no short cuts, and making sure it is secure.

Sprint 0: Setup Ubuntu 20.04 Server with ELK stack.

This was mostly rehydrating virtual server infrastructure I hadn't used in 8.5 years. It needed an upgrade from all perspectives and had a completely new OS. I implemented the ELK stack and made a couple of security changes to lock it all down. I ran a few tests by setting up a couple of websites, getting the JSON confirmation from ElasticSearch, and called up the Kibana dashboard. Oooo... sweet success!

Sprint 1: Vulnerability Assessment. Security changes if required.

This evening I spent some time poking at the overall vulnerability of the server and with the ElasticSearch and Kibana services. I made a few additions and changes for further locking down the services and believe they are as secure as they can be for this first release. Very happy to feel reasonably confident about it's being locked down. Maybe, I'll get lucky and get some free PEN testing. ha.

Sprint 2: Identify and register some well aligned domain names.

I registered the following domain names, even considered buying one... it would have been too expensive. I'll implement the data lab on the oceansofdatalab.com site when it becomes closer to being a minimal viable product (MVP).

oceansofdatalab.com
oceansofdatalab.org
oceansofdatalab.net
oceansofdata.net
sevenseasofdata.com
sevenseasofdata.org

Thursday, March 18, 2021

ODE Data Lab has its technology footing

This month has become more about standing up technology than it has been talking to people about their ocean data needs. That's ok... if you are building a technology company, you need to build technology. Ocean of Data Endeavours (ODE) is about building and utilizing software towards making it super easy to work with data, large amounts of data.

The last 10 days have been about refreshing a server infrastructure I stood up 12 years ago for a number of other projects. What was left was a couple of simple websites, some domain hosting, and all the related mail server infrastructure. All of this needed a complete refresh to be brought up to date;

Rebuild the server infrastructure to have more horsepower. - DONE
Upgrade the Ubuntu OS from 10.04 (Lucid Lynx) to 20.04 (Focal Fossa). - DONE
Rework all the domain aliases to remove dependency on a domain I no longer owned. - DONE
Do some basic security work to the server. Mostly SSH focused. - DONE
Create a new mail server, and do some mailbox maintenance. - DONE
Install Apache2 httpd host. - DONE
Configure Apache2 for a couple of web sites. - DONE
Code some basic HTML to confirm the sites are working. - DONE
Celebrate! http://endeavours.com/

So good to have all this done. The server will provide a strong foundation and is well prepared for the ELK stack and the first load of ocean data. So excited!

Tuesday, March 16, 2021

An Important difference between DevOps and DataOps

Where DevOps is automation, technology, and delivery focused; DataOps is more customer focused. I like these descriptions from Wikipedia for DevOps and DataOps;

DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. https://en.wikipedia.org/wiki/DevOps

DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations. https://en.wikipedia.org/wiki/DataOps

The similarities between these two are many, particularly from a process and automation perspective. I see DevOps really focused on delivering quality software, and DataOps focused on delivering visualized data analytics to the customer.

Customer focused DataOps assists with Agility

Having a customer focused data analytics team fits well with an Agile approach. The data analytics team needs very involved customer analysts (or product owners). The customer analyst identifies the KPI's, models, or intelligences that need to be fulfilled. These become part of the backlog, and as new sprints are defined they become focused on the item(s) of analytic. A sprint can be built around a few analytics, then iterate around the items for a DataOps sprint;

Where is the data? How do we get at it?
How do we best move it? How often? What are the security or privacy issues?
What needs to be cleansed or transformed? Is the data at the correct granularity?
Do we already have any related data to improve the intelligence? Is this a new build or do we use / alter an existing pipeline?
What models or analytics do we apply?
How do we best visualize the data?

Not to say that DevOps can't fit well within Agile approaches, it can.... the backlog is more technically focused and fits into the sprint more from a continuous perspective than a customer perspective. (What DevOps features go into a sprint are often negotiated with the product owner). The focus of DataOps is in shipping features that fulfill a visualized analytic or more... The focus of DevOps is in CD / CI...

https://medium.com/data-ops/dataops-is-not-just-devops-for-data-6e03083157b7

This approach worked well for us when working on a Business Intelligence project and our nine week sprints usually focused around 4 to 9 KPI's. The organization was in aerospace, they had many legacy data sources with new data sources coming online. As with many organizations, they were in a state of improvement and transformation. Fitting new cubes, representing KPI's, into sprints allowed us to show progress and success. The biggest challenge wasn't in the technical or delivery side of getting the data to the customer. The challenge came in developing a data team where every team member understood the process end-to-end and the efforts required during each step of the DataOps pipeline. Acquiring, cleansing, and transforming data takes as much effort and understanding as visualizing the data for the customer.

Saturday, March 13, 2021

For Contract Database Administrator

Do you require contracted database administration? Medium to small organizations using database technology to store corporate data definitely need database administration to care and feed for there database technologies. This care and feeding includes, and is not limited to;

Install and maintain database servers.
Optimize database security.
Build and maintain ETL pipelines.
Performance tuning of databases.
Storage optimization for databases.
Implement DataOps for up-to-date business analytics and its need for continuous data.
Install, upgrade, and manage database applications.
Create automation, and schedule, repeating database tasks.
Ensure recoverability of database systems.

If you require any, or all, of these database administration task we can help. With over 30 years of database experience complimented with a technology degree in database management we can work remotely to keep your databases healthy and reduce business risk. Part-time or full-time, reasonable rates.

Friday, March 12, 2021

Data Challenge Panel Session: The Power of data

I listened in on this presentation about the power of data. In particular, having Susan Hunt from Canada's Ocean Supercluster as one of the presenters. All the presenters had very valuable insights. It was an hour well spent!

The Session: The Power of data

Description: Come learn with us. Our panel guests are engaged in transformational data products and projects. Together, we’ll learn what opportunities they are creating when using the newest technology to exploit the power of data. And we’ll talk about careers. The opportunities are limitless.

Moderator: Cathy Simpson | Chief Executive Officer, TechImpact Panelists: Susan Hunt | Chief Technical Officer, Canada’s Ocean Supercluster Justin Kamerman | Chief Product Officer, Instnt Inc. Jason Lee | Partner, MNP Technology Solutions

Items for my follow-up:

A number of subjects sparked my interest from the presenters discussions. I believe these are the three that need follow-up from the current ODE perspective:

Building Models for Data, or transforming Data for Models
What is the business case for the datalab / data workbench?
What is the current state of DataOps?

Peter, we need to integrate data more easily.

I liked this article on Fundamental truths when it comes to innovators. I do think Elon Musk and Jeff Bezos have figured out how to be crazy successful in business. I do wish they were more philanthropic with the monies they have accumulated from their success. I digress...

I like the idea of a fundamental truth as a foundation for your business that doesn't change through time. And when I think about my commitment to build a data services business, I have now started to think about what would be the fundamental truths?

We need to access and integrate data more easily.
We need better ways to visualize, communicate, and understand, the data.
We would like to reduce the compute and storage costs of data.

These are what I came up with through my review of my initial thinking of fundamental truths. I know there will be a few more and these three will be edited as my idea grows and gets greater footing.

Monday, March 08, 2021

Building the Data Lab Technology Stack

The idea of building a data lab is emerging from my ocean data conversations and how to best utilize my knowledge and skillset within this opportunity. In my mind, the service offering would be twofold;

Data Engineering / Software Development consulting and services with focus on ocean data. We will do the heavy lifting of extracting, cleansing, transforming, and loading your data. And then we will help with analysis and visualizing the data. We are comfortable working in both the open source and Microsoft technology stacks.
Standing up (and data loading) the technology stack for the data lab. You are going to need to host all this compute power and storage somewhere. It could be on-premise. Most likely, it will be in the cloud. We can help with this also. We could build it in Azure, using the Microsoft technology stack. Or we could build it using an Open Source stack on top of Linux in any of the hosted environments of Azure, AWS, or Rackspace.

https://stock.adobe.com/

How do you build a low priced, large compute, technology stack to support data engineering efforts, implement a data lab, and showcase these new services capabilities. The low price is the key factor given the current startup state of this ocean data endeavour. Particularly, when you think of the cost of compute for processing and storing large amounts of data. I believe the the best way forward is as follows;

Use open source where you can. Fortunately, many of the infrastructures, tools, frameworks, and programming languages for the data lab are open source.
Automate the build so it can be built and torn down with ease. This would eliminate the need for the stack to always be running.
Store the data at it's source, if possible. Fetch, and load, the data when you automatically rebuild the stack. Keep in mind this limits the amount of big data you can store locally, and loading large amounts of data can cumulatively take days. Be mindful of this.

Note: this stack is to showcase the services capabilities. A full data lab would also need the ability to both persist and fetch data. It's going to take some time to build the data lab!

The Data Lab Technology Stack

The deployment of this technology stack will use open source wherever possible running on a Linux (Ubuntu) Server hosted at Rackspace. The rational for these decisions are;

Little to No licensing costs
Strong familiarity with Rackspace as hosting company
Existing domain name (endeavours.com) hosted with Rackspace
Extensive experience with Ubuntu Linux in a hosted environment
Familiarity with deploying data intensive solutions using the ELK stack
Experience programming in Python

Note: The deployment of this technology stack will happen in phases, where each phase will complete with some basic tests to ensure the stack behaves as desired.

Phase 0

Phase 0 will be a basic ELK stack running on an Ubuntu Linux server hosted at Rackspace with access via the endeavours.com web domain. The use case for where the data comes from, how we transform it, the analysis, and visualization is still to be determined. This use case will be used for testing this first iteration of the newly stood up data lab. Exciting times!

Phase 1

During phase 1 we will add the Python programming language to the technology stack and use it for two purposes;

Apply a model to the data using Python.
Present the processed data to a web page for display.

Phase 2

During phase 2 we will add Kafka as an infrastructure resource, identify some additional data sources, and pre-process the data before it gets loaded into ElasticSearch.

Phase 3 and beyond

Investigate the Apache Data lab stack, add Spark to our lab, add a data workbench...

Friday, March 05, 2021

ODE Newsletter - February 2021

I'm 7 people into working towards my 100 conversations. It is said that you need to have 100 conversations as you solidify your business / startup idea. So this is where I am, seven conversation in. If you know anyone who works with ocean data or works for a business that has an interest in oceans, I'd love to talk with them.

Given the time restraints of being deep into a large data / database migration project, I consider February has been a good month for conversations. It provided me a good view into the horizon of ocean data. I followed the conversations that were presented to me without me directing the focus. For this is the first month, and I have yet to gain clarity of the gaps of where I need more information. This makes sense given I am at the beginning and don't know what I don't know. Now that it is the end of February I have identified the need to talk with customers of ocean data. This could become a focus for March. The conversations for February unfolded in the following order, with the following summaries and highlights;

PropelICT (https://www.propelict.com/)

I reached out to a past co-worker in a leadership position within PropelICT. PropelICT is an Atlantic Canada e-accelerator for tech startups. The conversation was very encouraging and initiated my application to their April cohort. Looking forward to their support in the coming months (and years).

Highlights:

The idea of 100 conversations.
My first suggested conversation contact.
Being a candidate for their e-accelerator.

eOceans (https://www.eoceans.co/)

I spoke with one of the principals of eOceans. Time very well spent, Thank-you! So many details to be digested from this conversation. This organization clearly understands ocean data and where it intersects with social media! A bulleted list seems the best to call out the highlights;

There are many open standards and organizations working in this space. The data standards seem to be "standardizing" and there are many organizations working toward bringing the data standards together. More open organizations are contributing than the closed proprietary types. CIOOS is the standout for Canada. EU and US are much further down the standards and open data path than Canada.
Both ends [(data storage and end-points (IoT)] of the data collection are well serviced with lots of business and startup activity. It's the middle were the greater opportunity exists. It's with the data integration with consideration for all the standards and granularity. "It would be nice to dust off a 10 year old data set and be able to easily use it".
Working with ocean data initiatives is very project based and finding the revenue sources / the business model for an open reference architecture for the digitization of oceans could prove difficult.

Highlights:

Many open organizations already working in the ocean data space.
The business side of what you are exploring (reference architecture) may be difficult, so much work is project based and gov't funded. A reference architecture seems like an NGO or consortium kind of thing.
Middle ground of software and data integration could be a big need given my skillset.

Mentorship

Super fortunate to reconnect with an older friend who has loads of experience; small devices, programming, data, startups to a favorable exit, machine learning, etc... many skills that align well with what I am doing. And on top of all this, I really enjoy the meandering conversations we share!

The one area where there is a strong overlap towards my ocean data focus and the mentors previous experience with the integration of data. And yes he confirmed, integrating data from different devices to a common standard is a lot of work for creating a single view into a broad data realm.

Highlight: He agreed to provide me mentorship within this endeavour. So great!

New Brunswick Ocean Strategy: Our Opportunities in the Blue Economy

This was an excellent online conference put together by the Ocean Supercluster. What I did most was listen, and a good thing too... I have so much to learn. I really liked the breakout sessions where there was more individual participation. Some names, and acronyms are becoming more familiar too me.

Highlight: A small list of contacts I could reach out to. All good!

TechNL (https://www.technl.ca/)

I spoke with one of the leaders in TechNL and we talked about what I am wanting to do with data, in particular, ocean data. The conversation pointed towards two relevant contacts;

Petroleum Research of Newfoundland Labrador (PRNL) and how they are always focused on innovation within the Oil and Gas.
OceansAdvance which is focused on the Newfoundland and Labrador Ocean Technology sector.

Highlight: That if I am going to be successful in this endeavour I am going to need partners. The time required for setting up an organization isn't the best place for me to be focusing my time at this stage of the startup. And given the nature of this startup needing to work in the open, the partnership route may be the best way to go...

Canadian Integrated Ocean Observing System (https://cioos.ca/)

So fortunate to have the attention of two CIOOS employees! They were so gracious a provided a broad and deep amount of information regarding the state of ocean data. Super helpful! CIOOS clearly knows the data. The best way to summarize my conversation is by including the important questions and there answers;

With ocean data where is the greatest pain?
Resources as in financial and skills / knowledge.
At the more general project level; governance and the people who know how to organize and stewardship data through its lifecycle. This is more a reference to the industry in general... it's a project issue. And having the ability to integrate with a project that happened years ago...
Do open data standards have an influence?
Absolutely! There are many references to open data. Most of what we deal with are open.
How easily integrated are the existing data sets?
It’s getting better. It can be difficult to get an older data set and want to integrate it. These older sets often lack the granularity or metadata that makes it easier to ingest. There is a definite need here at a project level. Developing an expertise here could become a strong business.
Most initiatives within this space are project based. Which makes it difficult for longer initiatives that have some data sustainability. Rarely are there long term funding initiatives.

Highlights:

So many acronyms, references and URLs. The CIOOS folks provided me many references all pointing in the right direction. Reference to some of the ISO standards.
The need for better stewardship of data so as data ages it still has usefulness.

Pisces Research Project Management (https://piscesrpm.com/)

Another fortunate conversation with a person deep into ocean data and with the added bonus of being very technical. This was a contact I harvested from the New Brunswick Ocean Strategy Conference. There are may topics I could summarize from this outstanding conversation, much of the information confirmed things I discovered from the previous conversations described above. This is good!

I did pitch my idea about mooring buoys as a fixed points of data collection, and having these buoys like the personalized weather stations that have become so popular. This employee loved the idea.

The exciting part of this conversation was the discussion of the technical stack used within the open data within the oceans sector. It was good to add this to the knowledge I had of the proprietary technical stack used when I was managing the software engineering dept. at Provincial Aerospace.

What is the most common tech stack for Ocean Data?
This person has extensive experience working with Government Organizations and Academics. From what they have seen the most common, and emerging, technology stack includes;

Python
Assorted data storage approaches. Often NOT an RDBMS.
QGIS is common.

These are the tools he finds most effective and common. Using QGIS pushes you into the geo representation of data. Much ocean data requires different kinds of models, more 3d, more oceans… not necessarily geographic, etc.
The ability to prove models with real data is the biggest need from a technical perspective. This is why python has such good traction. It is easy for non-programmers and also rich enough for programmers. A good language for data, and useful across the technical skills working with data.
NetCDF is the most common data-store. Also CSV and proprietary data storage. Remember data people are mostly not programmers or overly technical.
Also take a look at CKAN (https://ckan.org/)
What are people looking for from a technical perspective?

Proving models with real data.
Integrating data

Highlights:

A deep discussion about the technical stack. The preferred programming languages, data storage, integration approaches, and technical issues.
Confirmation that integrating data and proving models is an area of software development opportunity.

Lessons Learned

A reference architecture for the digitization of oceans is not enough to hang a startup or business upon at this time! Where I do believe it is still a good idea that will form through time. There is so much work already going on for a common open architecture that another doesn't need to be started. I truly believe a reference architecture will emerge, it is a; when it will happen, not if it will happen.
There is a big need for technical and software development skills and knowledge in the data engineering space of ocean data. I believe the opportunity exists for a software development / data engineering consulting firm with the specialty of ocean data.
The idea of an anchored (or fixed) buoy for ocean data collection is very compelling too me. Kind of like the personal weather station but as a fixed mooring buoy. Anyone who has a mooring buoy could replace it with the data buoy, and have real-time data about the conditions at the buoy in preparation for mooring.

Next Steps

March will be the month of broadening my reach. I need to talk with a broader section of people working in the oceans space. I need to find potential customers for the processing and software development in, and around, ocean data.
I need to start building software tools for the processing of ocean data. I need a reference technology stack showcasing our abilities to work with data.
I need to start developing an elevator pitch for the ocean data software consulting firm. I need customers and revenue to get the real feedback to focus the business mission.

Sunday, February 21, 2021

The Beginning of Ocean Data Endeavours (ODE)

Thirty-nine months ago I started on a deep dive into developing a reference architecture for the digitization of oceans. The idea of developing this reference architecture was initiated by the Canadian Government awarding Atlantic Canada with the Ocean Super Cluster initiative and all my recent work with leading the software engineering group at Provincial Aerospace. My writing and research into ocean data took me to the point where I needed to deepen my understanding of a number of subjects, and I needed this deeper understanding before I could continue the writing and research (even though you could consider deepening understanding as research). I needed to have an intermediate understanding of what had come before and the current state of things with a reference architecture for the digitization of oceans. In particular, I needed to work directly with ocean data and the standards that influence its structure.

Over the last three years I have been lucky to work with Triware Technologies Inc., and together we have found projects that align with this need to deepen my understanding of all things digital and all things ocean. My recent project successes include;

OCIO Digital by Design - I was fortunate to be awarded the opportunity to be the data architect for the initial phase in digitizing the NL governments citizen facing portal. I remained on the project for the first 12 months through to the portals launch. Being on the design team to create the data tier and integrate with legacy data was a great achievement. And I deeply enjoyed using a scrum / jira approach with a multi-vendor, multi-disciplined team. We achieved a lot in a short period of time.

Lessons Learned - Agile, Scrum and Jira can scale well to a government organization with multiple scrum teams working toward an integrated solution.

Ocean Sector Search - We needed a way to index the Canadian Ocean Sector. So we built a search engine seeded by as many oceans related URLs as our analysts could gather. The technical architecture of this ocean specific search engine can be found in this previous post.

Lessons Learned - With reasonable technical effort Nutch can be configured and seeded to crawl a specific industry sector (in this case Canada Oceans Sector). The Nutch crawl harvested a significant number of pages (> 32000) that were then loaded into the ElasticSearch (ELK) stack while relevancy scoring each page along the way.

NLCHI - My work with the Newfoundland and Labrador Centre for Health Information (NLCHI) was a quick engagement to focus their requirements backlog into a few manageable sprints. I was super fortunate to help get an important project underway and gain insights into the concept of a customer focused data workbench for a specific subject domain.

Lessons Learned - The idea of a personal data workbench is very compelling when you consider the number of data sets already available in the oceans sector. And if we could fold in open and proprietary data sets, while honoring security and privacy we may be onto something...

Nalcor Energy Database Consolidation - So many databases, so little time. One of my favorite enterprise type projects is when the project pays for itself, over time, by the savings created by the projects downstream accomplishments. Not revenue generation, but operational expense savings. I believe one of the best KPI's for IT is not new systems implemented, but old systems retired.

Lessons Learned - an amazing amount of data can be moved with the correct use of tools, well built and managed ETL (pipelines), and a mindset of automation.

Argo Floats - 2018

NEXT STEPS

Over the last month I have revisited how to best develop my intermediate understanding of oceans data. After a number of conversations, with experts of oceans data, I believe my next steps are twofold; I need to focus on the existing standards for oceans data and I need to write some code to integrate some open oceans data sets.

I need to find opportunities to work directly with oceans data. If you are in the oceans sector, in any way, and you have the need of a very experienced data engineer, then I would love to help with your project. If you know of an oceans data project in needs of a data engineer, please forward on my credentials. Thanks to everyone for reading this far. And thank-you Triware for your ongoing career support!