Wednesday, February 19, 2025

Navigating Data Governance in the Age of AI

The Data Guardian.

Data governance has become crucial in the age of AI, particularly with technologies like Retrieval Augmented Generation (RAG) that combine language models with internal and external knowledge sources. Whether used personally by individuals at home or organizationally for customer service, AI systems' effectiveness depends entirely on the quality and governance of their underlying data. This guide explores five essential elements of data governance for AI systems: data provenance (tracking data origins), data lineage (mapping data journeys), data quality (ensuring accuracy), data security (protecting information), and data access (managing permissions). Understanding and implementing these elements is vital for building trustworthy AI systems that can deliver accurate, unbiased, and compliant results while fostering innovation and protecting sensitive information.

Introduction: Retrieval Augmented Generation (RAG) - Your AI Co-Pilot

Imagine having a personal AI assistant that can instantly answer any question, grounded in reliable information. That's the promise of Retrieval Augmented Generation (RAG). RAG systems combine the power of large language models (LLMs) with the ability to retrieve information from external knowledge sources.

  • Personal RAG: Think of a student using RAG to research a paper. The AI can access a library of academic articles, textbooks, and credible websites to provide accurate and up-to-date information, tailored to the student's specific query.
  • Organizational RAG: Now picture a company using RAG to improve customer service. The AI can access internal knowledge bases, product manuals, and FAQs to provide instant and consistent answers to customer inquiries, reducing response times and improving customer satisfaction.

But here's the catch: the effectiveness of RAG, and any AI system, hinges on the quality and governance of the underlying data. Just like a faulty GPS can lead you astray, ungoverned data can lead AI to generate inaccurate, biased, or even harmful outputs. That's where data governance comes in.

Why Data Governance Matters for AI: Personal and Organizational Perspectives

Data governance is not just a set of rules, it's a framework for ensuring that data is accurate, reliable, secure, and used ethically. In the context of AI, data governance is crucial for:

  • Building Trust: AI systems are only as trustworthy as the data they are trained on.
  • Mitigating Risk: Poor data quality can lead to flawed AI conclusions, increasing the risk of bad decisions and non-compliance.
  • Ensuring Compliance: Data governance helps organizations comply with data privacy regulations like GDPR and CCPA.
  • Driving Innovation: High-quality, well-governed data fuels AI innovation and enables organizations to unlock the full potential of their data assets.


Five Key Elements of Data Governance for AI

Here are five main elements of data governance that are critical for both personal and organizational use of AI:

1.  Data Provenance: Tracing the Origin

  • Definition: Data provenance is the "who, what, when, where, and why" of data. It involves tracking the origins of data, how it has been transformed, and who has accessed it.
  • Personal Use: Imagine using an AI tool to analyze your personal finances. Data provenance would help you understand where the AI is getting your financial data (e.g., bank accounts, credit cards), how it's being processed, and who has access to it.
  • Organizational Use: In an organization, data provenance is essential for tracking the source of training data for AI models. This helps ensure that the data is reliable, unbiased, and compliant with regulations. Tools like blockchain can be leveraged for provenance tracking of AI assets. Standards, such as those proposed by the Data & Trust Alliance (D\&TA), aim to surface metadata on source, legal rights, privacy and protection, generation date, data type, generation method, intended use and restrictions and lineage.

2.  Data Lineage: Mapping the Data Journey

  • Definition: Data lineage is the chronological journey of data from its origin to its current state. It provides a complete audit trail of all transformations and processes that the data has undergone.
  • Personal Use: If you're using an AI-powered fitness tracker, data lineage would show how your activity data is collected, processed (e.g., calculating calories burned), and used to generate personalized recommendations.
  • Organizational Use: For AI applications, data lineage is crucial for understanding how data quality issues may have been introduced during processing. It also helps in debugging AI models and ensuring that the results are reproducible.
3.  Data Quality: Ensuring Accuracy and Reliability
  • Definition: Data quality refers to the accuracy, completeness, consistency, and timeliness of data. High-quality data is essential for building trustworthy AI systems.
  • Personal Use: If you're using an AI-powered medical diagnosis tool, you want to be sure that the data it's using (e.g., your medical history, lab results) is accurate and up-to-date.
  • Organizational Use: Organizations need to implement data quality checks and validation procedures to ensure that AI models are trained on reliable data. This includes monitoring data for bias and implementing mitigation strategies. Characteristics which define data quality are accuracy, completeness, reliability and timeliness.
4.  Data Security: Protecting Sensitive Information
  • Definition: Data security involves implementing measures to protect data from unauthorized access, use, disclosure, disruption, modification, or destruction.
  • Personal Use: When using AI tools, you need to be confident that your personal data is protected from cyber threats and unauthorized access.
  • Organizational Use: Data security is paramount for AI applications that handle sensitive data, such as customer information, financial records, or medical data. This includes implementing access controls, encryption, and data loss prevention measures.
5.  Data Access: Balancing Openness and Control
  • Definition: Data access refers to the policies and procedures for granting access to data. It involves balancing the need for open access to data for innovation with the need to protect sensitive information.
  • Personal Use: You should have control over who has access to your data when using AI applications, and be able to grant or revoke access as needed.
  • Organizational Use: Organizations need to establish clear data access policies that define who can access what data, for what purpose, and under what conditions. This includes implementing role-based access control and data masking techniques. Semantic models also contribute to data anonymization processes.
Conclusion: Embrace Data Governance for AI Success

AI has the potential to transform every aspect of our lives, but it's not a silver bullet. To harness the power of AI responsibly and effectively, we need to embrace data governance as a core principle. By focusing on data provenance, lineage, quality, security, and access, we can build AI systems that are trustworthy, reliable, and beneficial for all.

Call to Action

For Individuals: Take data governance seriously. Understand where your data comes from, how it's being used, and what your rights are.
For Organizations: Invest in data governance tools and processes. Establish clear policies, train your employees, and foster a data-driven culture.

The future of AI depends on it!

In the coming workshop I will cover these topics and more. I will provide a number of use cases and examples of data governance practices for individuals as they increase their knowledge of integrating AI into their daily practices.

Friday, February 14, 2025

Data Governance for Reliable AI: From Source to Insight

I am building a workshop to help adult students and professionals make the best use of emerging AI tools for both organizational and personal use. Two themes that will be present in the workshop include, using Retrieval Augmented Generation (RAG) for improving context and being mindful of information privacy within the context of AI.

The overall theme of the workshop is to unlock the potential of AI while ensuring quality and reliability. This workshop explores the critical role of continuous improvement, data governance, data provenance, and data lineage in building trustworthy AI systems. Discover practical strategies for implementing robust data management frameworks to address challenges in data quality, compliance, and model performance, leading to more effective AI solutions.

Key Topics:

  • Continuous Improvement: Learn why iteratively refining data is crucial for reliable AI outcomes.
  • Data Governance: Understand the importance of data governance in AI.
  • Data Provenance & Lineage: Discover how tracking data's origin, journey, and transformations enhances transparency, supports ethical practices, improves decision-making, and reduces hallucinations in AI applications.

AI Approaches: RAG

The workshop will also touch on Retrieval-Augmented Generation (RAG), a technique that fundamentally improves AI systems by enabling them to access and utilize specific, real-time information from organizational documents, databases, and knowledge bases, rather than relying solely on their training data. RAG enhances accuracy and reliability, as demonstrated in applications like healthcare and legal work. Unless an organization builds its own Large Language Model (LLM), everything it does with AI could be considered RAG.

For more detail in moving beyond good prompt engineering you need to consider RAG, please consider this blog post for further insight; https://criticaltechnology.blogspot.com/2024/12/rag-and-agents-how-ai-is-learning-to.html

Tools for Data Governance

The workshop will include a demo of NotebookLM. NotebookLM is designed with robust privacy features that make it particularly relevant for Canadian professionals handling sensitive information. The platform's key privacy feature is that uploaded documents are never used to train its AI models, ensuring data remains private and secure.

Most of the demo's will be in using NotebookLM, the RAG tool built by Google. To better understand NotebookLM and its security position, please consider this recent blog post; https://criticaltechnology.blogspot.com/2025/02/keeping-your-data-private-in-notebooklm.html

PIPEDA and Data Governance

It's important to understand Canada's Personal Information Protection and Electronic Documents Act (PIPEDA). While PIPEDA doesn't explicitly address AI, its technology-neutral principles establish crucial guidelines for handling personal data in AI projects. These include obtaining proper consent, limiting data collection, implementing security measures, maintaining transparency, ensuring data accuracy, and practicing accountability.

For more detail of how AI intersects with PIPEDA enjoy this blog post highlighting seven important impacts; https://criticaltechnology.blogspot.com/2025/02/ai-and-your-personal-project-navigating.html

If you are interested in attending this workshop feel free to sign up. All are welcome. Reserve your spot here: https://lnkd.in/ecKQ-reB


Wednesday, February 12, 2025

Keeping Your Data Private in NotebookLM: A Canadian Professional's Guide


As a Canadian professional, you understand the importance of data privacy. Whether you're working with client information, sensitive research, or proprietary business strategies, keeping your data secure is paramount.  That's why when exploring new tools like NotebookLM, understanding its privacy features is crucial.

NotebookLM offers a powerful way to interact with your documents, but how does it handle your sensitive information?  The good news is that NotebookLM is designed with privacy in mind. Here's a breakdown of what you need to know:

Your Data Stays Yours:

  • No Training Data:  Let's get the biggest concern out of the way first.  Your uploaded documents are never used to train NotebookLM's AI models.  Think of it this way: your data is for *your* use only, and it doesn't contribute to improving the system for other users. This is a critical distinction and a significant advantage for professionals handling confidential material.
  • Workspace Account Protection: If you're accessing NotebookLM through a work or school account with a qualifying Workspace edition, you get an extra layer of protection. In this scenario, your uploads, queries, and the model's responses are shielded from human review. This is particularly important for professionals in regulated industries or those dealing with highly sensitive data.

A Note on Personal Accounts and Feedback:

If you're using a personal Google account, the situation is slightly different.  If you choose to provide feedback on NotebookLM, human reviewers *might* see your queries, uploads, and the AI's responses.  Therefore, it's best practice to avoid submitting anything you wouldn't be comfortable sharing if you're using a personal account.  Consider this carefully when deciding how to use the platform.

Key Considerations for Canadian Professionals:

  • Copyright:  As always, respect Canadian copyright laws.  Ensure you have the necessary rights to share any content you upload to NotebookLM.  This is a fundamental principle regardless of the platform you're using.
  • Terms of Service: Your use of NotebookLM, whether through a personal or Workspace account, is subject to Google's Terms of Service or the Google Workspace Terms of Service, respectively.  Familiarize yourself with these terms to fully understand your rights and responsibilities.

The Bottom Line:

NotebookLM is built with privacy at its core.  The platform emphasizes keeping your documents confidential and separate from its AI training processes.  For Canadian professionals, this is a vital consideration.  By understanding these privacy features and adhering to best practices, you can leverage the power of NotebookLM while maintaining the confidentiality of your valuable data.  If you have any further questions or concerns, always refer to Google's official documentation and privacy policy for the most current information.


Monday, February 10, 2025

AI and Your Personal Project: Navigating PIPEDA's Privacy Landscape

Artificial intelligence is rapidly changing the landscape of what's possible, even in personal projects.  Many hobbyists and professionals are exploring the power of AI for everything from creative endeavors to data analysis. But with this power comes responsibility, especially when dealing with personal information.  In Canada, the Personal Information Protection and Electronic Documents Act (PIPEDA) sets the ground rules for how we handle such data, and it applies even to your personal AI projects.

PIPEDA doesn't specifically mention "AI," but its core principles are technology-agnostic.  Think of it as a set of best practices for responsible data handling, regardless of the tools you use. So, how does this impact your AI tinkering? Let's break it down:

  1. Consent is Key: If your AI project uses any personal information, you generally need consent to collect, use, or disclose it.  This is crucial, even if you're not selling anything or sharing the data widely.  Think about what data your project requires and how you'll obtain consent.
  2. Stick to the Purpose: You can only use the personal information for the specific purpose you stated when you got consent.  Don't collect data for one reason and then use it for something completely different without obtaining new consent.  Be clear and upfront about your intentions from the start.
  3. Less is More (Data Minimization):  Only collect the personal information you *actually* need for your project.  Avoid the temptation to gather extra data "just in case."  The less you collect, the less you have to protect.
  4. Protect What You Collect (Safeguards):  You're responsible for protecting the personal information you collect with appropriate security measures.  This is especially important if you're dealing with sensitive data. Think about encryption, access controls, and secure storage.
  5. Be Transparent:  Be open and honest about how you're using personal information in your AI project.  People have a right to know how their data is being used, even in seemingly harmless projects.  Consider a simple privacy notice or explanation.
  6.  Accuracy Matters:  If your AI project involves making decisions about individuals, you need to ensure the personal information you're using is accurate and up-to-date. Inaccurate data can lead to unfair or incorrect outcomes.
  7. Accountability is Your Responsibility:  Ultimately, you're responsible for complying with PIPEDA, even in a personal project.  This means being able to demonstrate how you're protecting personal information and adhering to the principles outlined in the Act.

The Bottom Line:

PIPEDA might seem daunting, but its principles are fundamentally about respect for privacy.  By considering these points, you can ensure your AI projects are not only innovative but also responsible. Remember, these are just some key considerations. PIPEDA is a complex piece of legislation.  If you have specific questions about how it applies to your project, consulting with a privacy expert or legal professional is always a good idea.  Protecting privacy is not just a legal obligation; it's the right thing to do.

Friday, December 27, 2024

Getting Started with AI: NotebookLM

 If you followed along with a previous post on a holiday challenge for learning AI you may now be wondering where too next? Great question, shows you have learned about prompt engineering and are now thinking there has to be more. There is more, a lot more. A good set of skills and understanding of prompt engineering would serve you very well, and you could stop there for a while. Particularly, if you iterate your prompts and increase your literacy in creating prompts. And remember AI can help you improve your prompting.

For many people, I have found that once the intermediate understanding of prompting is achieved the question doesn't seem to go to how do I prompt better. The question seems to go to, can this be automated or can my LLM be more subject specific or can the LLM be restrained personally to my own knowledge. I want the AI to be more specific or give more weight to a narrower or more personal domain of knowledge.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is like giving an AI system a personalized library to reference while it's talking to you. Instead of only relying on what it learned during training, RAG lets AI search through specific documents or data to find relevant information before generating a response. Think of it like a student who first checks their textbook and notes before answering a question, rather than just going off memory. This helps the AI give more accurate and up-to-date answers based on reliable sources. 

There are a few online options to provide you a personal RAG platform. Currently, my two favorites are perplexity.ai and NotebookLM. Both these platforms allow you to upload or reference other resources (text, video, and others) to augment (and focus) your use of AI. Really very amazing at supporting you in creating subject specific AI mentors. I strongly suggest you begin to play with NotebookLM.

Consider using NotebookLM

  1. Set up an account (or use it with your existing google account). https://notebooklm.google.com/
  2. Watch a NotebookLM introductory overview video: https://youtu.be/UG0DP6nVnrc?si=2bGoT7ZMI-VKsU6_
  3. Think about business and personal use cases: https://youtu.be/U3SgtCWsjXg?si=eR_ESarUJTHenPki
  4. Consider the history of NotebookLM development at google: https://youtu.be/sOyFpSW1Vls?si=F9gVrxXrc2vihRnf
If you remain curious about where RAG and automated agents fit into all the near future of AI I published a post last week discussing these two innovations with AI. Twenty twenty-five will be an interesting year.

Wednesday, December 18, 2024

RAG and Agents: How AI is Learning to Think and Act

Collaborative RAG and Agents.
In the rapidly evolving landscape of artificial intelligence, two technologies are fundamentally changing how AI systems interact with the world: Retrieval-Augmented Generation (RAG) and AI Agents. While both enhance AI capabilities, they serve distinctly different yet complementary purposes in advancing machine intelligence.

RAG: The Power of Grounded Knowledge

Imagine trying to navigate a foreign city using only your general knowledge of how cities work. You might make educated guesses about where to find the downtown area or how the transit system operates, but you'd likely make many mistakes. This is similar to how traditional Large Language Models (LLMs) operate – they rely on their training data to make informed but potentially inaccurate assumptions.

RAG transforms this paradigm by giving LLMs access to specific, relevant information in real-time. Instead of relying solely on their training data, RAG-enabled systems can pull precise information from your organization's documents, databases, and knowledge bases. This means when you ask a question about your company's Q4 2023 results, the AI isn't generating a plausible-sounding response – it's retrieving and synthesizing actual data from your financial reports.

The impact of RAG on accuracy and reliability cannot be overstated. In healthcare, for instance, RAG-enabled systems can access the latest medical research rather than relying on potentially outdated training data. In legal applications, they can reference specific case law and regulations rather than generating generic legal-sounding language.

Agents: From Knowledge to Action

While RAG revolutionizes how AI systems access information, Agents take things a step further by adding autonomous action to the mix. An AI Agent is more like a capable assistant than a simple question-answering system. It can:

  1. Plan and execute multi-step tasks
  2. Interact with external tools and systems
  3. Maintain context across conversations
  4. Learn from past interactions
  5. Make decisions based on evolving situations

Consider a customer service scenario. A RAG-enabled system might accurately answer questions about your return policy by referencing your documentation. An Agent, however, could actually process the return, check inventory for replacements, schedule a pickup, and update your CRM – all while maintaining a natural conversation with the customer.

The Synergistic Future

The real magic happens when RAG and Agents work together. Imagine an AI system that can not only access your entire corporate knowledge base but also take action based on that information. It could:

  • Monitor market trends and automatically adjust your digital advertising strategy
  • Analyze customer feedback across channels and initiate appropriate response workflows
  • Review legal documents and prepare necessary compliance filings
  • Manage complex project timelines while adapting to real-time changes

Practical Implications for Businesses

The combination of RAG and Agents represents a significant leap forward in business process automation. Organizations can now build systems that don't just provide information but actually complete complex workflows with minimal human intervention.

However, this power comes with responsibility. As these systems become more capable, it's crucial to implement proper governance structures, ensuring that AI actions align with business objectives and ethical considerations.

Looking Ahead

As both RAG and Agent technologies continue to mature, we're likely to see increasingly sophisticated applications that blur the line between knowledge systems and autonomous actors. The key will be finding the right balance between automation and human oversight, ensuring that these powerful tools enhance rather than replace human decision-making.

The future of AI isn't just about smarter systems – it's about systems that can both understand and act upon that understanding in meaningful ways. RAG and Agents are just the beginning of this transformative journey.

Saturday, December 14, 2024

Getting Started with AI: A 15-Hour Learning Journey

Want to become AI-savvy in just two weeks? Here's a focused learning path that requires only about an hour a day. This guide explores four essential themes that will transform how you interact with AI:

  1. Learning from AI Experts: How to leverage AI podcasts to build your knowledge foundation
  2. The Art of Iteration: Mastering the technique of refining your prompts to get better results
  3. Trust but Verify: Developing critical thinking skills to verify AI-generated content
  4. Smart Summarization: Converting lengthy AI conversations into powerful, reusable prompts

Let's dive into these themes through practical exercises and real-world examples that will help you harness AI effectively in your daily life.

Week 1: Building Your Foundation (7.5 hours)

Deep Dive into AI Through Podcasts (2.5 hours)

Start your journey by listening to carefully selected podcasts during your commute or daily routine. I recommend you choose only one or  two for your regular listening pleasure:

Mastering Prompt Iteration (2 hours)

Spend time practicing with AI chatbots, focusing on refining your prompts. Here's a fun example:

Initial Prompt:

"Write about dogs"

Improved Iteration:

"Write a 300-word guide about choosing the right dog breed for apartment living, including considerations for size, energy level, and noise"

Final Iteration:

"Create a comprehensive guide for apartment dwellers considering a dog. Include:

  • Top 5 breeds suited for apartment living
  • Exercise requirements for each breed
  • Noise levels and training tips
  • Space considerations
  • Estimated monthly costs

Format this as a practical guide with clear headings and bullet points"

Summary Iteration:

I often ask the chatbot to provide an improved prompt based upon the contents of the session.

  • "Please rewrite the prompts within this session into a single well-engineered prompt"

Asking the AI chatbot to rewrite your prompt really helps in deepening your understanding of prompt engineering.

And to make things interesting I sometimes ask the chatbot to rewrite the response with different literacy levels.

  • "Please rewrite this response for a grade five literacy level"
  • "Please rewrite this response for a PhD literacy level"
I actually find the response for the grade eight literacy level more interesting than the PhD level.

Using the different AI chatbots (3 hours)

There are many emerging AI chatbots, build some prompts within each. Experiment with different AI chatbots: Play, get curious, ask the AI bot to rewrite your prompt, try the rewrites against all these different chatbots, compare and contrast their responses.

  • ChatGPT: Excellent for creative writing and coding
  • Claude: Strong at analysis and detailed explanations
  • Gemini: Particularly good with multimodal tasks
  • Perplexity: Specialized in real-time information retrieval and citation

Week 2: Advanced Techniques (7.5 hours)

Verification Strategies (4 hours)

Learn to verify AI outputs effectively with these examples:

When you have Historical Facts:

"You mentioned the Wright brothers' first flight was in 1903. Can you:

  1. Provide specific sources for this date
  2. Break down the key events of that day
  3. Highlight any details you're uncertain about"

When you asked for Technical Advice:

"You've suggested this Python code solution. Can you:

  1. Explain why each line is necessary
  2. Identify potential edge cases
  3. Compare it with alternative approaches"
When you wanted Financial Analysis:

"You've provided a financial forecast for my small business. Can you:
  1. Explain the key assumptions behind your projections
  2. Identify potential economic factors that could impact these numbers
  3. Compare this forecast with industry benchmarks
  4. Highlight any areas where you have limited data or uncertainty
  5. Suggest additional data points that could improve the accuracy of this analysis"
These  are three examples of verification for your AI outputs. It is always a good idea to request verification as it reduces the AI hallucinations and increases your knowledge of the topic being discussed.

Work through the sessions from last week and write prompts to verify the information in an AI output. Spend a few hours creating verification prompts, ask the AI to write these for you. Improve upon the verification prompts, iterate.

An AI hallucination occurs when an artificial intelligence generates information that appears plausible but is factually incorrect or nonsensical. 

Session Summarization (3.5 hours)

Master the art of creating comprehensive prompts from AI sessions. Here's an example:

Original Conversation:

  • Human: "How can I improve my public speaking?"
  • AI: [Provides tips about preparation]
  • Human: "What about handling nervousness?"
  • AI: [Shares anxiety management techniques]
  • Human: "How should I structure my speech?"
  • AI: [Explains speech organization]

Summarized into Single New Prompt:

"Create a comprehensive public speaking guide for beginners that covers:

  1. Essential preparation steps
  2. Anxiety management techniques
  3. Speech structure and organization
  4. Delivery tips
  5. Common pitfalls to avoid

Include specific examples for each section and actionable steps for implementation"

Practical Exercise Examples

Try these exercises during your learning journey:

1. Content Creation:

  1.    Ask AI to write a blog post, then iterate three times, each time making it more specific
  2.    Example progression:
    •    "Write about healthy eating"
    •    "Write about healthy eating for busy professionals"
    •    "Create a 7-day meal prep guide for busy professionals who have only 30 minutes for dinner"

2. Problem Solving:

  •    Start with a complex problem like home organization
  •    Break it into smaller tasks
  •    Ask AI to verify the feasibility of each step
  •    Create a final, comprehensive action plan

Reminder: Ask AI to summarize a session and all its progressive steps into a single new prompt. Use this prompt in the different chatbots.

Key Takeaways

After completing this learning path, you'll have:

  • A solid understanding of current AI capabilities and limitations
  • Practical experience in prompt engineering
  • The ability to verify and validate AI outputs
  • Skills to maintain efficient AI conversations

Remember: Success with AI tools comes from systematic practice and refinement. Start with simple queries and gradually increase complexity as you become more comfortable with the interaction patterns.

Pro Tip: Keep a "prompt journal" documenting your most effective prompts and the situations where they worked best. This will help you develop your own library of reliable AI interaction strategies.