Wednesday, October 24, 2012

Federated Databases (with a digital badges example)

The concept of federation is important when entering into discussions about distributed technology. I like the definition given from a google search on "federation".
fed·er·a·tion /fedəˈrāSHən/
1. A group of states with a central government but independence in internal affairs.
2. An organization or group within which smaller divisions have some degree of internal autonomy.
So a federation is how things collaborate / work together for a greater good and how a number of things bubble up into a larger collective. So how does this apply to the idea of federated databases. Federated databases provide technologies to make a collection of databases look like a single database. This is a simplified description that will do for the scope of this post. A good way to describe database federation is to compare it to a centralized database.
Centralized vs Federated databases
 The above diagram puts images of the centralized and federated database next to each other. The left side of the image represents the centralized database where the right side is the federated database. Databases store data and with the centralized model all the data is stored in the single centralized database. With the federated model the data is stored in a collection of related and autonomous databases. Each database within the federated approach may not have the exact same structure. This is both a blessing and a curse. It provides the ability for each database to have a different structure to meet their specific information needs, but it also makes it difficult to merge all the databases into a single common structure. All the databases within a federation have similar elements which provide the ability to link (or map) all the data together. Therefore, providing a single view of data; a federation of data.

Note 1: the amount of "centralized" data stored to link the federated databases together can vary. In some situations there is minimal amount of centralized data storage and all the databases are linked via mix of technology, good design and well managed metadata. The range of differences in how federated databases are implemented is well described in this technical paper on "Federated Database Systems".

Example: a federation of digital badges
Currently there are a number of emerging digital badge systems. Each of these different systems is designed to serve their particular badge issuer needs. Each has both similar and different attributes for what is a digital badge. A basic comparison of their similarities and differences would assist in describing a federation of digital badges. The following five sites issue digital badges, and each of them store user data regarding their earned badges in a database hosted on their respective servers;
The following differences and similarities come from a shallow analysis of these different badge systems. They serve as an example for this discussion on federated databases.

Data Differences:
  • badge criteria - the implementation of badge criteria can vary. It varies from a simple url location (mozilla), a collection of other badges or accomplishments (khan), and a dynamic criteria based on live contribution data (stackoverflow).
  • badge evidence - can also vary in its implementation, some of the evidence will follow the format required / specified by the criteria. While other evidence formats include a variety of different media and online contributions.
With these differences each badge system cannot be implemented in exactly the same database because they each use different data types and / or data structures. To overcome these differences when building the federation the data fields need to be mapped to a common data structure so they can be viewed as a single common set of data fields. This information is transformed into a common structure for the data federation. When this mapping occurs something is usually lost due to a systems specific data structure being mapped to a common data structure.

Data Similarities: 
  • badge name - the title of the badge
  • badge description - the description of the badge
  • issuer url - the internet url of the badge issuer
  • badge image - the url of the image used
  • earner identifier - the unique identifier of the badge earner
  • earner email - the earner email address
All the badge systems have these common (or similar) attributes. They are stored in each badge issuers database under different field names, but the data structures are similar enough that they could be merged together into one database without needing to transform the data.

To create a federated database of these different badge systems all the data would be merged. The similar data fields could be easily merged for the format is the same and would require no transformation. The different data fields could also be merged with some transformations, though it is likely some detail would be lost having the differences conform to the similar structure. If all goes well from a merge and data transformation perspective you would end up with a single view into all the different badge systems.

Note 2: Federated systems can have different amounts of merge and transform. In some situation the data is copied and moved into a new database that contains the federated data. In other situations technology sits on top of all the different databases and the merge and transform occurs in real-time and no data is moved or copied.

Note 3: This is a simplified discussion of federated databases. There are many design and technical details that have been simplified for this discussion. Feel free to email me if you would like to engage in a deeper discussion about database federation.