Changes

Jump to: navigation, search

EngineeringProductivity/Projects/ActiveData

58 bytes added, 13:29, 18 December 2014
reformatting
This project is inspired by the data warehouse and data mart technology that is common inside large corporations, but largely non-existent in the public space. Since Mozilla's mandate is an open web, and we have a lot of data to share, it is only logical we make our data active.
 = Architecture = Applications that leverage an active data warehouse can forgo significant server side development, if not all, and put the logic on the client side. == Features ==
An active data instance distinguishes itself from a static resource, or database, or big data solution, by delivering a particular set of features:
=== * '''A service, open to third party clients===: ''' - By providing the service, clients save the need to stand up their own datastore=== * '''Fast filtering=== : ''' - Sub-second filtering over the contents of the whole datastore, independent of size, saves the application developer from declaring and managing indexes that do the same: There is sufficient information in the queries to determine which indexes should be built to deliver a quick response.=== * '''Fast aggregates=== : ''' - Sub-second calculation of statistics over the whole datastore saves the application developer from building and managing caches of those aggregates. === * '''API is a query language (SQL, MDX)=== : ''' - Building upon the formalisms, and familiarity, of existing query languages, we reduce the learning curve, and also provide Active Data implementations with more insight into the intent of the client application; and optimize for its use cases.=== * '''Uniform, Cartesian space of values=== : ''' - Mozilla has a mandate of data driven decision making. Data analysis tools, like R, Scipy, Numpy, and Pandas are what's use to perform data analysis, and they all require uniform data in multi-dimensiton arrays. ActiveData's objective is to provide query results in these formats=== * '''Metadata on dimensions and measures=== : ''' - ActiveData also provides context to the data it holds. It serves the purpose to allow exploration and discovery by third parties; by describing unit-of-measure, how dimensions relate to others, and maybe even provide human descriptions of the columns stored. This metadata is also invaluable in automating the orientation and formatting of dashboard charts: Knowing the domain of an axis allows code to decide the best (default) chart form, and provides logically reasonable aggregate options. === * '''Has a security model=== ''' - Simpler applications can avoid the complications of a security model if it is baked into the ActiveData solution. If ActiveData is to become mainstream it is important that it can manage sensitive data and PII.
=Context=
== Problem ==
Columnar datastores, have solved many (but not all) problems with changing schema. Query-directed indexing has been around for decades in Oracle's query optimization algorithms, and are available for free in ElasticSearch. We now have the technology to build an ActiveData solution.
By defining an ActiveData standard, we can innovate on both sides of the ActiveData abstraction layer independently
== Client Architecture ==
By defining Applications that leverage an ActiveData standardactive data warehouse can forgo significant server side development, if not all, we can innovate and put the logic on both sides of the ActiveData abstraction layer independentlyclient side.
==Non Solutions==
ActiveData makes specific tradeoffs to achieve it's goals, and there are situations that active data will not provide benefit
* large memory requirements
* low add/update/remove speeds
Confirm
513
edits

Navigation menu