import blazetime for flexible data

Blaze presents a pleasant and familiar interface to us regardless of what computational solution or database we use (e.g. Spark, Impala, SQL databases, No-SQL data-stores, raw-files). One Blaze query can work across data ranging from a CSV file to a distributed database.
It's a bookstore! We've got:
(open notebook, fiddle, watch things break…)
things get really interesting when you can query across data sources, as more and more people have to do
(open notebook again, see more things break…)
accounts = blaze.resource('postgres://cpa:cpa@server/db::accounts')
Translates a string pointing to data into a Python object pointing to that data.
from blaze import resource from pandas import read_excel @resource.register('.*\.(xls|xlsx)') def resource_xls(uri, **kwargs): return read_excel(uri, **kwargs)
in_debt = blaze.compute(t[t.balance < 0], {t: accounts})
Does all the work – evaluates an expression against a set of data sources.
accounts = blaze.data('postgres://cpa:cpa@server/db::accounts') in_debt = blaze.compute(accounts[accounts.balance < 0])
Combines resource & compute – extremely handy for interactive exploration.
from blaze import join, by, concat, transform, merge, abs, sqrt, sin, sinh, cos, cosh, tan, tanh, exp, expm1, log, log10, radians, \ degrees, ceil, floor, trunc, isnan, greatest, least, coerce, distinct, min, max, mean, std, count, map
Supports a lot of expressions – documentation at http://blaze.readthedocs.io/en/latest/api.html#expressions doesn't cover all of them, but is a good start.
Expressions are internally described as trees of operations.
Lots of detail at http://blaze.readthedocs.io/en/latest/expr-design.html
>>> bz.to_tree(accounts['$']) { 'op': 'Field', 'args': [{ 'op': 'Symbol', 'args': ['_2', dshape("2 * {'$': int64, u: string}"), 0] }, '$'], }
http://blaze.readthedocs.io/en/latest/computation.html
pre_compute all the leaves of the tree that represent dataoptimize the expressioncompute_down on the entire expression treecompute_up.
Repeat this until the data significantly changes type (e.g. list to int
after a sum operation)optimize on the expression and pre_compute on all data elements.post_compute on the resultInternally, a lot of Blaze is implemented as simple functions that handle just one combination of possible inputs to an expression – like here, we see the case where we're computing a selection on pure-Python data.
@dispatch(Selection, Sequence, Sequence) def compute_up(expr, seq, predicate, **kwargs): preds = iter(predicate) return filter(lambda _: next(preds), seq)
(the decorator is from multipledispatch)
blaze.server.Server can host your datablaze.server.Client can be your APIaccounts = data('blaze://accounts.bank.com:6363') in_debt = accounts[accounts.balance < 0]
df = odo.odo('accounts.csv', 'postgresql://accounts::db')
var * { name: string, balance: float64 }
contact: @necaris