[hemmerling] Data Processing 5/5 - Business Intelligence, Data Mining

Organizations

Conferences

Software

Free ETL Software

Free BI Software

Free BI Tools

Free BI Frameworks

Data Cleanup

  • The OpenSource OpenRefine, GitHub "OpenRefine" ( formerly: “Google Refine” ) - “A powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data”.

Machine Learning

Important Machine Learning Tools according to "Developer Economics - National Trends Survey", 2016-04
  • IBM Watson
  • BigML
  • SAS Enterprise Miner
  • R platform
  • Python machine learning libraries
  • Google Prediction API
  • WEKA Machine Learning Workbench
  • Microsoft Azure Machine Learning
  • AWS Machine Learning
  • TensorFlow
  • PredictionIO
  • MATLAB or Octave
  • DMTK
  • IBM SPSS Modeler
  • Apache Spark
  • RapidMiner
Tools
  • The OpenSource library Tensorflow, GitHub "tensorflow" - “TensorFlow is an Open Source Software Library for Machine Intelligence”, “Open source software library for numerical computation using data flow graphs”.
  • GitHub "google/skflow" - “Simplified interface for TensorFlow (mimicking Scikit Learn)”.
Resources

Microsoft BI Tools

The free standalone Microsoft Power BI

The Tool
Downloads
Resources
  • “Microsoft Power BI” is a dashboard tool, not a reporting tool!
    • It is powered by “Microsoft SQL Server Analysis Services (SSAS)”.
    • It is now easy to catch data from HTML pages ( e.g. data tables of Wikipedia ).
    • Data visualisation:
      • Built-in graphics ( by a graphics engine similar / identical to that known from Micrsoft Excel, Microsoft SQL Server Reporting Services and Microsoft SQL Server Analysis Services ).
      • There is an API by which third parties may offer own visualisations.
    • Indeed with Microsoft Excel 2016, you may create the same dashboards, but you can´t publish it on the web.
  • You may save “Microsoft Power BI Desktop” as in single ”.pbix” files. Experts told me, that for saving a project, the software needs 3 times of the RAM memory as needed for the loaded data. So it might be that you may load big data, but are not able to save it. In general, this might not be a problem with the 32-bit edition ( on Windows PCs with 2 or 3 GByte of RAM ), but also with the 64-bit edition ( i.e. a project may be saved on a Win64 computer with 16 GB RAM, but not on a Win64 computer with 4 GB RAM ).
  • You might create test cases ( e.g. at business intelligence trainings ) with “Microsoft Power BI”, as alternative to NBI.
  • “Microsoft Power BI Desktop” also supports the “M” and “DAX” query languages, by the “Advanced Editor” ( “Home / Edit Queries ( = Query Editor ) / Advanced Editor” ).
  • If you install “Microsoft R” prior to “Microsoft Power BI Desktop”, you may use “R” for data processing and graphical reporting → See Mathematical Engineering.
  • I was told by experts, in 2016-03:
    • Now legacy versions of “Microsoft Power BI Desktop” were able to process 10.000 datasets for visualisation.
    • Current versions of “Microsoft Power BI Desktop” were able to process 10.0000 datasets for visualisation.
    • With “R” called by “Microsoft Power BI Desktop”, you may visualize 150.000 datasets.

Commercial Editions of Microsoft SQL Server with BI Support

Power BI Gateway - Personal

The Tool
Documentation
Concept
  • To get a BI report, connect with a client ( Internet browser, smartphone app,..) to the “Microsoft Power BI” cloudservice. The basic “Power BI” account is free.
  • By this, you may access any cloud document.
  • The cloud servce may access a registered onpremise Windows server at your ( the datacenter of your ) company, if the computer runs a free “Power BI Gateway - Personal” service, by a VPN connection.
    • Experts told me, that the free VPN service shipped with Windows is not under development for some time. Therefore it isn´t very advisable to use it in production environments.
Resources

Free Addons for the commercial Microsoft Excel

Tools

Query and Data Modeling Languages

Analysis Services Scripting Language ( ASS )
Data Analysis Expressions ( DAX ) - Data Modeling Language
"M" Language ( "Power Query Formula Language" / Microsoft Power Query for Excel Formula Language" )
MultiDimensional eXpressions ( MDX )
Tabular Modeling Scripting Language ( TMSL )

Datazen

Some other commercial BI Software

Services

BigML

  • The online cloud based predictive analysis service BigML - “Machine Learning for everyone. Easily add data-driven decisions and predictive power to your company”, “NOW FREE. Unlimited tasks ( up to 16 MB/task )”.

SAP HANA

The Service

Education

Events

Resources

Complex event processing

Data Warehouse Modeling

Dan Linstedt & Michael Olschimke - Data Vault Modeling 2.0

Dan Linstedt - Data Vault Modeling

Michael Olschimke

Literature

Some other Data Vault Literature

Commercial Data Warehouse Automation Tools, for use with Data Vault

AnalytiX DS
Varigence BimlFlex
WhereScape

OpenSource Data Warehouse Automation Tools, for use with Data Vault

  • I don´t know any, but you should consider to use XTEXT as design tool for your domain specific tools.

Resources

Bill Inmon

Len Silverston

Realtime Data Warehouse

Metamodels and Languages

Common Warehouse Metamodel

Business Intelligence Markup Language ( BIML )

Data Analysis & Dashboards

Full Text Search

Webcasts. Webinars

Resources

Forums, Newsgroups


When this document changes ! Site Navigation ( My Business ! My Topics ! Imprint / Contact ! Privacy Policy ! Keyword Index ! ! Google+ Publisher "hemmerling" )

 
en/bintelligence.html.txt · Last modified: 2017/10/20 12:59 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki