[hemmerling] Data Processing 5/7 - Business Intelligence, Data Mining

Organizations

Conferences, Seminars, Trainings

Germany

  1. Data Modeling Zone in Hannover ( 2013-09-23 - 2013-09-24 ).
  2. TDWI Young Guns.
      1. Full-day online training DatenPioniere GmbH "Microsoft Power BI Training - Wir machen Euch fit", 2023-02-03, 2023-03-03, 08:30-16:30. I joined :-).
        • “Kostenfreie Anmeldung. Werde mit uns zum Power BI Champion und melde Dich heute noch zur kostenfreien Basis Schulung an”.
        • “Jeden Monat bietet Dir Niklas ein kostenfreies Tagestraining an. Wir sind davon überzeugt, dass mehr gute Entscheidungen in deinem Unternehmen getroffen werden, wenn mehr Mitarbeitende Power BI beherrschen. Melde dich heute noch an”.
        • As of 2023-01:
          • Power BI Pro doesn't provide machine learning / artificial intelligence functions :-(.
          • Power BI Premium provides machine learning / artificial intelligence functions :-).
        • Starscheme: Separation of fact table and dimension table.
          • Fact table ( FACT_xxx, F_xxx ): German w-questions ( who, what, when, amount, rebate.. ), what is relevant for a financial transaction ( buy / sell / contract ).
          • Dimension table ( DIM_xxx, D_xxx ): Product ID, product category, country of customer.
          • If a customer buys 1000x the same good, you don't want all the 1000 data items in the dimension table.
        • Spreadsheet Cell vs. Calculated column vs. Measure.
          • In opposite to spreadsheets, BI tools are not based on the “cell” principle :-(, you can just address columns, not single cells :-(.
          • Calculated columns require filespace, Measures don't require filespace, as they are calculated in-memory.
          • Calculations
            Umsatz = 'F_Order Details'[Unit Price] * 'F_Order Details'[Quantity] *(1-'F_Order Details'[Discount])
            Einkaufskosten = 'F_Order Details'[Quantity] * RELATED(D_Products[UnitCost])
            Gewinn = 'F_Order Details'[Umsatz] - 'F_Order Details'[Einkaufskosten]
            
            Datumstabelle =
            VAR ErsteErfassung = Min(F_Orders[Order Date])
            VAR Kalender = CALENDAR(Date(Year(ErsteErfassung),1,1), Date(Year(Now())+1,12,31))
            return
            ADDCOLUMNS( Kalender,  
            "Jahr", Year([Date]),
            "JahrQuartal", FORMAT([Date], "yyyy-Q"),
            "JahrMonat", FORMAT([Date],"yyyy-mm"),
            "JahrKW", COMBINEVALUES("-", Year([Date]), FORMAT(WEEKNUM([Date],21),"00")),
            "Quartal", FORMAT([Date], "\QQ"),
            "Kalenderwoche", WEEKNUM([Date], 21),
            "Monat", FORMAT([Date], "mmm"),
            "Monat#", Month([Date]),
            "Wochentag", FORMAT([Date], "ddd"),
            "Wochentag#", Weekday([Date], 2)
            )
            
      • Important settings:
        • “File / Options and settings / Options” - “Global / Data Load”.
          • ”[_] Time intelligence”.
        • “File / Options and settings / Options” - “Current File / Data Load”.
          • ”[_] Time intelligence”.
          • ”[_] Import relationships from data soruces on first load”.
          • ”[_] Autodetect new relationships after data is loaded”.
      • Drilldown fields “Category Name, County aus Customers, Last Name aus Employes”.
      • After creating a Measure move it to a “Card” ( “123” ) visualisation!
        Umsatz Measure = Sum('F_Order Details'[Umsatz])
        !
        Umsatz Adams I =
        CALCULATE(
            Sum('F_Order Details'[Umsatz]),
            D_Employees[Last Name] = "Adams"
        )
        
        Umsatz Adams in Deutschland =
        Calculate(
            Sum('F_Order Details'[Umsatz]),
            D_Employees[Last Name] = "Adams",
            D_Customers[Country] = "Germany"
        )
        
        Umsatz Adams und Buchanan =
        CALCULATE(
            [Umsatz Measure],
            D_Employees[Last Name] = "Adams" ||
            D_Employees[Last Name] = "Buchanan"
        )
        
        Umsatz Adams II =
        Calculate(
            Sum('F_Order Details'[Umsatz]),
            D_Employees[Employee ID] = 4 //4 ist die ID für Adams
        )
        
        Gesamtumsatz =
        CALCULATE(
            [Umsatz Measure],
            All('F_Order Details')
            
        Anteil Umsatz = DIVIDE([Umsatz Measure],[Gesamtumsatz])
        
        Umsatz nach Shipped Date =
        Calculate(
            Sum('F_Order Details'[Umsatz]),
            USERELATIONSHIP(Datumstabelle[Date], F_Orders[Shipped Date])
        )
        
      1. The free event DatenPioniere "BI Arena auf Schalke" at Veltins Arena, Rudi-Assauer-Platz 1, 45891 Gelsenkirchen, 2025-01-29, 8:00 – 14:30.
  3. “M365 Summits”.
  4. Online events by BI or DIE Self Service.
  5. Free conferences and expositions Tech Show Frankfurt / Big Data & AI World, Cloud Expo Europe Frankfurt / Data Centre World in Frankfurt, 2023-05-10 - 2023-05-11.

International

  1. DataTalks.Club - “Free Data Engineering course!” ( on demand ).
  2. Global Data Summit in USA ( 2017, 2019 ).
  3. The affordable commercial online event Power BI Summit, 2023-03-06 - 2023-03-10.
  4. The free live virtual event Open Source Data Summit ( OSDS ).
    • 2023-11-15.
    • 2024-??-??.

Software

Free ETL Software

Free Tools

Resources

Free BI Software

Free Data Warehouse Solutions

Free BI Tools

Free BI Frameworks

Data Cleanup

  • The OpenSource OpenRefine, GitHub "OpenRefine" ( formerly: “Google Refine” ) - “A powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data”.

Machine Learning

Important Machine Learning Tools according to "Developer Economics - National Trends Survey", 2016-04
  • IBM Watson
  • BigML
  • SAS Enterprise Miner
  • R platform
  • Python machine learning libraries
  • Google Prediction API
  • WEKA Machine Learning Workbench
  • Microsoft Azure Machine Learning
  • AWS Machine Learning
  • TensorFlow
  • PredictionIO
  • MATLAB or Octave
  • DMTK
  • IBM SPSS Modeler
  • Apache Spark
  • RapidMiner
Tools
  • The OpenSource library Tensorflow, GitHub "tensorflow" - “TensorFlow is an Open Source Software Library for Machine Intelligence”, “Open source software library for numerical computation using data flow graphs”.
  • GitHub "google/skflow" - “Simplified interface for TensorFlow (mimicking Scikit Learn)”.
Resources

Microsoft BI Tools

The free standalone Microsoft Power BI

The Tool
Online Services
  • “You've selected Microsoft Power BI. Let's get you started. XXX looks like a personal email address. Enter your work address so we can connect you with others in your company. And don't worry. We won't share your address with anyone. Sign in with a work email address” :-(.
    • The publishing service “Power BI”, just for users of the commercial “Office 365” service?! - “The easy way to see your important data in one place. With a few clicks, connect to data from applications you use and get started with pre-built dashboards from experts”.
Downloads
Resources
  • “Microsoft Power BI” is a dashboard tool, not a reporting tool!
    • It is powered by “Microsoft SQL Server Analysis Services (SSAS)”.
    • It is now easy to catch data from HTML pages ( e.g. data tables of Wikipedia ).
    • Data visualisation:
      • Built-in graphics ( by a graphics engine similar / identical to that known from Micrsoft Excel, Microsoft SQL Server Reporting Services and Microsoft SQL Server Analysis Services ).
      • There is an API by which third parties may offer own visualisations.
    • Indeed with Microsoft Excel 2016, you may create the same dashboards, but you can´t publish it on the web.
  • You may save “Microsoft Power BI Desktop” as in single ”.pbix” files. Experts told me, that for saving a project, the software needs 3 times of the RAM memory as needed for the loaded data. So it might be that you may load big data, but are not able to save it. In general, this might not be a problem with the 32-bit edition ( on Windows PCs with 2 or 3 GByte of RAM ), but also with the 64-bit edition ( i.e. a project may be saved on a Win64 computer with 16 GB RAM, but not on a Win64 computer with 4 GB RAM ).
  • You might create test cases ( e.g. at business intelligence trainings ) with “Microsoft Power BI”, as alternative to NBI.
  • “Microsoft Power BI Desktop” also supports the “M” and “DAX” query languages, by the “Advanced Editor” ( “Home / Edit Queries ( = Query Editor ) / Advanced Editor” ).
  • If you install “Microsoft R” prior to “Microsoft Power BI Desktop”, you may use “R” for data processing and graphical reporting → See Mathematical Engineering.
  • I was told by experts, in 2016-03:
    • Now legacy versions of “Microsoft Power BI Desktop” were able to process 10.000 datasets for visualisation.
    • Current versions of “Microsoft Power BI Desktop” were able to process 10.0000 datasets for visualisation.
    • With “R” called by “Microsoft Power BI Desktop”, you may visualize 150.000 datasets.

Commercial Editions of Microsoft SQL Server with BI Support

Power BI Gateway - Personal

The Tool
Documentation
Concept
  • To get a BI report, connect with a client ( Internet browser, smartphone app,..) to the “Microsoft Power BI” cloudservice. The basic “Power BI” account is free.
  • By this, you may access any cloud document.
  • The cloud servce may access a registered onpremise Windows server at your ( the datacenter of your ) company, if the computer runs a free “Power BI Gateway - Personal” service, by a VPN connection.
    • Experts told me, that the free VPN service shipped with Windows is not under development for some time. Therefore it isn´t very advisable to use it in production environments.
Resources

Free Addons for the commercial Microsoft Excel

Tools

Query and Data Modeling Languages

Azure Log Analytics Query Language
  • See on this page.
Analysis Services Scripting Language ( ASS )
Data Analysis Expressions ( DAX ) - Data Modeling Language
"M" Language ( "Power Query Formula Language" / Microsoft Power Query for Excel Formula Language" )
MultiDimensional eXpressions ( MDX )
Tabular Modeling Scripting Language ( TMSL )

Datazen

Data Science Platform by Joshua Görner

"TWDE Datalab" by ThoughtWorks

Some other commercial BI Software

Services

BigML

  • The online cloud based predictive analysis service BigML - “Machine Learning for everyone. Easily add data-driven decisions and predictive power to your company”, “NOW FREE. Unlimited tasks ( up to 16 MB/task )”.

EdgeImpulse Inc.

Google

Google Colab ( Colaboratory )

  • The free online service Google Colab ( Colaboratory ).
  • A free Jupyter Notebook server, most packages of Anaconda are installed, but not 100% compatible ( different version of Python packages, few missing packages ).
  • User experience of Google Drive users ( so you must be logged into Google Drive by your Google account ):
    • Create new Jupyter cells.
      • Button ”+Code” and menu option “Insert / Code cell” create a new code cell, while ”+Text” and menu option “Insert / Text cell” create a new text cell, at the point of the Jupyter notebook, where your cursor is, i.e. where you have hit your mouse by mouse-leftlick.
      • In local Jupyter notebooks, there is a button of the extra button bar to provide this. This button bar is not available with Google Colab.
    • You may create a copy of a Jupyter notebook put on Google Docs, by “File / Save a copy in Drive”.
    • But its most easy to transfer additional data files by downloading them on your local terminal ( PC with browser ), and to upload it into the copied Jupyter Notebook.
    • You might use “Files / Mount Drive” to create the code
      from google.colab import drive
      drive.mount('/content/drive')
      

Google Looker

Microsoft KUSTO

The Service

Azure Log Analytics Query Language

Resources

SAP HANA

The Service

Education

Events

Resources

Some other commercial BI Tools

Complex event processing, Streaming Analytics

Realtime Data Warehouse

FIWARE "Orion ( Context Broker )"

Google Cloud Stream Analytics Solution

Microsoft Technologies for Realtime Data Warehouses

Microsoft Azure Stream Analytics

Microsoft SQL Server - StreamInsight

Microsoft KUSTO

  • See on this page.

Resources

Data Analysis & Dashboards

Full Text Search

Artificial Intelligence

Webcasts. Webinars

Experts

Literature

Resources

Forums, Newsgroups


When this document changes ! Site Navigation ( My Business ! My Topics ! Imprint / Contact ! Privacy Policy ! Keyword Index ! ! Google+ Publisher "hemmerling" )

 
en/bintelligence.html.txt · Last modified: 2025/01/15 19:55 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki