I feel like the bugs could largely be to do with the OSX binary in specific, and not the actual platform. Many of the queries I ran on my own tiny sample db seemed to just not run. I tried deleting the database and the button just doesn't work. Then I refreshed the application and it had created 10 of the same Dashboard. I tried creating a "Dashboard" and it wouldn't actually create or close the modal window. The tool (or the Mac app, at least) still has plenty of bugs to iron out: It still allows for raw queries if you're into that and just want their GUI for queries.Ģ. It is a glorified query tool that knows your tables and helps you make queries with visualizations and gives the results. After playing with this with a MySQL database for 10 minutes on my Mac I have 2 initial reactions:ġ. I always find database & "data helper" tools fun to experiment with. That said, as long as your data isn't truly ridiculously huge - if it can be centralized, centralization still works just fine. That's a common pattern (BI tool on top of Spark), see Apache Zeppelin for how it uses Spark, for example. I bet this Metabase UI on top of Apache Spark, and your databases, would be a killer. These days, between Apache Drill, Spark, Ignite, etc., and any number of other commercial solutions, we're starting to see the solutions to solve the problem you're talking about. Also, it was an enterprise solution and closed source - when it really probably needed to be open source to be able to support the diversity of data sources. However, it is a hard problem to solve, the company is/was small and funding was a problem because it took a long time to find ways to invent the tech. At the time, this was way ahead of what anyone else was doing. Or you could do things like, create a table on one system, as the result of a select from a join of tables on three other systems. It could literally join data across heterogeneous systems by shuffling data between them or to a dedicated "analytical processing platform" for join processing. You could write a single straight SQL statement across virtual "tables" (if you had non-tabular data, say NoSQL, you could query it through a table-like projection). I used to work for a company that built the "automagic ETL" kind of solution. I agree, the "one database at a time" approach is the way vendors ignore the harder problem of connecting data across platforms. Also, you don't to work on a always-modifying dataset while fine-tunung your analysis methods. Second, you want to keep the intermediate results anyway - for caching as well as to have an audit trail and reproducibility of the results.įor example, you don't want the performance of operational system be affected by for how many analysis tools it is used at a point in point. First, these are very different tasks, where different specialized tools will evolve anyway. It usually makes no sense to combine those phases into a single overall tool. More precicely, it is a common pattern to split the analtics into three parts: 1) Collecting the data (using lots of adapters) into a "data lake", 2) filter and preprocess that data lake into a uniform data structure whose structure is determined by the analysis goal rather than the operational system, 3) analyze that uniform data with statistical and other tools. However, you still want to copy & combine your data into a single database. Almost every company has some legacy systems around, or simply uses different tools for different jobs. You will have N databases no matter what.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |