Data Analysis Intro

jbante · December 10, 2014

If you're only using your database to keep track of your stuff, but not to decide what to do next with your stuff, you're falling behind. The "big data" hype has been droning on about the value of analyzing data for more than a decade, and there's no reason FileMaker can't play, too: statistics, business intelligence, data visualization, machine learning, or whatever other flavor of data analysis you want. I concede that the FileMaker community has some catching up to do relative to other software tools — and we should be ashamed of ourselves for it.

Some folks might think it's FileMaker Inc.'s job to build better data analysis features before we can do anything interesting. While that would be helpful, it's unreasonable to expect FileMaker's in-house engineering team to catch up with other tools. The standard data analysis tools for other languages are not built-in features of those languages, but add-ons built by the communities of users of those languages. It's up to us to make FileMaker useful for non-trivial analyses.

Some folks might think "big data" is too big for little FileMaker to handle. First, "big data" is not a well-defined concept, but it usually has something to do with any data analysis task on enough data to be difficult to handle with the tools at hand. "Big data" is hard for everyone else, too, no matter what tools they're using, or else it wouldn't be called "big data." Second, there's no reason your data have to be "big" for an analysis of your data to be valuable. Granted, more is often better, statistically speaking. But some of the best minds on Earth have spent the last century or so extracting as much insight as possible from as little data as possible. They came up with some really good techniques. The techniques that are still most common today were originally designed to be executed with pencil and paper at a time when "calculator" was a job title rather than a pocket-sized electronic device. We have FileMaker, so we have no excuse. The "small" and "medium" data in our FileMaker applications are just as worthy of analysis as anything Google has.

"Big data" gets all the news coverage, but most data analysis is not big data. FileMaker's biggest competition, market-share-wise, isn't Python or R or C++ or Hadoop. FileMaker's biggest competition is our ancient arch-nemesis, Excel. You're not going to let the rival team have a leg up on us, are you!? We can do better than Excel.

FileMaker presents some advantages for data analysis that have been neglected for long enough. The most sophisticated statistcal analysis in the world is useless to an organization that fails to act on the conclusions of that analysis. Analysis has to feed back into operations. Conventional analysis workflows have to export data out of operational systems and import it into separate analytically-oriented data warehouses after excruciating conversion processes, then data are analyzed by another menagerie of separate tools, and finally those results have a long trip back through layers of strategy meetings and bureaocracy to influencing action. Data clean-up and conversion as it passes between each of these systems is the overwhelming majority of the labor spent on "big data" projects. That's lame. Our operational data is already in FileMaker. Doing analysis in-place in FileMaker spares us the most expensive problems of "big data," making us more nimble. The results of our analyses can immediately feed back into operations.

Some folks have suggested that FileMaker may just not be the right tool for most analyses. Why skin a cat with FileMaker's multi-tool when you could use R's bowie knife? Because "the right tool for the job" is often whatever tool happens to be in your hand. FileMaker is the tool in our hands (or else what are you doing here?), and we could do a lot worse.

So what help do you need to add data analysis features to your FileMaker applications? Have you done anything cool to show off? The administrators of this site saw fit to grace us with this forum to discuss such issues. (Thanks!) So start talking already!

Mark Scott · December 10, 2014

Thanks, Jeremy, for a well-articulated (except for the "we should be ashamed of ourselves" part :laugh: ) call to arms.

I do biomedical research and have certainly found FileMaker up to task for a fair chunk of our analytical needs. We leave the Box-Cox transformations and multi-variable regression models to SAS, SPSS, etc., but FileMaker has proved an indispensable tool for getting raw, instrument-generated data ready for sending on to our data analyst for those final analyses. In our hands, for example, gene expression measurements leverage a well-normalized (in the relational sense) schema in order to "normalize" (in the biomarker measurement sense) the raw data. That normalization process (correction for laboratory-introduced sources of measurement variability) is not a function of the big stats packages, and has to be handled before we ship the results off to our data analyst. FileMaker is a great tool for that. Obviously, this is one small, fairly arcane, area of data analysis, but then again this whole topic is, by definition, likely to involve a lot of disparate and discipline-specific methods and approaches. I'm looking forward to seeing the directions it takes.

Mark

LaRetta · December 10, 2014

Fascinating!! Prior to identifying the tools, the questions must be asked ... not the question of whether to analyse our business data in this broad sense but what do we want to achieve from the analysis. For example, many years ago, I migrated 10 years of complete business data from a Wholesaler in to a new FileMaker solution which integrated accounting, inventory, sales - the entire business. As I viewed the data, there appeared to be patterns and spikes in sales and I thought it was because of full moon.

I wondered if Buyers contacted our Wholesaler company in patterns. Why would I care? Because if we could predict a pattern of contact, we could predict sufficient sales staffing and even contact those Buyers who didn't call us ... thinking they probably wanted to call us (like all other Buyers) but were too busy. Maybe the sales calls to these chains during low periods were less fruitful ... we could change our methods accordingly.

So the questions I asked (and finally COULD ask because we had gathered the data) were:

Gender: Is same sex (Buyer to Sales Rep) more productive or the reverse?
Who contacts first ... if we call them or wait for them to call us?
Time of day, day of week, day of month?
... and my favourite ... do sales spike during a full moon?

Back then, there were no databanks on the web of surveys and studies like there is today. Neither did I have college background so I had no Statistics or analytical skills. But even winging it, we DID find patterns and sales increased by 20%. The full moon comparison showed a negligible spike but the biggest percentages were day of month and day of week. :-)

I look forward to this discussion because the power of data modelling is underestimated. Thank you for opening this discussion (and to Stephen for opening this forum)! It is a subject which has always intrigued me and I look forward to learning from everyone about tools, calculations or methods to take full advantage of our data.

Mark Scott · December 10, 2014

Great example, LaRetta! And much closer, I suspect, to Jeremy's intent than mine. That you were able to bring it full-circle and realize a measurable increase in sales is fascinating.

Mark

jbante · December 10, 2014

Mark, it's great to hear a perspective from the sciences on this. FileMaker's biggest footprint is in business, where feeding analysis results back to operations quickly is an advantage. This isn't necessarily the case in research workflows, I suppose. Data preparation, clean-up, and a general hub of integration is still a great use for FileMaker as part of multi-tool workflows. Luke Rochester highlighted this in his DevCon session earlier this year.

But part of my intent is to suggest that there's no reason we can't do Box-Cox transforms and multi-variable regressions in FileMaker, too. I'm just eagerly awaiting the combination of someone who thinks it's a good idea (which, at the very least, is me) and someone who actually has a use for it.

I've been toying around with the idea of creating some analysis modules for FileMaker. Is there anything in particular you think would be good to start with? I had the thought to put something out that included all the statistics functions in Excel, which might be an interesting symbolic gesture, but it's not particularly sexy, which may be more what we need right now.

Mark Scott · December 11, 2014

Jeremy, I'm certainly intrigued. The use case for (bio)statistical testing in FileMaker is a bit hard to define. Taking our epidemiologic, cohort study of the natural history of HPV infection as an example, the particulars of which are unique, but the general study architecture of which is not:

In the lab, we generate masses of immune marker (gene expression, protein analysis, etc.) data, testing samples from among 200,000, collected over a 25 year period and cataloged in a hosted FileMaker db. We perform a lot of massaging of the raw data (as mentioned previously) in order to turn, say, raw fluorescence or colorimetric measurements into some sort of meaningful quantitative measurement (e.g., picograms/ml of protein), correct ("normalize") for methodologic sources of measurement variability, and send it off to our stats analyst. Most epi studies like ours employ a full-, or at least part-time, stats analyst. They generally come armed with one of several well-regarded, and quite expensive, stats analysis packages — the standard "tools of their trade" — and graduate-level skills in using those particular tools (and knowing which tests to use for which question).

With our lab measurement data in hand, our analyst then combines it with data collected by our study nurse practitioners through a 500-question interview and physical exam. The ultimate goal of a particular substudy is usually answering a question something along the lines of "Is the relative risk of clearing an HPV infection associated with a longitudinal change in the expression of one or more immune markers, corrected for confounding effects of smoking — yes, smokers, nicotine and its metabolites can effect your immune system in places well removed from the mouth and lungs! — oral contraceptive use, other infections, etc.?" He's constantly shaking up the tests and transformations he uses to fit the particular question and peculiarities of a given dataset. So much of it is mysterious to me; I guess that's why we employ someone with his expertise.

A junior investigator, or one for whom (usually small) research projects are but a minor part of their work, otoh, may not have the benefit of (i.e., funding to hire) a dedicated data analyst. Perhaps they're our use case! Simple research questions and designs sometimes need only fairly basic approaches. My first paper, way back when, for example, used only the time-honored Student's t-test, for comparing simple parametric measurements between two groups (HIV+ and HIV-uninfected, in that case). That and other bread-and-butter tests (Chi square, Cohen's Kappa, etc.) shouldn't be too hard to wrangle out of FileMaker. They're not the sexy ones, however, as you point out.

What would be sexy, yet not readily available in other, purpose-fit tools? I'd have to think about that a while.

As an aside, and just to illustrate a case of "sometimes the right tool is the one you have," I once created a fairly comprehensive oligonucleotide (short DNA sequence), and full gene-sequence, database and analysis tool in FileMaker. At the time, some of the online oligo calculators available today were not then available. Even if they were, I wanted to keep a database of the oligos we used in the lab, and have a tool to compare them analytically. By analytically, I mean things like determining the temperature at which the double-stranded DNA unzips (dependent on the exact sequence of bases: As, Cs, Gs, and Ts, if you remember your biology), calculating the reverse complement sequence, and determining where (if at all) a short oligo sequences falls within a full gene sequence hundreds to thousands of bases long. Reverse complement is is the sequence of the opposite strand in the famous double-helix, remembering that A and T always pair together (T is the "complement" of A and vice versa), and C and G pair together. The "reverse" part of reverse complement refers to the fact that the two strands are read (and synthesized) in opposing directions. Thus, the direct complement of AATACG is (according to the above pairing rules) TTATGC, but that is not the sequence we need. Rather, given the sequence AATACG, I need the reverse complement, CGTATT. Not hard if all oligos were only 6 bases long (as my example), but actually they have varying lengths up to 20 or 30 bases long. Still not hard in the post-FM7 custom function era, with recursion, but the "Yikes!" part of the story is that I did this way back in FM5! The system showed the all-important melting temperatures graphically; flagged oligos with "problematic" sequences such as tail ends that have reverse complementarity (and thus a risk of the oligo creating a "hairpin" secondary structure, in which the its ends stick to one another rather than binding correctly to the genes in a sample); and showed whether the oligo spans a "splice-site" (where the RNA synthesized from the original DNA sequence is "spliced" together to code a protein sequence; oligos that span splice sites are desirable, so you can know you're binding to [and measuring] RNA, indicating of gene expression, rather than DNA).

Was FileMaker (5!) the best tool for that sort of analysis? Arguably not. But it was the one at hand, and I learned a ton from the exercise.

Back on topic, I'll keep an open-minded lookout for more cases where FileMaker just might be a good, if non-obvious, fit with an analytical need.

Mark

jbante · December 11, 2014

Most of the analytical applications I've had success with in FileMaker so far have been less of the kind of exploratory analysis your staff analyst would do, and more analyses that are part of an automatable process — often what a statistics specialist would rather call operations research than anything that could be associated with them.

The example I use when I'm trying to lower the bar for statistically-informed applications of FileMaker is something I did for order fulfillment quality control. I once worked with a company that verifies that all the items for an order were picked correctly by putting all the picked items on a scale at once and checking that the total weight is within appropriate bounds. The company was using a linear range around what the total weight should be (±2%), which will of course be systematically too large. The revised solution used the variances for all the item weights to arrive at tighter bounds, reducing the number of errors. It doesn't even require any math functions that aren't built-in to FileMaker. It doesn't have to be advanced to be valuable.

Of course I'm still hoping for exploratory applications, too, but we may have to wait for the community to build up the appropriate tools for exploratory analysis workflows to be as practical as with SAS or R.

MikeKD · December 13, 2014

The application of this is way beyond my abilities, but it could be solo useful to me as a teacher:

Identifying pupils who are under / over-achieving.
Identifying projects that produce the best / worst results.
Moderating teacher assessment.
Identifying factors that contribute to achievement.
etc!.....

Matthew F · December 14, 2014

Mark,

I enjoyed your description of using Filemaker to determine the reverse complement of a DNA sequence. Most of my filemaker experience has revolved around the creation of a lab information management solution. The oligonucleotide module includes to "flip" a sequence to convert it to its reverse complement. It also dates back to FM5.

Another component of the system is an inventory of lab research mice. Filemaker has served well for this purpose. However, when we want to do survival analysis, I usually export data from Filemaker for analysis in R. It would be nice if filemaker exports were a little more customizable, e.g. allowing custom header and column names to the data. I sometimes wish I could perform R calculations in Filemaker. The rapid retrieval from and assignment of data to Filemaker datasets with R syntax would also be amazing.

comment · December 14, 2014

[off-topic]

It would be nice if filemaker exports were a little more customizable, e.g. allowing custom header and column names to the data.

They are much more customizable than that.

jbante · December 15, 2014

The application of this is way beyond my abilities, but it could be solo useful to me as a teacher:

Identifying pupils who are under / over-achieving.

Identifying projects that produce the best / worst results.

Moderating teacher assessment.

Identifying factors that contribute to achievement.

etc!.....

Mike, why don't you start separate threads to take these on one at a time, with more specific details for each?

MikeKD · December 15, 2014

Mike, why don't you start separate threads to take these on one at a time, with more specific details for each?

I will at some stage (hopefully fairly soon!) but want to try to get my thoughts in order first, so I know exactly what I'm trying to achieve.

Also, my existing database could do with a tidy up as it's a bit of a jumble at the moment and my next priority is to sort that!!

Many thanks though!

jbante · December 15, 2014

I will at some stage (hopefully fairly soon!) but want to try to get my thoughts in order first, so I know exactly what I'm trying to achieve.

"Getting your thoughts in order" and identifying a clear goal, or even just a concrete yardstick, is often half the challenge of an analysis anyway! Don't be afraid to post what you think is an underdeveloped question. I think pursuing a better question could be a very illuminating part of the conversation.

LaRetta · December 15, 2014

I completely agree, Jeremy! Once the question has been formulated, now what? Taking it from that point forward into the analysis will be helpful to many folks, as "Here is the question I want answered ... now what do we do?" We all will benefit.

MikeKD · December 15, 2014

OK, I'm about to start a very nebulous what if / how type thread....

Mark Scott · December 15, 2014

Most of my filemaker experience has revolved around the creation of a lab information management solution.

Well, Matthew, it sounds like we should be sitting down together and talking some serious shop at next year's DevCon! I suspect we've got a lot of shared experiences here, and a lot of fertile ground to cover.

However, when we want to do survival analysis, I usually export data from Filemaker for analysis in R. It would be nice if filemaker exports were a little more customizable, e.g. allowing custom header and column names to the data. I sometimes wish I could perform R calculations in Filemaker. The rapid retrieval from and assignment of data to Filemaker datasets with R syntax would also be amazing.

Tying this back to Jeremy's comments, what's your take on the feasibility of bringing some of the stuff you do in R into FileMaker?

Matthew F · December 20, 2014

what's your take on the feasibility of bringing some of the stuff you do in R into FileMaker?

Well I hadn't thought about it in that way very much. Mostly I do the opposite: Use Filemaker for data acquisition and as a repository. After all it is great for making pretty user interfaces. I think more about how to get data out of Filemaker into R for analysis or graphics.

Once in awhile though I wish I could use R syntax in a Filemaker calculation. Even without the advanced statistics tools, R's data accession and assignment is so unbelievably quick, easy and flexible. Take this example: You have a table with two columns of numbers. You decide that you want to invert the sort order of column A but not Column B. In filemaker you'd probably end up assigning A to some other field and then looping over all the records to reset field A. In R, it it can be done in one step. Columns are not locked to one another.

[off-topic]

They are much more customizable than that.

Yes, I agree Filemaker gives a wonderful assortment of export options.

I plead ignorance, though... How do you export data with custom column headers? I always get Field Names as column headers.

comment · December 20, 2014

Yes, I agree Filemaker gives a wonderful assortment of export options.

I plead ignorance, though... How do you export data with custom column headers? I always get Field Names as column headers.

You need to export as XML and apply a custom XSLT stylesheet during the export. This opens a whole new world of possibilities of transforming the exported data. One of the most powerful - and certainly the most under-utilized - features of Filemaker.

See a very simple example here:

http://fmforums.com/forum/topic/48901-exporting-to-a-csv/?p=228812

Take this example: You have a table with two columns of numbers. You decide that you want to invert the sort order of column A but not Column B. In filemaker you'd probably end up assigning A to some other field and then looping over all the records to reset field A. In R, it it can be done in one step. Columns are not locked to one another.

I am probably missing something here. Why is this such a problem? You can sort each field ascending or descending, independently of other fields in the sort order.

Matthew F · December 20, 2014

Comment,

Thanks for the XML tip.

r.e. Sorting... Maybe that wasn't a good example. I was referring to the batch reassignment of data to a field, not the sorting of records per se. In R, like many languages, a data column is a vector, so it's easy to manipulate or reassign all or part of its data in a single script step, without reordering records overall.

Mark Scott · December 21, 2014

Hey Matthew,

r.e. Sorting... Maybe that wasn't a good example.

It was an interesting example, nonetheless. No doubt that sort of flexible manipulation is what makes an analysis environment like R so powerful.

Ironically, taking your column-sorting example in reverse, it becomes one of the most approachable arguments I've found to make to people as to why they should store their data in FileMaker (or a relational DB in general) rather than something like Excel. It's far from the only reason, of course, but people do seem to immediately grasp the issue when I point out how easy it is to inadvertently sort a single column in Excel and totally hose your data.

Matthew F · December 21, 2014

Mark, I agree with you there. I know a researcher whose microarray data was stored in Excel files and was unaware that there was as frameshift on the Identifiers 1/2 way through the data set. I biostatistician working in R told him his data didn't jibe, fortunately before it was too late. In actuality the flexibility of R makes it even more dangerous. However, because R is scripted, you have a running log of the entire workflow from raw data to end product, and ought to be able to reproduce data analysis precisely. In R we generally produce a pipeline of sequential data objects from raw data followed by less complex objects down to the finished analysis. I've never done the same in Filemaker. I suppose though, you could have one table for raw data and another for normalized data. But on seriously big data sets I'm not sure what the utility would be of keeping things in Filemaker. The layout tools are kind of pointless when you have 100,000+ rows of data.

If you want another reason not to keep genomics data in Excel... It converts certain gene names into dates (unless you're careful about the import process) For example Oct4 becomes October 4, 2014. (The same happens to Jun and March family genes) I can't tell you how many research papers I've seen this in the supplemental data.

comment · December 21, 2014

it becomes one of the most approachable arguments I've found to make to people as to why they should store their data in FileMaker (or a relational DB in general) rather than something like Excel. It's far from the only reason, of course, but people do seem to immediately grasp the issue when I point out how easy it is to inadvertently sort a single column in Excel and totally hose your data.

In the defense of Filemaker, it too has no shortage of features to allow you to totally hose your data - if you wish to use it improperly.

Bailey Kessing · January 12, 2015

Stumbled upon your discussion and have enjoyed it. I am at the NIH (contract with SI) on LIMS and data management systems in mostly filemaker (or filemaker as the user-friendly interface for SQL and Oracle). We do do a lot of analysis in R more often by exporting but we also use ODBC to have R directly access data in our databases and skip the export step completely and Also have the results displayed in filemaker. Also you can use scriptmaster plugin and some stats java code and do a great deal of complication analysis too. No need to ever leave FM for most people in the lab. Maybe this would be an option for some of your analysis also.

jbante · January 12, 2015

Bailey, if you could have 3 data analysis tools in FileMaker without having to open R or resort to Groovy/Java via ScriptMaster, what would they be?

Bailey Kessing · January 13, 2015

I should also say that we use Google APIs a great deal for graphing and displaying maps of collections. So I guess I would love better (more complete) graphing tools built into filemaker. Many of our R uses are for graphing as well. I think I would like a complete scripting language built into Filemaker really. Something akin to perl. With this I could imagine extending and sharing modules (packages?) like we do with R and bioperl that really would make Filemaker complete. I am not sure if this answers your question exactly, but for me that makes real analysis possible. I need regressions/correlations, principle component analysis, etc. Real hardcoore stats. But I also understand that this is way beyond what most might want or need or even what FM company would bother with. But what has always bothered me is that if FM as a company doesn't want to provide these tools (as I can understand as they are "niche" tools) they should provide the tools that allow users to extend FM. This way FM as a company doesn't have to anticipate all things, but merely integrate "REAL" extensibility to FM via integration of a real language like perl, python, or whatever. Scriptmaster is great but is a plugin and limited. I would like a "real" programming language built in so that users can be unchained. Uh, did I go on too long? :hmm:

jbante · January 13, 2015

You did not go too long at all. I have mixed feelings about the tools FileMaker makes available. The tools currently in FileMaker could be better, but folks are creating top-notch graphics in FileMaker using D3.js in a web viewer, and the practices promoted by ModularFileMaker.org are doing a lot to promote the development of portable functionality.

FileMaker scripting and calculations (custom functions, anyway) are both Turing-complete, so saying that it isn't a "real" programming language is to have a very narrow understanding of programming. At one point, I might have said that saying that any Turing-complete tool isn't "real" programming clearly signals that the speaker isn't a "real" programmer, or maybe a "programmer" but not a "software engineer," or some garbage like that. In my mellower old age, I concede that certain conveniences do make meaningful differences in developer productivity. FileMaker functionality can often be made portable, but FileMaker doesn't always make it easy.

I'm not asking what you'd like FileMaker, Inc., to build in to FileMaker. I'm asking what you'd like FileMaker developers to build for FileMaker. Charting? Map-based data visualization? Regression, correlation, PCA — there's nothing stopping developers from implementing these techniques in FileMaker. I personally hope that the only reason folks don't want or need sophisticated statistics in FileMaker is that it isn't already there. One point from my introductory spiel for this thread was that programming languages more commonly used for intensive data processing — Python, C++, you brought up Perl — don't have built-in statistics either! This isn't necessarily FileMaker, Inc.'s responsibility, when you really look at how data analysis tools were made for other languages. The community of developers using those tools built statistics modules for them, and there's nothing stopping FileMaker developers from doing the same.

So what do you want to see first? Heat maps with OpenStreetMap and D3.js in a web viewer? A bundle of all the statistics-related functions available in Excel? I'm personally keen on a stream statistics library. Something else?

Mark Scott · January 13, 2015

Since the topic of graphing came up, there are a few graph types that are very common in epi studies. Often, I'll receive some regression results back from our data analyst, either with or without accompanying low-res plots, and the first thing I'll do is fire up Adobe Illustrator to produce some publication-ready (high quality typography, EPS output) graphs. I don't expect FM to be exporting submission-ready EPS files, but these might be kind of cool graphs to be able to produce in FileMaker for visualizing data:

Kaplan-Meier Survival plots display a cumulative change of some sort within a cohort, such as survival [hence the name], viral clearance, etc., plotted against time). One tricky aspect might be accounting for "right-censored" cases (e.g., lost to followup before the "event" can be measured): these are usually represented as tick marks on the plot. Kaplan-Meire estimation usually yields confidence intervals, as well, which can also be added to the plot.

Box and whisker plots display medians and inter-quatile ranges for some non-normal parameter of interest, in different groups or treatment arms. The whiskers, then, often show max and min. Outliers may be added as individual dots beyond the whiskers. The basic box plot should not be too difficult, depending only on determining the median value and 25th and 75th percentiles (doable with custom functions).

Forest plots are used in meta-analyses to compare odds ratios or relative risks, along with 95% confidence intervals, between studies. Can also be used within a study to compare, say, effects of different treatment arms.

Just some food for thought.

Mark

Bailey Kessing · January 14, 2015

While FileMaker does have built in programming language it is severely limited. While most of the limits can be overcome it shouldn't be so hard to do so. But as I said, I don't expect FM as a company to address everyone's needs, I wish they would provide an full featured and extensible programming language (like perl, python, etc.) built in without the need for plugins.

jbante · January 14, 2015

There are 2 mutually exclusive kinds of programming languages: the kind that no one has any practical complaints about, and the kind that get used.

JW_NZ · February 9, 2015

Hi Mark & Jeremy,

I've just found this thread and find myself in complete agreement with so many of the sentiments expressed there.

After listening to Luke Rochester's session at DevCon last year, I decided that one of my priorities would be to follow his recommendations and plot out some historical data on sales of various product lines from our own company, especially as we've experienced declining sales over the last two years.

After pulling the data together and graphing it along with the average for the previous 12 months as Luke demonstrated, I wanted to calculate the regression lines with their standard errors. As I've NEVER been a fan of Excel with all their buried formulas for stat functions, I turned to a former colleague at the Animal Research Station where I'd be a scientist many years earlier. He pointed me towards R as the modern way to do stats and I've become fascinated by the power and dexterity of that Open Source package, though far from becoming conversant and dexterous with its cryptic command line interface.

Even though R has a long and fairly steep learning curve, it is now my firm opinion that it is ideal partner to FMP for any form of multivariate data analysis. Thus I'd love to see a gathering of like minded FMP developers at DevCon in Las Vegas.

Several contributors to this thread obviously have the statistical support that each needs but I'm hopeful that you all see some benefit from FM developers being encouraged to add some rudimentary R expertise to their developer skill set.

I will be interested to hear your reactions.

With best regards,

John Wolff

Hamilton, New Zealand

jbante · February 9, 2015

JW, I agree with your colleague that R is the standard for exploratory data analysis (operational analysis is more likely to be in something like Python or C++), and I'd love to see a gathering of developers at DevCon about these topics. Perhaps an unconference session (or several) is in order.

However, I also hope people can think more broadly than reflexively looking to non-FileMaker tools to do sophisticated data analyses. FileMaker scripting and calculations are both Turing-complete, and therefore capable of any calculation possible with any other computational tool. The technical environments other tools operate in may present certain advantages, especially performance and network effect, but oftentimes "the right tool for the job" is the tool that happens to already be in your hand. FileMaker can be an excellent hub for connecting other tools involved in data collection and analysis workflows, but there's no reason FileMaker can't stand on its own, too.

Mark Scott · February 10, 2015

John and Jeremy,

Vegas is just around the corner (give or take a few months); an unconference session sounds good.

Mark

JW_NZ · February 10, 2015

Hi Jeremy & Mark.

I appreciate your responses and the enthusiasm you share for extending this knowledge amongst our fellow developers.

As I've never attended an unconference session, I'm uncertain of its probable outcome. We'd each likely to have divergent views that we'd want to share and these could easily provide an incoherent message to interested bystanders who could, and should, become more conversant with the best tools for "Date Analysis" irrespective of how you wish to define that requirement.

I therefore believe that some planning is required ahead of any unconference session so that a coherent message can result from divergent contributions. Might that be held on Sunday 19th July before DevCon gets underway?

Thus far I've listened to several webinars from gurus at R Studio, which provides an enhanced UI for running R analyses. I've also started reading several books, each of which aims to provide an introduction to the core R commands for basic statistical analyses. By the time of DevCon I'd hope to have gained some dexterity in applying those analyses to our own requirements.

I'm certainly supportive if you, Jeremy, want to develop modules that can interface FM with the R commands but that would undoubtedly be a considerable undertaking. Perhaps you can gain some indication of the value of proposed modules from the feedback that becomes evident at DevCon.

We have time to discuss options. No doubt there are other contributors who have wisdom to share.

Regards,

John

comment · February 10, 2015

I'm certainly supportive if you, Jeremy, want to develop modules that can interface FM with the R commands but that would undoubtedly be a considerable undertaking.

IIUC, Jeremy wants to develop ways and means to replicate R's functionality within Filemaker, using (only) Filemaker's own scripting and calculation tools - which would undoubtedly be a much more considerable undertaking.

BTW, I've never had the opportunity to use R; does it not support AppleScript in some way? That should make the task of "interfacing FM with the R commands" much easier (on the Mac OS X platform).

the best tools for "Date Analysis" irrespective of how you wish to define that requirement.

Well, "Date Analysis" is really simple; it's a Boolean... :Whistle:

jbante · February 10, 2015

As I've never attended an unconference session, I'm uncertain of its probable outcome. We'd each likely to have divergent views that we'd want to share and these could easily provide an incoherent message to interested bystanders who could, and should, become more conversant with the best tools for "Data Analysis" irrespective of how you wish to define that requirement.

Having room in the air for multiple opinions to peacefully coexist, rather than presenting a false consensus, is actually one of the advantages presented by unconference sessions. There doesn't have to be a coherent message. Bystanders are free to request whatever clarification they want, and that discussion often overwhelms what the organizers originally expected to talk about.

IIUC, Jeremy wants to develop ways and means to replicate R's functionality within Filemaker, using (only) Filemaker's own scripting and calculation tools - which would undoubtedly be a much more considerable undertaking.

That's right. I also want to integrate FileMaker with more established tools based in R and Python as appropriate. I advocate using FileMaker solo more because of the gap between its actual and popularly perceived merit than because of its current place among other tools.

Which task is more difficult depends on your background. I like math more than I like computers, so reimplementing computations from R using only FileMaker would have less of a learning curve involved than integrating the two for me.

Sign In

Data Analysis Intro

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Similar Content