pandas


is there an equivalent of data-frame in OCaml?


I have been on the R side for some years. I don't do any hardcore statistics, but rather use R as a sophisticated 'csv-files' manipulater. nevertheless, i do need to process a huge amout of data, in a distributed way.
I found that R is not fast enough for my application anymore and I am now investigating other languages.
the first choice is Python-pandas, which is faster. Also, I read that Ocaml could be 10x faster than python, which sounds very attractive to me.
However, i found that the standard libraries of Ocaml seems to be quite low-level. I cannot find any high-level containers like R's data frame.
How do you guys represent data frames in Ocaml? do you use a list of tuples? can anyone share a bit knowledge here?
thanks!
I had to google for data frames in R, not being familiar with R, but it seems like you're looking for records, or perhaps a list of records. Or as you suggest, maybe a list of tuples would have similar properties to R data frames if you add some functions to access the data in the tuples more easily. But I think records would be closer as you can refer to the name of a field in the record.
See the chapter on Records in Real World OCaml.
I am actually working on a dataframe class right now for OCaml. Hopefully I will have it finished in a few weeks. My progress so far is on GitHub. (Note: The current version on github does not have function 100%).
https://github.com/PamExx/TimeSeries/blob/master/TimeSeries.ml
As indicated in Thomas answer, such a rich data structure would be provided by a specialized library. You can start with either an array of records or a record of arrays. If your rows are not floating-point numbers only, a record of arrays might be slightly preferable. But perhaps it is more important for cache locality whether you work across rows (then array of records) or across columns (then record of arrays). Beware that you might want to base computations on low-level libraries such as LACAML or Stream Processing with OCaml -- you should study their APIs to get inspiration how to implement your high-level data structure. It would be nice if someone provided the actual high-level library! You can also try to work with both OCaml and R using OCaml-R.

Related Links

Reading batches of data from BigQuery into Datalab
Jupyter/ipywidgets sorting dataframe on two levels
Groupby.sum() giving ValueError: overflow in timedelta operation
Why does DataFrameGroupBy.boxplot method throw error when given argument “subplots=True/False”?
Calculate age in months - optimize date transformations in pandas
pandas: list of dictionaries grouped by key from df
Pandas data frames and matplotlib.pyplot
Pandas.to_csv thousand separator
Annotating a graph with certain values of another series (Index is datetime)
Pandas rolling sum on string column
pandas apply() with and without lambda
Pandas read_html to retrieve Table
pandas: reshape dataframe for stacked bar plot
Change values in a column from a list
Pandas: How to Return Max Value in Multiindex
cx freeze module not correctly installed

Categories

HOME
multithreading
cloud
testng
mfc
objectgears
bpmn
mean-stack
sqlite-net-extensions
yahoo-oauth
c#-2.0
angular-ui-bootstrap
mouse
adfs
maude-system
here-api
row
postgres-xl
timeout
serverless-framework
java-3d
physics-engine
java-7
qhull
entitlements
filezilla
strncpy
elasticsearch-net
windowbuilder
gitignore
c++-amp
r-raster
bluestacks
media-queries
galsim
contact-form
repo
android-browser
hilbert-curve
webix-treetable
eclipse-scout
powershell-dsc
glew
nomethoderror
pnotify
integrity
recursive-datastructures
setter
google-perftools
imanage
jspdf-autotable
dynamics-sl
root-framework
angular-strap
elements
file-writing
mikroc
bstr
captivenetwork
ruby-2.2
php-ci
nsmutabledictionary
clicktag
typekit
jscript.net
ceil
android-radiobutton
multiprocessor
quantlib-swig
ms-project-server-2010
apc
coverflow
eventual-consistency
pvrtc
quickdialog
propertyeditor
multipage
ticoredatasync
hgsubversion
work-stealing
data-loss
xmemcached
mongomapper
.nettiers
vc90
web-architecture
signal-handling

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App