openrefine


Use google-refine on csv without headers and with various number of columns per record


I'm attempting to import in open-refine a csv extracted from a NoSQL database (Cassandra) without headers and with different number of columns per record.
For instance, fields are comma separated and could look like below:
1 - userid:100456, type:specific, status:read, feedback:valid
2 - userid:100456, status:notread, message:"some random stuff here but with quotation marks", language:french
There's a maximum number of columns and there aren't cleansing required on their names.
How do I make up a big excel file I could mine using pivot table?
If you can get JSON instead, Refine will ingest it directly.
If that's not a possibility, I'd probably do something along the lines of:
import as lines of text
split into two columns containing row ID and fields
split multi-valued cells on fields column using comma as a separatd
split fields column into two columns using colon as a separate
use key/value on these two columns to unfold into columns

Related Links

Lost all my files on Openrefine [closed]
freebaseapps reconciliation stuck in Open Refine 2.6
OpenRefine - add sequence number, reset for each record
How to transpose cell data by section in Open Refine?
OpenRefine columnwise scripting
Remove content inside parentheses
Extra blank space between words
forNonBlank function in OpenRefine
Import columns to existing OpenRefine project
Bulk replace text in all columns
Split multi valued cells in more than one column into rows (Open Refine)
OpenRefine - Fill between cells but not at the end of the list
Reconciliation services for OpenRefine not working?
Appending a specific string in GREL
How to extract ONLY lat, lon values for node “osm_type”:“node” in a Nominatim response using Google Refine
Replace null cell with space character

Categories

HOME
c#-4.0
ngrx
q
jira
maven-3
packages
jpeg
ip
webpack-2
callback
retrofit
fingerprint
enterprise-library-5
qore
esper
offline
imacros
qt-creator
remote-access
iolanguage
openrefine
footer
nodatime
paging
sylius
mmap
fish
rst2pdf
fluentvalidation
libssl
twilio-api
functor
perlin-noise
asset-pipeline
xmlreader
espeak
libvpx
hilbert-curve
gzipstream
dynamic-reports
elmah
node-sass
wso2carbon
strptime
revapi
ionicons
lift-json
qtwebview
flow-control
nbconvert
dynamics-sl
pycaffe
login-control
nsviewcontroller
ready-api
livequery
natvis
maven-tomcat-plugin
lib.web.mvc
eclipse-clp
t4mvc
cannon.js
python-3.2
0xdbe
graph-drawing
mono-embedding
sortedlist
dundas
pretty-print
lustre
android-nested-fragment
java.util.concurrent
gulp-less
xojo
ceil
eol
marmalade
dealloc
picturefill
bundles
resty-gwt
mbr
industrial
commoncrypto
ember-app-kit
smtp-auth
robotics-studio
distutils
nsnetservice
cisco-jtapi
funscript
propertyeditor
dbproviderfactories
mercurial-server
gnu-prolog
genshi
coercion
android-sdk-2.1
imac
kdbg
swing-app-framework
port-number
multiple-languages
ntvdm.exe

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App