openrefine


Use google-refine on csv without headers and with various number of columns per record


I'm attempting to import in open-refine a csv extracted from a NoSQL database (Cassandra) without headers and with different number of columns per record.
For instance, fields are comma separated and could look like below:
1 - userid:100456, type:specific, status:read, feedback:valid
2 - userid:100456, status:notread, message:"some random stuff here but with quotation marks", language:french
There's a maximum number of columns and there aren't cleansing required on their names.
How do I make up a big excel file I could mine using pivot table?
If you can get JSON instead, Refine will ingest it directly.
If that's not a possibility, I'd probably do something along the lines of:
import as lines of text
split into two columns containing row ID and fields
split multi-valued cells on fields column using comma as a separatd
split fields column into two columns using colon as a separate
use key/value on these two columns to unfold into columns

Related Links

Reconciliation services for OpenRefine not working?
Appending a specific string in GREL
How to extract ONLY lat, lon values for node “osm_type”:“node” in a Nominatim response using Google Refine
Replace null cell with space character
Open refine by google on private data
Openrefine not working as expected
Open Refine Error Uploading Data?
Open Refine / Google Refine - edit cells in multiple columns
Open Refine : Reconciliation with Freebase data based on ORganization Name
Keep newest duplicate row depending on multiple Columns
multiple filters in google openrefine
Where does openrefine store projects?
Domain Names to Webpage Titles in OpenRefine
How does one run Google refine on a different port than 3333?
OpenRefine - Cross-column clustering
Grel to apply to ALL columns or current column

Categories

HOME
variables
webpack
proxy
homebrew
vmware
android-4.4-kitkat
getelementsbytagname
blueprintjs
windows-server
ezpublish
fancybox
podio
esper
slick-slider
elasticsearch-hadoop
handsontable
seaborn
progressive-web-apps
sms-gateway
jqwidget
reverse-proxy
xlsxwriter
language-agnostic
tibco-mdm
rundeck
sqlcipher
tooltipster
filezilla
jaxb2
vision
sql-server-2012-express
junit5
binary-data
ecto
fog
android-nestedscrollview
swift3.0.2
espeak
android-browser
streamsets
unspecified
preconditions
pdf-reactor
turbogears
heightmap
outlook-api
websphere-mq-fte
qwt
scrollable
youcompleteme
komodoedit
react-scripts
prolog-setof
ionicons
blogengine.net
mu
knockout-components
composite-key
zendesk-app
sqldf
hittest
pintos
spring-android
instant
clang-static-analyzer
lttng
disque
python-stackless
dataview
nessus
srand
lib.web.mvc
cartesian-product
feedback
content-length
kcachegrind
varargs
emailrelay
service-accounts
generic-programming
iiviewdeckcontroller
javax.mail
xc16
network-interface
cdt
socketexception
system.net.webexception
htmlcleaner
ms-project-server-2010
runtime.exec
ftps
eclipse-memory-analyzer
buster.js
eventual-consistency
robotics-studio
clipper
pvrtc
automount
multipage
objective-c-2.0
parametric-equations
regsvr32
windows-phone-7.1.1
adk
window-management
actionview
gallio
phonon
aquaticprime
defensive-programming
django-notification

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App