openrefine


Use google-refine on csv without headers and with various number of columns per record


I'm attempting to import in open-refine a csv extracted from a NoSQL database (Cassandra) without headers and with different number of columns per record.
For instance, fields are comma separated and could look like below:
1 - userid:100456, type:specific, status:read, feedback:valid
2 - userid:100456, status:notread, message:"some random stuff here but with quotation marks", language:french
There's a maximum number of columns and there aren't cleansing required on their names.
How do I make up a big excel file I could mine using pivot table?
If you can get JSON instead, Refine will ingest it directly.
If that's not a possibility, I'd probably do something along the lines of:
import as lines of text
split into two columns containing row ID and fields
split multi-valued cells on fields column using comma as a separatd
split fields column into two columns using colon as a separate
use key/value on these two columns to unfold into columns

Related Links

OpenRefine - Lost records
Incrementing a date in openrefine
add numbers down a column in OpenRefine
OpenRefine split on character in multivalue cell
Openrefine: text facet by counting
Select multiple repeated records OpenRefine
Simple OpenRefine IF to create a new column
OpenRefine split in multiple cells
How to export the cell that contains new line character properly?
Is it possible to run an OpenRefine script in the background?
Browser cluster link does not work properly in Open Refine
How to save only specific JSON elements in a new OpenRefine column
Openrefine: cross.cell for similar but not identical values
OpenRefine changing the port and host when executable is run directly
How can I join two datasets using a key in OpenRefine, with the secondary table having more than one value?
Open Refine: Open Project Issue

Categories

HOME
xamarin
webpack
netbeans
google-play
lodash
bpmn
routes
alpha
ios-charts
windows-10-universal
php-7.1
visual-studio-2005
fatal-error
sms-gateway
custom-wordpress-pages
web-sql
pepper
facebook-access-token
libssl
kendo-datasource
gsoap
.net-4.0
calibre
catch-all
phpfreechat
openoffice.org
http-digest
retina-display
vapor
impersonation
email-templates
fedex
grid.mvc
heightmap
dotcover
zip4j
dds
powercli
elgg
colorama
forever
worker-thread
iso8601
jspdf-autotable
wdf
angular-cache
dstu2-fhir
py2app
storekit
captivenetwork
tcpserver
reactive-banana
lib.web.mvc
endeca-workbench
feedback
dukescript
tableau-online
appfabric-cache
ami
xcode-6.2
java-metro-framework
concurrent-collections
intentservice
cdc
bulkloader
android-screen-support
lcs
cassini-dev
anonymous-methods
typoscript2
simba
getmessage
amazon-appstore
punbb
php-parser
self-extracting
fixed-width
signal-handling

Resources

Database Users
RDBMS discuss
Database Dev&Adm
javascript
java
csharp
php
android
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App