pandas


Pandas dataframe without copy


How can I avoid taking a copy of the dictionary supplied when creating a Pandas DataFrame?
>>> a = np.arange(10)
>>> b = np.arange(10.0)
>>> df1 = pd.DataFrame(a)
>>> a[0] = 100
>>> df1
0
0 100
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
>>> d = {'a':a, 'b':b}
>>> df2 = pd.DataFrame(d)
>>> a[1] = 200
>>> d
{'a': array([100, 200, 2, 3, 4, 5, 6, 7, 8, 9]), 'b': array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])}
>>> df2
a b
0 100 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
If I create the dataframe from just a then changes to a are reflected in df (and vice versa).
Is there any way of making this work when supplying a dictionary?
There is no way to 'share' a dict and have the frame update based on the dict changes. The copy argument is not relevant for a dict, data is always copied, because it is transformed to an ndarray.
However, there is a way to get this type of dynamic behavior in a limited way.
In [9]: arr = np.array(np.random.rand(5,2))
In [10]: df = DataFrame(arr)
In [11]: arr[0,0] = 0
In [12]: df
Out[12]:
0 1
0 0.000000 0.192056
1 0.847185 0.609028
2 0.833997 0.422521
3 0.937638 0.711856
4 0.047569 0.033282
Thus a passed ndarray will at construction time be a view onto the underlying numpy array. Depending on how you operate on the DataFrame you could trigger a copy (e.g. if you assign say a new column, or change a columns dtype). This will also only work for a single dtyped frame.

Related Links

pandas custom function apply on melted dataframe
How to check for boolean codition in pandas dataframe
Reading batches of data from BigQuery into Datalab
Jupyter/ipywidgets sorting dataframe on two levels
Groupby.sum() giving ValueError: overflow in timedelta operation
Why does DataFrameGroupBy.boxplot method throw error when given argument “subplots=True/False”?
Calculate age in months - optimize date transformations in pandas
pandas: list of dictionaries grouped by key from df
Pandas data frames and matplotlib.pyplot
Pandas.to_csv thousand separator
Annotating a graph with certain values of another series (Index is datetime)
Pandas rolling sum on string column
pandas apply() with and without lambda
Pandas read_html to retrieve Table
pandas: reshape dataframe for stacked bar plot
Change values in a column from a list

Categories

HOME
pandas
wso2
coq
urbancode
oracle11g
raspberry-pi
google-oauth
packages
binary-tree
pheatmap
session-variables
windows-azure-storage
django-admin
oxyplot
viewport
wkwebview
restful-authentication
crosstab
entitlements
msys2
libuv
windowbuilder
c++-amp
xquery-3.0
noraui
code-contracts
perlin-noise
html5-fullscreen
nssegmentedcontrol
mixture-model
libvpx
abstract-class
hilbert-curve
sql-server-agent
node-gyp
reactive-cocoa-5
node-sass
galleria
scorm
smartcontracts
errordocument
paxos
hendrix
pebble-watch
rdfs
gabor-filter
pdfclown
nbconvert
webdriverjs
grails-tomcat-plugin
asp.net-4.5
plottable.js
superstack
tform
revolution-r
endeca-workbench
qdialog
tarjans-algorithm
sonarqube5.1.2
id3v2
client-side-templating
code-access-security
project-planning
google-reader
reactfx
hippomocks
picturefill
android-2.2-froyo
access-rights
runtime.exec
jplaton
lcs
flashvars
ocunit
coderush
online-compilation
qtkit
subscript
boost-filesystem
window-management
external-accessory
hirefire
imac
multi-tier
simpletest
zune
commodore
windows-live-messenger

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App