pandas


VLOOKUP equivalent function to look up value in pandas DataFrame


I have a pandas dataframe with the following structure:
DF_Cell, DF_Site
C1,A
C2,A
C3,B
C4,B
C5,B
And I have a very long loop (100 million iterations) in which I treat one by one strings that correspond to the "DF_Cell" column in the DataFrame (first loop iteration creates C1, second iteration creates C2, etc...).
I would like to lookup in the dataframe the DF_Site corresponding to the cell (DF_Cell) treated in the loop.
One way I could think of was to put the treated cell in a one-cell DataFrame and then doing a left merge on it, but this is much too inefficient for such big data.
Is there a better way?
Perhaps you want to set DF_Cell as the index*:
In [11]: df = pd.read_csv('foo.csv', index_col='DF_Cell')
# or df.set_index('DF_Cell', inplace=True)
In [12]: df
Out[12]:
DF_Site
DF_Cell
C1 A
C2 A
C3 B
C4 B
C5 B
You can then refer to the row, or specific entry, using loc:
In [13]: df.loc['C1']
Out[13]:
DF_Site A
Name: C1, dtype: object
In [14]: df.loc['C1', 'DF_Site']
Out[14]: 'A'
*Assuming this has two columns, you could use squeeze=True.
I don't really understand what you mean in your first paragraph, but to be able to look up a field value by reference to the corresponding type in a different column, I agree with Alexis' example as the most idiomatic and efficient way to do it in pandas. However if this is really representative of your data structure you can just use a dict.
data = {'a': 1, 'b': 2, 'c': 3}
data['a']
# 2
map(lambda y: x[y]+1, ['c', 'b', 'a'])
# [4, 3, 2]

Related Links

Taking second last observed row
retrieve data from pandas dataframe if it doesn't cooccur in previous column
pandas resample MAX-VALUE with corresponding ANGLE-VALUE
Performance issues with writing data to HDFStore
Finding same value index of pandas Series
Get Maximum Value from Dataframe
Slicing in group by function
Grouping factors in python patsy
pandas Series groupby col not found
Annotate labels in pandas scatter plot
Arithmetic in pandas HDF5 queries
Exception appending DataFrame chunk with string values to large HDF5 file using pandas
Unable to use seaborn.countplot
How to convert a key and list of values to a dataframe in pyspark?
Pandas standard deviation miracle
Pandas: Test for key existence in dictionary

Categories

HOME
pandas
multithreading
gitlab
wso2
blogger
layout
hp-exstream
lodash
routes
google-project-tango
enterprise-library-5
vifm
metatrader4
imacros
row
remote-access
carthage
jsprit
netflix
lombok
ghc
libtiff
serilog
oracle-coherence
nas
visjs
tar
extjs5
pass-by-reference
language-agnostic
firebase-crash-reporting
buildbot
pingfederate
strncpy
windowbuilder
nouislider
javascriptcore
create-table
typed.js
lxd
protovis
wallpaper
mapdb
sequential
fusionpbx
qwerty
mixture-model
ssjs
azure-ml
elmah
glew
outlook-api
sencha-touch-2.3
qwt
rails-routing
gcsfuse
mcafee
bind9
vtigercrm
savon
measures
fputcsv
python-stackless
bluegiga
system.management
freedesktop.org
jfugue
iad
ios8-today-widget
kcachegrind
drawbitmap
tableau-online
sniffer
formatjs
modalpopup
java.util.concurrent
javafx-webengine
id3v2
bitcoinj
kgdb
dealloc
jboss-weld
srs
quantlib-swig
pushbackinputstream
xamlparseexception
ember-app-kit
angularjs-controller
bitsharp
runas
armcc
hosts-file
netbeans-6.9
nsobject
zpt
disclosure
propagation
locate
sudzc
digest-authentication
gallio
wise
brewmp
project-hosting
gacutil
caching-application-block

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App