pandas


VLOOKUP equivalent function to look up value in pandas DataFrame


I have a pandas dataframe with the following structure:
DF_Cell, DF_Site
C1,A
C2,A
C3,B
C4,B
C5,B
And I have a very long loop (100 million iterations) in which I treat one by one strings that correspond to the "DF_Cell" column in the DataFrame (first loop iteration creates C1, second iteration creates C2, etc...).
I would like to lookup in the dataframe the DF_Site corresponding to the cell (DF_Cell) treated in the loop.
One way I could think of was to put the treated cell in a one-cell DataFrame and then doing a left merge on it, but this is much too inefficient for such big data.
Is there a better way?
Perhaps you want to set DF_Cell as the index*:
In [11]: df = pd.read_csv('foo.csv', index_col='DF_Cell')
# or df.set_index('DF_Cell', inplace=True)
In [12]: df
Out[12]:
DF_Site
DF_Cell
C1 A
C2 A
C3 B
C4 B
C5 B
You can then refer to the row, or specific entry, using loc:
In [13]: df.loc['C1']
Out[13]:
DF_Site A
Name: C1, dtype: object
In [14]: df.loc['C1', 'DF_Site']
Out[14]: 'A'
*Assuming this has two columns, you could use squeeze=True.
I don't really understand what you mean in your first paragraph, but to be able to look up a field value by reference to the corresponding type in a different column, I agree with Alexis' example as the most idiomatic and efficient way to do it in pandas. However if this is really representative of your data structure you can just use a dict.
data = {'a': 1, 'b': 2, 'c': 3}
data['a']
# 2
map(lambda y: x[y]+1, ['c', 'b', 'a'])
# [4, 3, 2]

Related Links

How to check for boolean codition in pandas dataframe
Reading batches of data from BigQuery into Datalab
Jupyter/ipywidgets sorting dataframe on two levels
Groupby.sum() giving ValueError: overflow in timedelta operation
Why does DataFrameGroupBy.boxplot method throw error when given argument “subplots=True/False”?
Calculate age in months - optimize date transformations in pandas
pandas: list of dictionaries grouped by key from df
Pandas data frames and matplotlib.pyplot
Pandas.to_csv thousand separator
Annotating a graph with certain values of another series (Index is datetime)
Pandas rolling sum on string column
pandas apply() with and without lambda
Pandas read_html to retrieve Table
pandas: reshape dataframe for stacked bar plot
Change values in a column from a list
Pandas: How to Return Max Value in Multiindex

Categories

HOME
arduino-uno
zeromq
google-play
onedrive
platform-builder
relay
kalman-filter
spring-jdbc
jsrender
frameworks
azure-storage-tables
session-variables
kentor-authservices
resize
php-7.1
spring-tool-suite
hex-editors
jsprit
saxon
windows-7-x64
lucene.net
xlsxwriter
cloudhub
tapestry
core-text
autosys
android-widget
angular2-aot
tdd
rst2pdf
applozic
tasklet
nameservers
caret
ghost4j
mpmediaquery
environment-modules
overlap
automake
gzipstream
mime
bootstrap-dialog
sas-jmp
mu
svmlight
aurelia-validation
csound
nxt
removechild
probability-density
azure-virtual-network
file-writing
contact-list
angstrom-linux
prettytensor
ready-api
whois
apachebench
revolution-r
asp.net5
pervasive-sql
cannon.js
qdialog
tmuxinator
microbenchmark
nsight
cisco-ios
ami
offloading
document-classification
seaside
message-driven-bean
fmod
prettify
mysqltuner
oam
ms-project-server-2010
dir
jmapviewer
apc
soundtouch
opcache
sharp-repository
robotics-studio
anonymous-methods
broken-links
spring-portlet-mvc
spyware
xdomainrequest
cinema-4d
imac

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App