pandas


Filtering columns in dataframe that begin with a specific string


I have the following df, and I would like to apply a filter over the column names and simply remain those that begin with a certain string:
This is my current df:
ruta2:
Current SAN Prev.1m SAN Prev.2m SAN Prev.3m SAN Current TRE \
A 5 6 7 6 3
B 6 5 7 6 6
C 12 11 11 11 8
Basically what I would like is to filter the dataframe and remain the columns that begin with Current.
Then the desired output would be:
ruta2:
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
In order to do this I tried this filter but outputs a value error :
ruta2=ruta2[~(ruta2.columns.str[:4].str.startswith('Prev'))]
It seems you only need:
ruta2=ruta2.loc[:, ~(ruta2.columns.str[:4].str.startswith('Prev'))]
#same as
#ruta2=ruta2.loc[:, ~ruta2.columns.str.startswith('Prev')]
print (ruta2)
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
Or:
cols = ruta2.columns[ ~(ruta2.columns.str[:4].str.startswith('Prev'))]
ruta2=ruta2[cols]
print (ruta2)
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
But if need only Current columns use filter (^ means start of string in regex):
ruta2=ruta2.filter(regex='^Current')
print (ruta2)
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
#filter the columns names starting with 'Current'
ruta2[[e for e in ruta2.columns if e.startswith('Current')]]
Out[383]:
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
Or you can use a mask array to filter columns:
ruta2.loc[:,ruta2.columns.str.startswith('Current')]
Out[385]:
Current SAN Current TRE
A 5 3
B 6 6
C 12 8

Related Links

pandas.DataFrame.apply() with a parameter?
Pandas conditionally getting results from a previous row without shift (multiple rows have same value)
How do I select where exists in pandas?
Compute rowmeans ignoring na in pandas, like na.rm in R
Pandas apply to all values except missing
How to bind s parameter of scatter function in matplotlib to number of occurences of coordinates in pandas dataframe?
Creating a Dropdown menu in Plotly from Pandas
Pandas timestamp on array
create pandas dataframe from list of tuples
Turn values in a DataFrame column into column labels
Pandas & AWS Lambda
Pandas Year-Month format to timestamp
group by time and other column in pandas
Pandas `apply` into multiple columns consuming lots of memory
How to plot aggregated DataFrame using two columns?
Built-in support for converting column names into values?

Categories

HOME
vbscript
atom-editor
proxy
pycharm
angular-material
q
cvs
google-oauth
spring-jdbc
umd
directx
pheatmap
amazon-cloudformation
awesome-wm
facebook-php-sdk
nano-server
modelica
circular-dependency
spring-tool-suite
commonmark
opentracing
django-admin
pugjs
nhibernate-envers
swingx
one-hot-encoding
sylius
wkwebview
dcevm
autosys
bootstrap-duallistbox
pingfederate
uninstall
greendao
buck
ejabberd-module
windows-server-2000
phonegap
oracle-fusion-middleware
pdb
gammu
hexo
overriding
http-digest
devextreme
jquery-bootgrid
xcode-extension
sqlite2
streamreader
bootstrap-dialog
long-polling
slickedit
magma
capacity
memory-alignment
iso8601
pintos
heidisql
savon
database-optimization
ctest
impresspages
autorest
fuzzy-search
pagedlist
skype4py
testng-dataprovider
bluegiga
iad
python-3.2
wyam
python-green
gui-test-framework
android-nested-fragment
titanium-modules
gulp-less
android-radiobutton
elliptic-curve
valueconverter
doskey
hippomocks
braille
c18
oam
bulkloader
back-stack
seed
flashvars
simba
jquery-knob
google-email-migration
javaspaces
labwindows
online-compilation
ohm
jquery-ui-layout
bigcouch
xmlspy
xdomainrequest
xtype
imac
gamma
webkit.net
brewmp
putchar

Resources

Encrypt Message