pandas


Filtering columns in dataframe that begin with a specific string


I have the following df, and I would like to apply a filter over the column names and simply remain those that begin with a certain string:
This is my current df:
ruta2:
Current SAN Prev.1m SAN Prev.2m SAN Prev.3m SAN Current TRE \
A 5 6 7 6 3
B 6 5 7 6 6
C 12 11 11 11 8
Basically what I would like is to filter the dataframe and remain the columns that begin with Current.
Then the desired output would be:
ruta2:
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
In order to do this I tried this filter but outputs a value error :
ruta2=ruta2[~(ruta2.columns.str[:4].str.startswith('Prev'))]

It seems you only need:
ruta2=ruta2.loc[:, ~(ruta2.columns.str[:4].str.startswith('Prev'))]
#same as
#ruta2=ruta2.loc[:, ~ruta2.columns.str.startswith('Prev')]
print (ruta2)
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
Or:
cols = ruta2.columns[ ~(ruta2.columns.str[:4].str.startswith('Prev'))]
ruta2=ruta2[cols]
print (ruta2)
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
But if need only Current columns use filter (^ means start of string in regex):
ruta2=ruta2.filter(regex='^Current')
print (ruta2)
Current SAN Current TRE
A 5 3
B 6 6
C 12 8

#filter the columns names starting with 'Current'
ruta2[[e for e in ruta2.columns if e.startswith('Current')]]
Out[383]:
Current SAN Current TRE
A 5 3
B 6 6
C 12 8
Or you can use a mask array to filter columns:
ruta2.loc[:,ruta2.columns.str.startswith('Current')]
Out[385]:
Current SAN Current TRE
A 5 3
B 6 6
C 12 8


Related Links

How can I plot a histogram in pandas using nominal values?
How to subtract one dataframe from another?
Pandas Inter-row calculations
how to make pandas HDFStore 'put' operation faster
Handling detection limits in a Series
Change Categorical Variable levels to What I provide/Combine levels two categorical variables
pandas: read_csv combined date-time columns as index into a dataframe
apply on group replicating complete MultiIndex
Convert csv file to pandas dataframe
Get unique values from index column in MultiIndex
Pandas dataframe resample at every nth row
how to calculate the differences of a list of pandas timestamps?
Escaped quotes in pandas read_csv
Resample Series/DataFrame with frequency anchored to specific time
Error when calling R from Pandas
Merge of multiple data frames of different number of columns into one big data frame

Categories

HOME
osgi
vbscript
pycharm
google-play
tinymce
cplex
octobercms
spagobi
cmd
c#-2.0
blueprintjs
fancybox
datastax-java-driver
ios-charts
autotools
contact
awesome-wm
dacpac
google-translate
communication
node-pdfkit
hapi
lombok
tostring
facebook-page
aurigma
pythonanywhere
samoa
openedx
core-text
rundeck
javacv
selectedindexchanged
spring-mybatis
junit5
titanium-mobile
typed.js
protovis
subset-sum
force-layout
space-complexity
unspecified
dynamic-reports
firebase-admin
eigenvalue
convertapi
galleria
skeleton-css-boilerplate
wso2carbon
auto-update
avconv
flickr-api
appcompat
youcompleteme
integrity
mu
git-diff
feeds
sqldf
lync-client-sdk
dynamics-sl
sonarlint-vs
libpng
storekit
clang-static-analyzer
moveit
blackberry-10
testng-dataprovider
ios4
qtableview
endeca-workbench
t4mvc
census
streambase
csquery
dia
sorl-thumbnail
oxwall
pick
terminfo
didselectrowatindexpath
resty-gwt
ember-charts
cloud-connect
jsctypes
gridcontrol
html4
mylyn
orchardcms-1.7
wsdl-2.0
factory-method
sublist
bitsharp
bubble-chart
fireworks
coercion
f#-powerpack
ecl
vc90
brewmp
privilege
premature-optimization
windows-live-messenger





Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm