pandas
Filtering columns in dataframe that begin with a specific string
I have the following df, and I would like to apply a filter over the column names and simply remain those that begin with a certain string: This is my current df: ruta2: Current SAN Prev.1m SAN Prev.2m SAN Prev.3m SAN Current TRE \ A 5 6 7 6 3 B 6 5 7 6 6 C 12 11 11 11 8 Basically what I would like is to filter the dataframe and remain the columns that begin with Current. Then the desired output would be: ruta2: Current SAN Current TRE A 5 3 B 6 6 C 12 8 In order to do this I tried this filter but outputs a value error : ruta2=ruta2[~(ruta2.columns.str[:4].str.startswith('Prev'))]
It seems you only need: ruta2=ruta2.loc[:, ~(ruta2.columns.str[:4].str.startswith('Prev'))] #same as #ruta2=ruta2.loc[:, ~ruta2.columns.str.startswith('Prev')] print (ruta2) Current SAN Current TRE A 5 3 B 6 6 C 12 8 Or: cols = ruta2.columns[ ~(ruta2.columns.str[:4].str.startswith('Prev'))] ruta2=ruta2[cols] print (ruta2) Current SAN Current TRE A 5 3 B 6 6 C 12 8 But if need only Current columns use filter (^ means start of string in regex): ruta2=ruta2.filter(regex='^Current') print (ruta2) Current SAN Current TRE A 5 3 B 6 6 C 12 8
#filter the columns names starting with 'Current' ruta2[[e for e in ruta2.columns if e.startswith('Current')]] Out[383]: Current SAN Current TRE A 5 3 B 6 6 C 12 8 Or you can use a mask array to filter columns: ruta2.loc[:,ruta2.columns.str.startswith('Current')] Out[385]: Current SAN Current TRE A 5 3 B 6 6 C 12 8
Related Links
Collecting together data in columns… and knowing if it goes wong
converting a dictionary with with multi values for each key to dataframe
Faceted plots of a multi-indexed DataFrame
How can I select rows from one DataFrame, where a part of the row's index is in another DataFrame's index and meets certain criteria?
How can I find correlation between tags with Pandas?
using time zone in pandas to_datetime
How to replace items with their indices in a pandas series
Check number of unique values in pandas dataframe
Finding the time spent by id in each location
dropping various columns using iloc
pandas Selecting/sampling at different interval frequencies
First five non-numeric, non-null, distinct values from a column
How to operate conditional calculation between columns in pandas dataframe?
Group by groups to Pandas Series/Dataframe
How to write a multiple dataframes to same sheet without duplicating the column labels
logic element-wise operations in pandas time-series dataframe