首页 > python-3.x > Pandas - 在多列中替换一定长度的字符串

Pandas - 在多列中替换一定长度的字符串 (Pandas - replacing strings with a certain length in multiple columns)

2019-03-06 python-3.xpandas

问题

我有一个(n,m)维数据帧,其列是“dtype对象”,具有不同长度字符串的条目。df如下所示:

      col1    col2    col3    col4    ...   colm
    |---------------------------------------------    
row1| str1,1  str1,2  str1,3  str1,4  ...   str1,m
row2| str2,1  str2,2  str2,3  str2,4  ...   str2,m
.   | .       .       .       .       ...   .
.   | .       .       .       .       ...   . 
.   | .       .       .       .       ...   .
rown| strn,1  strn,2  strn,3  strn,4  ...   strn,m

我希望在字符串长度必须小于10的情况下用NaN替换特定字符串,但仅限于某些列。

这是我的代码:

column_list = ['col1','col3']
df.loc[:,column_list] = df.apply(lambda x: x.str.replace(x,np.NaN) if len(x) < 10 else x)

代码运行没有错误,但遗憾的是实际上没有对这些列中的值执行任何操作。我相信我的问题与以下部分有关:

x.str.replace(x,np.NaN) 

我不认为“x”应该在“替换”功能中。

感谢帮助。

谢谢

解决方法

mask在你得到字符串长度之后就做str.len

s=df.apply(lambda x : x.str.len())<10
df.loc[:,column_list]=df.loc[:,column_list].mask(s)

问题

I have a (n,m) dimensional dataframe, with columns that are "dtype objects", that has entries of different length strings. The df looks like the following:

      col1    col2    col3    col4    ...   colm
    |---------------------------------------------    
row1| str1,1  str1,2  str1,3  str1,4  ...   str1,m
row2| str2,1  str2,2  str2,3  str2,4  ...   str2,m
.   | .       .       .       .       ...   .
.   | .       .       .       .       ...   . 
.   | .       .       .       .       ...   .
rown| strn,1  strn,2  strn,3  strn,4  ...   strn,m

I want to replace particular strings with NaN's on a condition that the length of the string has to be less than 10, but only for certain columns.

Here's my code:

column_list = ['col1','col3']
df.loc[:,column_list] = df.apply(lambda x: x.str.replace(x,np.NaN) if len(x) < 10 else x)

The code is running without error, but unfortunately not actually doing anything to my values in those columns. I believe my issue has to do with the following part:

x.str.replace(x,np.NaN) 

I don't think "x" should be in the "replace" function.

Appreciate the help.

Thanks

解决方法

Just do with mask after you get string length by str.len

s=df.apply(lambda x : x.str.len())<10
df.loc[:,column_list]=df.loc[:,column_list].mask(s)
相似信息