dask.dataframe.Series.str.contains
dask.dataframe.Series.str.contains¶
- dataframe.Series.str.contains(pat, case: bool = True, flags: int = 0, na=None, regex: bool = True)¶
Test if pattern or regex is contained within a string of a Series or Index.
This docstring was copied from pandas.core.strings.accessor.StringMethods.contains.
Some inconsistencies with the Dask version may exist.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
- Parameters
- patstr
Character sequence or regular expression.
- casebool, default True
If True, case sensitive.
- flagsint, default 0 (no flags)
Flags to pass through to the re module, e.g. re.IGNORECASE.
- nascalar, optional
Fill value for missing values. The default depends on dtype of the array. For object-dtype,
numpy.nan
is used. ForStringDtype
,pandas.NA
is used.- regexbool, default True
If True, assumes the pat is a regular expression.
If False, treats the pat as a literal string.
- Returns
- Series or Index of boolean values
A Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.
See also
match
Analogous, but stricter, relying on re.match instead of re.search.
Series.str.startswith
Test if the start of each string element matches a pattern.
Series.str.endswith
Same as startswith, but tests the end of string.
Examples
Returning a Series of booleans using only a literal pattern.
>>> s1 = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.nan]) >>> s1.str.contains('og', regex=False) 0 False 1 True 2 False 3 False 4 NaN dtype: object
Returning an Index of booleans using only a literal pattern.
>>> ind = pd.Index(['Mouse', 'dog', 'house and parrot', '23.0', np.nan]) >>> ind.str.contains('23', regex=False) Index([False, False, False, True, nan], dtype='object')
Specifying case sensitivity using case.
>>> s1.str.contains('oG', case=True, regex=True) 0 False 1 False 2 False 3 False 4 NaN dtype: object
Specifying na to be False instead of NaN replaces NaN values with False. If Series or Index does not contain NaN values the resultant dtype will be bool, otherwise, an object dtype.
>>> s1.str.contains('og', na=False, regex=True) 0 False 1 True 2 False 3 False 4 False dtype: bool
Returning ‘house’ or ‘dog’ when either expression occurs in a string.
>>> s1.str.contains('house|dog', regex=True) 0 False 1 True 2 True 3 False 4 NaN dtype: object
Ignoring case sensitivity using flags with regex.
>>> import re >>> s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True) 0 False 1 False 2 True 3 False 4 NaN dtype: object
Returning any digit using regular expression.
>>> s1.str.contains('\\d', regex=True) 0 False 1 False 2 False 3 True 4 NaN dtype: object
Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.
>>> s2 = pd.Series(['40', '40.0', '41', '41.0', '35']) >>> s2.str.contains('.0', regex=True) 0 True 1 True 2 False 3 True 4 False dtype: bool