How to Convert Object to Float in Pandas

Manav Narula Feb 02, 2024
  1. Convert an Object-Type Column to Float in Pandas
  2. Use the astype() Method to Convert Object to Float in Pandas
  3. Use the to_numeric() Function to Convert Object to Float in Pandas
  4. Use the apply() Function With a Lambda Function to Convert an Object to Float in Pandas
  5. Conclusion
How to Convert Object to Float in Pandas

Data manipulation is a cornerstone of any data science or analysis endeavor. Often, datasets arrive in formats that require careful preprocessing to unlock their full analytical potential.

One common challenge is converting object-type columns, which may contain numerical information stored as strings, into a more numerical format like floats. Pandas is the go-to library for data manipulation in the Python ecosystem, offering several methods for achieving this conversion.

This tutorial will focus on converting an object-type column to float in Pandas.

Convert an Object-Type Column to Float in Pandas

An object-type column contains a string or a mix of other types, whereas a float contains decimal values. We will work on the following DataFrame in this article.

import pandas as pd

df = pd.DataFrame(
    [["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
    columns=["a", "b", "c", "d"],
)

print(df)
print("---------------------------")
print(df.info())

The above code first imports the Pandas module and then creates a DataFrame named df with three rows and four columns labeled 'a', 'b', 'c', and 'd'. The initial values are a mix of strings and integers.

After creating the DataFrame, it prints the contents of df and then adds a separator line.

Following that, it prints information about the DataFrame using the info() method. This provides details like the data types of each column, as well as the number of non-null entries, which is useful for understanding the structure of the dataset.

Output:

      a  b   c   d
0  10.0  6   7   8
1   1.0  9  12  14
2   5.0  8  10   6
---------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       3 non-null      object
 1   b       3 non-null      int64
 2   c       3 non-null      int64
 3   d       3 non-null      int64
dtypes: int64(3), object(1)
memory usage: 224.0+ bytes
None

Notice the type of column 'a', which is of the object type. We will convert this object to float using pd.to_numeric(), astype(), and apply() functions in Pandas.

Note
This tutorial won’t cover the convert_objects() function, which is deprecated and removed.

Use the astype() Method to Convert Object to Float in Pandas

Pandas provide the astype() method to convert a column to a specific type. We pass float to the method and set the parameter errors as 'raise', which means it will raise exceptions for invalid values.

Syntax:

DataFrame.astype(dtype, copy=True, errors="raise")
  1. dtype: The data type that we want to assign to our object.
  2. copy: A Boolean parameter. It returns a copy when True.
  3. errors: It controls the raising of exceptions on invalid data for the provided data type. It has two options.
    3.1. raise: allows exceptions to be raised.
    3.2. ignore: suppresses exceptions. If an error exists, then it returns the original object.

The following code uses the astype() method to convert the object to float in Pandas.

import pandas as pd

df = pd.DataFrame(
    [["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
    columns=["a", "b", "c", "d"],
)

df["a"] = df["a"].astype(float, errors="raise")

print(df.info())

This code imports the Pandas library and aliases it as pd. Next, it creates a DataFrame called df using a list of lists, where each inner list represents a row of data.

The columns of the DataFrame are labeled as 'a', 'b', 'c', and 'd'. The data in column 'a' is initially stored as strings, but the subsequent line of code attempts to convert them into floating-point numbers using the astype() method.

The errors = 'raise' argument means that if there are any issues with the conversion, it will raise an error. Finally, it prints out information about the DataFrame using the info() method, which provides details like the column data types and memory usage.

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       3 non-null      float64
 1   b       3 non-null      int64
 2   c       3 non-null      int64
 3   d       3 non-null      int64
dtypes: float64(1), int64(3)
memory usage: 224.0 bytes
None

This method is efficient and suitable for cases where the data is clean and consistent.

Use the to_numeric() Function to Convert Object to Float in Pandas

The Pandas to_numeric() function can be used to convert a list, a series, an array, or a tuple to a numeric datatype, which means signed, or unsigned int and float type. It also has the errors parameter to raise exceptions.

Syntax:

DataFrame.to_numeric(arg, errors="raise", downcast=None)
  1. arg: It is a scalar, list, tuple, 1-d array, or Series. It is the argument that we want to convert to numeric.
  2. errors: It is a string parameter. It has three options: ignore, raise, or coerce. If it is set to raise, then an invalid argument will raise an exception. If it is set to coerce, then an invalid argument will be set as NaN. If it is set to ignore, then an invalid argument will return the input.
  3. downcast: It is a string parameter. It has four options: integer, signed, unsigned, or float.

An example of converting the object type to float using to_numeric() is shown below.

import pandas as pd

df = pd.DataFrame(
    [["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
    columns=["a", "b", "c", "d"],
)

df["a"] = pd.to_numeric(df["a"], errors="coerce")

print(df.info())

This code first imports the Pandas library and then creates a DataFrame (a table-like data structure) with three rows and four columns labeled 'a', 'b', 'c', and 'd'. The values in the 'a' column are initially given as strings, like '10.0'.

The code then converts the values in the 'a' column to numeric format using the pd.to_numeric function. The errors='coerce' argument is used, which means that if any conversion errors occur (e.g., if a value cannot be converted to a number), those cells will be replaced with NaN (Not a Number).

Finally, it prints out information about the DataFrame using the df.info() function, which provides details about the DataFrame, including the data types of each column and the number of non-null entries.

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       3 non-null      float64
 1   b       3 non-null      int64
 2   c       3 non-null      int64
 3   d       3 non-null      int64
dtypes: float64(1), int64(3)
memory usage: 224.0 bytes
None

This method provides more flexibility when dealing with messy data.

Use the apply() Function With a Lambda Function to Convert an Object to Float in Pandas

The apply() function is a versatile tool in Pandas that allows us to apply a given function along an axis of a DataFrame or a Series. It can be used to transform data in a multitude of ways.

Syntax for DataFrame:

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
  1. func: This is the function that you want to apply to each element or row/column of the DataFrame or Series.
  2. axis: Specifies the axis along which the function is applied. For DataFrames, 0 applies the function to each column, and 1 applies it to each row.
  3. raw: A Boolean parameter. If set to True, the function will receive NumPy arrays as input. If set to False, it will receive a Series.
  4. result_type: For DataFrames, you can specify the desired return type (e.g., 'expand', 'reduce', 'broadcast', or None).
  5. args: A tuple of additional arguments to pass to the function being applied.
  6. **kwds: Keyword arguments for the function func.

Syntax for Series:

Series.apply(func, convert_dtype=True, args=(), **kwds)

convert_dtype is a Boolean parameter. If set to True, it tries to infer better data types for the output.

A lambda function, also known as an anonymous function, is a small throwaway function defined without a name. It’s particularly useful for short, one-off operations.

When you use apply() with a lambda function, it provides an efficient way to perform element-wise operations. Look at the following code as an example.

import pandas as pd

df = pd.DataFrame(
    [["10.0", 6, 7, 8], ["1.0", 9, 12, 14], ["5.0", 8, 10, 6]],
    columns=["a", "b", "c", "d"],
)

# Assuming df is your DataFrame and 'a' is the column to be converted
df["a"] = df["a"].apply(lambda x: float(x) if x.replace(".", "", 1).isdigit() else None)

print(df.info())

This code starts by importing the Pandas library and then creates a DataFrame named df with three rows and four columns labeled 'a', 'b', 'c', and 'd'. The values in the 'a' column are initially given as strings, like '10.0'.

The code then applies a lambda function to the 'a' column using df['a'].apply(...). This lambda function checks if each value in column 'a' can be converted to a float.

If it can, it performs the conversion; otherwise, it assigns None to that cell. The replace('.', '', 1).isdigit() checks if the value is a valid float representation.

Finally, it prints out information about the DataFrame using df.info(), which provides details about the DataFrame, including the data types of each column and the number of non-null entries. This code effectively attempts to convert valid string representations of floats in column 'a' while handling invalid ones appropriately.

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   a       3 non-null      float64
 1   b       3 non-null      int64
 2   c       3 non-null      int64
 3   d       3 non-null      int64
dtypes: float64(1), int64(3)
memory usage: 224.0 bytes
None

This method provides a way to implement custom logic during the conversion process.

Conclusion

This tutorial has extensively covered converting an object-type column to float in Pandas, showcasing three distinct approaches: using the astype() method, employing the to_numeric() function, and harnessing the power of the apply() function coupled with a lambda function.

Always choose the method that best fits your data and use case. Remember, it’s essential to validate your data after conversion to ensure accuracy in your analysis.

Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Pandas DataFrame