Go getters: a monadic way
Apr 9, 2024
I talked about some functional ideas about monad in the previous post. Let’s put this to work. In this post, I’ll show you how to implement a simple monadic type in Python to handle safe getters.
Call me Maybe
This is a trivial example, but it shows the basic idea. We’ll create a Maybe
type that can either hold a value or be empty. We’ll implement map
and bind
methods to work with this type.
class Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
instance Monad Maybe where
return x = Just x
Nothing >>= f = Nothing
Just x >>= f = f x
class Maybe:
def __init__(self, value=None):
self.value = value
def is_nothing(self):
return self.value is None
def map(self, func):
if self.is_nothing():
return self
else:
return Maybe(func(self.value))
def bind(self, func):
if self.is_nothing():
return self
else:
return func(self.value)
def __repr__(self):
if self.is_nothing():
return "Nothing"
else:
return f"Just({self.value})"
# Example usage:
def safe_divide(a, b):
try:
return Maybe(a / b)
except ZeroDivisionError:
return Maybe()
def increment(x):
return x + 1
result = Maybe(10).bind(lambda x: safe_divide(x, 2))
print(result) # Just(5.0)
result = result.map(increment)
print(result) # Just(6.0)
result = Maybe(10).bind(lambda x: safe_divide(x, 0))
print(result) # Nothing
result = result.map(increment)
print(result) # Nothing
Some refactoring to make the interface more user-friendly:
def safe_divide(maybe_a: Maybe, maybe_b: Maybe) -> Maybe:
def inner_divide(a, b):
if b == 0:
raise ValueError("Division by zero.")
return a / b
return maybe_a.bind(lambda a: maybe_b.map(lambda b: inner_divide(a, b)))
# Example usage:
a = Maybe(10)
b = Maybe(2)
result = safe_divide(a, b)
print(result) # Just(5.0)
a = Maybe(10)
b = Maybe(0)
result = safe_divide(a, b)
print(result) # Nothing
a = Maybe(10)
b = Maybe()
result = safe_divide(a, b)
print(result) # Nothing
a = Maybe()
b = Maybe()
result = safe_divide(a, b)
print(result) # Nothing
The value of using a Maybe
monad over a simple division operation in Python, especially in the context of handling operations that might fail (like division by zero), lies in several key areas:
- Error Handling and Safety Explicit Handling of Failure Cases: The
Maybe
monad makes the handling of errors and exceptional cases explicit and a fundamental part of the type system. This contrasts with simple division, where errors must be handled through conditional checks or exception handling, which can be more error-prone and verbose. No Exception Required for Control Flow: Using exceptions for control flow is generally considered a bad practice because it can make the code harder to understand and maintain. TheMaybe
monad allows you to encode potential failure in the type system, making the flow of data and errors more explicit and less reliant on side-effects or exceptions. - Composability and Chaining Chaining Operations: With the
Maybe
monad, you can easily chain operations that might fail without having to check for errors after each step. This leads to cleaner and more readable code, especially when dealing with multiple operations that can fail. Unified Interface for Nullable Operations: It provides a unified interface for dealing with operations that might return a null or undefined value, reducing the need for null checks scattered throughout the code. - Functional Programming Paradigm Encourages Pure Functions: The use of monads encourages the design of pure functions that don’t have side effects, making the code easier to reason about, test, and reuse. Declarative Code Style: It promotes a more declarative style of programming, where you describe what you want to achieve rather than how to do it step by step. This can lead to more concise and readable code.
Null
Safety PreventsNull
Reference Errors: By encapsulating the presence or absence of a value in aMaybe
object, you avoid the common pitfall of null reference errors, which are a frequent source of bugs in many programming languages.
Safe getters
Implementing a monad specifically designed to handle attribute access (getattr
) operations safely, dealing gracefully with None values (or any situation where an attribute might not exist), is a practical way to manage data access in a more functional style. This can be particularly useful when working with deeply nested data structures where any part of the chain might be None
(or missing).
Let’s call this monad SafeGetter
. It will encapsulate a value and allow us to chain attribute access operations safely, returning a special value (e.g., None
or a custom default) if any operation in the chain fails due to the target being None
or the attribute not existing.
Here’s how you might implement and use such a monad:
class SafeGetter:
def __init__(self, value):
self.value = value
def get(self, attr, default=None):
"""Attempts to get an attribute from the current value, safely."""
if self.value is None:
return SafeGetter(default)
try:
return SafeGetter(getattr(self.value, attr, default))
except AttributeError:
return SafeGetter(default)
def or_else(self, default):
"""Returns the contained value or a default if None."""
if self.value is None:
return default
return self.value
def __repr__(self):
return f"SafeGetter({repr(self.value)})"
# Example usage:
class Person:
def __init__(self, name, parent=None):
self.name = name
self.parent = parent
# Constructing a nested structure
grandparent = Person("Grandparent")
parent = Person("Parent", grandparent)
child = Person("Child", parent)
# Safe attribute access
child_name = SafeGetter(child).get("parent").get("parent").get("name").or_else("No name")
print(child_name) # Output: Grandparent
# Handling missing attributes safely
unknown = SafeGetter(child).get("parent").get("sibling").get("name").or_else("No name")
print(unknown) # Output: No name
# Handling None at any level
none_test = SafeGetter(None).get("parent").get("name").or_else("No value")
print(none_test) # Output: No value
This provides a way to safely access attributes in a chain of nested objects, handling missing attributes or None values gracefully without raising exceptions or requiring explicit null checks at each step.
PEP 505. Several modern programming languages have so-called “null-coalescing” or
“null
- aware” operators, including C#, Dart, Perl, Swift, and PHP (starting in version 7). There are also stage 3 draft proposals for their addition to ECMAScript (a.k.a. JavaScript). These operators provide syntactic sugar for common patterns involvingnull
references.
- The “
null
-coalescing” operator is a binary operator that returns its left operand if it is notnull
. Otherwise it returns its right operand.- The “
null
-aware member access” operator accesses an instance member only if that instance is non-null
. Otherwise it returnsnull
. (This is also called a “safe navigation” operator.) -The “null
-aware index access” operator accesses an element of a collection only if that collection is non-null
. Otherwise it returnsnull
. (This is another type of “safe navigation” operator.)
Getters in the wild
It’s common to use nested pydantic models to represent complex data structures in Python.
from pydantic import BaseModel, Field
from typing import List, Optional
class CashHandling(BaseModel):
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
cashflow: Optional[float] = Field(None)
class AssetWeight(BaseModel):
asset_id: Optional[str] = Field(None)
trade_direction: Optional[str] = Field(None)
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
class RebalanceSettings(BaseModel):
cash_handling: CashHandling = Field(default_factory=CashHandling)
asset_weights: List[AssetWeight] = Field(default_factory=list)
#%%
rebal_settings = RebalanceSettings(**{
"cash_settings": {
"min_weight": 0.005,
"max_weight": 0.01,
"cashflow": 1000.0
},
"asset_weights": [
{
"asset_id": "AAPL",
"trade_direction": "BUY",
"min_weight": 0.05,
"max_weight": 0.06
},
{
"asset_id": "MSFT",
"trade_direction": "SELL",
"min_weight": 0.05,
"max_weight": 0.07
}
]
})
It is left to the users to access nested attributes in a safe way to avoid exceptions.
if (self.rebal_settings is not None) and (self.rebal_settings.cash_settings is not None):
cashflow = self.rebal_settings.cash_settings.cashflow
else:
cashflow = 0.0
v1: Using a monadic getter
We can use the SafeGetter
monad to achieve this.
from pydantic import BaseModel, Field, ValidationError
from typing import List, Optional, Any, TypeVar, Generic
T = TypeVar('T')
class OptionalMonad(Generic[T]):
def __init__(self, value: Optional[T]):
self.value = value
def bind(self, func):
if self.value is None:
return OptionalMonad(None)
try:
return OptionalMonad(func(self.value))
except (AttributeError, ValidationError, KeyError):
return OptionalMonad(None)
def or_else(self, default: T) -> T:
return self.value if self.value is not None else default
def __repr__(self):
return f"OptionalMonad({self.value})"
class SettingsModel(BaseModel):
def get(self, attr: str) -> OptionalMonad:
return OptionalMonad(getattr(self, attr, None))
class CashHandling(SettingsModel):
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
cashflow: Optional[float] = Field(None)
class AssetWeight(SettingsModel):
asset_id: Optional[str] = Field(None)
trade_direction: Optional[str] = Field(None)
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
class RebalanceSettings(SettingsModel):
cash_handling: CashHandling = Field(default_factory=CashHandling)
asset_weights: List[AssetWeight] = Field(default_factory=list)
# Example usage
rebal_settings = RebalanceSettings(**{
"cash_handling": {
"min_weight": 0.005,
"max_weight": 0.01,
"cashflow": 1000.0
},
"asset_weights": [
{
"asset_id": "AAPL",
"trade_direction": "BUY",
"min_weight": 0.05,
"max_weight": 0.06
},
{
"asset_id": "MSFT",
"trade_direction": "SELL",
"min_weight": 0.05,
"max_weight": 0.07
}
]
})
# Accessing and using the monadic getter
cashflow = rebal_settings.get('cash_handling').bind(lambda ch: ch.get('cashflow')).or_else(0)
print(cashflow) # Output: 1000.0
This implementation is technically correct. It is more verbose than the simple getattr
approach, but it provides a more functional and safe way to access nested attributes in a chain of objects, especially when dealing with complex data structures like those defined by Pydantic models. But the interface is user-friendly and no programmer would buy into this pattern of binding into an anonymous function to access an attribute.
v2: Simplifying the interface
We can simplify the interface by overriding the __getattr__
method in the SettingsModel
class to return a SafeGetter
object for any attribute access. This way, users can access attributes directly without having to call the get
method explicitly.
from pydantic import BaseModel, Field, ValidationError
from typing import List, Optional, Any, TypeVar, Generic, Callable
T = TypeVar('T')
class SafeAccess:
def __init__(self, value: Optional[T]):
self.value = value
def __getattr__(self, name: str) -> 'SafeAccess':
if self.value is None:
return SafeAccess(None)
try:
return SafeAccess(getattr(self.value, name, None))
except AttributeError:
return SafeAccess(None)
def or_else(self, default: T) -> T:
return self.value if self.value is not None else default
def __call__(self, *args, **kwargs) -> 'SafeAccess':
if callable(self.value):
try:
return SafeAccess(self.value(*args, **kwargs))
except Exception:
return SafeAccess(None)
return SafeAccess(None)
def __repr__(self):
return f"SafeAccess({self.value})"
class SettingsModel(BaseModel):
def __getattr__(self, item: str) -> SafeAccess:
# This method is only called when accessing an undefined attribute,
# so we wrap the result in SafeAccess for safety.
value = self.__dict__.get(item, None)
return SafeAccess(value)
class CashHandling(SettingsModel):
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
cashflow: Optional[float] = Field(None)
class AssetWeight(SettingsModel):
asset_id: Optional[str] = Field(None)
trade_direction: Optional[str] = Field(None)
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
class RebalanceSettings(SettingsModel):
cash_handling: CashHandling = Field(default_factory=CashHandling)
asset_weights: List[AssetWeight] = Field(default_factory=list)
# Example usage
rebal_settings = RebalanceSettings(**{
"cash_handling": {
"min_weight": 0.005,
"max_weight": 0.01,
"cashflow": 1000.0
},
"asset_weights": [
{
"asset_id": "AAPL",
"trade_direction": "BUY",
"min_weight": 0.05,
"max_weight": 0.06
},
{
"asset_id": "MSFT",
"trade_direction": "SELL",
"min_weight": 0.05,
"max_weight": 0.07
}
]
})
# Accessing attributes without explicit bind
cashflow = rebal_settings.cash_handling.cashflow.or_else(0)
print(cashflow) # Output: 1000.0
If you run this code, you’ll get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/workspaces/d-.github.io/scripts/test.py in line 80
57 rebal_settings = RebalanceSettings(**{
58 "cash_handling": {
59 "min_weight": 0.005,
(...)
76 ]
77 })b
79 # Accessing attributes without explicit bind
---> 80 cashflow = rebal_settings.cash_handling.cashflow.or_else(0)
81 print(cashflow) # Output: 1000.0
AttributeError: 'float' object has no attribute 'or_else'
This is a common bug in monadic code. The or_else
method is not defined for the float
type, which is the value of cashflow
. This is because the SafeAccess
object returned by rebal_settings.cash_handling.cashflow
is not a SafeAccess
object but a float
object. We can fix this by returning a SafeAccess
object from the __getattr__
method when accessing an attribute. This is the moral of the story: these getters should be a one street into the monadic world. If you take the purist functionalist approach: you should not be able to get out of the monadic world, except with the or_else
escape hatch. You could wrap all primitive types in a SafeAccess
object, but that would be overkill. Instead, we just assume the last attribute in the chain is a primitive type and return it directly.
#unsafe
cashflow = rebal_settings.cash_handling.cashflow
print(cashflow) # Output: 1000.0 instead of SafeAccess(1000.0)
v3: handling safe access to lists
You still need to handle safe access to lists. You can create a SafeList
class that inherits from list
and overrides the __getitem__
method to return a SafeAccess
object for the accessed element. This way, you can safely access elements in a list without worrying about index out of range errors.
from pydantic import BaseModel, Field
from typing import List, Optional, Generic, TypeVar
T = TypeVar('T')
class SafeAccess:
def __init__(self, value: Optional[T]):
self.value = value
def __getattr__(self, name: str) -> 'SafeAccess':
if self.value is None:
return SafeAccess(None)
try:
return SafeAccess(getattr(self.value, name, None))
except AttributeError:
return SafeAccess(None)
def or_else(self, default: T) -> T:
return self.value if self.value is not None else default
def __call__(self, *args, **kwargs) -> 'SafeAccess':
if callable(self.value):
try:
return SafeAccess(self.value(*args, **kwargs))
except Exception:
return SafeAccess(None)
return SafeAccess(None)
def __repr__(self):
return f"SafeAccess({self.value})"
class SafeList(List[T]):
def __getitem__(self, index: int) -> SafeAccess:
try:
return SafeAccess(super().__getitem__(index))
except IndexError:
return SafeAccess(None)
class SettingsModel(BaseModel):
def __getattr__(self, item: str) -> SafeAccess:
value = self.__dict__.get(item, None)
return SafeAccess(value)
class Config:
arbitrary_types_allowed = True
class CashHandling(SettingsModel):
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
cashflow: Optional[float] = Field(None)
class AssetWeight(SettingsModel):
asset_id: Optional[str] = Field(None)
trade_direction: Optional[str] = Field(None)
min_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
max_weight: Optional[float] = Field(None, ge=0.0, le=1.0)
class RebalanceSettings(SettingsModel):
cash_handling: CashHandling = Field(default_factory=CashHandling)
asset_weights: List[AssetWeight] = Field(default_factory=SafeList)
# Example usage modified to use SafeList
rebal_settings = RebalanceSettings(**{
"cash_handling": {
"min_weight": 0.005,
"max_weight": 0.01,
"cashflow": 1000.0
},
})
#%%
print(rebal_settings.asset_weights[0].asset_id) # Output: SafeAccess(None)
This however would fail but I think it’s reasonable to require users of RebalanceSettings
to rely on the default factory of List[AssetWeight]
to handle empty lists, instead of passing an empty list directly. This way, the default factory can return a SafeList
object, ensuring that all list accesses are safe.
# Example usage modified to use SafeList
rebal_settings = RebalanceSettings(**{
"cash_handling": {
"min_weight": 0.005,
"max_weight": 0.01,
"cashflow": 1000.0
},
"asset_weights": [
]
})
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/workspaces/d-.github.io/scripts/test.py in line 74
64 # Example usage modified to use SafeList
65 rebal_settings = RebalanceSettings(**{
66 "cash_handling": {
67 "min_weight": 0.005,
(...)
71 "asset_weights": []
72 })
---> 74 rebal_settings.asset_weights[0].min_weight
IndexError: list index out of range
DataFrameMonads
To make the DataFrame
operations more expressive and chainable in a monadic style, we can wrap the DataFrame
in a class that implements the monadic operations more seamlessly. This approach allows us to use method chaining for a more fluent and readable syntax. Below is an example of how you might implement such a class:
import pandas as pd
class DataFrameMonad:
def __init__(self, df):
self.df = df
@staticmethod
def return_df(value):
"""Static method to encapsulate a value into the monad."""
return DataFrameMonad(pd.DataFrame(value))
def bind(self, func):
"""Apply a function to the DataFrame and return a new monad."""
try:
# Apply the function to the DataFrame
result = func(self.df)
# Ensure the result is a DataFrame
if not isinstance(result, pd.DataFrame):
raise ValueError("The function did not return a DataFrame")
return DataFrameMonad(result)
except Exception as e:
# Handle or log the error
print(f"Error during bind: {e}")
# Optionally, return a monad with an empty DataFrame or some error indicator
return DataFrameMonad(pd.DataFrame())
def to_dataframe(self):
"""Utility method to get the underlying DataFrame."""
return self.df
def filter_rows(df):
"""Example function to filter rows of the DataFrame."""
return df[df['value'] > 10]
def add_column(df):
"""Example function to add a new column to the DataFrame."""
df['new_column'] = df['value'] * 2
return df
df_monad = DataFrameMonad.return_df({'id': [1, 2, 3], 'value': [5, 15, 25]})
result_monad = df_monad.bind(filter_rows).bind(add_column)
result_df = result_monad.to_dataframe()
print(result_df)
The choice to use a return_df method (or any similarly named method) for initialization, instead of directly using the initializer of the DataFrameMonad
class, is primarily a matter of adhering to the monadic pattern and its terminology. However, this choice also offers flexibility and clarity in certain contexts. Let’s explore the reasons and considerations in more detail:
class DataFrameMonad:
def __init__(self, df):
if not isinstance(df, pd.DataFrame):
raise ValueError("Expected a pandas DataFrame")
self.df = df
def bind(self, func):
# Implementation remains the same...
df_monad = DataFrameMonad(pd.DataFrame({'id': [1, 2, 3], 'value': [5, 15, 25]}))
To give a DataFrameMonad
class the same methods and attributes as a usual pandas DataFrame
, you essentially want to make the DataFrameMonad
behave like a DataFrame
. This can be achieved through a combination of delegation and dynamic attribute access. The goal is to allow users of DataFrameMonad
to invoke DataFrame
methods directly on a DataFrameMonad
instance, with the class transparently passing these calls through to the underlying DataFrame
.
class DataFrameMonad:
def __init__(self, df):
self._df = df
def __getattr__(self, name):
# Delegate attribute access to the underlying DataFrame
attr = getattr(self._df, name)
if callable(attr):
def wrapper(*args, **kwargs):
# Call the DataFrame method and wrap the result in a new DataFrameMonad
result = attr(*args, **kwargs)
if isinstance(result, pd.DataFrame):
return DataFrameMonad(result)
else:
return result
return wrapper
else:
return attr
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
monad = DataFrameMonad(df)
result = monad.sum()
print(result) # This will print the sum of the DataFrame columns
filtered = monad[monad['a'] > 1]
print(filtered._df) # Accessing the underlying DataFrame to display it