Intro To Data Structures — Pandas 1.4.3 Documentation
Có thể bạn quan tâm
Series#
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
s = pd.Series(data, index=index)Here, data can be many different things:
a Python dict
an ndarray
a scalar value (like 5)
The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is:
From ndarray
If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].
In [3]: s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"]) In [4]: s Out[4]: a 0.469112 b -0.282863 c -1.509059 d -1.135632 e 1.212112 dtype: float64 In [5]: s.index Out[5]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object') In [6]: pd.Series(np.random.randn(5)) Out[6]: 0 -0.173215 1 0.119209 2 -1.044236 3 -0.861849 4 -2.104569 dtype: float64Note
pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.
From dict
Series can be instantiated from dicts:
In [7]: d = {"b": 1, "a": 0, "c": 2} In [8]: pd.Series(d) Out[8]: b 1 a 0 c 2 dtype: int64If an index is passed, the values in data corresponding to the labels in the index will be pulled out.
In [9]: d = {"a": 0.0, "b": 1.0, "c": 2.0} In [10]: pd.Series(d) Out[10]: a 0.0 b 1.0 c 2.0 dtype: float64 In [11]: pd.Series(d, index=["b", "c", "d", "a"]) Out[11]: b 1.0 c 2.0 d NaN a 0.0 dtype: float64Note
NaN (not a number) is the standard missing data marker used in pandas.
From scalar value
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.
In [12]: pd.Series(5.0, index=["a", "b", "c", "d", "e"]) Out[12]: a 5.0 b 5.0 c 5.0 d 5.0 e 5.0 dtype: float64Series is ndarray-like#
Series acts very similarly to a ndarray and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.
In [13]: s.iloc[0] Out[13]: 0.4691122999071863 In [14]: s.iloc[:3] Out[14]: a 0.469112 b -0.282863 c -1.509059 dtype: float64 In [15]: s[s > s.median()] Out[15]: a 0.469112 e 1.212112 dtype: float64 In [16]: s.iloc[[4, 3, 1]] Out[16]: e 1.212112 d -1.135632 b -0.282863 dtype: float64 In [17]: np.exp(s) Out[17]: a 1.598575 b 0.753623 c 0.221118 d 0.321219 e 3.360575 dtype: float64Note
We will address array-based indexing like s.iloc[[4, 3, 1]] in section on indexing.
Like a NumPy array, a pandas Series has a single dtype.
In [18]: s.dtype Out[18]: dtype('float64')This is often a NumPy dtype. However, pandas and 3rd-party libraries extend NumPy’s type system in a few places, in which case the dtype would be an ExtensionDtype. Some examples within pandas are Categorical data and Nullable integer data type. See dtypes for more.
If you need the actual array backing a Series, use Series.array.
In [19]: s.array Out[19]: <NumpyExtensionArray> [ 0.4691122999071863, -0.2828633443286633, -1.5090585031735124, -1.1356323710171934, 1.2121120250208506] Length: 5, dtype: float64Accessing the array can be useful when you need to do some operation without the index (to disable automatic alignment, for example).
Series.array will always be an ExtensionArray. Briefly, an ExtensionArray is a thin wrapper around one or more concrete arrays like a numpy.ndarray. pandas knows how to take an ExtensionArray and store it in a Series or a column of a DataFrame. See dtypes for more.
While Series is ndarray-like, if you need an actual ndarray, then use Series.to_numpy().
In [20]: s.to_numpy() Out[20]: array([ 0.4691, -0.2829, -1.5091, -1.1356, 1.2121])Even if the Series is backed by a ExtensionArray, Series.to_numpy() will return a NumPy ndarray.
Series is dict-like#
A Series is also like a fixed-size dict in that you can get and set values by index label:
In [21]: s["a"] Out[21]: 0.4691122999071863 In [22]: s["e"] = 12.0 In [23]: s Out[23]: a 0.469112 b -0.282863 c -1.509059 d -1.135632 e 12.000000 dtype: float64 In [24]: "e" in s Out[24]: True In [25]: "f" in s Out[25]: FalseIf a label is not contained in the index, an exception is raised:
In [26]: s["f"] --------------------------------------------------------------------------- KeyErrorTraceback (most recent call last) File ~/work/pandas/pandas/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3811 try: -> 3812 return self._engine.get_loc(casted_key) 3813 except KeyError as err: File ~/work/pandas/pandas/pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc() File ~/work/pandas/pandas/pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'f' The above exception was the direct cause of the following exception: KeyErrorTraceback (most recent call last) Cell In[26], line 1 ----> 1 s["f"] File ~/work/pandas/pandas/pandas/core/series.py:1133, in Series.__getitem__(self, key) 1130 return self._values[key] 1132 elif key_is_scalar: -> 1133 return self._get_value(key) 1135 # Convert generator to list before going through hashable part 1136 # (We will iterate through the generator there to check for slices) 1137 if is_iterator(key): File ~/work/pandas/pandas/pandas/core/series.py:1249, in Series._get_value(self, label, takeable) 1246 return self._values[label] 1248 # Similar to Index.get_value, but we do not fall back to positional -> 1249 loc = self.index.get_loc(label) 1251 if is_integer(loc): 1252 return self._values[loc] File ~/work/pandas/pandas/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key) 3814 if isinstance(casted_key, slice) or ( 3815 isinstance(casted_key, abc.Iterable) 3816 and any(isinstance(x, slice) for x in casted_key) 3817 ): 3818 raise InvalidIndexError(key) -> 3819 raise KeyError(key) fromerr 3820 except TypeError: 3821 # If we have a listlike key, _check_indexing_error will raise 3822 # InvalidIndexError. Otherwise we fall through and re-raise 3823 # the TypeError. 3824 self._check_indexing_error(key) KeyError: 'f'Using the Series.get() method, a missing label will return None or specified default:
In [27]: s.get("f") In [28]: s.get("f", np.nan) Out[28]: nanThese labels can also be accessed by attribute.
Vectorized operations and label alignment with Series#
When working with raw NumPy arrays, looping through value-by-value is usually not necessary. The same is true when working with Series in pandas. Series can also be passed into most NumPy methods expecting an ndarray.
In [29]: s + s Out[29]: a 0.938225 b -0.565727 c -3.018117 d -2.271265 e 24.000000 dtype: float64 In [30]: s * 2 Out[30]: a 0.938225 b -0.565727 c -3.018117 d -2.271265 e 24.000000 dtype: float64 In [31]: np.exp(s) Out[31]: a 1.598575 b 0.753623 c 0.221118 d 0.321219 e 162754.791419 dtype: float64A key difference between Series and ndarray is that operations between Series automatically align the data based on label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.
In [32]: s.iloc[1:] + s.iloc[:-1] Out[32]: a NaN b -0.565727 c -3.018117 d -2.271265 e NaN dtype: float64The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN. Being able to write code without doing any explicit data alignment grants immense freedom and flexibility in interactive data analysis and research. The integrated data alignment features of the pandas data structures set pandas apart from the majority of related tools for working with labeled data.
Note
In general, we chose to make the default result of operations between differently indexed objects yield the union of the indexes in order to avoid loss of information. Having an index label, though the data is missing, is typically important information as part of a computation. You of course have the option of dropping labels with missing data via the dropna function.
Name attribute#
Series also has a name attribute:
In [33]: s = pd.Series(np.random.randn(5), name="something") In [34]: s Out[34]: 0 -0.494929 1 1.071804 2 0.721555 3 -0.706771 4 -1.039575 Name: something, dtype: float64 In [35]: s.name Out[35]: 'something'The Series name can be assigned automatically in many cases, in particular, when selecting a single column from a DataFrame, the name will be assigned the column label.
You can rename a Series with the pandas.Series.rename() method.
In [36]: s2 = s.rename("different") In [37]: s2.name Out[37]: 'different'Note that s and s2 refer to different objects.
Từ khóa » Xóa Cột Trong Python
-
Xóa Cột Khỏi Pandas DataFrame Theo Tên Cột? - HelpEx
-
Python — Xóa Cột Khỏi DataFrame - Wake-up
-
Python — Làm Cách Nào để Xóa Các Cột Trong Tệp CSV?
-
[Series Pandas DataFrame] Phân Tích Dữ Liệu Cùng Pandas (Phần 4)
-
[Python Library Series] Pandas Tutorial For Beginners Part 2 - Viblo
-
Làm Cách Nào để Bạn Xóa Hàng Tên Cột Khỏi A Pandas DataFrame?
-
Chi Tiết Bài Học 6. Làm Quen Với Dataframe Qua Một Số Thao Tác Trên ...
-
Xóa Cột Trong Ma Trận Với ray - Desktop Of ITers
-
Xử Lý Các ô Dữ Liệu Trống Với Python
-
Làm Thế Nào để Loại Bỏ Cột Hoặc Biến Trong R?
-
8 Làm Sạch Số Liệu Và Các Hàm Quan Trọng
-
Cách Sử Dụng Hàm Pandas Drop () Trong Python [Hướng Dẫn Hữu ích]
-
Xóa Phần Tử Trong List Python (del, Pop, Remove, Clear)
-
Thư Viện Pandas Trong Python - Lập Trình Không Khó