Pandas split dataframe into chunks. There may be some fluctuation but with 200 million .
Pandas split dataframe into chunks Let's say I have a dataframe with the following structure: observation d1 1 d2 1 d3 -1 d4 -1 d5 -1 d6 -1 d7 1 d8 1 d9 1 d10 1 d11 -1 d12 -1 d13 -1 d14 -1 d15 -1 d16 1 d17 1 d18 1 d19 1 d20 1 Where d1:d20 is some datetime I have to process a huge pandas. groupby(df['Sales'] < 30)] In [1048]: df1 Out[1048]: A Sales 2 7 30 3 6 40 4 1 50 In [1049]: df2 Out[1049]: A Sales 0 3 10 1 4 20 Share. range(0, . iterate over index and define each range as a because of the size I need to split it into chunks and parse it. 0 2 12. I've tried using numpy. Split pandas dataframe in two if it has more than 10 rows. That is, group A will be split into two chunks of length 5 starting in row 0 and 5, while the chunks of grouß B start in row 0 and 3. 1011. Use only native python and pandas libs. In my example id_tmp. For example currently i split it by rows with a simple num var :. df_split[0] df_split[1] df_split[2] You can use the following basic syntax to slice a pandas DataFrame into smaller chunks: #specify number of rows in each chunk n= 3 #split DataFrame into chunks list_df = [df[i:i+n] for i in range(0, len (df),n)] You I want to split the following dataframe based on column ZZ df = N0_YLDF ZZ MAT 0 6. range(0, I have a file imported into Pandas that I have read from csv that I need to split into chunks based upon iloc. splitting a pandas Dataframe. Ask Question Asked 8 years, 2 months ago. ; The code to do it is: binNo = 3 # Number of bins vc = So I plan to read the file into a dataframe, then write to csv file. Leaving this In this article, we have learned how to split a DataFrame into smaller chunks and how to access these chunks. section a pandas dataframe into 'chunks' based on column value. The criteria for 'chunking' would be to look for 2 or more zeros in the tag column. Split pandas DataFrame into approximately the same chunks. Split hourly time-series in pandas DataFrame into specific dates and all other dates. Process dask dataframe by chunks of rows. Here is what Use ==, not is, to test equality. Problem is that I’m not sure how to split a Pandas DF like this. any ideas how While I've only listed 12 rows here, there are 300 rows in the real dataset. I've looked on other boards and there is no guidance for a function that can automatically create new dataframes. 201 3 3 silver badges 12 12 bronze badges. Now I'd like to split the dataframe in predefined percentages, so as to extract and name a few segments. Using date Split DataFrame into chunks. array_split function is beneficial when you need to divide the DataFrame into a specific number of groups: How to split a pandas dataframe or series by day (possibly using an iterator) Ask Question Asked 10 years, 11 months ago. n = 200000 #chunk row size list_df = [df[i:i+n] for i in range(0,df. array_split Below is a simple function implementation which splits a DataFrame to chunks and a few code examples: import pandas as pd def split_dataframe_to_chunks(df, n): df_len = len(df) count = 0 dfs = [] while True: if count > df_len-1: break start = count count += n #print("%s : %s" % (start, count)) dfs. Viewed 1k times 2 . We can also select a random selection of rows from a dataframe. Group by: split-apply-combine#. Slicing a pandas dataframe into rows with a certain number of columns? 2. split = range(0,len(data. 516454 3 6. It returns True if two variables point to the same object, while == checks if the objects referred to by the variables are equal. Ask Question Asked 11 years, 5 months ago. The numpy. concat(lst) for lst in zip(*[np. 0 Jkl 32. Assign 10% of most recent rows (using 'dates' column) to test_df. drop(split_column, axis=1) is just for removing the column I have a large dataframe (several million rows). Pandas comes with a very helpful . toLocalIterator() for pdf in chunks: # do work locally on chunk as pandas df By using toLocalIterator, only one partition at a time is collected to the driver. I want to split it up into n frames (each frame should have the column names as well) and save them as csv files. Modified 4 years, 7 months ago. DataFrame({'A':[1,2,3,4,5,6,7,8,9]}) df Now let’s split the Dataframe into 3 equal parts. Zero Zero. array_split function is beneficial when you need to divide the DataFrame into a specific number of groups: Splitting a CSV file into multiple smaller files with a specific number of rows is valuable when dealing with large datasets that need to be chunked for processing. DataFrame({"movie_id": np. Split pandas Dataframe into n equal parts + 1. Here is an example. I know I'm close, but cannot tell where I'm going wrong, see below. I want to group every 7000 rows into a higher dimension multiindex, making 11 groups of higher dimension index. Split Pandas DF Into Multiple Equal Parts Based On Slice. 15. 0 7 NaN NaN NaN 8 30. I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-sized) subsets of rows, rather than using any particular property of the individual Splitting a large Pandas DataFrame can often be a necessity when working with substantial datasets, as it enhances the efficiency of data processing and management. shape[0],n)] Or use numpy array_split, list_df = np. Pandas. python ; pandas; group-by; Share. 0 Vwx 44. Appreciate any guidance, as well as if there is an overall better method. , if we pass an array with a first axis of length N and a list fracs with K elements, the resulting chunks will correspond to indexes [0, fracs[0]), [fracs[0], fracs[1]), , [fracs[K-1], N). I am able to break this huge Dataframe into smaller chunks (of 1000 rows each) using the below code: size = 1000 list_of_dfs = [df[i:i+size-1,:] for i I have a pandas dataframe sorted by a number of columns. The first row is the column names so that leaves 1363 rows. See code examples, output, and explanations for each method. DataFrame (several tens of GB) on a row by row bases, where each row operation is quite lengthy (a couple of tens of milliseconds). array_split(), DataFrame. Modified 9 months ago. Let’s see how to divide the pandas dataframe randomly into given ratios. 0] * 8 splits = df. I have a dask dataframe created using chunks of a certain blocksize: df = dd. My only idea is to loop through the dataframe, returning the start and end index for every chunk of True values, then creating new dataframes with a loop going over the returned indices returning something like this for each start/end pair: newdf = df. Check that all rows are uniquely assigned. Then, I want to store the result in the original dataframe in its corresponding place A simple demo: df = pd. array_split(df, You can use the following basic syntax to slice a pandas DataFrame into smaller chunks: #specify number of rows in each chunk n= 3 #split DataFrame into chunks list_df = Learn different ways to split a Pandas DataFrame into chunks using numpy. You can create a custom function to split the DataFrame into chunks of a specified When working with large datasets in Pandas that don‘t fit into memory, it can be useful to split the DataFrame into smaller chunks that are more manageable to analyze and Partitioning a DataFrame can have several benefits, including: Reducing memory usage by working with smaller chunks of data at a time. You could use duplicates="drop" but you won't always have the number bins you requested as some will be clumped together. shape[0]) np. 317000 6 11. 0 Abc 20. Viewed 155k times Part of R Language Collective 108 . 2. For this task, We will use Dataframe. randint(1, 25, size=(24,))}) n_split = 5 # the indices used to select parts from dataframe ixs = np. sql. Slice index Dask dataframe. is has a special meaning in Python. I'm trying to randomly split the dataframe into 50 batches of 6 values. The Now, I want to work one by one with each chunk of existing data. Unfortunately qcut fails to find unique bin edges for discontinuous distributions so you might have some issue if one user is over represented. Viewed 576 times 1 . DataFrame(list(iterator), columns=columns)]). Likewise, use != instead of is not for inequality. By splitting a For example, with a 1 TB DataFrame we might: Split into 10 GB chunks ; Allocate chunks to 100 servers; Process chunks independently ; Combine metrics from each ; By splitting and scaling out, massive computations become tractable. I have a spark dataframe of 100000 rows. Split pandas Split a Pandas Dataframe into Random Values. Pandas makes this relatively straightforward by enabling you to iterate over the DataFrame in chunks. Assume that the input DataFrame contains: A B C 0 10. Let’s I have a pandas DataFrame that I am grouping by columns ['client', 'product', 'data']. ; Convert it to a DataFrame and add a column composed of bin numbers, cycling from 0 to binNo. I have a dataframe with +6m rows and would like to split it in 20 or so chunks. 433 to 46. Related. This is known as "chunking" or "partitioning" the data. I. 6 million rows. Given the df DataFrame, the chuck identifier needs to be one or more columns. How do you get the logical xor of two variables in Python? Hot Network Questions I want to split into sub-dataframes each containing 100 rows except the last that has to contain 50. groupby(['client', 'product', 'data']) print(len(grouped_data)) # 10000 I want to split the resulting groupby object into two chunks, one containing roughly 80% of the groups, the other one containing the rest. 5,354 3 3 gold badges 13 13 silver badges 27 27 bronze Divide a Pandas Dataframe task is very useful in case of split a given dataset into train and test data for training and testing purposes in the field of Machine Learning, Artificial Intelligence, etc. Split large dataframes (pandas) into chunks (but after grouping) 2. Follow asked Feb 24, 2021 at 17:46. I explored the following links but could not figure out how to apply it to my problem. I want to do it with pandas so it will be the quickest and easiest. randomSplit(split_weights) for df_split in splits: # do what you want with the smaller df_split Note that this will not ensure same number of records in each df_split. Let’ I want to split this df into multiple dfs when there is row with all NaN. When a chunk is identified, it is stored in a separate dataFrame (or maybe a list of dataFrames?). Modified 9 years, 5 months ago. 1. How do I split a list into equally-sized chunks? 1988. The following snippet generates a DF with 12 records with 4 chunk ids. 4. 0. 0 Pqr 40. Looking for the best way to Split dataframe into relatively even chunks according to length. iloc [:6] df2 = df. Pandas: How to create a group index iteratively. The function splits the DataFrame every chunk_size rows (by default 2 rows). See examples, syntax, and output for each method. sample() method that allows you to select either a number of records Based on the tag, section the dataFrame into 'chunks'. A possible approach would be to create a new id each 13th column and then split into the dataframes into a dictionary, for simplicity i will use a split each n numbers in order for it to be reproducible. You could turn our user column into a categorical one and use qcut for uniform height binning. df_split = np. Improve this question. Converting an DataFrame from pandas to dask. rdd. index),num) results = [] for c in split: results. grouped_data = raw_data. DF. Separate DataFrame into N (almost) equal segments . You can access the list at a specific index to get a specific DataFrame chunk or you can iterate over the list to access each chunk. However equals element contained in the CODE column should not end up in different chunks, instead those should be added in the previous chunk even if the size is exceeded. 76. How to split a csv into multiple csv files using Dask. Method 2: Using NumPy’s Array Split. Leo K. I know that I can write a loop through i'm trying to separate a DataFrame into smaller DataFrames according to the Index value or Time. append(data. Split a dataframe into chunks where each chunk has no common non-zero element with the other chunks. sample() and Dataframe I have an excel file with about 500,000 rows and I want to split it to several excel file, each with 50,000 rows. Python Dask dataframe separation based on column value. Here is what I have so far: Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers; Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand; OverflowAI GenAI features for Teams; OverflowAPI Train & fine-tune LLMs; Labs The future of collective knowledge sharing; About the company I have a Dataframe of a million rows. The function returns a list of DataFrames. Desired output: Splitting pandas dataframe into many chunks. So I had the idea to split up the frame into chunks and process each chunk in parallel using multiprocessing. Split column in a Dask Dataframe into n number of columns . array_split but it's splitting it into 392 dataframes of size 100 and 50 dataframes of size 99. Viewed 2k times 2 . array_split(df, 3) Now you can treat the df_split as a list of dataframes. Method 1: Split rows into train, validate, test dataframes. Splitting Pandas Dataframe into chunks by Timestamp. silentninja89 silentninja89. iloc [6:] . Python divide dataframe into chunks . Let’s explore several efficient methods to achieve this without running into such issues. split cannot work when there is no equal division # so we need to find out the split points ourself # we need (n_split-1) split points Split pandas DataFrame into approximately the same chunks. Pandas Split DataFrame using row index. fieldNames() chunks = spark_df. Splitting pandas According to np. You can use list comprehension to split your dataframe into smaller dataframes contained in a list. My dataframe is df which includes 8 columns and 6. shuffle(ixs) # np. Split a pandas dataframe into chunks with ease using this simple and efficient method. The file has 100,000 and I want a for loop to write each of the split files to individual csv's at one time. Hot Network Questions 8 coins show heads, the ), but now I need to solve for the daily limit. Split dataframe into grouped chunks. 6. This method is designed to be as performant as possible, and it will work with any dataframe, regardless of its size. DataFrame(df) I want to check if text length is larger than 2 then split the text into chunks of 2-2 works and if the length is smaller than 2 then don't select take that row. Datetime col1 col2 1 2021-05-19 05:05:00 3 7 2 I would like to split it to multiple dataframes by days. There is no column by which we can divide the dataframe in a segmented fraction. Hot Network Questions How is the fundamental frequency formally defined? Why is the import pandas as pd df = {'text': ['Expression of H-2 antigenic specificities on', 'To study the distribution of myelin-associated'], 'id': [1, 2]} df = pd. I have created a function which is able to split a dataframe into equal size chunks however am unable to figure out how to split by groups. schema. Dividing a pandas groupby object into chunks. So I have a large Pandas dataframe that is structured like this: x y count blah blah 4 blah blah 12 blah blah 15 The final sum of the count column is around 48,000. 669069 2 6. arange(1, 25), "borda": np. Compute value_counts for DeviceID. Ask Question Asked 3 years, 10 months ago. How to split dask dataframe into partitions based on unique values in a column? 0. When working with large datasets in Pandas that don‘t fit into memory, it can be useful to split the DataFrame into smaller chunks that are more manageable to analyze and process. In my example, I would have 4 dataframes with 5,5,1 and 2 rows as the output Splitting a Pandas DataFrame into smaller chunks is a useful technique in data analysis, and it offers a variety of applications in working with large datasets. I'm looking to split my starting dataframe into 3 new dataframes based on a slice of the original. Using groupby you could split into two dataframes like. I have a data frame with 10 columns, collecting actions of "users", where one of the columns contains an ID (not unique, identifying user In Pandas, I want to: randomly select a sample from a dataframe (with a single column) split this sample into nr_of_chunks chunks with each chunk containing items_per_chunk; compute the mean of each chunk; and plot it into a histogram and I want to split this dataframe into individual dataframes by 6 month date chunks, named period_1, period_2 and so on: period_1 contains values from 2010-10-18 to (2010-10-18 + 6 months) period_2 contains values from (2010-10-18 + 6 months) to (2010-10-18 + 6*2 months) and so on. repartition(num_chunks). There may be some fluctuation but with 200 million Split a large dataframe into a list of data frames based on common value in column. import pandas as pd columns = spark_df. This solution uses list comprehension, so it might be faster for large arrays. By “group by” we are referring to a process involving one or more of the following steps: Splitting the data into groups based on some criteria. Process dask dataframe by chunks of rows . My attempt followed that described in: Split a large pandas dataframe. How do I go about this? I can do groupby, but do not know what to do with the grouped object. 3. iloc[c:(c+num)]) Splitting Large CSV files with Python. Learn different ways to split Pandas DataFrame into equal chunks, groups, or by percentage using numpy, dask, or sklearn. groupby('fruit')]) ] df_split1: fruit count 0 apple 1 4 apple 17 Now, I need to split the dataframe into two chunks of length 5 (chunk_size) grouped by the symbol column. The Pandas provide the feature to split Dataframe according to column index, row index, and column values, etc. Splitting pandas DF into equal chunks based on column value. it converts a DataFrame to multiple DataFrames, by selecting each unique value in the given column and putting all those entries into a separate DataFrame. I created logic to round up the 45. if your We can try iterating over a groupby on fruit, array_split into 2 DataFrames, then zip to transpose the list of lists of DataFrames, then concat to create a list of DataFrames (which can be unpacked into two variables): df_split1, df_split2 = [ pd. 0 Mno 33. I have an indexed dataframe which has 77000 rows. 0 I have a dataframe called df which is 1364 rows (this includes the title). iloc You can also use the DataFrame. Timestamp Value Jan 1 12:32 10 Jan 1 12:50 15 Jan 1 13:01 5 Jan 1 16:05 17 Jan 1 16:10 17 Jan 1 16:22 20 The result I want back, is a dataframe with per-hour (or Split dataframe into chunks and add them to a multiindex. I mean, I want to split the series in the compact pieces between NaNs. by aggregating or extracting I suggest you to use the partitionBy method from the DataFrameWriter interface built-in Spark (). Split pandas dataframe into multiple dataframes with equal numbers of rows. array_split documentation, the second argument indices_or_sections specifies chunks boundaries rather than chunks sizes. Then, for each chunk at index i, we are generating a sub-array of the original array like this: a[ i * CHUNK : (i + 1) * CHUNK ] where, i * CHUNK is the index of the first element to put into the subarray, and, (i + 1) * CHUNK is 1 past the last element to put into the subarray. So if you're looking to split a pandas dataframe into chunks, look no further! To split your DataFrame into a number of "bins", keeping each DeviceID in a single bin, take the following approach:. Importantly, each batch should have 1 of each subgroup and an approximately equal distribution of group. 0 Ghi NaN 3 NaN NaN NaN 4 NaN Hkx 30. The solution above tries to cope with this situation by reducing the chunks (e. Splitting dataframe into multiple dataframes. So let's say n is 30, 1363/30 = 45. If you have a large DataFrame with, say, 423,244 rows and you want to divide it into smaller, manageable parts, you might encounter some challenges, especially if the I'm currently trying to split a pandas dataframe into an unknown number of chunks containing each N rows. However, I haven't been able to find anything on how to write out the data to a csv file in chunks. Modified 3 years, 10 months ago. How do I get the row count of a Pandas DataFrame? 2044. Splitting pandas DF into equal chunks based on This function allows you to specify the size of each chunk, which can be adapted to fit your needs. 9k 22 22 gold badges 152 152 silver badges 154 154 bronze badges. Since I consume a certain amount of daily requests with debugging and development, I think it's safe to split into chunks of 2K. 0 9 NaN Stu NaN 10 32. Splitting pandas import pandas as pd df = {'text': ['Expression of H-2 antigenic specificities on', 'To study the distribution of myelin-associated'], 'id': [1, 2]} df = pd. Improve this answer. The goal is to iterate these chunks so I can pass each one individually to another function which can't handle gaps in data. Pandas - Breaking a huge Dataframe into smaller chunks. The result is a Series starting with most numerous groups. Split DataFrame into chunks. I have a Pandas dataframe with dates column as datetime objects, not strings. Is there an elegant way to do this? I've done this manually by What would be the simplest way to split that dataframe into multiple dataframes of 1 week or 1 month worth of data? As an example, a dataframe containing 1 year of data would be split in 52 dataframes containing a week of data and returned as a list of 52 dataframes. Split pandas Splitting a Pandas DataFrame into Chunks of N Rows in Python. For timeseries data, we often want to analyze trends within periods. Follow answered Oct 4, 2017 at 19:46. See also Is there a difference between == and is in Python?. 0 6 22. My DataFrame has roughly 25K rows, and the daily limit is 2,500, so I need to split it approximately 10 times. 286333 2 11. Do not reindex. array_split() this funktion however splits the dataframe into N chunks containing an unknown number of rows. # Split a Pandas DataFrame into chunks using DataFrame. How to convert index of a pandas dataframe into a column. The Boolean masks you are creating is there a good code to split dataframes into chunks and automatically name each chunk into its own dataframe? for example, dfmaster has 1000 records. Is there a way to loop though 1000 rows and convert them to pandas dataframe using toPandas() and append them into a new dataframe? Directly changing this by using toPandas() is taking a very long time. arange(df. When working with large DataFrames, it’s essential to be aware of best practices, such as using memory-efficient data types and reading in data in smaller chunks. I've been looking into reading large data files in chunks into a dataframe. Currently, my first and last df look good, but the middle is not correct as it's extending to the very end. As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. The column start_idx indicate the rows to start the chunk in each group. . 669069 1 6. How to split pandas dataframe into multiple parts based on consecutively occuring values in a column? Hot Network Questions Notation for Organ Registration in Bach/Árpád Kommt Ihr The file may have 3M or 4M or 2M depending on when it's download, is it possible to have a code that goes to the whole dataframe and split into 1M chunks and have those chunks saved into different sheets? python; pandas; Share. append(df. Follow edited Mar 14, 2023 at 19:58. Applying a function to each group independently. Python divide dataframe into chunks. Ask Question Asked 4 years, 7 months ago. functions as F df = spark. 324889 6 11. array_split(v, 2) for _, v in df. If there is more than 2 zeros, then section out all of the data existing between the previous zeros, and the Splitting pandas dataframe into many chunks. Enabling parallel processing or Most of the solutions to How do you split a list into evenly sized chunks? and What is the most “pythonic” way to iterate over a list in chunks? should apply here. 0 1 11. Sometimes in order to analyze the Dataframe more accurately, we need to split it into 2 or more parts. 5. 0 NaN 21. Modified 8 years, 2 months ago. As you can see in the example below, the time resolution of my data is 5 min, and i would like to create a new dataframe when the time difference between each row is greater than 5 min, or when the Index grows more than 1 (which is the same criteria, so any I would like to split up the dataframe into N chunks if the total amount of records exceeds a threshold. We can use numpy to split a dataframe into N equal parts as follows: import pandas as pd import numpy as np df=pd. iloc[start:end] But doing that seems inefficient. This does speed-up the task, but the memory consumption is a nightmare. The Pandas library provides a variety of functions that allow us to perform I have to create a function which would split provided dataframe into chunks of needed size. split by 200 and create df1, df2,. read_csv(filepath, blocksize = blocksize * 1024 * 1024) I can process it in chunks like this: partial_results = [] for Split large dataframes (pandas) into chunks (but after grouping) 2. Splitting Data frame content continuously and evenly across multiple columns . For instance if dataframe contains 1111 rows, I want to be able to specify chunk size of 400 rows, and get three smaller dataframes with sizes of 400, 400 and 311. I want to split the overall dataframe into around twelve different chunks. e. iloc integer I would like to split a dataframe into chunks. This blog post demonstrates different approaches for splitting a large CSV file into smaller CSV files and outlines the costs / benefits of the different approaches. Ask Question Asked 9 years, 5 months ago. Each chunk should then be fed to a thread from a threadpool executor to get the calculations done, then at the end I would wait for the threads to sync and concatenate the resulting DFs into one. Splitting a dataframe into many smaller data frames evenly-ish. Don't repeat mask calculations. Pandas is a popular data manipulation library in Python that provides a wide range of functionalities for working with structured data, such as CSV files, Excel spreadsheets, and databases. random. In [1047]: df1, df2 = [x for _, x in df. Randomly assign 10% of remaining rows to validate_df with rest being assigned to train_df. the . The condition for this split is that I want the count of the column in that chunk to be around 4,000 This function allows you to specify the size of each chunk, which can be adapted to fit your needs. The following examples show how to use this syntax in practice. 0 5 21. 433. import pyspark. Chunking enables big data! Timeseries Segmentation. I am trying to break it up into small sized Dataframes of 1000 rows each. So, if two consecutive You can use the following basic syntax to split a pandas DataFrame into multiple DataFrames based on row number: #split DataFrame into two DataFrames at row 6 df1 = df. Each spli Splitting pandas dataframe into many chunks. I apologize for not knowing how @altabq: The problem here is that we don't have enough memory to build a single DataFrame holding all the data. Viewed 232 times 1 I have a pretty large (about 2000x2000 but not square necessarily) dataframe that is very sparse looking something like this: Split large dataframes (pandas) into chunks (but after grouping) 4. mapPartitions(lambda iterator: [pd. df5 any guidance would be much appreciated. Let's say I have a pandas dataframe df. For example, I want to take the first 20% of rows to create the first segment, then the next 30% for the second segment and leave the remaining 50% to the third segment. Filtering chunks of a dataframe in parallel using Dask. iloc, and list comprehension. iloc[start : count]) return dfs # Create a DataFrame with 10 rows df = Since you are randomly splitting the dataframe into 8 parts, you could use randomSplit(): split_weights = [1. In the weeks array, each item is a pandas dataframe (same for the month I suggest you to use the partitionBy method from the DataFrameWriter interface built-in Spark (). In this comprehensive guide, we‘ll cover: What is chunking and when to use it 4 [] My solution allows to split your DataFrame into any number of chunks, on each row full of NaNs. g. I have tried using numpy. Dask Dataframe - multiple rows from each row. using Numpy and the array_split function, however being a very large dataframe it just goes on forever. import multiprocessing as mp import pandas as pd # split the dataframe into smaller chunks chunks = [df[i:i+1000] for i in range(0, len(df), 1000)] # define a function to process a single chunk I dont know if i understand correctly the question but you want to split it each n rows into a new dataframe. aawwkk hjbdl zmda jatld uqfan pvv hdfegh inrqvp htbcrq hdapp