question summary: how to determine the training duration for replay algorithm with infinite data stream
Hello continual learners,
currently I am trying to implement the vanilla replay algorithm to train a feed-forward-neural-network for predicting the power consumption of a machine (single output regression). The supervised data appear in a datastream. The different production plans of the machine correspond to my tasks to learn, and occure in a systematic order.
For testing the replay algorithm I want to implement the vanilla replay algorithm in a “pseudo offline version” that simulates my data stream. During this offline version I have access to historic datasets with a limit size for all tasks. I plan to train new tasks in mini-batches containing a mix of new training data for the new task collected from the data stream and replay data from the previous tasks from a memory.
In the real-world-application I don’t have information about how long the machine gonna run in a specific scenario and generate streaming data from a specific task. How could I define when to stop the training of a new task, since theoretically the datastream for one task could be infinite long. Running the training for a task for to long gonna lead to overfitting, I guess. During my real-time training I won’t have a test set availlable to check for overfitting and early stopping.
What method could I use here to determine when to stop the training?