Julaiti Alafate, Yoav Freund · Published 2022
We present a novel approach for parallel computation in the context of machine learning that we call "Tell Me Something New" (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe "something new". TMSN does not require synchronization or a head node and is highly resilient against failing machines or laggards. We demonstrate the utility of TMSN by applying it to learning boosted trees. We show that our implementation is 10 times faster than XGBoost and LightGBM on the splice-site prediction problem.

Recommendations

Overall similar

We present a simple dynamic batching approach applicable to a large class of dynamic architectures that consistently yields speedups of over 10x. We provide performance bounds when the architecture is not known a priori and …
Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, …
We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk synchronization …

Similar task

Thanks to rapidly evolving sequencing techniques, the amount of genomic data at our disposal is growing increasingly large. Determining the gene structure is a fundamental requirement to effectively interpret gene function and regulation. An important …
Bioinformatics incorporates information regarding biological data storage, accessing mechanisms and presentation of characteristics within this data. Most of the problems in bioinformatics and be addressed efficiently by computer techniques. This paper aims at building a …
Recently several deep learning models have been used for DNA sequence based classification tasks. Often such tasks require long and variable length DNA sequences in the input. In this work, we use a sequence-to-sequence autoencoder …

Similar method

Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data …
Lake and Baroni (2018) introduced the SCAN dataset probing the ability of seq2seq models to capture compositional generalizations, such as inferring the meaning of "jump around" 0-shot from the component words. Recurrent networks (RNNs) were …
While gradient descent has proven highly successful in learning connection weights for neural networks, the actual structure of these networks is usually determined by hand, or by other optimization algorithms. Here we describe a simple …

Similar dataset

Most of the problems in bioinformatics are now the challenges in computing. This paper aims at building a classifier based on Multiple Attractor Cellular Automata (MACA) which uses fuzzy logic. It is strengthened with an …
Bioinformatics incorporates information regarding biological data storage, accessing mechanisms and presentation of characteristics within this data. Most of the problems in bioinformatics and be addressed efficiently by computer techniques. This paper aims at building a …
Thanks to rapidly evolving sequencing techniques, the amount of genomic data at our disposal is growing increasingly large. Determining the gene structure is a fundamental requirement to effectively interpret gene function and regulation. An important …