Best ML resources for beginners
This blog post by AICorespot answers the really difficult question of the very best libraries, courses, papers and books that come recommended to a complete beginner within the domain of machine learning.
It was a real challenge when deliberating what to integrate and what to exclude. Therefore, this guide has been adapted with the perspective of a programmer and beginner at machine learning and what resources would ideally set them up with an advantage.
Only the best was selected for every variant of resource. If you are a complete newbie and thrilled to begin in the domain of machine learning, our hope is that you takeaway something useful. Our suggestion would be to select one thing, one book, or one library and read it back to back to go through all of the tutorials. Choose one and stick to it, then after you’ve mastered it, choose another and rinse and repeat.
We are advocates of the maxim: “learn just enough to be dangerous and begin trying things.”
This is how several people involved within the domain of machine learning went about learning to program and how several other people learned as well. Be conscious of your restrictions and limitations and harness your strengths. If you have know-how of programming, harness that to dive deep into machine learning quickly. Then inculcate the discipline to go and learn the mathematics for the strategy prior to implementation of a production system.
Identify a library and read the documentation, follow the tutorials and begin trying things out. The following are the ideal, open-sourced machine learning programming libraries in the wild. We don’t believe that they are all appropriate for leveraging within your production system, but they are the best for learning, exploration, and for prototyping purposes.
Begin with a library in a language you possess sound knowledge of and then move on to other more potent libraries. If you’re an adequate programmer, you know you can move from language to language adequately easily. It’s all identical logic, just differences in syntax and APIs.
- R Project for Statistical Computing: This is an environment and a lisp-like scripting language. All the statistics stuff you would ever wish to do will be furnished into R, which includes amazing plotting. The Machine Learning category on CRAN (think: third-party Machine Learning packages) contains code authored by forerunners in the domain with state-of-the-art strategies, in addition to anything else you can contemplate. Learning R is a prerequisite if you wish to prototype and explore in a quick fashion. It also might not be the first place you begin.
- WEKA: This is a data mining workbench furnishing API, and an array of command line and graphical user interfaces for the entire data mining lifecycle. You can prep data, visualize, explore, develop classification, regression and clustering models and several algorithms are furnished, built-in, in addition to being provided by third-party plugins. Not connected to WEKA, Mahout is a good JAVA framework for Machine Learning on Hadoop infrastructure if that is more of your cup of tea. If you’re brand new to big data and machine learning, stick with WEKA and learn one thing at a time.
- Scikit learn: Machine learning in Python developed on top of NumPy and SciPy. If you are a Python or Ruby programmer, this is the library for you. It’s user-friendly, capable and comes with excellent documentation. Orange would be an adequate alternative if you’d like to attempt something different.
- Octave: If you are acquainted with MatLab or you’re a NumPy programmer seeking for something a bit different, consider Octave. It is an environment for numerical computing a lot like Matlab and makes it simple to author programs to find solutions to linear and non-linear problems, like those that are at the heart of a majority of machine learning programs. If you possess a background in engineering, this might be an appropriate place for you to begin.
- BigML: Perhaps you don’t wish to perform any programming. You can drive utilities such as WEKA fully without programming. You can take it a step further and leverage services such as BigML that provide machine learning interfaces on the web where you can look into developing models completely within your browser
Choose a platform and harness it to do your practical machine learning education. Practice makes perfect.
Video is a really widespread way leveraged by students to get into the domain of machine learning. There are a ton of machine learning vids available on Youtube and Videolectures.net. The drawback is that all you will be performing is consumption without action. It comes recommended you should constantly jot down notes when watching these instructional videos, even if you do away with the notes later as you become more proficient and gain knowledge. It is also recommended practically attempting whatever is being taught within the lecture.
To be frank, none of the video-based courses we have observed are really appropriate for a beginner, for a real beginner. They all assume a functional knowledge of at least linear algebra and probability theory, and more.
Andrew Ng’s Stanford Lectures are likely the ideal place to begin for a course, otherwise there are one-off videos that come recommended.
- Stanford Machine Learning: Available through Coursera and taught by Andrew Ng. On top of enrolling, you can watch all the lectures at any time and obtain the handouts and lecture notes from the actual Stanford CS229 course. The course consists of homework and quizzes and concentrates on linear algebra and leveraging octave.
- Caltech learning from data: Available through edX and the instructor is Yaser Abu-Mostafa. All the lectures and materials are available on the Caltech site. Again, much like the Stanford class, you can take it at your own speed and finish the homework and assignments. It covers similar subjects and goes into a bit more details and is a lot more mathematical. The homework is likely to be too challenging for a starter.
- Machine Learning category on VideoLectures.net: This is a simple place to drown in the overload of content. Look for videos that appear interesting and try them out. Step back if it’s at the wrong level or jot down notes if you’re enjoying it. Machine learning experts keep coming back to it to refresh themselves on subjects and to take up completely new subjects. Also, it’s amazing to see what the forerunners of the domain actually appear like.
- Getting in shape for the sport of Data Science – Talk by Jeremy Howard: A talk to a local R users group on the practical procedure for performing well in competitive machine learning. This is a very worthy resource as very little people talk about what it’s actually like to work on a problem and how to perform it.
If you aren’t accustomed to reading research papers, you will discover that the language is very stiff. A paper is a lot like a snippet of a textbook, but details an experiment or some other frontier of the domain. Nonetheless, there are a few papers that you might find to be of interest if you are seeking to delve deeper into the domain of machine learning.
- The discipline of machine learning: A white paper that defines the discipline of machine learning by Tom Mitchell. This was a piece of the argument Mitchell leveraged to convince the president of CMU to develop a standalone Machine Learning department for a topic that will still be around in a century.
- A few useful things to know about machine learning: This is a brilliant paper as it pulls back from particular algorithms and motivates an array of critical issues like feature selection generalizability and model simplicity. This is all brilliant stuff to get straight and think clearly about from the start.
We’ve just detailed two critical papers, as reading papers can really weigh you down.
Machine learning books for novices and beginners
There are an array of machine learning books and very little are authored for beginners.
Who is a beginner?
Most probably probably you’re entering machine learning from another domain, probably computer science, programming, or statistics. Even then, a majority of the literature expect you to possess a grounding in at least linear algebra and probability theory.
Nonetheless, there are a few books out in the wild that encourage eager programmers to begin by teaching the minimum intuition for an algorithm and point to utilities and libraries so that you can run off to and attempt some things.
The most noteworthy are Programming Collective Intelligence, Machine Learning for Hackers and Data Mining: Practical Machine Learning Tools and Techniques for Python, R, and Java respectively. When in doubt, pick up one of these three books.
- Programming collective intelligence: Building Smart Web 2.0 Applications: This book was authored for your dear programmer. It’s light on theory, heavy on code instances, and practical web problems and solutions. Purchase it, go through it, and perform the exercises.
- Machine learning for hackers: I’d recommend this book after reading Programming Collective Intelligence (above). It once again furnishes worked instances that are practical, but it has a more of a data analysis flavour and leverages R.
- Machine Learning: An Algorithmic Perspective: This book is like a more sophisticated variant of Programming Collective Intelligence (above.) It has similar objectives (get programmers started within Machine Learning.), but it integrates mathematics and references as well as instances and snippets in Python. It is recommended reading this after reading Programming Collective Intelligence if you’re still fascinated.
- Data Mining: Practical Machine Learning Tools and Techniques: Many actually begin with this book, in actuality it was the first edition and it was around the turn of the millennium. Particularly for Java programmers, and this book with its companion library WEKA furnishes an ideal environment to attempt things, implement algorithms as plug-ins and generally practice machine learning and the wider processes of Data Mining. This book and its path comes highly recommended.
- Machine learning: This is an old book and does integrate formulae and a ton of references. It’s a textbook but it is additionally really accessible with grounded motivation for every algorithm.
A ton of people ramble on about some amazing machine learning textbooks. A majority of them, however, are just not an ideal place for a beginner to start their journey.
We thought this post all the way through and we also went off and looked at other people’s listings of resources to ensure we didn’t overlook anything critical.
For the sake of completeness, here are some amazing listings of resources around the internet to begin in machine learning.
- A list of data science and machine learning resources: A meticulously fabricated list. Take your time and read through his suggestions and click on the links. They are well worth your time.
- What are some good resources for learning about Machine Learning? The first response to this question posed on Quora is amazing. Ensure to jot down notes and bookmark it the first time you read through it. The most worthwhile parts of this answer are the listings of machine learning courses with lecture notes and the listings of connected postings on question and answer websites.
- Overwhelmed by Machine Learning: is there an ML101 book? A StackOverflow question. Actually a listing of recommended machine learning books. The first response by Jeff Moser is useful as it indicates to lecture videos and talks.