Verze z 15. 7. 2014, 15:48 editovat Jj14 (diskuse \| příspěvky) Prověření uživatelé 88 592 editací preklad, en , perex, ; mezisave		Verze z 15. 7. 2014, 17:26 editovat zrušit editaci Jj14 (diskuse \| příspěvky) Prověření uživatelé 88 592 editací linky Přejít na další porovnání →
Řádek 1: ~~{{Pracuje se}}~~ '''Klasifikace''' je ve [[strojové učení\|strojovém učení]] a [[statistika\|statistice]] druh problému, když máme určit, do které z [[kategoriální proměnná\|kategorií]] dat dané [[pozorování]] patří. K tomu máme k dispozici úú[[trénovací množina\|trénovací množinu))]] obsahující pozorování (data, instance), pro která jsou kategorie určeny. Jednotlivá pozorování jsou analyzována do množiny kvantifikovatelných vlastností, známých jako [[nezávislá proměnná\|nezávislé proměnné]], rysy, fíčury (features) apod. Tyto vlastnosti můžou být kategoriální (např. "A", "B", "AB" nebo "O" pro [[krevní skupina\|krevní skupiny]], [[ordinální data\|ordinální]] (např. "velký", "střední" nebo "malý"), [[celočíselné]] (např. počet výskytů slova v emailu) anebo [[reálné]] (např. měření [[krevního tlaku]]). Některé algoritmy pracují pouze s diskrétními hodnotami a požadují, aby se celočíselná nebo reálná data ''diskretizovaly'', tj. převedly na skupiny obsahující podobná měření (např. "méně než 5", "mezi 5 a 10", "víc než 10"). Příklad problému je přiřazení emailu do třídy "spam" nebo "ne-spam" anebo přiřazeni diagnozy danému pacientovi, podle toho, jak je popsán svými pozorovanými charakteristikami (pohlavím, věk, krevní tlak, přítomnost nebo absence určitých symptomů, ...) Algoritmus, který implementuje klasifikaci, se nazývá [[klasifikátor]]. Tento termín se používá také pro [[matematická funkci\|matematickou funkci]], která je implementována algoritmem, a zobrazuje vstupní data na třídy. ~~<!--~~ In [[machine learning]] and [[statistics]], '''classification''' is the problem of identifying to which of a set of [[categorical data\|categories]] (sub-populations) a new [[observation]] belongs, on the basis of a [[training set]] of data containing observations (or instances) whose category membership is known. The individual observations are analyzed into a set of quantifiable properties, known as various [[explanatory variables]], ''features'', etc. These properties may variously be [[categorical data\|categorical]] (e.g. "A", "B", "AB" or "O", for [[blood type]]), [[ordinal data\|ordinal]] (e.g. "large", "medium" or "small"), [[integer\|integer-valued]] (e.g. the number of occurrences of a part word in an [[email]]) or [[real number\|real-valued]] (e.g. a measurement of [[blood pressure]]). Some [[algorithm]]s work only in terms of discrete data and require that real-valued or integer-valued data be ''discretized'' into groups (e.g. less than 5, between 5 and 10, or greater than 10). An example would be assigning a given email into "spam" or "non-spam" classes or assigning a diagnosis to a given patient as described by observed characteristics of the patient (gender, blood pressure, presence or absence of certain symptoms, etc.). V terminologii [[strojové učení\|strojového učení]] je klasifikace považována za metodu [[učení s učitelem]], to jest učení, při kterém je známá trénovací množina správně klasifikovaných příkladů. Analogická metoda v [[učení bez učitele]] je známá jako [[klastrování]] a spočívá ve spojování dat do kategorií podle nějaké míry vnitřní [[podobnosti]] (např. [[vzdálenost]]i mezi instancemi, které jsou považovány za vektory ve vícedimenzionálním [[vektor]]ovém prostoru. An algorithm that implements classification, especially in a concrete implementation, is known as a '''[[Pattern recognition\|classifier]]'''. The term "classifier" sometimes also refers to the mathematical [[function (mathematics)\|function]], implemented by a classification algorithm, that maps input data to a category. Terminologie není jednotná a liší se mezi statistikou a strojovým učením, případně v aplikačních oblastech. In the terminology of machine learning,<ref>{{cite book\|last=Alpaydin\|first=Ethem\|title=Introduction to Machine Learning\|date=2010\|publisher=MIT Press\|isbn=978-0-262-01243-0\|page=9}}</ref> classification is considered an instance of [[supervised learning]], i.e. learning where a training set of correctly identified observations is available. The corresponding [[unsupervised learning\|unsupervised]] procedure is known as ''clustering'' or [[cluster analysis]], and involves grouping data into categories based on some measure of inherent similarity (e.g. the [[distance]] between instances, considered as vectors in a multi-dimensional [[vector space]]). Terminology across fields is quite varied. In [[statistics]], where classification is often done with [[logistic regression]] or a similar procedure, the properties of observations are termed [[explanatory variable]]s (or [[independent variable]]s, regressors, etc.), and the categories to be predicted are known as outcomes, which are considered to be possible values of the [[dependent variable]]. In machine learning, the observations are often known as ''instances'', the explanatory variables are termed ''features'' (grouped into a [[feature vector]]), and the possible categories to be predicted are ''classes''. There is also some argument over whether classification methods that do not involve a [[statistical model]] can be considered "statistical". Other fields may use different terminology: e.g. in [[community ecology]], the term "classification" normally refers to [[cluster analysis]], i.e. a type of [[unsupervised learning]], rather than the supervised learning described in this article. ~~-->~~ {{Překlad\|en\|Statistical classification\|609043102}} [[Kategorie:Umělá inteligence]] [[Kategorie:Strojové učení]]

Klasifikace (umělá inteligence): Porovnání verzí