๐Ÿ—‚๏ธData Science/Aritificial Intelli

    ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 3.DQN

    # 1. Deep Q network ์šฐ์„ , ์ง€๊ธˆ๊นŒ์ง€ ๋ดค๋˜ Q-learning๊ณผ DQN์„ ๋น„๊ตํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. Q-Learning์—์„œ๋Š” Q-Table์„ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ Input์œผ๋กœ {State,Action} Pair์ด ๋“ค์–ด์˜ค๋ฉด ๊ทธ๊ฑฐ์— ๋Œ€ํ•œ Q-value๊ฐ’์„ returnํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌ๊ธฐ์—๋Š” State์™€ Action๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๋ฉด Q-Table์„ ๊ด€๋ฆฌํ•˜๊ธฐ๋ž€ ๋ถˆ๊ฐ€๋Šฅ์— ๊ฐ€๊น๊ฒŒ ๋ ๋ฟ๋”๋Ÿฌ ์‹œ๊ฐ„๋„ ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฒฝํ—˜ํ•˜์ง€ ๋ชปํ•œ ๊ฒƒ์— ๋Œ€ํ•œ ๊ฐ’์€ ์•Œ ์ˆ˜๊ฐ€ ์—†๋‹ค.๋”ฐ๋ผ์„œ, Q-Learning์—์„œ Q-Table ๋Œ€์‹  Neural Network๋ฅผ ์‚ฌ์šฉํ•ด์„œ Q-value๋ฅผ ์ถ”์ •ํ•ด๋ณด๋Š” ๊ฒŒ DQN์ด๋‹ค. # 2. Naive DQN Neural Network๋ฅผ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” Loss Function์ด ํ•„์š”ํ•œ..

    ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 2.Q-learning

    # 1. Q-Learning https://repoji-dataengineer.tistory.com/entry/%EA%B0%95%ED%99%94%ED%95%99%EC%8A%B5Reinforcement-Learning ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 1.Value Function # 1. ๊ฐ•ํ™”ํ•™์Šต์ด๋ž€? ๊ฐ•ํ™”ํ•™์Šต์ด๋ž€ ์–ด๋–ค Enviroment์„ ํƒ์ƒ‰ํ•˜๋Š” Agent๊ฐ€ ํ˜„์žฌ์˜ State์„ ์ธ์‹ํ•˜์—ฌ ์–ด๋–ค Action์„ ์ทจํ•˜๋ฉด ๊ทธ ํ–‰๋™์— ๋Œ€ํ•œ Reward๊ฐ€ ์ฃผ์–ด์ง€๊ฒŒ ๋˜๊ณ , Reward๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” Action์„ ์ฐพ๋Š” Policy๋ฅผ ์ฐพ repoji-dataengineer.tistory.com ์—ฌ๊ธฐ์„œ Q-Value Function๊นŒ์ง€ ์‚ดํŽด๋ณด์•˜๋‹ค. ์ด์ œ ๋ชจ๋“  {State,Action} Pair์— ํ•ด..

    ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 1.Value Function

    # 1. ๊ฐ•ํ™”ํ•™์Šต์ด๋ž€? ๊ฐ•ํ™”ํ•™์Šต์ด๋ž€ ์–ด๋–ค Enviroment์„ ํƒ์ƒ‰ํ•˜๋Š” Agent๊ฐ€ ํ˜„์žฌ์˜ State์„ ์ธ์‹ํ•˜์—ฌ ์–ด๋–ค Action์„ ์ทจํ•˜๋ฉด ๊ทธ ํ–‰๋™์— ๋Œ€ํ•œ Reward๊ฐ€ ์ฃผ์–ด์ง€๊ฒŒ ๋˜๊ณ , Reward๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” Action์„ ์ฐพ๋Š” Policy๋ฅผ ์ฐพ๋Š” ํ•™์Šต๋ฐฉ๋ฒ•์ด๋‹ค. # 2 .Value Function Markov Decision Process๋ž€? ์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ •์„ ๋ชจ๋ธ๋งํ•˜๋Š” ์ˆ˜ํ•™์ ์ธ ํ‹€๋กœ์„œ ํ˜„์žฌ State์—์„œ ์ด์ „ ์ด๋ ฅ์€ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ , ์ตœ์„ ์˜ action์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋งํ•œ๋‹ค. ์ƒํƒœ(State), ํ–‰๋™(Action), ๋ณด์ƒ(Reward), ์ƒํƒœ ๋ณ€ํ™˜ ํ™•๋ฅ (State Transition Probability), ๊ฐ๊ฐ€์œจ(Discount Factor), ์ •์ฑ…(Policy)์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ ์ด๋ฅผ ํ†ต..

    Pattern mining

    # Frequent Pattern Mining Association Rule vs Sequential patterns Association rule์€ ์—ฐ๊ด€ ๊ทœ์น™์„ ์ฐพ๋Š” ๊ฒƒ์ด์ง€๋งŒ, ๊ทธ ์ˆœ์„œ๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š๋Š”๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์น˜ํ‚จ๊ณผ ํ”ผ์ž๋ฅผ ์‚ฐ ์‚ฌ๋žŒ์ด ์ฝœ๋ผ๋ฅผ ์‚ด ํ™•๋ฅ  ๋†’๋‹ค๋Š” ๊ฒƒ์€ ์•Œ ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ฌด์—‡์ด ๋จผ์ €์ธ์ง€๋Š” ์•Œ ์ˆ˜ ์—†๋‹ค. ์ฝœ๋ผ์™€ ํ”ผ์ž๋ฅผ ์‚ฐ ์‚ฌ๋žŒ์ด ์น˜ํ‚จ์„ ๋งŽ์ด ์‚ฐ๊ฑด์ง€ ์•„๋‹ˆ๋ฉด ๊ทธ ์™ธ์ธ์ง€๋Š” ์•Œ ์ˆ˜ ์—†๋‹ค. ์ด๋Ÿฐ ๋ถ€๋ถ„์„ ๋ณด์™„ํ•œ ๊ฒŒ Sequential Pattern ๋ถ„์„์ด๋‹ค. ์ด ๊ธ€์—์„œ๋Š” Association Rule์— ๋Œ€ํ•ด์„œ๋งŒ ๋‹ค๋ฃจ๊ฒ ๋‹ค. ์šฐ์„  ์—ฐ๊ด€์„ฑ ๋ถ„์„์„ ์œ„ํ•ด์„œ๋Š” ์ฃผ์š”์ง€ํ‘œ์ธ Support์™€ Confidence๋ผ๋Š” ๊ฐœ๋…์„ ๋จผ์ € ์•Œ์•„์•ผ ํ•œ๋‹ค. A,B,C๋ผ๋Š” ์„ธ ๋ฌผ๊ฑด ์‚ฌ์ด์˜ ๊ตฌ๋งค ์—ฐ๊ด€์„ฑ์„ ํŒŒ์•…ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ณด์ž. Sup..

    ์ฃผ์„ฑ๋ถ„๋ถ„์„(Principal Component Analysis)

    # 1 ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์ด๋ž€? ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์€ ์ฐจ์› ์ถ•์†Œ(dimensionality reduction)๊ณผ ๋ณ€์ˆ˜ ์ถ”์ถœ(feature extraction) ๊ธฐ๋ฒ•์œผ๋กœ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. PCA๋Š” ๋†’์€ ์ฐจ์›์˜ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์—์„œ ๋‚ฎ์€ ์ฐจ์›์˜ ์ •์‚ฌ์˜๋œ ์ถ•์„ ์ฐพ๋Š” ๋ฐฉ์‹์œผ๋กœ ์ฐจ์›์„ ์ถ•์†Œ์‹œํ‚จ๋‹ค. ์ด ๋•Œ ์ถ•๊ณผ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ์˜ค์ฐจ๋Š” ์ตœ์†Œํ™” ํ•˜๊ณ , ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์€ ์ตœ๋Œ€ํ•œ ์œ ์ง€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ถ•์„ ์ฐพ๋Š” ๊ฑธ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ , ์•„๋ž˜์˜ ๊ทธ๋ฆผ์ด ๊ทธ ๊ณผ์ •์„ ๋ณด์—ฌ์ค€๋‹ค. # 2 . ๊ณต๋ถ„์‚ฐ(Covariance) ๊ณต๋ถ„์‚ฐ์ด๋ž€? 2๊ฐœ์˜ ํ™•๋ฅ ๋ณ€์ˆ˜์˜ ์ƒ๊ด€ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’์ด๋‹ค. Cov(X,Y) > 0 : ํ™•๋ฅ ๋ณ€์ˆ˜ X๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ Y๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด, ๊ณต๋ถ„์‚ฐ์€ 0๋ณด๋‹ค ํฐ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. Cov(X,Y) < 0 : ํ™•๋ฅ ๋ณ€์ˆ˜ X๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ Y๊ฐ€ ๊ฐ์†Œํ•˜๋ฉด, ๊ณต๋ถ„์‚ฐ์€ 0๋ณด๋‹ค ..