ํ‹ฐ์Šคํ† ๋ฆฌ

์•ž๋™๋„คJIHOON
{RepoJI}
์•ž๋™๋„คJIHOON
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (37)
    • ๐Ÿ–ฅ๏ธSW Engineer (29)
      • DataBase (0)
      • System Programming (0)
      • Algorithm (12)
      • DataStructure (0)
      • Computer Architechure (0)
      • Operating System (0)
      • Distributed System (11)
    • ๐ŸฆBackend (0)
      • Web (0)
    • ๐Ÿ—‚๏ธData Science (7)
      • Statistic (1)
      • Aritificial Intelli (5)
      • Probability Theory (0)
      • Information Retrieval (0)
      • Linear Algebra (0)
    • ๐ŸซUnderGraduate (1)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ํ™ˆ
  • ๐Ÿ—‚๏ธGithub

ํƒœ๊ทธ

  • Q-learning
  • ๋ ˆ์ฝ”๋“œ ๋ฝํ‚น
  • ๋ฉ”์„ธ์ง€ ํ
  • ์†Œ์ผ“
  • ์—ฐ๊ด€์„ฑ๋ถ„์„
  • Pipe
  • ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์‹œ์Šคํ…œ Chap12
  • association rule
  • ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์‹œ์Šคํ…œ Chap13
  • ๋ฐ์ฒญ์บ 
  • Apriori algorithm
  • ํŒจํ„ด๋ถ„์„
  • PCA
  • ๊ฐ•ํ™”ํ•™์Šต
  • pthread
  • Shemaphore
  • ์ฃผ์„ฑ๋ถ„๋ถ„์„
  • ์„ธ๋งˆํฌ์–ด
  • mini shell project
  • Message Queue
  • ์ƒ๋ช…๋Œ€ ๋น„์ฆˆ๋‹ˆ์Šค ๊ณผ์ •
  • Value function
  • ํŒŒ์ผ ์‹œ์Šคํ…œ
  • q-learning algorithm
  • ์Šค๋ ˆ๋“œ
  • ๊ณต์œ ๋ฉ”๋ชจ๋ฆฌ
  • ๋‹จ์–ด ์ •๋ ฌ
  • ๋ฐ์ดํ„ฐ ์ฒญ๋…„ ์บ ํผ์Šค
  • shared memory
  • ๊ณต๋ถ„์‚ฐ
hELLO ยท Designed By ์ •์ƒ์šฐ.
์•ž๋™๋„คJIHOON

{RepoJI}

๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 1.Value Function
๐Ÿ—‚๏ธData Science/Aritificial Intelli

๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 1.Value Function

2022. 12. 7. 01:01

# 1. ๊ฐ•ํ™”ํ•™์Šต์ด๋ž€?


๊ฐ•ํ™”ํ•™์Šต์ด๋ž€ ์–ด๋–ค Enviroment์„ ํƒ์ƒ‰ํ•˜๋Š” Agent๊ฐ€ ํ˜„์žฌ์˜ State์„ ์ธ์‹ํ•˜์—ฌ ์–ด๋–ค Action์„ ์ทจํ•˜๋ฉด ๊ทธ ํ–‰๋™์— ๋Œ€ํ•œ Reward๊ฐ€ ์ฃผ์–ด์ง€๊ฒŒ ๋˜๊ณ , Reward๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” Action์„ ์ฐพ๋Š” Policy๋ฅผ ์ฐพ๋Š” ํ•™์Šต๋ฐฉ๋ฒ•์ด๋‹ค.


# 2 .Value Function


Markov Decision Process๋ž€?
์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ •์„ ๋ชจ๋ธ๋งํ•˜๋Š” ์ˆ˜ํ•™์ ์ธ ํ‹€๋กœ์„œ ํ˜„์žฌ State์—์„œ ์ด์ „ ์ด๋ ฅ์€ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ , ์ตœ์„ ์˜ action์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋งํ•œ๋‹ค. ์ƒํƒœ(State), ํ–‰๋™(Action), ๋ณด์ƒ(Reward), ์ƒํƒœ ๋ณ€ํ™˜ ํ™•๋ฅ (State Transition Probability), ๊ฐ๊ฐ€์œจ(Discount Factor), ์ •์ฑ…(Policy)์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ ์ด๋ฅผ ํ†ตํ•ด ๋” ์ข‹์€ ์ •์ฑ…์„ ์ฐพ๋Š” ๊ณผ์ •์ด๋‹ค.

Reward๋ž€? 

t๋ผ๋Š” ์‹œ๊ฐ„์ผ ๋•Œ์˜ State์—์„œ Action์ด ์ทจํ•ด์กŒ์„ ๋•Œ t+1์— ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” Reward์˜ ๊ธฐ๋Œ€๊ฐ’์„ ๋งํ•œ๋‹ค.

Reward

์ƒํƒœ๋ณ€ํ™˜ ํ™•๋ฅ ์ด๋ž€?

์ƒํƒœ ๋ณ€ํ™˜ ํ™•๋ฅ ์ด๋ž€ ์‹œ์  t์—์„œ State์™€ Action์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ๋‹ค์Œ State์— ๋„๋‹ฌํ•  ํ™•๋ฅ ๋กœ์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ธ๋‹ค.

๊ฐ๊ฐ€์œจ(Discount Factor)์ด๋ž€?

๊ฐ๊ฐ€์œจ์ด๋ž€ ์—์ด์ „ํŠธ๊ฐ€ ํ˜„์žฌ์— ๊ฐ€๊นŒ์šด ์‹œ์ ์— ๋ฐ›๋Š” ๋ณด์ƒ์„ ๋ฏธ๋ž˜์— ๋ฐ›๋Š” ๋ณด์ƒ๋ณด๋‹ค ๊ฐ€์น˜์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๊ฐœ๋…์ด๋‹ค.

์ •์ฑ…(Policy)์ด๋ž€?

๋ชจ๋“  ์ƒํƒœ์— ๋Œ€ํ•ด ์—์ด์ „ํŠธ๊ฐ€ ํ•  ํ–‰๋™์„ ๋‚˜ํƒ€๋‚ธ ๊ฐœ๋…์ด๋‹ค.

๊ทธ๋Ÿผ, ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ํ•˜๋Š” ์ผ์€ 

Value Function์ด๋ž€?

value Function์€ ํ•ด๋‹น State๋กœ๋ถ€ํ„ฐ ์ข…๋ฃŒ ๋  ๋•Œ๊นŒ์ง€์˜ ๋ˆ„์  Reward๋ฅผ ์ถ”์ •ํ•˜๋Š” ์‹์ด๊ณ   ๐‘ฃ_๐œ‹ (๐‘ )๋กœ ํ‘œ๊ธฐํ•œ๋‹ค.

๋”ฐ๋ผ์„œ, ์—์ด์ „ํŠธ๊ฐ€ ์‹œ์  t์—์„œ State์™€ Action์„ ์„ ํƒํ•˜๊ณ  Reward๋ฅผ ๊ณ„์† ๋ฐ˜๋ณตํ•ด์„œ ๋ฐ›์•˜๋˜ ๋ชจ๋“  ๋ณด์ƒ์˜ ํ•ฉ์„ ๊ฐ๊ฐ€์œจ์„ ๊ณฑํ•ด์ฃผ์–ด ๋”ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚œ๋‹ค.

์ด ์‹์˜ ๊ธฐ๋Œ€๊ฐ’์ด ๋ฐ”๋กœ t์‹œ์ ์˜ State์— ๋Œ€ํ•œ Value๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

์ด ์‹์„ ๊ฐ๊ฐ€์œจ์„ ๋ฌถ๊ณ  ๋’ค๋ฅผ ์ •๋ฆฌํ•ด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚œ๋‹ค. 

๋”ฐ๋ผ์„œ, t ์‹œ์ ์˜ State์— ๋Œ€ํ•œ ๊ธฐ๋Œ€๊ฐ’์€ Reward์™€ t+1์— ๋Œ€ํ•œ Value์— ๊ฐ๊ฐ€์œจ์„ ๊ณฑํ•œ ๊ฐ’์„ ํ•ฉํ•œ ๊ธฐ๋Œ€๊ฐ’๊ณผ ๊ฐ™๋‹ค.


# 3 . Q-Value Function


์œ„์—์„œ ์‚ดํŽด๋ณธ Value Funtion์€ Action์— ๋Œ€ํ•ด์„œ๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š์ง€๋งŒ, ์—์ด์ „ํŠธ๋Š” State์— ๋Œ€ํ•œ ์ตœ์ ์˜ Action์„ ์•Œ์•„์•ผ ํ•˜๋ฏ€๋กœ, Value Function์„ ์ข€ ์ˆ˜์ •ํ•ด์•ผ ํ•œ๋‹ค.

๋”ฐ๋ผ์„œ, ์ด๋ฒˆ์—๋Š” {State,Action} Pair์„ ํ†ตํ•ด Reward๋ฅผ ๋ˆ„์ ํ•˜๋Š” ๋ฐฉ์‹์ธ Q-value Function์— ๋Œ€ํ•ด ๋ณด๋ ค๊ณ  ํ•œ๋‹ค. 

์ผ๋‹จ, ํ˜„์žฌ๊นŒ์ง€ Value Function์„ ํ™œ์šฉํ•œ Bellman Equation์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์—ฌ๊ธฐ์— Action์— ๋Œ€ํ•œ Quality๋ฅผ ์ถ”๊ฐ€ํ•œ๊ฒŒ ๋ฐ”๋กœ Q-value function์ด๋‹ค. State-Action Function์ด๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋„ ํ•œ๋‹ค.

Value Function vs Q-value function
Q-value Function

๊ทธ๋ฆฌ๊ณ  ์ด Q-value function์— ๋Œ€ํ•œ ๋ฒจ๋งŒ ๋ฐฉ์ •์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. 

 

์ €์ž‘์žํ‘œ์‹œ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'๐Ÿ—‚๏ธData Science > Aritificial Intelli' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 3.DQN  (0) 2022.12.07
๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 2.Q-learning  (0) 2022.12.07
Pattern mining  (0) 2022.11.26
์ฃผ์„ฑ๋ถ„๋ถ„์„(Principal Component Analysis)  (0) 2022.11.22
    '๐Ÿ—‚๏ธData Science/Aritificial Intelli' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 3.DQN
    • ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning) - 2.Q-learning
    • Pattern mining
    • ์ฃผ์„ฑ๋ถ„๋ถ„์„(Principal Component Analysis)
    ์•ž๋™๋„คJIHOON
    ์•ž๋™๋„คJIHOON
    Every step repository / Data && Engineering

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”