Skip to content

stage advantage value gt #16

@LQ0212

Description

@LQ0212

您好非常感谢开源这么好的工作,但是我对training estimator的部分有一点疑问,value计算的target是progress = stage_progress_gt - his_-100_stage_progress_gt,his_-100_stage_progress_gt是随机的同episode的完成度,那么这个value的具体意义是什么呢,我的理解他已经不再是完成度了。在Eval中依然是通过value相减得到advantage。感谢大佬

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions